WO2004106550A2 - Identification and/or analysis of nucleic acids and/or proteins associated with a chromosome location - Google Patents

Identification and/or analysis of nucleic acids and/or proteins associated with a chromosome location Download PDF

Info

Publication number
WO2004106550A2
WO2004106550A2 PCT/GB2004/002344 GB2004002344W WO2004106550A2 WO 2004106550 A2 WO2004106550 A2 WO 2004106550A2 GB 2004002344 W GB2004002344 W GB 2004002344W WO 2004106550 A2 WO2004106550 A2 WO 2004106550A2
Authority
WO
WIPO (PCT)
Prior art keywords
protein
dna
nucleic acid
tracer
host
Prior art date
Application number
PCT/GB2004/002344
Other languages
French (fr)
Other versions
WO2004106550A3 (en
Inventor
Adele Meinie Murrell
Original Assignee
The Babraham Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Babraham Institute filed Critical The Babraham Institute
Publication of WO2004106550A2 publication Critical patent/WO2004106550A2/en
Publication of WO2004106550A3 publication Critical patent/WO2004106550A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1055Protein x Protein interaction, e.g. two hybrid selection
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01KANIMAL HUSBANDRY; AVICULTURE; APICULTURE; PISCICULTURE; FISHING; REARING OR BREEDING ANIMALS, NOT OTHERWISE PROVIDED FOR; NEW BREEDS OF ANIMALS
    • A01K2217/00Genetically modified animals
    • A01K2217/05Animals comprising random inserted nucleic acids (transgenic)
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01KANIMAL HUSBANDRY; AVICULTURE; APICULTURE; PISCICULTURE; FISHING; REARING OR BREEDING ANIMALS, NOT OTHERWISE PROVIDED FOR; NEW BREEDS OF ANIMALS
    • A01K2227/00Animals characterised by species
    • A01K2227/10Mammal
    • A01K2227/105Murine
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01KANIMAL HUSBANDRY; AVICULTURE; APICULTURE; PISCICULTURE; FISHING; REARING OR BREEDING ANIMALS, NOT OTHERWISE PROVIDED FOR; NEW BREEDS OF ANIMALS
    • A01K2267/00Animals characterised by purpose
    • A01K2267/03Animal model, e.g. for test or diseases
    • A01K2267/0393Animal model comprising a reporter system for screening tests

Definitions

  • the present invention relates to methods for studying nucleic acid (for example DNA- DNA), protein-protein and/or nucleic acid-protein (for example, DNA-protein) interactions.
  • nucleic acid for example DNA- DNA
  • protein-protein for example, protein-protein
  • nucleic acid-protein for example, DNA-protein
  • the invention is applicable to studying chromatin structure and its involvement in gene regulation.
  • DNA does not only encode the genes, but also contains instructions for the regulation of gene expression. Furthermore, chromatin higher-order structure formed by interaction of DNA with various proteins (for example, histones) plays a critical role in gene regulation, presumably because regulatory regions that are separated by long stretches of DNA can be brought into close proximity. Further understanding of gene regulation will require the identification of DNA binding proteins, their DNA recognition sites, and how they interact to activate or silence genes. This task is not straightforward, since DNA binding proteins require specific conditions for their binding to the DNA target sites, and the underlying chromatin structure also plays a major role in this process.
  • proteins for example, histones
  • Descriptive methods include RNA profiling and nuclease hypersensitivity assays in various tissues, as well as direct visualisation of chromatin and protein structures by crystallography, single molecule imaging, and electron and light microscopy techniques together with fluorescence resonance energy transfer (FRET). Electron microscopy techniques are laborious and not easily applicable to specific gene loci. Light microscopy has a resolution of 100 to 200nm, which is insufficient to resolve higher order chromatin structure. DNA binding proteins fused to green fluorescent protein permit visualisation of independent loci, but only a few positions can be examined simultaneously.
  • FRET fluorescence resonance energy transfer
  • Macromolecular interactions between two or more proteins can be monitored by fusing green- and blue fluorescent proteins to potential interacting partners and directly visualising the association in a single cell (Mahajan et al, 1998, Nature Biotechnol. 16: 547-552).
  • the strength of in situ fluorescence techniques is that these interactions can be examined in single cells.
  • Biochemical methods for looking at various features in isolation are reductionist, in vitro studies and only hint at secondary structure. These methods include electromobility shift, footprinting and various transcription reporter assays in cell free systems. Many DNA binding proteins interact with other proteins to form regulatory complexes.
  • Yeast two-hybrid and one-hybrid assays are tools for detecting protein-protein and protein-DNA interactions respectively.
  • the drawback of this technology is that interactions detected in a yeast system need to be further tested in a mammalian system. Modifications of this technology include a mammalian two hybrid assay (BD Clontech) for studying protein-protein interactions in cell lines. Combinations of microchip array based methods, quantitative real time PCR technologies, immunochemical, and chromatography methods are providing alternative methods for identifying transcription regulatory proteins, DNA binding elements, and the genes affected.
  • Chromatin immuno-precipitation is a further technique which can be used for in vivo study of protein-DNA interactions, h that method, chromatin from a given tissue or cell line is treated with formaldehyde or another cross linking agent to fix protein-DNA interactions, and the whole complex of proteins bound to the DNA is then precipitated with an antibody specific for a known DNA binding protein. After reversal of the cross links, the genes that contain a binding site for the particular DNA binding protein can be identified. ChlP is a powerful and adaptable technique, but is limited by the availability of suitable antibodies to particular DNA binding proteins of interest.
  • ChlP technique and specific modifications that can be used to identify a nucleotide sequence recognised by a binding factor have been described by Wells and Famam (2002, Methods 1: 48-56). These modifications include "Double antibody ChlP" or ChlP combined with Western blot analysis and mass spectrometry could be used to detect additional proteins that interact with a protein of interest.
  • ChlP Limitations of the ChlP technique include: low efficiency of target sequence recovery; low amounts of co-precipitated protein; and inability to distinguish between direct protein-DNA interaction and indirect interactions. Negative results can be obtained where factors present in large complexes are inaccessible, or where biochemical properties of the protein have been changed during cross-linking.
  • An alternative to ChlP is to use in situ fluorescent techniques to detect interaction between two proteins in the same cell. This has the advantage of requiring fewer cells, but the method is a hypothesis-based approach, requiring some speculation as to the interacting proteins. The resolution of those techniques is not currently as sensitive as ChlP .
  • nucleic acid for example, DNA-DNA
  • protein-protein for example, DNA-protein
  • nucleic acid-protein for example, DNA-protein
  • a method for identifying and/or analysing nucleic acid and/or protein associated with a chromosome location in a host comprising: (i) providing a host including:
  • an in vivo method to examine, for example, DNA-DNA, DNA- protein and/or protein-protein interactions at a chromosome location (such as a gene region) of interest allows identification of nucleic acid and/or protein associated with the chromosome location in a wide range of tissues during different developmental stages of the host, allowing a comparison to be made between expressed and silent states of a gene.
  • a chromosome location such as a gene region
  • an exogenous (or transgenic) tracer protein binds to a preferably unique binding site, which has been placed adjacent to the region of interest in the DNA.
  • the tracer protein bound to its target, together with surrounding proteins and DNA in the immediate vicinity can be studied for specific protein-protein and/or protein- DNA and/or DNA-DNA interactions.
  • the method may further comprise the step of isolating the bound complex from the host or from a sample (for example, a cell or tissue) taken from the host before identifying and/or analysing nucleic acid and/or protein associated with the bound complex.
  • a sample for example, a cell or tissue
  • the nucleic acid and/or protein associated with the chromosome location may be endogenous. This allows in vivo associations to be studied.
  • the DNA-binding site may be exogenous, for example a unique binding site which is not recognised by endogenous DNA-binding proteins.
  • the DNA binding site may be positioned at a gene regulatory element.
  • the DNA binding site is preferably non- transcribed in vivo.
  • the tracer protein may be encoded by DNA introduced into the host.
  • the tracer protein binds DNA without activating and/or modifying the DNA, so that development of the host is not affected by the bound complex.
  • the tracer protein is preferably ubiquitously expressed in the host.
  • the tracer protein may be inducible, for example by heat or chemical activation.
  • the tracer protein may be fused to an activator or inactivator protein or polypeptide so that the tracer protein is expressed only when the activator or inactivator protein is expressed.
  • a bound complex may be formed only after the tracer protein has been expressed, allowing the timing of the formation of the bound complex to be controlled.
  • the tracer protein may comprise an epitope for antibody binding.
  • the bound complex may be isolated by immuno-precipitation.
  • the bound complex may be analysed by immunofluorescence, antibody array, microarray and/or quantitative real time PCR.
  • the host is in one embodiment formed by crossing a first host containing the tracer protein and a second host containing the DNA binding site.
  • the host may be non-yeast, for example a mammal.
  • the host may be an organism, for example a mouse, and preferably a non-human organism.
  • the host may alternatively be cells or a cell line, for example mammalian (such as human) cells or a mammalian (such as human) cell line.
  • a method for identifying and/or analysing endogenous nucleic acid and/or protein associated with a chromosome location in a mouse comprising:
  • the method may further comprise the step of isolating the bound complex from the hybrid mouse or from a sample (for example, a cell or tissue) taken from the hybrid mouse before identifying and/or analysing endogenous nucleic acid and/or protein associated with the bound complex.
  • the tracer protein of the invention in a preferred embodiment comprises a yeast Gal4 DNA-binding domain, or a functional equivalent thereof, fused to a myc epitope tag.
  • the DNA binding domain then preferably comprises the UAS binding site (SEQ ID NO: 3) or a functional equivalent thereof.
  • the tracer protein may comprise a zinc finger DNA-binding domain.
  • the DNA binding domain then preferably comprises a zinc finger sequence specific for the zinc finger protein binding domain.
  • the tracer protein may comprise a green fluorescent protein fused to a DNA-binding domain.
  • the DNA binding domain then preferably comprises a DNA sequence specific for the green fluorescent protein DNA-binding domain.
  • DNA binding domains of further proteins which are known in the prior art to bind specifically to DNA binding sites may be used in combination with the respective DNA binding sites in the present invention.
  • the DNA binding site may be a transcription factor recognition site, a restriction enzyme recognition site, an enhancer, a silencer, a specifically engineered target site and/or a recombinase enzyme binding site.
  • a method for identifying and/or analysing nucleic acid and/or protein associated with regulating gene expression comprising:
  • step (iii) comparing the results obtained in step (i) and step (ii);
  • step (i) by the method defined above, identifying and or analysing nucleic acid and/or protein associated with regulating gene expression; (ii) generating a drug screening assay for identifying and/or analysing agents which inhibit or potentiate regulation of gene expression by the nucleic acid and/or protein identified in step (i);
  • step (iii) conducting animal toxicity profiles on an agent identified or analysed in step (ii), or an analogue thereof; (iv) manufacturing a pharmaceutical preparation of an agent having a suitable animal toxicity profile; and (v) marketing the pharmaceutical preparation to healthcare providers.
  • step (i) by the method defined above, identifying and/or analysing nucleic acid and/or protein associated with a gene at a chromosome location under a given condition; and repeating step (i) thereby
  • a host for use in the method defined herein in which the host comprises a tracer protein and a DNA-binding site.
  • the host may be a mouse or a human cell.
  • the tracer protein may comprise a yeast Gal4 DNA- binding domain.
  • the DNA-binding site may comprise the UAS binding site (SEQ ID NO: 3).
  • the tracer protein may be directed to a target site excluding DNA, for example a target protein or other non-DNA chemical structure.
  • a target may, for example, be located at a specific location in a protein complex.
  • a mammal or mammalian cell comprising a yeast Gal4 DNA-binding domain.
  • the invention also provides a mammal or mammalian cell comprising the UAS binding site (SEQ LD NO: 3).
  • the mammal or mammalian cell is preferably murine.
  • transgenic mouse in the method defined above.
  • the nucleic acid is DNA.
  • the nucleic acid may further be RNA.
  • the invention further provides a nucleic acid construct, for example a DNA vector, for insertion of an insertion sequence into a specific location of a host, comprising, in the following order, a first cloning site for insertion of a first nucleic acid sequence homologous to a sequence on one side of the specific location, the insertion sequence, one copy of a direct repeat sequence, a selective marker, a second copy of the direct repeat sequence, and a second cloning site for insertion of a second nucleic acid sequence homologous to a sequence on the other side of the specific location.
  • the insertion sequence may be a nucleic acid binding site such as the UAS binding site (SEQ ID NO: 3).
  • the direct repeats may be LoxP sequences.
  • the selective marker may be a neo gene.
  • the nucleic acid construct may be designed for use in eukaryotic organisms or cells, including mammalian organisms or cells. As exemplified using a TS vector (for example the H19TSUAS construct described below), the nucleic acid construct may be used to target an insertion sequence into the genome of a host a specific location.
  • Fig. 1 shows a graphical outline of a tracer target binding assay according to the invention
  • Fig. 2 shows generation of transgenic Tracer mice, using the Gal4 binding domain as the tracer protein
  • Fig. 3 illustrates a strategy for making targeted ES cells to target the UAS site for Gal4 binding into the HI 9 gene
  • Fig. 4 shows ChlP assay results for a proof of principle in vitro experiment using HEK 293 cells
  • Fig. 5 shows the HI 9 gene with respect to genes around it, and the location of primer sequences used in further experiments.
  • Fig. 6 depicts the results of two in vivo experiments, illustrating that the transgenic Gal4 can bind UAS site in hybrid mice;
  • Fig. 7 illustrates the Igf2-H19 locus and the GAL4-UAS knock-in strategy to detect physical interactions between the DMRs
  • Fig. 8 illustrates the targeting strategy to introduce three copies of the UAS binding motif into HI 9 DMR (part A) and northern analysis in transgenic mice of Ig ⁇ and HI 9 RNA (part B);
  • Fig. 9 shows the results of ChlP with an anti- MYC antibody directed against the transgenic GAL4-MYC tagged protein following maternal or paternal transmission of the HI 9 DMR UAS;
  • Fig. 10 shows interactions between HI 9 and Ig ⁇ DMRs identified using Chromosome Conformation Capture (“3C”) assays; and
  • Fig. 11 depicts a model for parent specific interactions between HI 9 and Ig ⁇ DMRs to provide an epigenetic switch for Ig ⁇ .
  • Fig. 1 shows a Cre-loxP transgenic strategy used to introduce a tracer-binding site ("TS") into a regulatory region of any gene of interest.
  • a specific vector named "TS-N” contains the tracer-binding site adjacent to a lox-P cassette flanking neomycin selection markers. Two multiple-enzyme cloning sites on either side of the lox-P cassette enables the insertion of homology arms by various cloning techniques.
  • Target mice (labelled “T”) have the TS targeted into their gene of interest. These mice are bred with tracer mice (labelled "TR”) which have been transfected with a tracer vector (“TR-N”) and consequently express high levels of tracer protein (preferably ubiquitous expression).
  • the offspring (labelled "T-TR") can be subjected to biochemical analysis such as ChlP using Tracer-specific antibody, followed by protein and/or D A analysis (for example, quantitative [Q] - PCR and microarray analysis) and/or immunofluorescence analysis using Tracer-specific antibodies.
  • biochemical analysis such as ChlP using Tracer-specific antibody
  • D A analysis for example, quantitative [Q] - PCR and microarray analysis
  • immunofluorescence analysis using Tracer-specific antibodies.
  • the Tracer protein binds to its Tracer binding site.
  • the tracer protein has strong epitopes for antibody binding and can be readily immunoprecipitated in ChlP assays. Since the tracer protein is unique, only proteins and D ⁇ A that interact at the targeted gene region will be isolated. Information that can be obtained from the biochemical analysis include detection of proteins bound to sequences adjacent to the Tracer and detection of cross-linked D ⁇ A sequences bound to proteins.
  • Fig. 2A depicts the 1.5kb tracer construct: Gal4BD fused to a nuclear localisation signal ( ⁇ LS) and the antigenic myc epitope (***, myc tag) under control of the CMV promoter.
  • Fig. 2B shows Southern blot analysis for genotyping the offspring of one of the founders. This founder had a high copy number of transgenes in two locations as detected by restriction of genomic DNA with EcoRl and probed with Gal4BD.
  • Fig. 2C shows Western blot analyses on protein extracts from livers (upper panel) and kidneys (lower panel), from the same animals as in B, using human anti-myc monoclonal antibodies.
  • the transgenic Gal4 protein is about 25kDa, while the endogenous myc would be 60kDa.
  • Fig. 3 A illustrates homologous recombination between the target construct H19TSUAS (bottom) and the endogenous (wild type) H19 gene (top), after the target construct has been electroporated into ES cells. After neomycin selection, clones of ES cells that have been successfully targeted can be identified by Southern blot analysis, with Spel digestion and using a downstream probe ("P") as indicated. "R” denotes repeat sequence.
  • Southern analysis shows the endogenous wt allele to be 13kb and the targeted allele (Uas neo) to be 15kb.
  • Fig. 3A Southern analysis shows the endogenous wt allele to be 13kb and the targeted allele (Uas neo) to be 15kb.
  • Fig. 3C selected successful targeted clones were transiently transfected with Cre recombinase to remove neo and ura, and after negative selection, these clones were screened by PCR with primers (shown as arrow heads) spanning either side of the UAS lox P sites.
  • Fig. 3D shows how these primers were used to amplify a 0.7kb fragment from the endogenous gene, and a 1.2kb fragment from the Cre deleted targeted gene, due to the remaining UAS binding sites and lox P sites.
  • Fig. 4 real time PCR results show that after ChlP analysis on HEK cells transformed with Gal4BDmyctag (tracer) and H19TSUAS (target) constructs, there is enrichment for mouse HI 9 sequences present in the target construct, but not for the endogenous human H19 gene or an unrelated gene such as Xist (Fig. 4A and B). In the absence of tracer protein, the target mouse HI 9 sequences are not detected (Fig. 4C and D). Fig. A,B, tracer and target; Fig. 4C, no tracer, no target; Fig. 4D, target, no tracer.
  • FIG. 5 a schematic depiction of the HI 9 gene relative to its position to the Ins2, Ig ⁇ and Nctcl genes on the chromosome is shown.
  • Fig. 5 A shows the relative positions of the genes (boxes with names in), within the cluster and the position of the associated accession numbers for the Genbank sequences.
  • Fig. 5B shows an expanded view of the region, with the exons of the genes depicted as open boxes and positions of primer pairs depicted as small triangles above the line. DNA sequences or the primers are given in Table 1.
  • Fig. 5C is a further expansion of the H19DMR region showing where the UAS binding site is with relation to the four primer pairs in this region.
  • Fig. 6 provides in vivo results showing that transgenic Gal4 can bind to the UAS binding site in hybrid mice.
  • Fig 6A The results of a ChlP assay showing the ratio of the bound sequences relative to the total input. Samples in which both Gal4 and the UAS binding site are present (WG), only Gal4 is present (Gal4) and wild type (WT) were compared. 13-14 at which the highest ratio was obtained is the region immediately adjacent to the UAS binding site.
  • 6B is a slot blot and the results of densitometry analyses in which the bound DNA from a sample in which both the UAS binding site and the Gal4 protein is present (WG) is directly compared to a sample in which the target site, but not the Gal4 protein, is present (UAS).
  • the peak obtained at 13-14 (arrows) is that nearest to the UAS binding site.
  • the peak at DMR1 is an artefact due to background signal on a blot.
  • part A shows Ig ⁇ and HI9 genes separated by 90 kb intervening sequences.
  • the Ig ⁇ DMR1 and 2 and the HI9 DMR regions are expanded to show the location of CTCF binding sites (*), restriction sites (B (BamHI), H (Hind III) K, (Kpnl), the knocked-in UAS/loxP site, polymorphic restriction sites (sp, spretus, dom, domesticus), occurring in Bs (BsaAI), BsX (BstXI), D (Dral), E (EcoNI), Sp (Sphl).
  • Q-PCR primer locations are marked (Q) in DMR1, DMR2, the INS and HI 9 DMR and 3C PCR primers are marked with roman numerals and arrow heads to indicate their direction.
  • B The principle of the GAL4-UAS strategy illustrated with a hypothetical secondary chromatin structure. The targeted UAS binding site upstream of HI 9 is bound by the transgenic GAL4-MYC tag fusion protein. Upon fixation with formaldehyde, sequences that are in close physical proximity to GAL4 are cross-linked together (shaded area).
  • Q- PCR detects DNA sequences which are in physical proximity to GAL4-UAS (HI 9 DMR, DMR2) but not those that are remote (DMR1, INS).
  • Fig. 8A shows the targeting strategy, analogous to that shown in Fig. 3 above, to introduce three copies of the GAL4 binding site UAS into the HI9 DMR.
  • the endogenous HI9 gene is shown on top, the targeting construct ("TC", which is inserted by homologous recombination downstream of the CTCF sites at a Bglll restruction site) in the middle.
  • a downstream probe for the endogenous H19 gene is denoted "P”.
  • Cre-mediated deletion the HI 9 DMR-UAS construct shown on the bottom results.
  • Fig. 9A ChlP and Q-PCR was carried out on day 9 livers after maternal transmission of the H19 DMR-UAS. Note enrichment for H19 DMR and DMR1, but not INS and DMR2. Similar results were obtained in five independent experiments. A representative ChlP experiment is shown, with error bars indicating the variation in duplicate Q-PCR experiments.
  • Fig. 9B shows ChlP and Q-PCR results on day 9 livers after paternal transmission of the H19 DMR-UAS. Note enrichment for H19 DMR and DMR2 sequences, but not for INS and DMR1. Similar results were obtained in three independent experiments.
  • FIG. 9C comparison is made of bound to input ratios for maternally and paternally transmitted H19 DMR-UAS, showing relative enrichment for the H19 DMR upon paternal and maternal transmission of the HI 9 DMR-UAS, enrichment for DMR1 after maternal transmission, and enrichment for DMR2 after paternal transmission.
  • FIG. 10A the orientation of primers (arrows) and their distances to Kpnl sites in HI 9 DMR and Ig ⁇ DMR2 are shown. Polymorphisms are depicted as dots.
  • Fig. 10B PCR products from primer combinations after ligation between Kpnl fragments in the H19 DMR and Ig ⁇ DMR2 region are shown. Primer combinations are shown above each gel. Experiments were carried on reciprocal hybrids between SD7 and C57/B16 (B6). All individual PCR experiments were replicated at least twice. Fig.
  • IOC shows the sequence of the 768 bp PCR product obtained with primers in in HI 9 DMR and XII in Ig ⁇ DMR2 on B6 X SD7 tissue reveals this is the paternal Ig ⁇ allele by the presence of the Bs ⁇ AI spretus polymorphism.
  • the sequence of the 1.5kb PCR product obtained with primer PA in HI 9 DMR and PC in Ig ⁇ DMR2 on SD7 X B6 tissue reveals this again to be the paternal Ig ⁇ allele by the presence of the EcoNI domesticus polymorphism.
  • Fig. 10D shows orientation of primers (arrows) and their distances to Hindlll sites in HI 9 DMR and Ig ⁇ DMR1. Polymorphisms are depicted as dots.
  • PCR products after ligation between Hindlll fragments in the H19 DMR and Ig ⁇ DMR are shown. Primer combinations are shown above each gel. These products did not span any polymorphisms. All individual PCR experiments were replicated at least twice.
  • Eukaryotic gene expression is controlled over long distances by regulatory regions such as enhancers, promoters, boundaries, insulators, and silencers.
  • DNA winds around nucleosomal proteins, mainly histones, to form chromatin.
  • the chromatin also interacts with other non-nucleosomal proteins such as transcription factors, cofactors and enzymes and folds into a higher order structure that is important in the overall nuclear architecture of the cell.
  • This higher order structure has an important role in gene expression, since the regulatory regions of genes that on a linear template can be tens to hundreds of kilobase pairs apart may be brought into close proximity by specific changes in the DNA conformation.
  • a "tracer target binding assay” as a method in which for example DNA-protein, protein-protein, and/or DNA-DNA interactions can be studied in vivo.
  • This method involves introducing a DNA binding sequence into a regulatory region of interest by gene targeting in mice and breeding the mice with transgenic tracer mice that express a DNA binding tracer protein.
  • the tracer protein bound to its target is isolated along with its interacting regulatory DNA templates and proteins by chromatin immuno-precipitation with an antibody specific for the tracer protein.
  • chromatin immuno-precipitation with an antibody specific for the tracer protein.
  • the method can be applied to broader investigations of chromatin structure.
  • the method involves generating mice that express a transgenic tracer protein that can bind DNA (Tracer mice) and mice that have a specific binding site for the tracer protein (TS) targeted to the specific gene region of interest (Target mice).
  • the tracer protein can be any small DNA binding protein with a known binding sequence that is not normally expressed in mammals.
  • the tracer protein should merely bind to the DNA and have no DNA activation or modifying properties.
  • it should have strong epitopes for antibody binding. This can be accomplished by fusing a known epitope tag onto the tracer protein.
  • TS-vector Tracer binding site
  • the TS vector has the TS, Lox-P-neo cassette and two multiple-cloning sites, to enable the insertion of homology arms by direct or shuttle vector cloning techniques.
  • the TS cassette can be simply excised from the vector and blunt end ligated into any convenient site in another construct (see methods below).
  • the targeted mice with the TS inserted into the gene of interest are bred with Tracer mice. In the offspring, the Tracer protein binds to the single unique TS site.
  • the appropriate tissues are then isolated for chromatin immuno-precipitation with a tracer specific antibody.
  • the tracer protein Since the tracer protein is unique, only proteins and DNA (chromatin) that interact at the targeted gene region will be isolated. However, the immuno-precipitation reaction will pull down the target protein plus any other proteins and DNA that have been fixed at this region by the cross-linking process. This will enable the detection of DNA sequences, other than the target binding site that will be present in the cross linked complex, due to protein DNA interactions at this site. Western blot or mass-spectrophotometry analyses can be used to identify proteins. Additional DNA sequence elements can be identified using quantitative PCR, or cloning techniques or hybridisation to specific micro-arrays. The tracer mice lines need only to be generated once, and can be bred with various target mice.
  • DMRs differentially methylated regions
  • Ig ⁇ Insulin like growth factor 2
  • a paternally methylated germline DMR is located 2-4kb upstream of HI 9, which contains CTCF binding sites and acts as a methylation sensitive insulator between the Ig ⁇ promoters and shared enhancers downstream of HI 9 (see Figs 3 A; 5A,B; 7A).
  • the CTCF zinc finger protein binds and sets up a boundary preventing the Ig ⁇ promoters from accessing the enhancers.
  • Methylation of the DMR on the paternal allele prevents CTCF from binding and the Ig ⁇ promoters can access the enhancers.
  • CTCF binding at the H19 DMR also maintains the unmethylated state of the HI 9 DMR in somatic cells. Mutation of CTCF binding sites in the female germline has no effect on methylation but ablation of CTCF protein in the female germline results in de novo methylation at the HI 9 DMR suggesting that CTCF may influence the H19 DMR also indirectly. Chromatin looping has been proposed as a mechanism whereby CTCF boundary elements separate silent and active domains.
  • the mouse Ig ⁇ has three DMRs, these are the maternally methylated, placenta specific DMR0 located at exon Ul, and the paternally methylated DMRl and DMR2 located upstream of promoter 1, and within exon 6, respectively (see Fig. 7A).
  • Deletion of the maternal HI 9 DMR results in loss of imprinting of Ig ⁇ with biallelic expression in most tissues studied, while deletion of the maternal DMRl results in biallelic expression of Ig ⁇ in mesodermal tissues.
  • Deletion of DMR2 has no effect on imprinting but reduces transcriptional activation of Ig ⁇ .
  • DMRl has a silencer function and DMR2 has an activator function.
  • vectors were constructed and in vitro assays used to validate the methodology in a mammalian cell line at high levels. Thereafter, target and tracer mice were produced, and the invention was tested in transgenic hybrid mice.
  • a small protein from a bacterial or yeast system with known DNA binding properties can be used.
  • the protein is most preferably inert, i.e. it preferably has no enzymatic or tianscriptional activation properties.
  • To make our tracer protein we used the binding domain of the yeast Gal4 protein, fused to the myc epitope tag (Gal4BDmyctag).
  • the Gal4BD sequence was amplified from the pGBKT7 vector (available from BD Clontech, Cat K1612-B), using the following primers: Fwd: 5' - CCT CCT GAA AGA TGA AGC [SEQ ID NO: 1]; Rev: 5' - TCG CCC TAT AGT GAG TCG [SEQ ID NO: 2].
  • the PCR product was cloned into the Hinc II site in MCS of the pCMV/myc/nuc vector (Invitrogen pShooter system), so that the Gal4 binding domain was in frame with 3 nuclear localisation signals and the myc epitope tag.
  • the tracer construct ( Figure 2a) was verified by sequencing and prepared for pronuclear injection by restriction enzyme digestion to remove most of the plasmid backbone and ethanol precipitation. Pronuclear injection into fertilised mouse oocytes was done at the Babraham Institute Transgenic facility. Transgenic mice were genotyped by PCR and Southern blotting ( Figure 2b). Detection of tracer protein expression was done with Western blotting with anti-myc and anti-Gal4 antibodies (Upstate). Proteins of 22kDa were detected at high levels in extracts from liver and kidney ( Figure 2c).
  • Gal4 binds to the following consensus sequence, known as the UAS site: 5' - CGG AGG ACT GTC CT [SEQ ID NO: 3].
  • the following oligos Sense: 5' - AGC TTA TGG ATC GGA GGA CTG TCC TCC GG [SEQ ID NO: 4] and Compliment: 5' - ATA CCT AGC CTC CTG ACA GGA GGC CTA G [SEQ ID NO: 5] (obtained from Genosys) were annealed and ligated into the Smal site of the pRAY 1 vector (GenBank accession number CVU63018 [SEQ ID NO: 6], Stork et al, 1996, Nucleic Acid Research.
  • This vector has two MCS on either side of a lox P-neo-ura cassette. A new Bglll site was added into the second MCS so that the UAS lox P-neo cassette could be excised with Bglll to make the TS targeting cassette ( Figure 3).
  • the cassette can be excised with Bglll , blunt ended and blunt-end ligated into any cloned gene of interest.
  • the H19TSUAS construct was linearised and purified by ethanol precipitation and electroporated into ES cells. Positive clones were identified by Southern blots and the ES cells were then transiently transfected with recombinant CRE protein to delete the neo and URA genes. ES cell clones that had successfully deleted the lox-P cassette contained the URA site inserted into the upstream region of the HI 9 gene. These were injected into blastocysts. The resulting chimeras were bred and germline transmission has been achieved. These mice were then mated with the Gal4 tracer mice.
  • both the tracer and the target constructs were incorporated into HEK 293 cells with the Qiagen Effectine system. Selection for stable transfectants with random integration of both constructs was done with G418 (lOOOmg/ml) over a period of two weeks. Clones of G418 resistant cells were examined for the integration of both the tracer (Gal4BDmyctag) and the target construct (H19TSUAS), by genomic PCR.
  • Chromatin extraction and ChlP assays For the ChlP analysis, 1% Formaldehyde was added to the cells for 10 minutes. The cells were then washed twice in ice cold PBS and suspended in 1 ml lysis buffer with protease inhibitor (1% SDS; lOmM EDTA; 50mMTris HC1 (pH 8.1), lmM PMSF; lug/ml apoprotin) for 5 minutes. The lysed cells were sonicated (3-4X 30 sec - to reduce DNA to 200 and 1000 bp) and then centrifuged for 10 minutes 13000 rpm at 4°C to remove cellular debris.
  • protease inhibitor 1% SDS; lOmM EDTA; 50mMTris HC1 (pH 8.1), lmM PMSF; lug/ml apoprotin
  • the chromatin suspension was diluted 10 fold in ChlP dilution buffer (0.01% SDS; 1.1% Triton-X-100; 1.2 mM EDTA; 16.7 mM Tris-HCl pH8.1; 167mM NaCl + protease inhibitors) and pre-cleared with 80ul Salmon sperm/Protein A agarose slurry (UPSTATE, Cat 16-157) for 30 minutes, at 4°C. lOul of anti-myc polyclonal antibody (UPSTATE) was added and incubated overnight at 4 °C on a rotation platform. Then 60ul Salmon sperm/Protein A agarose slurry was added and the reaction was allowed to incubate for 1 hour at 4°C.
  • ChlP dilution buffer 0.01% SDS; 1.1% Triton-X-100; 1.2 mM EDTA; 16.7 mM Tris-HCl pH8.1; 167mM NaCl + protease inhibitors
  • the ProteinA/agarose-antibody-chromatin complex was collected with gentle centrifugation (8000 rpm at 4°C for 1 min) and the supernatant containing the unbound chromatin was discarded.
  • the complex was washed for 5 minutes in 1ml ChlP lysis low salt buffer (1% Triton-X-100; 140mM NaCl; 50mM HEPES pH7.5; 0.1% Sodium deoxycholate + protease inhibitors), followed by a 5 minute wash in lml ChlP lysis High salt buffer (1% Triton-X-100; 500mM NaCl; 50mM HEPES pH7.5; 0.1% Sodium deoxycholate + protease inhibitors) and then finally washed for 5 minutes in 1ml ChlP Lithium immune complex buffer (250mM LiCl; lOmM HEPES pH7.5; lmM EDTA; 0.5% IGEPAL CA-630 (Sigma I 8896); 0.5% Sodium deoxycholate + prote
  • the chromatin- antibody complexes were eluted twice off the Protein-A beads with 250ul freshly made elution buffer (1% SDS; 0.1M NaHCO 3 ).
  • the cross-links were reversed by adding 20ul 5M NaCl and incubating for 5 hours at 65°C.
  • DNA was extracted by phenol- chloroform, after an hour of 45°C incubation in lOul 0.5 M EDTA; 20ul 1M Tris-HCl, Ph6.5 and 2 ⁇ of lOmg/ml Proteinase K. DNA was suspended in TE buffer.
  • Real time PCR was performed with primers optimised according to the guidelines given in the ABI Prism 7700 sequence detection User's Manual (http://www.appliedbiosystems.com/support). Real time PCR was performed on an ABI Prism 7700, using the SYBR Green PCR mastermix (Applied Biosystems Cat 4309155 ) according to the manufacturer's instructions (Protocol cat 4310251).
  • Table 1 Primers sequences and optimal concentrations for quantitative real time PCR across the IGF2-H19 region.
  • mice containing the UAS binding site upstream of the H19 gene were bred with tracer mice transgenic for Gal4 (Gal4BDmyctag) and the offspring were genotyped by PCR.
  • the genotype results of two litters are given in Table 2 (see Results, below).
  • Chromatin preparations were made from individual livers taken from 9-day old pups in each litter. PCR primers were standardised against known concentrations of high molecular weight mouse DNA. The results showed that DNA made from chromatin after formaldehyde fixing and sonication did not produce homogeneous results for all the primer pairs tested, with some regions amplifying more readily than others in the input DNA. This is presumably due to irregular and variable shearing of the chromatin during sonication. The amount of amplification obtained from the bound DNA was normalised to the input DNA. Each chromatin prep was cross linked with 1% formaldehyde and litter I was frozen at -70°C.
  • PCR primers were standardised against known concentrations of high molecular weight mouse DNA. The results showed that DNA made from chromatin after formaldehyde fixing and sonication did not produce homogeneous results for all the primer pairs tested, with some regions amplifying more readily than others in the input DNA. This is presumably due to irregular and variable shearing of the chromatin during sonication. The amount of amplification obtained from the bound DNA was normalised to the input DNA.
  • a second ChlP assay was performed on 2 samples (genotypes: WG and UAS) of chromatin prepared from livers taken from 9 day old pups from the litter II. In these chromatin preparations, the sonication step was replaced by a restriction digestion using Alul.
  • a slotblot assay was used to assess any long range DNA protein interactions. Probes on duplicated slotblots were PCR products (copy number of amplicons in excess of 1X10 13 ) amplified from normal mouse DNA using primers ranging from the upstream of the Ins2 gene to the downstream Nctcl gene as shown on Fig. 5.
  • the tracer protein (GaUBDmyctag) binds to its specific target (H19TSUAS site) in transfected cells
  • the expression of the tracer protein Gal4BDmyctag was detected in transfected cells by immunostaining with anti-myc antibodies.
  • the Gal4BDmyctag protein was localised within the nucleus, near the nuclear membrane, as shown by DAPI staining.
  • a chromatin immunoprecipitation assay (Fig. 4). The success of this assay depends on whether the expression levels of Gal4BDmyctag protein binding to its target UAS binding site was high enough to be immunoprecipitated. Two clones of stably transfected HEK cells, that were positive for both the tracer (Gal4BDmyctag) and the target (H19TSUAS), as well as a negative control (untransfected HEK cells) and a clone that was positive for the target construct (H19TSUAS), but not the tracer protein, were tested.
  • transgenic mice The transgenic tracer mice appear to be healthy and the Gal4BDmyctag protein seems to be widely expressed and can be immunoprecipitated from brain, liver and kidney (Fig. 2).
  • Our H19TSUAS targeted mice showed high levels of chimerism and we have germline transmission.
  • the tracer mice were bred with the H19TSUAS target mice, and the hybrids appear to be healthy.
  • the results of the two litters from a female H19TSUAS target mouse mated with a male Gal4BDmyc tag tracer mouse are tabulated in Table 2.
  • Table 2 Genotype results of H19TSUAS X Gal4BDmvctag.
  • n number of pups in litter
  • WT wild type i.e. no Gal4BDmyctag or H19TSUAS
  • WG Gal4BDmyctag, and H19TSUAS
  • Gal4BDmyctag no H19TSUAS
  • UAS Gal4BDmyctag, and H19TSUAS.
  • Gal4 protein was present (Fig. 6A, WT). Despite a higher background, there was no enrichment when the Gal4 was present in the absence of UAS binding sites (Fig. 6A, Gal4). This enrichment seems low, but if the reduced amount of DNA obtained after the
  • ChlP assay is taken into consideration the enrichment may in fact be much higher.
  • DNA-protein and long distance interactions at the HI 9 locus are studied in the hybrid mice.
  • Our aim with this experiment is to look for the interaction of the HI 9 DMR with other differentially methylated regions at this locus This region is an important imprinting control element and has been shown to have a boundary/insulator function.
  • This region is an important imprinting control element and has been shown to have a boundary/insulator function.
  • the HI 9 DMR and the DMRs on the neighbouring Igf2 gene interact.
  • An important advantage of our transgenic assay is that it is possible to follow the interactions of the HI 9 gene at different stages of development, in different tissues and after various parental imprinting effects.
  • Genes are surrounded by regulatory sequences such as promoters, enhancers and locus control regions (LCRs). These are sites on the DNA that DNA binding proteins (transcription factors, activators, and repressors) recognise and where they attach to the DNA. Promoters activate gene expression and are generally situated directly upstream (in front of) a gene, while enhancers and LCRs have longer ranges and can be located upstream, downstream or even within a gene. Enhancers and LCRs fine-tune promoter activity and influence the levels and tissue specific patterns of gene expression. The nature and mechanism of these interactions is not understood and there is still much to be learnt about the similarities and differences between enhancers and LCRs. Recent studies have identified another transcriptional regulatory region, known as the insulator or boundary element. These elements, when placed between a promoter and an enhancer, prevent transcription, presumably by blocking promoter access to the enhancer.
  • Models that have been invoked to explain long-range interactions between promoters and enhancers include looping, tracking and linking.
  • the "looping model" for enhancer promoter interactions was first proposed in 1986. There has been mounting support for this model in the literature and in many bacterial genes, it is now accepted. Evidence for DNA looping between enhancers and promoters in higher eukaryotic cells is still indirect. Certain DNA sequences, such as those near the telomeres are more flexible and have internal homologies that make them more likely to form loop structures. In yeast, looping at the telomeres has been shown to influence gene expression directly (see for example de Bruin et al, 2001, Nature 409: 109-113).
  • the tracking model was developed on the basis that certain factors move processively along the DNA helix, while the "linking model” refers to the establishment of a chain of regulatory factors involving "facilitator” proteins that would link a distant nucleoprotein complex and a promoter.
  • Double stranded DNA is wrapped around histone proteins to form an array of nucleosomes that comprises the primary structure of chromatin, which is then folded and condensed in the nucleus of a cell, forming higher order structures.
  • Chromatin folding is mediated by interactions of the nucleosomes with regulatory proteins on specific DNA sequences (to form secondary structures) and also long distance contacts involving interactions between the secondary structures, possibly involving enhancers and promoters.
  • An integral part of this complex multilevel assembly is a class of proteins known as high mobility group (HMG) proteins.
  • HMG proteins bind to the nucleosomes between the histones and the DNA and have been shown to have an effect on the architecture of chromatin structure, since they bend the DNA.
  • chromatin can also be compacted by altering the charge balance between histones and DNA. These charge balances are brought about by chemical modifications, such as acetylation, methylation, phosphorylation and ubiquitination of the histones. Indeed, it is currently thought that there is a "histone code” that determines the accessibility of genes to transcription factors. Histone acetylation increases the accessibility of nucleosomal DNA to sequence specific DNA binding proteins and also perturbs higher order protein folding. Histone methylation influences chromatin structure transitions - histone H3, when methylated at lysine 9 interacts with a protein called HP1, which leads to the assembly of condensed, inactive heterochromatin. In contrast, H3 methylated on lysine 4, is associated with active euchromatin.
  • MARs matrix attachment regions
  • MARs are A/T-rich DNA sequences, often containing topoisomerase II cleavage sites, that mediate the anchoring of the chromatin fibre to the nuclear matrix and that might delimit the boundaries of discrete and topologically independent higher-order domains.
  • Gal4BDmyctag tracer protein can be expressed in a mammalian cell line at high levels. We have also shown that it localises to the nucleus and binds specifically to the integrated TS site by immunostaining and ChlP respectively. Our in vitro assay was not specifically designed to check whether the GaWBDmyctag tracer binds both active and inactive chromatin, since we selected for neomycin resistance gene expression that would have been conferred by the H19TSUAS target construct. The tracer protein does not appear to alter gene expression where it binds to its target, since there were as many H19TSUAS + GaWBDmyctag as H19TSUAS -GaWBDmyctag colonies. If Gal4BDmyctag influenced expression of neomycin resistance then we would have expected an altered ratio, depending on the direction of the effect.
  • Gal4BDmyctag tracer mice are bred onto a congenic SD7 mouse strain background so that the maternal and paternal chromosomes using known polymorphisms markers can be distinguished.
  • the invention was further exemplified in a further series of offspring of tracer mice crossed with target mice. Results from the binding assay were verified using the 3C technique.
  • Chromatin was extracted from day 9 mouse livers as follows: Freshly dissected liver was mashed through a 70 ⁇ m nylon cell strainer into 25ml DMEM medium. The cells were fixed in 2% formaldehyde for 10 minutes at room temperature and quenched with 0.125M glycine. After centrifugation at 3500 rpm for 10 minutes the cells were suspended in lysis buffer (lOmM Tris.HCl, lOmM NaCl, 0.2%NP40 and 1:500 Complete protease inhibitor cocktail (Roche)) for 90 minutes on ice.
  • lysis buffer lOmM Tris.HCl, lOmM NaCl, 0.2%NP40 and 1:500 Complete protease inhibitor cocktail (Roche)
  • the nuclei were pelleted by centrifugation for 15 minutes at 2500rpm and resuspended in 500 ⁇ of REact 3 buffer (Invitrogen) plus 0.3% SDS and incubated for 1 hour at 37 °C. Triton-X (1.8%) were then added to sequester the SDS and the incubation was continued for a further 10 minutes. BSA (1%) and 1200 units of Bam HI (Invitrogen) was added to digest the chromatin overnight at 37 °C.
  • the suspension was then diluted 10 fold in ChlP dilution buffer (0.01% SDS, 1.1% Triton X- 100, 16.7 mM Tris HC1, 167mM NaCl) and processed according to the UPSTATE ChlP kit protocol, using a mouse monoclonal IgG Anti-MYC Tag antibody (clone 9E10, Upstate, Cat 05-419). After reverse cross linking and proteinase K digestion DNA was extracted by isopropanol-ethanol precipitation.
  • the amount of bound DNA and input DNA was measured with a Picogreen dsDNA Quantitation kit (Molecular probes) and depending on the yield, 0.005 -0.5ng of DNA was used for SYBR green Q-PCR analyses with primers for HI9 DMR (see “H19H3" primers in Table 1 above), the intervening sequence between Ig ⁇ and H19 (INS - Fwd: AGA CAC ACT CCC ACC AAG G [SEQ ID NO: 47] and Rev: TCA TCT AGC TGT CAG CTC ACC ⁇ SEQ ID NO: 48]), DMRl (see “DMR1-C1" primers in Table 1 above) and DMR2 (see “DMR2" primers in Table 1).
  • Input DNA was analysed by PCR, using primers that flanked the UAS lox P sites to detect whether there is an input bias of non methylated maternally derived DNA at the HI 9 locus over methylated paternally derived DNA.
  • primers were optimised according to the ABI Prism 7700 user's manual. Standard curves were constructed for each primer using genomic DNA and the input and bound amounts of DNA were calculated from absolute values obtained from comparisons to standard curves.
  • ChlP assays were repeated 5 times for maternal transmission of the HI 9 DMR-UAS and 3 times for paternal transmission on different chromatin samples. The internal positive control was the successful precipitation of HI 9 DMR sequences.
  • Negative controls included chromatin extracted from littermates that were negative for GAL4 protein or the HI 9 DMR-UAS, as well as a no antibody control.
  • Chromosome conformation capture 3C
  • PCR reaction was carried out on a PTC-200 DNA Engine (MJ Research) in a 50 ⁇ l volume containing 50ng DNA, 25pmol of each primer, lOnmol dNTPs (Bioline), lx Phusion HF Buffer and 1U Phusion High-Fidelity DNA Polymerase (Finnzymes).
  • PCR parameters were initial denaturation at 98° C 30 sec, followed by a 3 step cycle (98° C for 7 sec, gradient of 54-69° C for 15 sec, 72° C for 20 sec) for 35 cycles and a final extension at 72° C for 10 minutes.
  • the PCR products were purified using QIAquick PCR Purification Kit (Qiagen) to remove primers and dNTPs, and then sequenced by an external sequencing service (Lark Technologies Inc, GRI-Genomics).
  • the 3C primers (shown in 5' to 3' orientation; see Fig. 7A) were:
  • HI 9 DMR-UAS mice were phenotypically normal and fertile.
  • Offspring with both the HI 9 DMR-UAS knock-in allele and the CMV-GAL4-MYC transgene had normal levels of Ig ⁇ and HI 9 RNA (see Fig. 8D) and were phenotypically normal, demonstrating that the UAS knock-in and the CMV-GAL4-MYC transgene had no effect on the regulation of Ig ⁇ or HI 9.
  • HI 9 DMR sequences adjacent to the UAS binding site were immunoprecipitated in a ChlP assay using chromatin from postnatal day 9 mouse livers and the anti-MYC antibody specific to the tagged transgenic GAL4 protein, confirming that GAL4 bound to its target site in the in vivo transgenic system (Fig. 9A,B).
  • the ChlP assays (coupled with Q-PCR) yielded up to tenfold higher signal in mice with maternal transmission of the HI9 DMR-UAS, than in those with paternal transmission (Fig. 9A,B).
  • Methyl-binding proteins (Mbds) attached to methylated CpGs in the paternal DMR could also contribute to reduced Gal4 binding.
  • Chromatin was extracted from livers of postnatal day 9 reciprocal FI hybrids between SD7 (a congenic mouse strain that is Mus spretus for distal chromosome 7) crossed with C57/B16 (Mus m domesticus) so that we could detect parent of origin specific interactions between the DMRs.
  • SD7 a congenic mouse strain that is Mus spretus for distal chromosome 7
  • C57/B16 Mus m domesticus
  • the unmethylated active boundary element at the HI 9 DMR (which is bound by CTCF) is associated with the unmethylated silencer element on the Ig ⁇ DMRl, while on the paternal chromosome the methylated inactive HI 9 DMR (no CTCF bound) associates with the methylated Ig ⁇ DMR2.
  • CTCF methylated inactive HI 9 DMR
  • DMRl has potential CTCF binding sites, but we have been unable to detect binding by ChlP assay, which suggests that other protein factors as well as CTCF may be interacting at the base of this loop.
  • the H19 DMR interacts with DMR2, partitioning Ig ⁇ into the active chromatin domain.
  • the location of DMR2 at the end of the Ig ⁇ gene positions its promoters in remarkable proximity to the enhancers downstream of HI 9, while HI 9 itself is silent due to promoter methylation. Since the H19 DMR and DMR2 are both methylated their interaction excludes CTCF and must therefore involve other factors.
  • Insulators or boundaries have been proposed to be key elements in the laying down of secondary chromatin structures. Higher order chromatin structure is also considered to play a functional role in nuclear architecture enabling transcription and replication. Direct physical interactions and looping have recently been shown for the beta-globin locus, where a remote enhancer interacts with the globin gene, but our results are the first that reveal epigenetic regulation of long-distance interactions in the genome.
  • the chromatin loop model derived from our results (Fig. 11) describes a simple epigenetic switch by which the Ig ⁇ gene (whose promoters are not regulated by DNA methylation) is moved either into an inactive domain, or into an active domain close to enhancers. The present invention could be used to see whether other imprinted genes, or other epigenetically regulated genes located in clusters, possess similar epigenetic switches.
  • tracer-target binding assay One of the tracer-target binding assay's main applications is to allow isolation and/or identification of DNA sequences or proteins that interact at a specific target site in vivo. The identification of the isolated DNA or proteins will become increasingly easier as the technologies for screening small amounts of DNA and proteins such as microarray, real time quantitative PCR and mass spectrophotometry coupled with bioinformatics improve.
  • a purpose of the tracer-target binding assay is to look for protein-DNA; protein- protein or DNA-DNA interactions formed due to the higher order chromatin structure.
  • An advantage of using transgenic animals or cells thereof is that chromatin at the same locus can be studied in various tissues, and also different developmental stages, enabling a comparison between the expressed and silent state of the gene.
  • Protein-DNA interactions occur when transcription factors bind to promoters, enhancers, and other regulatory regions (LCRs insulators, silencers).
  • LCRs insulators, silencers By introducing a target site for a tracer protein adjacent (in cis) to a regulatory region of interest, proteins that bind to the specific regulatory region can be isolated and identified. This has an advantage over bandshift and footprinting assays in that it is an in vivo reaction. It has an advantage over conventional ChlP assays using antibodies to transcription factors, since it is more specific. Only proteins binding to a specific targeted region will be pulled down by the tracer-target assay.
  • Protein-protein interactions are required for transcription initiation, elongation, and secondary chromatin folding.
  • the tracer-target binding assay could provide information on the specific protein-protein interactions at a given locus and also be used as an co- immunoprecipitation step in the isolation of proteins in a complex.
  • DNA-DNA interactions are likely to be the result of secondary chromatin structures and are unlikely to be direct.
  • the tracer-target binding assay is ideal for detecting these type of interactions and is an excellent way of identifying and/or analysing additional regulatory regions of interest.
  • enhancer sequences that interact with a target sequence can be identified, without any prior knowledge or hypotheses of the enhancer. This will be particularly helpful for characterising different tissue specific enhancers influencing a single gene promoter.
  • SEQ ID NO: 6 (pRAY 1 vector; GenBank Accession No. CVU63018):

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The invention relates to methods for studying nucleic acid (for example DNA-DNA), protein-protein and/or nucleic acid-protein (for example, DNA-protein) interactions. In one aspect, the invention covers a method for identifying and/or analysing nucleic acid and/or protein associated with a chromosome location in a host, comprising: (i) providing a host including: (a) an exogenous DNA-binding protein (a 'tracer protein', for example containing the Ga14 Binding Domain); and (b) a DNA-binding site (for example, the UAS DNA-binding site) located at the specific chromosome location and capable of being bound specifically by the tracer protein; (ii) allowing the tracer protein to bind to the DNA binding site to form a bound complex in the host; (iii) identifying nucleic acid and/or protein associated with the bound complex.

Description

Binding Assay
The present invention relates to methods for studying nucleic acid (for example DNA- DNA), protein-protein and/or nucleic acid-protein (for example, DNA-protein) interactions. In particular, the invention is applicable to studying chromatin structure and its involvement in gene regulation.
Over the past decade research into the genomes of many diverse organisms has produced a vast amount of information on the location and structure of genes. Sequence data is now available of the nearly completed (99%) human genome map. A central challenge facing the biomedical community is how to derive knowledge about the function of these genes. Although every gene exists within every nucleated cell in the human body, only a small percentage of genes are active in any given cell. In the post genomic era, we need to elucidate the mechanisms of gene expression (transcription), and also to analyse the control of this process.
DNA does not only encode the genes, but also contains instructions for the regulation of gene expression. Furthermore, chromatin higher-order structure formed by interaction of DNA with various proteins (for example, histones) plays a critical role in gene regulation, presumably because regulatory regions that are separated by long stretches of DNA can be brought into close proximity. Further understanding of gene regulation will require the identification of DNA binding proteins, their DNA recognition sites, and how they interact to activate or silence genes. This task is not straightforward, since DNA binding proteins require specific conditions for their binding to the DNA target sites, and the underlying chromatin structure also plays a major role in this process.
A mixture of descriptive, biochemical or genetic approaches have been used to study gene regulation. Genetic systems where transcription factors are over or under expressed to see the overall effect on the genome currently provide in vivo functional data. Transgenic technology has advanced to the stage where functional knockouts of specific genes encoding transcription factors or precise targeted deletions of regulatory regions of a gene can be made. However, making these transgenic animals is a labour- intensive process, and the proteins involved in gene regulation are often part of gene families, for which there may be functional redundancy. This can result in a lack of phenotype or very subtle easily missed phenotypes.
Descriptive methods include RNA profiling and nuclease hypersensitivity assays in various tissues, as well as direct visualisation of chromatin and protein structures by crystallography, single molecule imaging, and electron and light microscopy techniques together with fluorescence resonance energy transfer (FRET). Electron microscopy techniques are laborious and not easily applicable to specific gene loci. Light microscopy has a resolution of 100 to 200nm, which is insufficient to resolve higher order chromatin structure. DNA binding proteins fused to green fluorescent protein permit visualisation of independent loci, but only a few positions can be examined simultaneously. Macromolecular interactions between two or more proteins can be monitored by fusing green- and blue fluorescent proteins to potential interacting partners and directly visualising the association in a single cell (Mahajan et al, 1998, Nature Biotechnol. 16: 547-552). The strength of in situ fluorescence techniques is that these interactions can be examined in single cells.
Biochemical methods for looking at various features in isolation, such as localised protein-DNA interactions, nucleosomal spacing, and histone modifications, are reductionist, in vitro studies and only hint at secondary structure. These methods include electromobility shift, footprinting and various transcription reporter assays in cell free systems. Many DNA binding proteins interact with other proteins to form regulatory complexes. Yeast two-hybrid and one-hybrid assays (commercialised by BD Clontech) are tools for detecting protein-protein and protein-DNA interactions respectively. The drawback of this technology is that interactions detected in a yeast system need to be further tested in a mammalian system. Modifications of this technology include a mammalian two hybrid assay (BD Clontech) for studying protein-protein interactions in cell lines. Combinations of microchip array based methods, quantitative real time PCR technologies, immunochemical, and chromatography methods are providing alternative methods for identifying transcription regulatory proteins, DNA binding elements, and the genes affected.
A "Chromosome Conformation Capture" (3C) approach to directly analyse chromatin conformation has been described by Dekker et al. (2002, Science 295: 1306- 1311). In the 3C method, nuclei can be isolated from tissue and fixed with formaldehyde. The result of fixing is that physically touching segments of DNA and protein are cross-linked. The frequency of cross-linked DNA is quantified by digesting the DNA with a restriction enzyme and then ligating the fragments at low DNA concentrations. The cross-linked fragments will be in closer proximity and ligate more frequently than the random fragments. This technique requires extensive optimisation for use in mammalian cells, for example as attempted by Tolhuis et al. (2002, Mol. Cell 10: 1453-1465), in addition to a series of very sensitive control experiments.
Chromatin immuno-precipitation (ChlP) is a further technique which can be used for in vivo study of protein-DNA interactions, h that method, chromatin from a given tissue or cell line is treated with formaldehyde or another cross linking agent to fix protein-DNA interactions, and the whole complex of proteins bound to the DNA is then precipitated with an antibody specific for a known DNA binding protein. After reversal of the cross links, the genes that contain a binding site for the particular DNA binding protein can be identified. ChlP is a powerful and adaptable technique, but is limited by the availability of suitable antibodies to particular DNA binding proteins of interest. The ChlP technique and specific modifications that can be used to identify a nucleotide sequence recognised by a binding factor have been described by Wells and Famam (2002, Methods 1: 48-56). These modifications include "Double antibody ChlP" or ChlP combined with Western blot analysis and mass spectrometry could be used to detect additional proteins that interact with a protein of interest.
Limitations of the ChlP technique include: low efficiency of target sequence recovery; low amounts of co-precipitated protein; and inability to distinguish between direct protein-DNA interaction and indirect interactions. Negative results can be obtained where factors present in large complexes are inaccessible, or where biochemical properties of the protein have been changed during cross-linking. An alternative to ChlP is to use in situ fluorescent techniques to detect interaction between two proteins in the same cell. This has the advantage of requiring fewer cells, but the method is a hypothesis-based approach, requiring some speculation as to the interacting proteins. The resolution of those techniques is not currently as sensitive as ChlP .
Limitation in the prior art methods for studying nucleic acid (for example, DNA-DNA), protein-protein and/or nucleic acid-protein (for example, DNA-protein) interactions are addressed by the present invention.
According to a first aspect of the present invention, there is provided a method for identifying and/or analysing nucleic acid and/or protein associated with a chromosome location in a host, comprising: (i) providing a host including:
(a) an exogenous DNA-binding protein (a "tracer protein"); and
(b) a DNA-binding site located at the specific chromosome location and capable of being bound specifically by the tracer protein;
(ii) allowing the tracer protein to bind to the DNA binding site to form a bound complex in the host; and
(iii) identifying and/or analysing nucleic acid and/or protein associated with the bound complex.
We have developed an in vivo method to examine, for example, DNA-DNA, DNA- protein and/or protein-protein interactions at a chromosome location (such as a gene region) of interest. The method for example allows identification of nucleic acid and/or protein associated with the chromosome location in a wide range of tissues during different developmental stages of the host, allowing a comparison to be made between expressed and silent states of a gene. Thus, for example, maternal or paternal transmission effects ("imprinting") could be studied in hosts which undergo sexual reproduction. In one embodiment of this system, an exogenous (or transgenic) tracer protein binds to a preferably unique binding site, which has been placed adjacent to the region of interest in the DNA. The tracer protein bound to its target, together with surrounding proteins and DNA in the immediate vicinity can be studied for specific protein-protein and/or protein- DNA and/or DNA-DNA interactions.
The method may further comprise the step of isolating the bound complex from the host or from a sample (for example, a cell or tissue) taken from the host before identifying and/or analysing nucleic acid and/or protein associated with the bound complex.
The nucleic acid and/or protein associated with the chromosome location may be endogenous. This allows in vivo associations to be studied.
The DNA-binding site may be exogenous, for example a unique binding site which is not recognised by endogenous DNA-binding proteins. The DNA binding site may be positioned at a gene regulatory element. The DNA binding site is preferably non- transcribed in vivo.
The tracer protein may be encoded by DNA introduced into the host. Preferably, the tracer protein binds DNA without activating and/or modifying the DNA, so that development of the host is not affected by the bound complex. The tracer protein is preferably ubiquitously expressed in the host.
In another aspect of the invention, the tracer protein may be inducible, for example by heat or chemical activation. The tracer protein may be fused to an activator or inactivator protein or polypeptide so that the tracer protein is expressed only when the activator or inactivator protein is expressed. In such systems, a bound complex may be formed only after the tracer protein has been expressed, allowing the timing of the formation of the bound complex to be controlled. The tracer protein may comprise an epitope for antibody binding. The bound complex may be isolated by immuno-precipitation. The bound complex may be analysed by immunofluorescence, antibody array, microarray and/or quantitative real time PCR.
The host is in one embodiment formed by crossing a first host containing the tracer protein and a second host containing the DNA binding site.
The host may be non-yeast, for example a mammal. The host may be an organism, for example a mouse, and preferably a non-human organism. The host may alternatively be cells or a cell line, for example mammalian (such as human) cells or a mammalian (such as human) cell line.
In a further aspect of the invention there is provided a method for identifying and/or analysing endogenous nucleic acid and/or protein associated with a chromosome location in a mouse, comprising:
(i) producing a transgenic tracer mouse comprising an exogenous DNA-binding protein
("tracer protein");
(ii) producing a transgenic target mouse comprising an exogenous DNA binding site at the chromosome location, in which the DNA binding site is capable of being bound specifically by the tracer protein;
(iii) crossing the tracer mouse with the target mouse to produce a hybrid mouse including the tracer protein and the DNA binding site;
(iv) allowing the tracer protein to bind to the DNA binding site to form a bound complex in the hybrid mouse; and (vi) identifying and/or analysing endogenous nucleic acid and/or protein associated with the bound complex.
The method may further comprise the step of isolating the bound complex from the hybrid mouse or from a sample (for example, a cell or tissue) taken from the hybrid mouse before identifying and/or analysing endogenous nucleic acid and/or protein associated with the bound complex. The tracer protein of the invention in a preferred embodiment comprises a yeast Gal4 DNA-binding domain, or a functional equivalent thereof, fused to a myc epitope tag. The DNA binding domain then preferably comprises the UAS binding site (SEQ ID NO: 3) or a functional equivalent thereof.
Alternatively, the tracer protein may comprise a zinc finger DNA-binding domain. The DNA binding domain then preferably comprises a zinc finger sequence specific for the zinc finger protein binding domain.
Alternatively, the tracer protein may comprise a green fluorescent protein fused to a DNA-binding domain. The DNA binding domain then preferably comprises a DNA sequence specific for the green fluorescent protein DNA-binding domain.
The DNA binding domains of further proteins which are known in the prior art to bind specifically to DNA binding sites may be used in combination with the respective DNA binding sites in the present invention. For example, the DNA binding site may be a transcription factor recognition site, a restriction enzyme recognition site, an enhancer, a silencer, a specifically engineered target site and/or a recombinase enzyme binding site.
In a further aspect of the invention there is provided a method for identifying and/or analysing nucleic acid and/or protein associated with regulating gene expression, comprising:
(i) by the method defined above, identifying and/or analysing nucleic acid and/or protein associated with a gene at a chromosome location under a first condition;
(ii) by the method defined above, identifying and/or analysing nucleic acid and/or protein associated with the gene under a second condition;
(iii) comparing the results obtained in step (i) and step (ii); and
(iv) identifying and/or analysing nucleic acid and/or protein associated with regulating gene expression. In another aspect of the invention, there is provided a method for conducting a drug discovery business, comprising:
(i) by the method defined above, identifying and or analysing nucleic acid and/or protein associated with regulating gene expression; (ii) generating a drug screening assay for identifying and/or analysing agents which inhibit or potentiate regulation of gene expression by the nucleic acid and/or protein identified in step (i);
(iii) conducting animal toxicity profiles on an agent identified or analysed in step (ii), or an analogue thereof; (iv) manufacturing a pharmaceutical preparation of an agent having a suitable animal toxicity profile; and (v) marketing the pharmaceutical preparation to healthcare providers.
In yet a further aspect of the invention, there is provided a method for conducting a bioinformatics business, comprising:
(i) by the method defined above, identifying and/or analysing nucleic acid and/or protein associated with a gene at a chromosome location under a given condition; and repeating step (i) thereby
(ii) generating a database comprising information identifying and/or analysing different nucleic acid and/or protein associated with one or more genes under one or more conditions.
Also provided according to the present invention is a host for use in the method defined herein, in which the host comprises a tracer protein and a DNA-binding site. The host may be a mouse or a human cell. The tracer protein may comprise a yeast Gal4 DNA- binding domain. The DNA-binding site may comprise the UAS binding site (SEQ ID NO: 3).
According to an alternative aspect of the invention, the tracer protein may be directed to a target site excluding DNA, for example a target protein or other non-DNA chemical structure. Such a target may, for example, be located at a specific location in a protein complex.
Further provided according to the present invention is a mammal or mammalian cell comprising a yeast Gal4 DNA-binding domain. The invention also provides a mammal or mammalian cell comprising the UAS binding site (SEQ LD NO: 3). The mammal or mammalian cell is preferably murine.
In a further aspect there is provided the use of a transgenic mouse in the method defined above.
In a preferred embodiment, the nucleic acid is DNA. The nucleic acid may further be RNA.
The invention further provides a nucleic acid construct, for example a DNA vector, for insertion of an insertion sequence into a specific location of a host, comprising, in the following order, a first cloning site for insertion of a first nucleic acid sequence homologous to a sequence on one side of the specific location, the insertion sequence, one copy of a direct repeat sequence, a selective marker, a second copy of the direct repeat sequence, and a second cloning site for insertion of a second nucleic acid sequence homologous to a sequence on the other side of the specific location. The insertion sequence may be a nucleic acid binding site such as the UAS binding site (SEQ ID NO: 3). The direct repeats may be LoxP sequences. The selective marker may be a neo gene. The nucleic acid construct may be designed for use in eukaryotic organisms or cells, including mammalian organisms or cells. As exemplified using a TS vector (for example the H19TSUAS construct described below), the nucleic acid construct may be used to target an insertion sequence into the genome of a host a specific location.
Also provided is use of the nucleic acid construct in methods of the invention as defined herein. Embodiments of the invention are now described by way of example with reference to the accompanying drawings in which:
Fig. 1 shows a graphical outline of a tracer target binding assay according to the invention;
Fig. 2 shows generation of transgenic Tracer mice, using the Gal4 binding domain as the tracer protein;
Fig. 3 illustrates a strategy for making targeted ES cells to target the UAS site for Gal4 binding into the HI 9 gene;
Fig. 4 shows ChlP assay results for a proof of principle in vitro experiment using HEK 293 cells;
Fig. 5 shows the HI 9 gene with respect to genes around it, and the location of primer sequences used in further experiments; and
Fig. 6 depicts the results of two in vivo experiments, illustrating that the transgenic Gal4 can bind UAS site in hybrid mice;
Fig. 7 illustrates the Igf2-H19 locus and the GAL4-UAS knock-in strategy to detect physical interactions between the DMRs;
Fig. 8 illustrates the targeting strategy to introduce three copies of the UAS binding motif into HI 9 DMR (part A) and northern analysis in transgenic mice of Igβ and HI 9 RNA (part B);
Fig. 9 shows the results of ChlP with an anti- MYC antibody directed against the transgenic GAL4-MYC tagged protein following maternal or paternal transmission of the HI 9 DMR UAS; Fig. 10 shows interactions between HI 9 and Igβ DMRs identified using Chromosome Conformation Capture ("3C") assays; and
Fig. 11 depicts a model for parent specific interactions between HI 9 and Igβ DMRs to provide an epigenetic switch for Igβ.
In more detail:
Fig. 1 shows a Cre-loxP transgenic strategy used to introduce a tracer-binding site ("TS") into a regulatory region of any gene of interest. A specific vector, named "TS-N", contains the tracer-binding site adjacent to a lox-P cassette flanking neomycin selection markers. Two multiple-enzyme cloning sites on either side of the lox-P cassette enables the insertion of homology arms by various cloning techniques. Target mice (labelled "T") have the TS targeted into their gene of interest. These mice are bred with tracer mice (labelled "TR") which have been transfected with a tracer vector ("TR-N") and consequently express high levels of tracer protein (preferably ubiquitous expression). The offspring (labelled "T-TR") can be subjected to biochemical analysis such as ChlP using Tracer-specific antibody, followed by protein and/or D A analysis (for example, quantitative [Q] - PCR and microarray analysis) and/or immunofluorescence analysis using Tracer-specific antibodies. In the T-TR mice, the Tracer protein binds to its Tracer binding site. The tracer protein has strong epitopes for antibody binding and can be readily immunoprecipitated in ChlP assays. Since the tracer protein is unique, only proteins and DΝA that interact at the targeted gene region will be isolated. Information that can be obtained from the biochemical analysis include detection of proteins bound to sequences adjacent to the Tracer and detection of cross-linked DΝA sequences bound to proteins.
Fig. 2A depicts the 1.5kb tracer construct: Gal4BD fused to a nuclear localisation signal (ΝLS) and the antigenic myc epitope (***, myc tag) under control of the CMV promoter.
Fig. 2B shows Southern blot analysis for genotyping the offspring of one of the founders. This founder had a high copy number of transgenes in two locations as detected by restriction of genomic DNA with EcoRl and probed with Gal4BD. Fig. 2C shows Western blot analyses on protein extracts from livers (upper panel) and kidneys (lower panel), from the same animals as in B, using human anti-myc monoclonal antibodies. The transgenic Gal4 protein is about 25kDa, while the endogenous myc would be 60kDa.
Fig. 3 A illustrates homologous recombination between the target construct H19TSUAS (bottom) and the endogenous (wild type) H19 gene (top), after the target construct has been electroporated into ES cells. After neomycin selection, clones of ES cells that have been successfully targeted can be identified by Southern blot analysis, with Spel digestion and using a downstream probe ("P") as indicated. "R" denotes repeat sequence. In Fig. 3B, Southern analysis shows the endogenous wt allele to be 13kb and the targeted allele (Uas neo) to be 15kb. In Fig. 3C, selected successful targeted clones were transiently transfected with Cre recombinase to remove neo and ura, and after negative selection, these clones were screened by PCR with primers (shown as arrow heads) spanning either side of the UAS lox P sites. Fig. 3D shows how these primers were used to amplify a 0.7kb fragment from the endogenous gene, and a 1.2kb fragment from the Cre deleted targeted gene, due to the remaining UAS binding sites and lox P sites. Targeted genes that have not been cre-deleted, do not amplify under these PCR conditions.
In Fig. 4, real time PCR results show that after ChlP analysis on HEK cells transformed with Gal4BDmyctag (tracer) and H19TSUAS (target) constructs, there is enrichment for mouse HI 9 sequences present in the target construct, but not for the endogenous human H19 gene or an unrelated gene such as Xist (Fig. 4A and B). In the absence of tracer protein, the target mouse HI 9 sequences are not detected (Fig. 4C and D). Fig. A,B, tracer and target; Fig. 4C, no tracer, no target; Fig. 4D, target, no tracer. Legend: (-♦- ) DNA from Input Chromatin, to verify the PCR reaction and confirm the presence of H19TSUAS construct; and (-■-) DNA after ChlP indicating which sequences are enriched. In Fig. 5, a schematic depiction of the HI 9 gene relative to its position to the Ins2, Igβ and Nctcl genes on the chromosome is shown. Fig. 5 A shows the relative positions of the genes (boxes with names in), within the cluster and the position of the associated accession numbers for the Genbank sequences. Fig. 5B shows an expanded view of the region, with the exons of the genes depicted as open boxes and positions of primer pairs depicted as small triangles above the line. DNA sequences or the primers are given in Table 1. Fig. 5C is a further expansion of the H19DMR region showing where the UAS binding site is with relation to the four primer pairs in this region.
Fig. 6 provides in vivo results showing that transgenic Gal4 can bind to the UAS binding site in hybrid mice. Fig 6A. The results of a ChlP assay showing the ratio of the bound sequences relative to the total input. Samples in which both Gal4 and the UAS binding site are present (WG), only Gal4 is present (Gal4) and wild type (WT) were compared. 13-14 at which the highest ratio was obtained is the region immediately adjacent to the UAS binding site. Fig. 6B is a slot blot and the results of densitometry analyses in which the bound DNA from a sample in which both the UAS binding site and the Gal4 protein is present (WG) is directly compared to a sample in which the target site, but not the Gal4 protein, is present (UAS). The peak obtained at 13-14 (arrows) is that nearest to the UAS binding site. The peak at DMR1 is an artefact due to background signal on a blot.
In Fig. 7, part A shows Igβ and HI9 genes separated by 90 kb intervening sequences. The Igβ DMR1 and 2 and the HI9 DMR regions are expanded to show the location of CTCF binding sites (*), restriction sites (B (BamHI), H (Hind III) K, (Kpnl), the knocked-in UAS/loxP site, polymorphic restriction sites (sp, spretus, dom, domesticus), occurring in Bs (BsaAI), BsX (BstXI), D (Dral), E (EcoNI), Sp (Sphl). Q-PCR primer locations are marked (Q) in DMR1, DMR2, the INS and HI 9 DMR and 3C PCR primers are marked with roman numerals and arrow heads to indicate their direction. B, The principle of the GAL4-UAS strategy illustrated with a hypothetical secondary chromatin structure. The targeted UAS binding site upstream of HI 9 is bound by the transgenic GAL4-MYC tag fusion protein. Upon fixation with formaldehyde, sequences that are in close physical proximity to GAL4 are cross-linked together (shaded area). C, Following immunoprecipitation of the cross-linked chromatin with an anti- MYC tag antibody, Q- PCR detects DNA sequences which are in physical proximity to GAL4-UAS (HI 9 DMR, DMR2) but not those that are remote (DMR1, INS).
Fig. 8A shows the targeting strategy, analogous to that shown in Fig. 3 above, to introduce three copies of the GAL4 binding site UAS into the HI9 DMR. The endogenous HI9 gene is shown on top, the targeting construct ("TC", which is inserted by homologous recombination downstream of the CTCF sites at a Bglll restruction site) in the middle. A downstream probe for the endogenous H19 gene is denoted "P". Following Cre-mediated deletion, the HI 9 DMR-UAS construct shown on the bottom results. In Fig. 8B, northern analysis of Igβ and H19 RΝA levels in mice containing both the H19 DMR-UAS knock-in allele and the CMV-GAL4-MYC transgene is shown on top, while total ribosomal RΝA is shown below.
In Fig. 9A, ChlP and Q-PCR was carried out on day 9 livers after maternal transmission of the H19 DMR-UAS. Note enrichment for H19 DMR and DMR1, but not INS and DMR2. Similar results were obtained in five independent experiments. A representative ChlP experiment is shown, with error bars indicating the variation in duplicate Q-PCR experiments. Fig. 9B shows ChlP and Q-PCR results on day 9 livers after paternal transmission of the H19 DMR-UAS. Note enrichment for H19 DMR and DMR2 sequences, but not for INS and DMR1. Similar results were obtained in three independent experiments. A representative ChlP experiment is shown, with error bars indicating the variation in duplicate Q-PCR experiments (The scales of Y-axes in a and b differ due to decreased yield of bound chromatin after ChlP with paternal transmission of the H19 DMR-UAS). In Fig. 9C, comparison is made of bound to input ratios for maternally and paternally transmitted H19 DMR-UAS, showing relative enrichment for the H19 DMR upon paternal and maternal transmission of the HI 9 DMR-UAS, enrichment for DMR1 after maternal transmission, and enrichment for DMR2 after paternal transmission. I = input; B = bound. Fig. 10A, the orientation of primers (arrows) and their distances to Kpnl sites in HI 9 DMR and Igβ DMR2 are shown. Polymorphisms are depicted as dots. In Fig. 10B, PCR products from primer combinations after ligation between Kpnl fragments in the H19 DMR and Igβ DMR2 region are shown. Primer combinations are shown above each gel. Experiments were carried on reciprocal hybrids between SD7 and C57/B16 (B6). All individual PCR experiments were replicated at least twice. Fig. IOC shows the sequence of the 768 bp PCR product obtained with primers in in HI 9 DMR and XII in Igβ DMR2 on B6 X SD7 tissue reveals this is the paternal Igβ allele by the presence of the BsαAI spretus polymorphism. The sequence of the 1.5kb PCR product obtained with primer PA in HI 9 DMR and PC in Igβ DMR2 on SD7 X B6 tissue reveals this again to be the paternal Igβ allele by the presence of the EcoNI domesticus polymorphism. Fig. 10D shows orientation of primers (arrows) and their distances to Hindlll sites in HI 9 DMR and Igβ DMR1. Polymorphisms are depicted as dots. In Fig. 10E, PCR products after ligation between Hindlll fragments in the H19 DMR and Igβ DMR are shown. Primer combinations are shown above each gel. These products did not span any polymorphisms. All individual PCR experiments were replicated at least twice.
In Fig. 11, on the maternal allele ("Mat") ,the unmethylated HI 9 DMR (which is bound by CTCF and possibly other proteins (stippled ovals)) and Igβ DMR1 interact, resulting in two chromatin domains, with the H19 gene in an active domain with its enhancers (small circles), and the Igβ gene in an inactive domain away from the enhancers (shaded area). On the paternal allele ("Pat"), the methylated H19 DMR associates with the methylated Igβ DMR2 through putative protein factors (filled ovals), moving Igβ into the active chromatin domain. The location of DMR2 at the end of the Igβ gene positions its promoters in close vicinity to the enhancers downstream of HI 9. HI 9 remains in the active domain but is silenced by DNA methylation. Experimental
Eukaryotic gene expression is controlled over long distances by regulatory regions such as enhancers, promoters, boundaries, insulators, and silencers. DNA winds around nucleosomal proteins, mainly histones, to form chromatin. The chromatin also interacts with other non-nucleosomal proteins such as transcription factors, cofactors and enzymes and folds into a higher order structure that is important in the overall nuclear architecture of the cell. This higher order structure has an important role in gene expression, since the regulatory regions of genes that on a linear template can be tens to hundreds of kilobase pairs apart may be brought into close proximity by specific changes in the DNA conformation. Thus if two or more regulatory regions were brought into close proximity by regulatory proteins, then the intervening DNA would loop out around these interacting regions. To date there is limited direct data that demonstrate DNA looping, or the bringing together of regulatory regions, although there is enough indirect evidence to make this a plausible theory.
In this experimental section, we describe a "tracer target binding assay" as a method in which for example DNA-protein, protein-protein, and/or DNA-DNA interactions can be studied in vivo. This method involves introducing a DNA binding sequence into a regulatory region of interest by gene targeting in mice and breeding the mice with transgenic tracer mice that express a DNA binding tracer protein. The tracer protein bound to its target is isolated along with its interacting regulatory DNA templates and proteins by chromatin immuno-precipitation with an antibody specific for the tracer protein. We are applying this method to detect whether there are differences in the secondary DNA conformation between the maternal and paternal alleles of an imprinted gene region. However, the method can be applied to broader investigations of chromatin structure.
Outline of the proposed tracer-target binding in vivo assay In a preferred embodiment depicted in Fig. 1, the method involves generating mice that express a transgenic tracer protein that can bind DNA (Tracer mice) and mice that have a specific binding site for the tracer protein (TS) targeted to the specific gene region of interest (Target mice). For the Tracer mice, the tracer protein can be any small DNA binding protein with a known binding sequence that is not normally expressed in mammals. The tracer protein should merely bind to the DNA and have no DNA activation or modifying properties. In addition it should have strong epitopes for antibody binding. This can be accomplished by fusing a known epitope tag onto the tracer protein. We have generated mice that ubiquitously express high levels of tracer protein that localises to the nuclear compartment of cells, under control of the CMV promoter.
To generate the Target mice, a Cre-loxP strategy is used to insert the tracer binding site (TS) precisely in the gene region of interest. We have a vector construct (TS-vector) that will be useful for inserting TS elements into various genes. The TS vector has the TS, Lox-P-neo cassette and two multiple-cloning sites, to enable the insertion of homology arms by direct or shuttle vector cloning techniques. Alternatively the TS cassette can be simply excised from the vector and blunt end ligated into any convenient site in another construct (see methods below). The targeted mice with the TS inserted into the gene of interest are bred with Tracer mice. In the offspring, the Tracer protein binds to the single unique TS site. Depending on the expression pattern of the gene of interest, the appropriate tissues are then isolated for chromatin immuno-precipitation with a tracer specific antibody.
Since the tracer protein is unique, only proteins and DNA (chromatin) that interact at the targeted gene region will be isolated. However, the immuno-precipitation reaction will pull down the target protein plus any other proteins and DNA that have been fixed at this region by the cross-linking process. This will enable the detection of DNA sequences, other than the target binding site that will be present in the cross linked complex, due to protein DNA interactions at this site. Western blot or mass-spectrophotometry analyses can be used to identify proteins. Additional DNA sequence elements can be identified using quantitative PCR, or cloning techniques or hybridisation to specific micro-arrays. The tracer mice lines need only to be generated once, and can be bred with various target mice.
By way of background, most imprinted genes contain differentially methylated regions (DMRs) that regulate their parent of origin specific expression. In the examples below, we studied DMRs located in the Insulin like growth factor 2 (Igβ) which is separated by about 100 kb from the maternally expressed non-coding H19 gene on mouse distal chromosome 7. A paternally methylated germline DMR is located 2-4kb upstream of HI 9, which contains CTCF binding sites and acts as a methylation sensitive insulator between the Igβ promoters and shared enhancers downstream of HI 9 (see Figs 3 A; 5A,B; 7A). Thus on the unmethylated maternal allele of the DMR, the CTCF zinc finger protein binds and sets up a boundary preventing the Igβ promoters from accessing the enhancers. Methylation of the DMR on the paternal allele prevents CTCF from binding and the Igβ promoters can access the enhancers. CTCF binding at the H19 DMR also maintains the unmethylated state of the HI 9 DMR in somatic cells. Mutation of CTCF binding sites in the female germline has no effect on methylation but ablation of CTCF protein in the female germline results in de novo methylation at the HI 9 DMR suggesting that CTCF may influence the H19 DMR also indirectly. Chromatin looping has been proposed as a mechanism whereby CTCF boundary elements separate silent and active domains.
The mouse Igβ has three DMRs, these are the maternally methylated, placenta specific DMR0 located at exon Ul, and the paternally methylated DMRl and DMR2 located upstream of promoter 1, and within exon 6, respectively (see Fig. 7A). Deletion of the maternal HI 9 DMR results in loss of imprinting of Igβ with biallelic expression in most tissues studied, while deletion of the maternal DMRl results in biallelic expression of Igβ in mesodermal tissues. Deletion of DMR2 has no effect on imprinting but reduces transcriptional activation of Igβ. Thus DMRl has a silencer function and DMR2 has an activator function. Methylation studies have demonstrated that on the maternal chromosome the HI 9 DMR protects the Igβ DMRl and DMR2 from methylation, while DMRl protects the DMR2 from methylation in a hierarchical manner. These results suggested the possibility that the DMRs interact directly such that the HI 9 DMR protects the Igβ DMRs from becoming methylated in somatic tissues.
EXAMPLE 1
In a first set of experiments, vectors were constructed and in vitro assays used to validate the methodology in a mammalian cell line at high levels. Thereafter, target and tracer mice were produced, and the invention was tested in transgenic hybrid mice.
Materials and Methods
Generation of transgenic Tracer mice
In order to make a tracer protein, a small protein from a bacterial or yeast system with known DNA binding properties can be used. The protein is most preferably inert, i.e. it preferably has no enzymatic or tianscriptional activation properties. To make our tracer protein, we used the binding domain of the yeast Gal4 protein, fused to the myc epitope tag (Gal4BDmyctag). The Gal4BD sequence was amplified from the pGBKT7 vector (available from BD Clontech, Cat K1612-B), using the following primers: Fwd: 5' - CCT CCT GAA AGA TGA AGC [SEQ ID NO: 1]; Rev: 5' - TCG CCC TAT AGT GAG TCG [SEQ ID NO: 2]. The PCR product was cloned into the Hinc II site in MCS of the pCMV/myc/nuc vector (Invitrogen pShooter system), so that the Gal4 binding domain was in frame with 3 nuclear localisation signals and the myc epitope tag. The tracer construct (Figure 2a) was verified by sequencing and prepared for pronuclear injection by restriction enzyme digestion to remove most of the plasmid backbone and ethanol precipitation. Pronuclear injection into fertilised mouse oocytes was done at the Babraham Institute Transgenic facility. Transgenic mice were genotyped by PCR and Southern blotting (Figure 2b). Detection of tracer protein expression was done with Western blotting with anti-myc and anti-Gal4 antibodies (Upstate). Proteins of 22kDa were detected at high levels in extracts from liver and kidney (Figure 2c).
Generation of targeted TS mice ("knock-in " mice) Gal4 binds to the following consensus sequence, known as the UAS site: 5' - CGG AGG ACT GTC CT [SEQ ID NO: 3]. To make the TS construct, the following oligos: Sense: 5' - AGC TTA TGG ATC GGA GGA CTG TCC TCC GG [SEQ ID NO: 4] and Compliment: 5' - ATA CCT AGC CTC CTG ACA GGA GGC CTA G [SEQ ID NO: 5] (obtained from Genosys) were annealed and ligated into the Smal site of the pRAY 1 vector (GenBank accession number CVU63018 [SEQ ID NO: 6], Stork et al, 1996, Nucleic Acid Research. 24: 4594-4596 ). This vector has two MCS on either side of a lox P-neo-ura cassette. A new Bglll site was added into the second MCS so that the UAS lox P-neo cassette could be excised with Bglll to make the TS targeting cassette (Figure 3). In order to introduce the TS cassette into any gene of interest, the cassette can be excised with Bglll , blunt ended and blunt-end ligated into any cloned gene of interest.
We decided to target the upstream imprinting control region of the HI 9 gene, and used an alternative cloning strategy, since we did not have the region of interest already cloned into a single vector. Briefly we first inserted a 2384 Bglll fragment corresponding to nucleotides 884 to 3268 in the 5' non transcribed HI 9 GenBank sequence MMUl 9619 [SEQ ID NO: 7] into the Bglll site of the UAS modified pRAYl vector, so that the UAS site was adjacent to the Bglll site at 2348. Next we inserted into the Sail site of the second MCS, a 5.5kb fragment [SEQ ID NO: 8] corresponding to nucleotide 3438 to 8436 in the HI 9 GenBank sequence AF049091 as a Sail fragment from a Bluescript based construct made from a Xhol genomic cloned lambda mouse 129/sv library. The construct was verified by restriction digest and sequencing of the homology arms. We called our TS target construct (H19TSUAS). The targeting strategy to introduce a UAS binding site into the HI 9 gene is shown in Figure 3.
The H19TSUAS construct was linearised and purified by ethanol precipitation and electroporated into ES cells. Positive clones were identified by Southern blots and the ES cells were then transiently transfected with recombinant CRE protein to delete the neo and URA genes. ES cell clones that had successfully deleted the lox-P cassette contained the URA site inserted into the upstream region of the HI 9 gene. These were injected into blastocysts. The resulting chimeras were bred and germline transmission has been achieved. These mice were then mated with the Gal4 tracer mice.
In vitro testing of the tracer-target binding assay
In order to test the system, both the tracer and the target constructs were incorporated into HEK 293 cells with the Qiagen Effectine system. Selection for stable transfectants with random integration of both constructs was done with G418 (lOOOmg/ml) over a period of two weeks. Clones of G418 resistant cells were examined for the integration of both the tracer (Gal4BDmyctag) and the target construct (H19TSUAS), by genomic PCR.
Clones positive for both the Gal4BDmyctag and the target construct H19TSUAS were expanded and cells were collected for immunostaining and ChlP analysis.
Chromatin extraction and ChlP assays For the ChlP analysis, 1% Formaldehyde was added to the cells for 10 minutes. The cells were then washed twice in ice cold PBS and suspended in 1 ml lysis buffer with protease inhibitor (1% SDS; lOmM EDTA; 50mMTris HC1 (pH 8.1), lmM PMSF; lug/ml apoprotin) for 5 minutes. The lysed cells were sonicated (3-4X 30 sec - to reduce DNA to 200 and 1000 bp) and then centrifuged for 10 minutes 13000 rpm at 4°C to remove cellular debris. The chromatin suspension was diluted 10 fold in ChlP dilution buffer (0.01% SDS; 1.1% Triton-X-100; 1.2 mM EDTA; 16.7 mM Tris-HCl pH8.1; 167mM NaCl + protease inhibitors) and pre-cleared with 80ul Salmon sperm/Protein A agarose slurry (UPSTATE, Cat 16-157) for 30 minutes, at 4°C. lOul of anti-myc polyclonal antibody (UPSTATE) was added and incubated overnight at 4 °C on a rotation platform. Then 60ul Salmon sperm/Protein A agarose slurry was added and the reaction was allowed to incubate for 1 hour at 4°C. The ProteinA/agarose-antibody-chromatin complex was collected with gentle centrifugation (8000 rpm at 4°C for 1 min) and the supernatant containing the unbound chromatin was discarded. The complex was washed for 5 minutes in 1ml ChlP lysis low salt buffer (1% Triton-X-100; 140mM NaCl; 50mM HEPES pH7.5; 0.1% Sodium deoxycholate + protease inhibitors), followed by a 5 minute wash in lml ChlP lysis High salt buffer (1% Triton-X-100; 500mM NaCl; 50mM HEPES pH7.5; 0.1% Sodium deoxycholate + protease inhibitors) and then finally washed for 5 minutes in 1ml ChlP Lithium immune complex buffer (250mM LiCl; lOmM HEPES pH7.5; lmM EDTA; 0.5% IGEPAL CA-630 (Sigma I 8896); 0.5% Sodium deoxycholate + protease inhibitors).
The chromatin- antibody complexes were eluted twice off the Protein-A beads with 250ul freshly made elution buffer (1% SDS; 0.1M NaHCO3). The cross-links were reversed by adding 20ul 5M NaCl and incubating for 5 hours at 65°C. DNA was extracted by phenol- chloroform, after an hour of 45°C incubation in lOul 0.5 M EDTA; 20ul 1M Tris-HCl, Ph6.5 and 2 μ of lOmg/ml Proteinase K. DNA was suspended in TE buffer.
DNA was examined using quantitative real time PCR analysis with SYBR green and primers specific for the target construct (the mouse H19 DMR gene sequence). Negative controls were primers for mouse Igβ sequences and human H19 and Xist sequences. Primer regions are as shown in Fig. 5; primer sequences and optimal concentrations are given in Table 1. Real time PCR was performed with primers optimised according to the guidelines given in the ABI Prism 7700 sequence detection User's Manual (http://www.appliedbiosystems.com/support). Real time PCR was performed on an ABI Prism 7700, using the SYBR Green PCR mastermix (Applied Biosystems Cat 4309155 ) according to the manufacturer's instructions (Protocol cat 4310251).
Table 1: Primers sequences and optimal concentrations for quantitative real time PCR across the IGF2-H19 region.
Figure imgf000023_0001
Figure imgf000024_0001
Figure imgf000025_0001
In vivo testing of tracer protein (GaUBDmyctag) binding to target (H19TSUAS) site in mice with maternal transmission of the UAS binding site: the effects of sonication and restriction enzyme digestion of chromatin
Female target mice containing the UAS binding site upstream of the H19 gene (H19TSUAS) were bred with tracer mice transgenic for Gal4 (Gal4BDmyctag) and the offspring were genotyped by PCR. The genotype results of two litters (litter I and litter II) are given in Table 2 (see Results, below).
Chromatin preparations were made from individual livers taken from 9-day old pups in each litter. PCR primers were standardised against known concentrations of high molecular weight mouse DNA. The results showed that DNA made from chromatin after formaldehyde fixing and sonication did not produce homogeneous results for all the primer pairs tested, with some regions amplifying more readily than others in the input DNA. This is presumably due to irregular and variable shearing of the chromatin during sonication. The amount of amplification obtained from the bound DNA was normalised to the input DNA. Each chromatin prep was cross linked with 1% formaldehyde and litter I was frozen at -70°C.
After genotyping litter I, a single preparation from the following genotypes were selected for ChlP analysis: WT, WG and Gal4. These samples were subjected to sonication. An aliquot of formaldehyde fixed, sonicated DNA was taken from each sample for the "input" control and the remaining chromatin was then immunoprecipated with anti-myc antibody. After immunoprecipitation the crosslinks were reversed and "bound" DNA was extracted. For quantitative PCR analyses, four primer pairs to sequences adjacent to the UAS binding site in the HI 9 sequence and 3 primer pairs to sequences within the Ig£2 gene located about lOkb upstream of the HI 9 gene were used (see Fig. 5 for primer positions and annotation).
PCR primers were standardised against known concentrations of high molecular weight mouse DNA. The results showed that DNA made from chromatin after formaldehyde fixing and sonication did not produce homogeneous results for all the primer pairs tested, with some regions amplifying more readily than others in the input DNA. This is presumably due to irregular and variable shearing of the chromatin during sonication. The amount of amplification obtained from the bound DNA was normalised to the input DNA.
A second ChlP assay was performed on 2 samples (genotypes: WG and UAS) of chromatin prepared from livers taken from 9 day old pups from the litter II. In these chromatin preparations, the sonication step was replaced by a restriction digestion using Alul. A slotblot assay was used to assess any long range DNA protein interactions. Probes on duplicated slotblots were PCR products (copy number of amplicons in excess of 1X1013) amplified from normal mouse DNA using primers ranging from the upstream of the Ins2 gene to the downstream Nctcl gene as shown on Fig. 5. Two extra sequences were added to the slot blot, the luciferase gene which is not present in mouse and the Tdg gene which comes from a different locus. The entire DNA sample obtained from each bound fraction after immunoprecipitation was end-labelled with γ-ATP P32 and hybridised to the slotblot (Fig. 6B.) The filters were then read on a phophoimager and the total amount of radioactivity was counted for each band. The samples were compared direcly, since the UAS sample had no Gal4 and therefore could not have been the result of immunoprecipitation.
Results
1. The tracer protein (GaUBDmyctag) binds to its specific target (H19TSUAS site) in transfected cells The expression of the tracer protein Gal4BDmyctag was detected in transfected cells by immunostaining with anti-myc antibodies. In these cells, the Gal4BDmyctag protein was localised within the nucleus, near the nuclear membrane, as shown by DAPI staining.
The ability of the tracer protein, Gal4BDmyctag to bind specifically to the target H19TSUAS site was tested in a chromatin immunoprecipitation assay (Fig. 4). The success of this assay depends on whether the expression levels of Gal4BDmyctag protein binding to its target UAS binding site was high enough to be immunoprecipitated. Two clones of stably transfected HEK cells, that were positive for both the tracer (Gal4BDmyctag) and the target (H19TSUAS), as well as a negative control (untransfected HEK cells) and a clone that was positive for the target construct (H19TSUAS), but not the tracer protein, were tested. Real time PCR analysis showed two and six fold enrichment for the target sequences (mouse HI 9 sequences) in the two clones that were positive for both the tracer and the target construct and were negative for endogenous human HI 9 and other unrelated sequences. In the absence of Gal4BDmyctag protein, no enrichment for target sequence was found, indicating that the endogenous human MYC protein did not bind to the H19TSUAS target sequence.
2. Transgenic mice The transgenic tracer mice appear to be healthy and the Gal4BDmyctag protein seems to be widely expressed and can be immunoprecipitated from brain, liver and kidney (Fig. 2). Our H19TSUAS targeted mice showed high levels of chimerism and we have germline transmission.
The tracer mice were bred with the H19TSUAS target mice, and the hybrids appear to be healthy. The results of the two litters from a female H19TSUAS target mouse mated with a male Gal4BDmyc tag tracer mouse are tabulated in Table 2. Table 2: Genotype results of H19TSUAS X Gal4BDmvctag.
Figure imgf000028_0001
n, number of pups in litter; WT, wild type i.e. no Gal4BDmyctag or H19TSUAS; WG, Gal4BDmyctag, and H19TSUAS; Gal4, Gal4BDmyctag, no H19TSUAS; and UAS, Gal4BDmyctag, and H19TSUAS.
3. In vivo testing of tracer protein (GaUBDmyctag) binding to target (H19TSUAS) site in mice with maternal transmission of the UAS binding site: the effects of sonication and restriction enzyme digestion of chromatin - real time PCR and slot blot analyses.
After real time PCR analyses of chromatin previously fixed with formaldehyde and subjected to sonication, the following results were obtained: when both the UAS binding site and the Gal4 protein were present (WG), up to two-fold enrichment was seen for the region adjacent to the UAS binding sites in HI 9. No enrichment was detected for the regions tested in the Igβ gene in this sample (Fig. 6A WG). No enrichment was seen for any of these regions in the wildtype sample where neither the UAS binding site nor the
Gal4 protein was present (Fig. 6A, WT). Despite a higher background, there was no enrichment when the Gal4 was present in the absence of UAS binding sites (Fig. 6A, Gal4). This enrichment seems low, but if the reduced amount of DNA obtained after the
ChlP assay is taken into consideration the enrichment may in fact be much higher.
After slot blot analyses of chromatin previously fixed with formaldehyde and then subjected to restriction digestion with Alul, a high background was obtained in the DMR2 and DMRl regions in both the WG and the UAS samples (Fig. 6B). Thus this region was probably resistant to digestion by Alul. A three fold higher amount of signal was obtained for H19 13-14 in the WG sample than in the other probes, indicating that this region was selectively pulled down in the WG sample. Alu restriction enables much smaller DNA fragments to be pulled down and compared to sonication and real time PCR analyses, only regions immediately adjacent to the UAS binding site has been pulled down.
These results show that that the UAS targeted mice and Gal4 mice produce viable hybrids and that the Gal4 binding to theUAS binding site can be detected by ChlP analyses in these hybrids.
DNA-protein and long distance interactions at the HI 9 locus are studied in the hybrid mice. Our aim with this experiment is to look for the interaction of the HI 9 DMR with other differentially methylated regions at this locus This region is an important imprinting control element and has been shown to have a boundary/insulator function. In addition there is evidence to suggest that the HI 9 DMR and the DMRs on the neighbouring Igf2 gene interact. An important advantage of our transgenic assay is that it is possible to follow the interactions of the HI 9 gene at different stages of development, in different tissues and after various parental imprinting effects.
Discussion
General
Genes are surrounded by regulatory sequences such as promoters, enhancers and locus control regions (LCRs). These are sites on the DNA that DNA binding proteins (transcription factors, activators, and repressors) recognise and where they attach to the DNA. Promoters activate gene expression and are generally situated directly upstream (in front of) a gene, while enhancers and LCRs have longer ranges and can be located upstream, downstream or even within a gene. Enhancers and LCRs fine-tune promoter activity and influence the levels and tissue specific patterns of gene expression. The nature and mechanism of these interactions is not understood and there is still much to be learnt about the similarities and differences between enhancers and LCRs. Recent studies have identified another transcriptional regulatory region, known as the insulator or boundary element. These elements, when placed between a promoter and an enhancer, prevent transcription, presumably by blocking promoter access to the enhancer.
Models that have been invoked to explain long-range interactions between promoters and enhancers include looping, tracking and linking. The "looping model" for enhancer promoter interactions was first proposed in 1986. There has been mounting support for this model in the literature and in many bacterial genes, it is now accepted. Evidence for DNA looping between enhancers and promoters in higher eukaryotic cells is still indirect. Certain DNA sequences, such as those near the telomeres are more flexible and have internal homologies that make them more likely to form loop structures. In yeast, looping at the telomeres has been shown to influence gene expression directly (see for example de Bruin et al, 2001, Nature 409: 109-113).
The tracking model was developed on the basis that certain factors move processively along the DNA helix, while the "linking model" refers to the establishment of a chain of regulatory factors involving "facilitator" proteins that would link a distant nucleoprotein complex and a promoter.
Double stranded DNA is wrapped around histone proteins to form an array of nucleosomes that comprises the primary structure of chromatin, which is then folded and condensed in the nucleus of a cell, forming higher order structures. Chromatin folding is mediated by interactions of the nucleosomes with regulatory proteins on specific DNA sequences (to form secondary structures) and also long distance contacts involving interactions between the secondary structures, possibly involving enhancers and promoters. An integral part of this complex multilevel assembly is a class of proteins known as high mobility group (HMG) proteins. The HMG proteins bind to the nucleosomes between the histones and the DNA and have been shown to have an effect on the architecture of chromatin structure, since they bend the DNA. In addition to folding, chromatin can also be compacted by altering the charge balance between histones and DNA. These charge balances are brought about by chemical modifications, such as acetylation, methylation, phosphorylation and ubiquitination of the histones. Indeed, it is currently thought that there is a "histone code" that determines the accessibility of genes to transcription factors. Histone acetylation increases the accessibility of nucleosomal DNA to sequence specific DNA binding proteins and also perturbs higher order protein folding. Histone methylation influences chromatin structure transitions - histone H3, when methylated at lysine 9 interacts with a protein called HP1, which leads to the assembly of condensed, inactive heterochromatin. In contrast, H3 methylated on lysine 4, is associated with active euchromatin.
Biochemical studies have shown that when histones and other chromosomal proteins are extracted from nuclei of interphase cells, loops of DNA containing negative unrestrained supercoils can be observed as nuclear halos. The bases of these loops are attached to a matrix or scaffold through sequences termed MARs (matrix attachment regions). MARs are A/T-rich DNA sequences, often containing topoisomerase II cleavage sites, that mediate the anchoring of the chromatin fibre to the nuclear matrix and that might delimit the boundaries of discrete and topologically independent higher-order domains. Although some of these sequences play a role in the expression of particular genes, the question of whether they are merely structural components or whether they play a functional role is still unanswered. At least some insulator elements seem to have properties that bridge those of MARs and of standard transcriptional regulatory elements, opening the possibility that the function of both types of sequence is related.
Validation of the method of the invention in cell lines
In vitro assays to validate the methodology confirmed that our Gal4BDmyctag tracer protein can be expressed in a mammalian cell line at high levels. We have also shown that it localises to the nucleus and binds specifically to the integrated TS site by immunostaining and ChlP respectively. Our in vitro assay was not specifically designed to check whether the GaWBDmyctag tracer binds both active and inactive chromatin, since we selected for neomycin resistance gene expression that would have been conferred by the H19TSUAS target construct. The tracer protein does not appear to alter gene expression where it binds to its target, since there were as many H19TSUAS + GaWBDmyctag as H19TSUAS -GaWBDmyctag colonies. If Gal4BDmyctag influenced expression of neomycin resistance then we would have expected an altered ratio, depending on the direction of the effect.
Validation of the method of the invention in transgenic hybrid mice Two litters from our hybrid mice have provided enough material to demonstrate that the Gal4 protein binds the UAS site in the hybrid mice and that it can be specifically precipated with ChlP assays. We have looked at chromatin isolated from day 9 pups taken from two litters in which the H19TSUAS binding site has been maternally transmitted. The results obtained show that the hybrid litters are viable (at least from the maternal transmission of the UAS site) and that the Gal4 protein binds it target in vivo.
In order to look for longer range interactions, cross linking conditions should be optimised. Also, Gal4BDmyctag tracer mice are bred onto a congenic SD7 mouse strain background so that the maternal and paternal chromosomes using known polymorphisms markers can be distinguished.
EXAMPLE 2
The invention was further exemplified in a further series of offspring of tracer mice crossed with target mice. Results from the binding assay were verified using the 3C technique.
Materials and Methods
Transgenic CMV-GAL4 mice and H19 DMR-UAS "knock-in" mice were produced as described in Example 1.
Chromatin extraction and ChlP assays
Chromatin was extracted from day 9 mouse livers as follows: Freshly dissected liver was mashed through a 70μm nylon cell strainer into 25ml DMEM medium. The cells were fixed in 2% formaldehyde for 10 minutes at room temperature and quenched with 0.125M glycine. After centrifugation at 3500 rpm for 10 minutes the cells were suspended in lysis buffer (lOmM Tris.HCl, lOmM NaCl, 0.2%NP40 and 1:500 Complete protease inhibitor cocktail (Roche)) for 90 minutes on ice. The nuclei were pelleted by centrifugation for 15 minutes at 2500rpm and resuspended in 500 μ of REact 3 buffer (Invitrogen) plus 0.3% SDS and incubated for 1 hour at 37 °C. Triton-X (1.8%) were then added to sequester the SDS and the incubation was continued for a further 10 minutes. BSA (1%) and 1200 units of Bam HI (Invitrogen) was added to digest the chromatin overnight at 37 °C. The suspension was then diluted 10 fold in ChlP dilution buffer (0.01% SDS, 1.1% Triton X- 100, 16.7 mM Tris HC1, 167mM NaCl) and processed according to the UPSTATE ChlP kit protocol, using a mouse monoclonal IgG Anti-MYC Tag antibody (clone 9E10, Upstate, Cat 05-419). After reverse cross linking and proteinase K digestion DNA was extracted by isopropanol-ethanol precipitation.
The amount of bound DNA and input DNA was measured with a Picogreen dsDNA Quantitation kit (Molecular probes) and depending on the yield, 0.005 -0.5ng of DNA was used for SYBR green Q-PCR analyses with primers for HI9 DMR (see "H19H3" primers in Table 1 above), the intervening sequence between Igβ and H19 (INS - Fwd: AGA CAC ACT CCC ACC AAG G [SEQ ID NO: 47] and Rev: TCA TCT AGC TGT CAG CTC ACC {SEQ ID NO: 48]), DMRl (see "DMR1-C1" primers in Table 1 above) and DMR2 (see "DMR2" primers in Table 1). Input DNA was analysed by PCR, using primers that flanked the UAS lox P sites to detect whether there is an input bias of non methylated maternally derived DNA at the HI 9 locus over methylated paternally derived DNA. For Q-PCR, primers were optimised according to the ABI Prism 7700 user's manual. Standard curves were constructed for each primer using genomic DNA and the input and bound amounts of DNA were calculated from absolute values obtained from comparisons to standard curves. ChlP assays were repeated 5 times for maternal transmission of the HI 9 DMR-UAS and 3 times for paternal transmission on different chromatin samples. The internal positive control was the successful precipitation of HI 9 DMR sequences. Negative controls included chromatin extracted from littermates that were negative for GAL4 protein or the HI 9 DMR-UAS, as well as a no antibody control. Chromosome conformation capture ("3C")
In a method modified from Dekker et al. (2002, supra) and Tolhuis et al. (2002), chromatin was extracted, fixed, and digested as for the above ChlP protocol using BamHIIKpnl or Hindlll high concentration restriction enzymes and buffers from Invitrogen. Intramolecular ligation was with 2.5ng/μl chromatin in 800/xl ligation buffer (Promega), plus 1% Triton X and 30 weiss units of T4 ligase (Promega) for 4 hours at 4 °C. DNA was extracted using isopropanol and ethanol precipitation after reverse cross linking with Proteinase K at 65 ° C overnight. The thoroughness of restriction enzyme digestion and quality of religated DNA were inspected by gel electrophoresis.
The PCR reaction was carried out on a PTC-200 DNA Engine (MJ Research) in a 50μl volume containing 50ng DNA, 25pmol of each primer, lOnmol dNTPs (Bioline), lx Phusion HF Buffer and 1U Phusion High-Fidelity DNA Polymerase (Finnzymes). PCR parameters were initial denaturation at 98° C 30 sec, followed by a 3 step cycle (98° C for 7 sec, gradient of 54-69° C for 15 sec, 72° C for 20 sec) for 35 cycles and a final extension at 72° C for 10 minutes. The PCR products were purified using QIAquick PCR Purification Kit (Qiagen) to remove primers and dNTPs, and then sequenced by an external sequencing service (Lark Technologies Inc, GRI-Genomics).
The 3C primers (shown in 5' to 3' orientation; see Fig. 7A) were:
I CCATCGAAATGCAAATGAACC [SEQ ID NO: 49]
II CACAACTCCCGCGTATAAACC [SEQ ID NO: 50]
III GCTACATTCACACGAGCATCC [SEQ ID NO: 51]
IV TGCAATACATTCCATGATCACC [SEQ ID NO: 52] V TTGACTCATTCCCTACACAGC [SEQ ID NO: 53]
VI CTATACAACCCCACCATGC [SEQ ID NO: 54]
IX TTTCTTACAGTTCAAAGCCACC [SEQ ID NO: 55]
X GCACCCCATGTTGTAGTCC [SEQ ID NO: 56]
XI CCACTCACTTCTTGATTTGG [SEQ ID NO: 57] Xπ TAGTGTGGGACGTGATGG [SEQ ID NO: 58]
XVII GCAGAAAGCAGAGATTAGG [SEQ ID NO: 59] XVIII TAGAGGATGGTATGCAGG [SEQ ID NO: 60] PA TACACTGGTCTGGCCTTGCT [SEQ ID NO: 61] PC GGCAATGCTGTGGGTCACC [SEQ ID NO: 62].
Results and Discussion
In these experiments, we report the detection of direct interactions between the HI 9 DMR and Igβ DMRs. Our method involved a "knock-in" strategy to introduce 3 copies of the GAL4 binding motif (UAS) into the H19 DMR, followed by chromatin immunoprecipitation (ChlP) of GAL4 protein bound to the modified HI 9 DMR-UAS, and assaying for co-precipitated Igβ DMR sequences (see Fig. 7B,C). We then validated our observations independently using the recently described Chromatin Conformation Capture method (3C). The targeting strategy to introduce 3 copies of the UAS binding motif into the HI 9 DMR is shown in Fig. 8 A, which is analogous to the strategy described in Example 1 with reference to Fig. 3 A and C.
Targeted HI 9 DMR-UAS mice were phenotypically normal and fertile. We next made a transgenic mouse line ubiquitously expressing high levels of the GAL4-MYC fusion protein under a CMV promoter. Offspring with both the HI 9 DMR-UAS knock-in allele and the CMV-GAL4-MYC transgene had normal levels of Igβ and HI 9 RNA (see Fig. 8D) and were phenotypically normal, demonstrating that the UAS knock-in and the CMV-GAL4-MYC transgene had no effect on the regulation of Igβ or HI 9. HI 9 DMR sequences adjacent to the UAS binding site were immunoprecipitated in a ChlP assay using chromatin from postnatal day 9 mouse livers and the anti-MYC antibody specific to the tagged transgenic GAL4 protein, confirming that GAL4 bound to its target site in the in vivo transgenic system (Fig. 9A,B). The ChlP assays (coupled with Q-PCR) yielded up to tenfold higher signal in mice with maternal transmission of the HI9 DMR-UAS, than in those with paternal transmission (Fig. 9A,B). Since we found no bias for the maternal HI 9 allele in the input chromatin of maternally or paternally transmitting HI 9 DMR-UAS mice, we assume that a more closed chromatin conformation of the H19 DMR on the paternal allele reduces GAL4 binding, even though the UAS binding site is not normally - . . „ „- e,s,v-j , U J g. -J
methylation sensitive. Methyl-binding proteins (Mbds) attached to methylated CpGs in the paternal DMR could also contribute to reduced Gal4 binding.
Having verified that the transgenic system worked appropriately, we tested the hypothesis that Igβ DMRs come into physical proximity with the HI 9 DMR. Indeed, upon maternal transmission of HI 9 DMR-UAS we pulled down Igβ DMRl sequences, in addition to the HI9 DMR sequences (Fig. 9A). Specificity of this interaction was confirmed by the failure to find any enrichment for the intervening sequences between Igβ and HI 9, or for the Igβ DMR2 sequences (Fig. 9A). This experiment was carried out independently five times, with the same qualitative results. By contrast, upon paternal transmission of HI 9 DMR-UAS, there was enrichment for Igβ DMR2 sequences in addition to the HI 9 DMR, but not for DMRl or for the intervening sequences between Igβ and H19 (Fig. 9B). This experiment was carried out independently three times, with the same qualitative results. An experiment using Kpnl instead of BamHl to fragment the chromatin prior to ChlP produced analogous results, confirming that the enrichment for Igβ DMRs was not a peculiarity of the restriction enzyme used. Controls that included chromatin extracted from littermates that were negative for the GAL4 protein or the UAS binding sites, as well as ChlP without antibody, showed no enrichment for H19 DMR or the DMRl and DMR2 sequences. No quantifiable controls exist for variation of crosslin ng, digestion or antibody binding in a system where there is only a single binding site in the genome, and thus there was no reference sample to normalise the different experiments for statistical comparison. Displaying the ratios of bound chromatin relative to input chromatin shows clearly that the HI 9 DMR physically interacts with the Igβ DMRl on the maternal allele, and with the Igβ DMR2 on the paternal allele, respectively (Fig. 9C).
We sought to qualitatively confirm these results by an independent and alternative method. We thus designed a series of compatible and interchangeable primers along the Igβ-H19 DMRs to be used in the Chromatin Conformation Capture assay (3C assay). This method relies on cutting formaldehyde fixed chromatin with restriction enzymes followed by intramolecular ligation to detect physical interactions between DNA fragments held together by secondary chromatin structure. Ligation products are detected by PCR and their identity is determined by sequencing. The primers used and their locations with regard to the relevant restriction sites and DNA polymorphisms are shown in Fig. 7A and Fig. 10A, D. We chose not to optimise the PCR reactions using cloned Igβ and HI 9 fragments so as to avoid the potential for contamination. Chromatin was extracted from livers of postnatal day 9 reciprocal FI hybrids between SD7 (a congenic mouse strain that is Mus spretus for distal chromosome 7) crossed with C57/B16 (Mus m domesticus) so that we could detect parent of origin specific interactions between the DMRs. Using chromatin digested with Kpnl we consistently detected ligation products between DMR2 fragments and the H19 DMR in both B6 x SD7 and SD7 x B6 crosses (Fig. 10A,B). Sequencing established that these ligation products were exclusively between the HI 9 DMR and the paternal Igβ allele (Fig. 10C). Using chromatin digested with Hindlll, we consistently detected ligation products between the HI 9 DMR and Igβ DMRl which were confirmed by sequencing (Fig. 10D,E). We were unable to assign these to the parental alleles since they did not span polymorphisms. This qualitative 3C analysis thus confirms interactions between HI 9 and Igβ DMRs.
Our results using the transgenic UAS targeted mice and the 3C method reveal that the HI 9 DMR has differential interactions with the Igβ DMRs on the maternal and paternal alleles (Fig. 11). This type of analysis can in the future be extended to sequences outside the cluster to see whether interactions occur between other sequences. The interactions are likely mediated by the epigenetic modifications of the DMRs and by the protein factors that consequently bind to them. On the maternal chromosome, the unmethylated active boundary element at the HI 9 DMR (which is bound by CTCF) is associated with the unmethylated silencer element on the Igβ DMRl, while on the paternal chromosome the methylated inactive HI 9 DMR (no CTCF bound) associates with the methylated Igβ DMR2. The interaction between Igβ DMRl and HI 9 DMR on the maternal allele sets up two chromatin domains, placing the HI 9 gene in an active (enhancers present), and the Igβ gene in an inactive (enhancers absent) domain, respectively. DMRl has potential CTCF binding sites, but we have been unable to detect binding by ChlP assay, which suggests that other protein factors as well as CTCF may be interacting at the base of this loop. On the paternal allele the H19 DMR interacts with DMR2, partitioning Igβ into the active chromatin domain. The location of DMR2 at the end of the Igβ gene positions its promoters in remarkable proximity to the enhancers downstream of HI 9, while HI 9 itself is silent due to promoter methylation. Since the H19 DMR and DMR2 are both methylated their interaction excludes CTCF and must therefore involve other factors. A recent study of matrix attachment regions (MARs) demonstrated that DMR2 is adjacent to a MAR that is specifically attached on the paternal allele when the Igβ gene is active. Deletion within DMR2 impedes this matrix attachment, suggesting that matrix attachment is necessary for Igβ tianscriptional activation. Our model (Fig. 11) is fully consistent with results from mice with deletions of the DMRs. Thus maternal deletion of the HI 9 DMR or Igβ DMRl results in reactivation of the maternal Igβ gene, and paternal deletion of the HI 9 DMR or Igβ DMR2 leads to reduced transcription of the paternal Igβ gene.
Insulators or boundaries have been proposed to be key elements in the laying down of secondary chromatin structures. Higher order chromatin structure is also considered to play a functional role in nuclear architecture enabling transcription and replication. Direct physical interactions and looping have recently been shown for the beta-globin locus, where a remote enhancer interacts with the globin gene, but our results are the first that reveal epigenetic regulation of long-distance interactions in the genome. The chromatin loop model derived from our results (Fig. 11) describes a simple epigenetic switch by which the Igβ gene (whose promoters are not regulated by DNA methylation) is moved either into an inactive domain, or into an active domain close to enhancers. The present invention could be used to see whether other imprinted genes, or other epigenetically regulated genes located in clusters, possess similar epigenetic switches.
General Discussion
Applications of tracer-target binding assay One of the tracer-target binding assay's main applications is to allow isolation and/or identification of DNA sequences or proteins that interact at a specific target site in vivo. The identification of the isolated DNA or proteins will become increasingly easier as the technologies for screening small amounts of DNA and proteins such as microarray, real time quantitative PCR and mass spectrophotometry coupled with bioinformatics improve. A purpose of the tracer-target binding assay is to look for protein-DNA; protein- protein or DNA-DNA interactions formed due to the higher order chromatin structure. An advantage of using transgenic animals or cells thereof is that chromatin at the same locus can be studied in various tissues, and also different developmental stages, enabling a comparison between the expressed and silent state of the gene.
Protein-DNA interactions occur when transcription factors bind to promoters, enhancers, and other regulatory regions (LCRs insulators, silencers). By introducing a target site for a tracer protein adjacent (in cis) to a regulatory region of interest, proteins that bind to the specific regulatory region can be isolated and identified. This has an advantage over bandshift and footprinting assays in that it is an in vivo reaction. It has an advantage over conventional ChlP assays using antibodies to transcription factors, since it is more specific. Only proteins binding to a specific targeted region will be pulled down by the tracer-target assay.
Protein-protein interactions are required for transcription initiation, elongation, and secondary chromatin folding. The tracer-target binding assay could provide information on the specific protein-protein interactions at a given locus and also be used as an co- immunoprecipitation step in the isolation of proteins in a complex.
DNA-DNA interactions are likely to be the result of secondary chromatin structures and are unlikely to be direct. The tracer-target binding assay is ideal for detecting these type of interactions and is an excellent way of identifying and/or analysing additional regulatory regions of interest. Using the tracer-target binding assay, enhancer sequences that interact with a target sequence can be identified, without any prior knowledge or hypotheses of the enhancer. This will be particularly helpful for characterising different tissue specific enhancers influencing a single gene promoter. Listing of Sequences (All sequences in 5'-3' orientation):
SEQ ID NO: 1 CCT CCT GAA AGA TGA AGC
SEQ K) NO: 2 TCG CCC TAT AGT GAG TCG
SEQ ID NO: 3 CGG AGG ACT GTC CT
SEQ ID NO: 4 AGC TTA TGG ATC GGA GGA CTG TCC TCC GG
SEQ ID NO: 5 ATA CCT AGC CTC CTG ACA GGA GGC CTA G
SEQ ID NO: 6 (pRAY 1 vector; GenBank Accession No. CVU63018):
1 gacgaaaggg cctcgtgata cgcctatttt tataggttaa tgtcatgata ataatggttt
61 cttagacgtc aggtggcact tttcggggaa atgtgcgcgg aacccctatt tgtttatttt
121 tctaaataca ttcaaatatg tatccgctca tgagacaata accctgataa atgcttcaat
181 aatattgaaa aaggaagagt atgagtattc aacatttccg tgtcgccctt attccctttt
241 ttgcggcatt ttgccttcct gtttttgctc acccagaaac gctggtgaaa gtaaaagatg 301 ctgaagatca gttgggtgca cgagtgggtt acatcgaact ggatctcaac agσggtaaga
361 tccttgagag ttttcgcccc gaagaacgtt ttccaatgat gagcactttt aaagttctgc
421 tatgtggcgc ggtattatcc cgtattgacg ccgggcaaga gcaactcggt cgccgcatac
481 actattctca gaatgacttg gttgagtact caccagtcac agaaaagcat cttacggatg
541 gcatgacagt aagagaatta tgcagtgctg ccataaccat gagtgataac actgcggcca 601 acttacttct gacaacgatc ggaggaccga aggagctaac cgcttttttg cacaacatgg
661 gggatcatgt aactcgcctt gatcgttggg aaccggagct gaatgaagcc ataccaaacg
721 acgagcgtga σaccacgatg cctgtagcaa tggcaacaac gttgcgcaaa ctattaactg
781 gcgaactact tactctagct tcccggcaac aattaataga ctggatggag gcggataaag
841 ttgcaggacc acttctgcgc tcggcccttc cggctggctg gtttattgct gataaatctg 901 gagccggtga gcgtgggtct cgcggtatca ttgcagcact ggggccagat ggtaagccct
961 cccgtatcgt agttatctac acgacgggga gtcaggcaac tatggatgaa cgaaatagac
1021 agatcgctga gataggtgcc tcactgatta agcattggta actgtcagac caagtttact
1081 catatatact ttagattgat ttaaaacttc atttttaatt taaaaggatc taggtgaaga
1141 tcctttttga taatctcatg accaaaatcc cttaacgtga gttttcgttc cactgagcgt 1201 cagaccccgt agaaaagatc aaaggatctt cttgagatcc tttttttctg cgcgtaatct 1261 gctgcttgca aacaaaaaaa ccaccgctac cagcggtggt ttgtttgccg gatcaagagc
1321 taccaactct ttttccgaag gtaactggct tcagcagagc gcagatacca aatactgtcc
1381 ttctagtgta gccgtagtta ggccaccact tcaagaactc tgtagcaccg cctacatacc
1441 tcgctctgct aatcctgtta ccagtggctg ctgccagtgg cgataagtcg tgtcttaccg 1501 ggttggactc aagacgatag ttaccggata aggcgcagcg gtcgggctga acggggggtt
1561 cgtgcacaca gcccagcttg gagcgaacga cctacaccga actgagatac ctacagcgtg
1621 agcattgaga aagcgccacg cttcccgaag ggagaaaggc ggacaggtat ccggtaagcg
1681 gcagggtcgg aacaggagag cgcacgaggg agcttccagg gggaaacgcc tggtatcttt
1741 atagtcctgt cgggtttcgc cacctctgac ttgagcgtcg atttttgtga tgctcgtcag 1801 gggggcggag cctatggaaa aacgccagca acgcggcctt tttacggttc ctggcctttt
1861 gctggccttt tgctcacatg ttctttcctg cgttatcccc tgattctgtg gataaccgta
1921 ttaccgcctt tgagtgagct gataccgctc gccgcagccg aacgaccgag cgcagcgagt
1981 cagtgagcga ggaagcggaa gagcgcccaa tacgcaaacc gcctctcccc gcgcgttggc
2041 cgattcatta atgcagctgg cacgacaggt ttcccgactg gaaagcgggc agtgagcgca 2101 acgcaattaa tgtgagttag ctcactcatt aggcacccca ggctttacac tttatgcttc
2161 cggctcgtat gttgtgtgga attgtgagcg gataacaatt tcacacagga aacagctatg
2221 accatgatta cgccaagctg gccatactgg ccgtcgacaa gcttctcgag gaattccgat
2281 catattcaat aacccttaat ataacttcgt ataatgtatg ctatacgaag ttattaggtc
2341 tgaagaggag tttacgtcca gccaagctag cttggctgca ggtcgagcag tgtggttttc 2401 aagaggaagc aaaaagcctc tccacccagg cctggaatgt ttccacccaa tgtcgagcag
2461 tgtggttttg caagaggaag caaaaagcct ctccacccag gcctggaatg tttccaccca
2521 atgtcgagca aaccccgccc agcgtcttgt cattggcgaa ttcgaacacg cagatgcagt
2581 cggggcggcg cggtcccagg tccacttcgc atattaaggt gacgcgtgtg gcctcgaaca
2641 ccgagcgacc ctgcagccaa tatgggatcg gccattgaac aagatggatt gcacgcaggt 2701 tctccggccg cttgggtgga gaggctattc ggctatgact gggcacaaca gacaatcggc
2761 tgctctgatg ccgccgtgtt ccggctgtca gcgcaggggc gcccggttct ttttgtcaag
2821 accgacctgt ccggtgccct gaatgaactg caggacgagg cagcgcggct atcgtggctg
2881 gccacgacgg gcgttccttg cgcagctgtg ctcgacgttg tcactgaagc gggaagggac
2941 tggctgctat tgggcgaagt gccggggcag gatctcctgt catctcacct tgctcctgcc 3001 gagaaagtat ccatcatggc tgatgcaatg cggcggctgc atacgcttga tccggctacc
3061 tgcccattcg accaccaagc gaaacatcgc atcgagcgag cacgtactcg gatggaagcc
3121 ggtcttgtcg atcaggatga tctggacgaa gagcatσagg ggctcgcgcc agccgaactg
3181 ttcgccaggc tcaaggcgcg catgcccgac ggcgaggatc tcgtcgtgac ccatggcgat
3241 gcctgcttgc cgaatatcat ggtggaaaat ggccgctttt ctggattcat cgactgtggc 3301 cggctgggtg tggcggaccg ctatcaggac atagcgttgg ctacccgtga tattgctgaa
3361 gagcttggcg gcgaatgggc tgaccgcttc ctcgtgcttt acggtatcgc cgctcccgat
3421 tcgcagcgca tcgccttcta tcgccttctt gacgagttct tctgagggga tcggcaataa
3481 aaagacagaa taaaacgcac gggtgttggg tcgtttgttc ggatcggccg cgtatcacga 3541 ggccagcttt tcaattcatc attttttttt tattcttttt tttgatttcg gtttccttga
3601 aatttttttg attσggtaat ctccgaacag aaggaagaac gaaggaagga gcacagactt
3661 agattggtat atatacgcat atgtagtgtt gaagaaacat gaaattgccc agtattctta
3721 acccaactgc acagaacaaa aacatgcagg aaacgaagat aaatcatgtc gaaagctaca 3781 tataaggaac gtgctgctac tcatcctagt cctgttgctg ccaagctatt taatatcatg
3841 cacgaaaagc aaacaaactt gtgtgcttca ttggatgttc gtaccaccaa ggaattactg
3901 gagttagttg aagcattagg tcccaaaatt tgtttactaa aaacacatgt ggatatcttg
3961 actgattttt ccatggaggg cacagttaag ccgctaaagg cattatccgc caagtacaat
4021 tttttactct tcgaagacag aaaatttgct gacattggta atacagtcaa attgcagtac 4081 tctgcgggtg tatacagaat agcagaatgg gcagacatta cgaatgcaca cggtgtggtg
4141 ggcccaggta ttgttagcgg tttgaagcag gcggcggaag aagtaacaaa ggaacctaga
4201 ggccttttga tgttagcaga attgtcatgc aagggctccc tatctactgg agaatatact
4261 aagggtactg ttgacattgc gaagagcgac aaagattttg ttatcggctt tattgctcaa
4321 agagacatgg gtggaagaga tgaaggttac gattggttga ttatgacacc cggtgtgggt 4381 ttagatgaca agggagacgc attgggtcaa cagtatagaa ccgtggatga tgtggtctct
4441 acaggatctg acattattat tgttggaaga ggactatttg caaagggaag ggatgctaag
4501 gtagagggtg aacgttacag aaaagcaggc tgggaagcat atttgagaag atgcggccag
4561 caaaactaaa aaactgtatt ataagtaaat gcatgtatac taaactcaca aattagagct
4621 tcaatttaat tatatcagtt attacccgcg gccgatccgt cgaggaattc cgatcatatt 4681 caataaccct taatataact tcgtataatg tatgctatac gaagttatta ggtctgaaga
4741 ggagtttacg tccagccaag ctagcttggc tgcaggtcga gtaccccggg ttcgaaatcg
4801 atagatctgg taaccatgcg gccgcaattc actggccgtc gttttacaac gtcgtgactg
4861 ggaaaaccct ggcgttaccc aacttaatcg ccttgcagca catccccctt tcgccagctg
4921 gcgtaatagc gaagaggccc gcaccgatcg cccttcccaa cagttgcgca gcctgaatgg 4981 cgaatggcgc ctgatgcggt attttctcct tacgcatctg tgcggtattt cacaccgcat
5041 atggtgcact ctcagtacaa tctgctctga tgccgcatag ttaagccagc cccgacaccc
5101 gccaacaccc gctgacgcgc cctgacgggc ttgtctgctc ccggcatccg cttacagaca
5161 agctgtgacc gtctccggga gctgcatgtg tcagaggttt tcaccgtcat caccgaaacg
5221 cgcga
SEO LD NO: 7 (Mus musculus (HI 9) gene, 5' non-transcribed region sequence; GenBank Accession No. MMU19619)
1 agcactcσtg tgtccatatc ccctatgtgt tagcactcct gtgtccatgt cccatctatg
61 tgacacttct gggtgcatat cccatctatg tgacactccc atgtccatgt cccagctatg 121 tcactctttt gtggtgttgt tttgttttgt tttttgagac agggtttctc tgtgtagcct 181 tggctgtcct ggaactcact ttatagacca ggctggcctc gaactcagaa atctgcctgc
241 ctctgcctcc ccaagtcgtg ggaccagcta tgtcactctt atgtaccaac ttatctaagt
301 caatatcata ttgtcatgtt agcatctaca agcttttgtc cagatactag tggcaccacc
361 atgctcacag tggtataaaa gacacattgt gtggttagtt ctatatgggg atggtgtcca 421 gaaggggaca acgggagctg gttcccaact gagagggcca tagtgtgagt ttaggacata
481 agtcttggcc atgaactcag aagagactga ggctcccctg gatgtttcac ttccagtctc
541 tctatgctca ttctttagga acgttactaa gaaggaatct gagaaagcag agctctttct
601 ccaccacttg tctaagttcc acagccaaca acagggctca gactaggctc tcactggata
661 ctagccccca ttagagtacc atgcttcagt taggttctgt ttgcccagga gctgctagcc 721 atcacctagt cctcaatgtc acgtactatt acaatggcca aaacagacta gacttgaggg
781 gaagagcccc cctgccccgc caaagcttac tgccctcatt gtactttctg ccctggcatc
841 cccagattga caaaggcatg ctaacttgtt tctggaagaa ggagatctag ctctatccca
901 tcgaaatgca aatgaaccac taggagttta gctgaccaag gaagctttct gctcactgtc
961 cattcaatgc agtcaaaagt gctgtgacta tacaggagga acatagcagt gctgtgacca 1021 tacaggagga acatagcaga ggctaaaggg ccatggtgca tgtaagtgtg ttctgtgcag
1081 caactgatga ccagacagta ctgagtctgc ctggagσttg agttaaaacc gagaaaatag
1141 ccattgccta cagttcccga atcaccacaa ggaaagaaaa aggttggtga gaaaaataga
1201 gattctattt tcatgtccgg gggatgagcg tgcagggcta tacacccagg actcaaagga
1261 acatgctaca ttcacacgag catccaggag gcataagaat tctgcaagga gaccatgcct 1321 attcttggac gtctgctgga atcagttgtg gggtttatac gcgggagttg tggcccggtg
1381 gcagcaaaat cgattgcgcc aaacctaaag agccccccca cccctggtat tggaattcac
1441 aaatggcaat gctgtgggtc acccaagttc agtacctcag gggggtcaca aatgccacta
1501 ggggggcagg acacatgcat tttctaggct ggtacctcgt ggactcggac tcccaaatca
1561 acaaggtcgg cttactctct gcaaagaatc ctttgtgtgt aaagaccagg gttgcccgca 1621 cggcggcagt gaagtctcgt acatcgcagt cctaaaacgg attgcaactg attgagtttt
1681 ctcccctatc accatctatg atcccatagt catgggcttc atgaggccag gggttcatgc
1741 tagtccttga taaaacgttc tcaagagcta tctcaggtat ctgacttata gggttctggg
1801 gccatcagcg ctattgtgtg aaaattgaag tatagctact atgtgtggtg atcatggaat
1861 gtattgcagt accataatgc agaccccact aagcatggtc ctcaaattct gcacatctat 1921 gaggacacct gacctcccta gctccccaaa gaatcactta aggaaccgcc aacaagaaag
1981 tctggtgatt tgcgctttcg tattctaatt ggggcgttca gatgctaatg atctccggct
2041 cagggctgca aacaattctg aaactgcatt ctctctcaat ggggctcagc tgccagcttc
2101 ggataggaat cctacatcaa gccttgggtg tcagccaaaa tcaattgaag aggcggcagg
2161 acccctggct ggtttgtggc agataatcgt acttatggca aagttggggg ttcacctgtt 2221 tttgcactgt cccggagagc cttgttaaaa ggccaggaag gcaggacacc tattagccct
2281 tccggaatgg tctcctttgg tctcagaccc ataaaccagt gcatgtggtt cgatgccaga
2341 aagcacaaaa gccgttgtga gtggaaagac caaattgctt ggcgctggtg actgtcatct
2401 taaacattat gttccagaga cagccaaagt taaggtttgc ccatgacaat gtccaagggc 2461 caaagttcgg gttcgcccac agcaatgtcc gaagccgcta tgcctcagtg gtcgatatgg 2521 tttataagag gttggaacac ttgtgtttct ggagggggtc ctttggtcac tgaaccccaa 2581 aaaccagcca gtgtggctca ctataggaag gcatagaagc tgttatgtgc aancaaggga 2641 acggarmtac cgcgcggtgg cagcatactc ctatatatcg tggcccaaat gctgccaact 2701 tggggggagc gattcattcc cagcaatatc ccagggtcac ccaaataggg attcataggg 2761 gtggtaagat gtgtgcacct ctggaatggt tcccttacac actgaaccag agaacttgac 2821 tcattcccta cacagcccga gatcgtcagt ggctggtaag accgaagttg ccgagcagcg 2881 aaccagtgca gtccacatac ttatcataga ggtgaccaaa aattgcggtt cacctatggc 2941 aaactcatgg gtcactcaag gcatagcatt caatgttcat aagggtcatg gggtggatac 3001 aacacacatt tcttgggcgc tccttcagtc ttgcgccctt cacgatcgat cggttcactc 3061 tccacgctgt gcagatttgg ctatagctaa atggacagac gatgccgcgt ggtggcagta 3121 caatactaca tattgctcgg cagacgcggc ataggttggg gtccgctgtg acaaagcttg 3181 agtaccccag gttcaacaaa gggatcaggc atttgtgaca ctttacggaa tggtcccctt 3241 ctgccttggg actcctcaag ctagccagat ctgttcaacc aaactcaata cagaatcggt 3301 tataggcggg agacatagaa actgcgcgtg cgtgcgtcca ccgaaacccc atagccataa 3361 aagcagaggc tgggttcaac cattgcaatg tcccaggtaa cctaggaact gtagcaagaa 3421 gttnaaagct gagattagag actaggagaa gtcaggatcc gcttagtttg aagtgaagca 3481 ggagggtggg tagcaagagg gagcataggc ggtgtggcat ggtggggttg tatagcgtgg 3541 gttatagcga gggggtataa tggggaggta tagtggaagt atagtgggag gtatagtagg 3601 ggtatagcgg gggatataga gggagttata gtgggggtat agcgggggca tagcgggggc 3661 ataatggggg acatagtggg gggccatggc aggaggggca tagtcggggg tataatgggg 3721 ggtgtagtgg ggcggtatag tgtgggtata gtggggacat agcgggggta tagtgggggt 3781 atagcaggga gggtatagtg ggggtatagc agagagtatg ggggccatag gggaagtatg 3841 gtggggggta tggcaagggg tatggtgggg aggatgacag gggggatata gcaggggtgt 3901 agtggggagg atatagtggg aagtataggg gagtatagca agggtgatgt ggggggtgat 3961 gggagacaga gaccaactgc caccacacca gtgaatagct tacatgtgat gccaaccctc 4021 ggcctttagt ctgctgatgc tgccagggtt cttgaggctc attagaagat ggggtcattc 4081 ttttccatct tatataccga tctctgaaaa ggatcatttg tttaagagcg cagagatagc 4141 cacaaccacc acaaggtgac agcacagaca cacattacag cgccaagata atagaggctg 4201 gggtctattt atgaccaaat tggaggtgtc cgtcagtatc ataacttagg gtccacccac
4261 agcaatgacc agggcttttc acagagtaaa aatgaaggat cactaggagc cagaaatgag
4321 tactcagctg acatgacgca ctaggctgag gatctgccaa ggtgctattg cactcacata 4381 ggtgagaacc actgctgagt ggtcatgact ggtcagccct tgagtcctcc tcccatctgg 4441 agctccggtc acttccatct agagctgctc tactctctgg tgccctgatt tgtggatgct 4501 gactctgtgg atttcagctg cataactttc aacagaagct aagagcaatg gtatacgcac 4561 gcacgtggct ggagggacag ggcatctctc ctgtccccat tctatcactg atagtagtac 4621 ttcagtagga tagggagacc atactgtgcc ctccggttga ctggggtgac caagaccaga 4681 ctgtaagata actaataaaa ctcatcgcaa gagagtcctg acaaacgtga cgactcaccc 4741 agaaaaggga cggtgaggag tgcccaaatt agctactacc ttgttgaaac cactgσggcc 4801 agataaggtc agtcaagcag actaggacaa aggctggggg ttagccactc gtaggctgtt 4861 catactccgt gggatagtat gagacccatg ccctcaaatt cccattgtgg tcattataca 4921 aactgagtaa agggcagctt cattggcatc ctggggcacc accgttctat gaagggcttc 4981 agcaggctac ggggctatat gttctcgaca tcaagggaga tatctgggga caatgccagg 5041 ccctgtctaa ggattccaaa gtgggagttg tggtgaggct gtctttggag aatttcagga 5101 cgggtgcggg tgtggggtga ggcgaacgtg cgctggaacg atacagggtg gaggtcggcc 5161 ctcgcccgcg gggccatgta ctgattggtt gacagagtag gggcgggaat tc
SEO ID NO: 8 (Nucleotides 3438 to 8436 of Mus musculus (HI9) gene; GenBank Accession No. AF049091)
3438 agg gatcaggcat ttgtgcactt acggaatggt ccccttctgc
3481 cttgggactc ctcaagctag ccagatctgt tcaatccaaa ctcaatacag aatcggttat
3541 aggcgggaga catagaaact gccgcgtgcg tgcgtccacc gaaaccccat agccataaaa
3501 gcagaggctg gggttcaacc attgcaatgt cccaggtaac ctaggaactg tagcaagaag
3661 ttgcaaagct gagattagag actaggagaa gtcagggatc cgcttagttt gaagtgaagc 3721 aggagggtgg gtagcaagag ggagcatagc gggtgtggca tggtggggtt gtatagcgtg
3781 ggttatagcg agggggtata atggggaggt atagtggaag tatagtggga ggtatagtag
3841 gggtatagcg ggggatatag agggagttat agtgggggta tagcgggggc atagcggggg
3901 cataatgggg gacatagtgg ggggcatggc aggaggggca tagtgggggt ataatggggg
3961 tgtagtgggg cggtatagtg tgggtatagt gggggacata gcgggggtat agtgggggta 4021 tagcagggag ggtatagtgg gggtatagca gagagtatgg gggccatagg ggaagtatgg
4081 tggggggtat ggcaaggggt atggtgggga ggatgacagg ggggatatag caggggtgta
4141 gtggggagga tatagtggga agtatagggg agtatagcaa gggtgatgtg gggggtgatg
4201 ggagacagag accaactgcc accacaccag tgaatagctt acatgtgatg gccaaccctc
4261 ggcctttagt ctgctgatgc tgccagggtt cttgaggctc attagaagat ggggtcattc 4321 ttttccatct tatataccga tctctgaaaa ggatcatttg tttaagagcg cagagatagc
4381 cacaaccacc acaaggtgac agcacagaca cacattacag cgccaagata atagaggctg
4441 gggtctattt atgaccaaat tggaggtgtc cgtcagtatc ataacttagg gtccacccac
4501 agcaatgacc agggcttttc acagagtaaa aatgaaggat cactaggagc cagaaatgag
4561 tactcagctg acatgcagcc actaggctga ggatctgcca aggtgctatt gcacctcaca 4621 taggtgagaa ccactgctga gtggtcatga ctggtcagcc cttgagtcct cctcccatct
4681 ggagctccgg tcacttccat ctagagctgc tctactctct ggtgccctga tttgtggatg
4741 ctgactctgt ggatttcagc tgcataactt tcaacagaag ctaagagcaa tggtatacgc
4801 acgcacgtgg ctggagggac agggcatctc tcctgtcccc attctatcac tgatagtagt 4861 acttcagtag gatagggaga ccatactgtg cctccggttg actggggtga ccaagaccag 4921 actgtaagat aactaataaa actcatcgca agagagtcct gacaaacgtg acgactcacc 4981 cagaaaaggg acggtgagga gtgcccaaat tagctactac cttgttgaaa ccactgcggc 5041 cagataaggt cagtcaagca gactaggaca aaggctgggg gttagccact cgtaggctgt 5101 tcatactccg tgggatagta tgagacccat gccctcaaat tcccattgtg gtcattatac 5161 aaactgagta aagggcagct tcagtgtggc agggtgcctg gggcaccacc gttctatgaa 5221 gggcttcagc aggctacggg gctatatgtt ctcgacatca agggagatat ctggggacaa 5281 tgccaggccc tgtctaaggg attccaaagt gggagttgtg gtgaggctgt ctttggagaa 5341 tttcaggacg ggtgcgggtg tggggtgagg cgaacgtgcg ctggaacgat acagggtgga 5401 ggtgggccct gcgggggccc cggcggggcc atgtactgat tggttgacag agtaggggcg 5461 ggaattctgg gcggagccac tccagttaga aaaagcccgg gctagagggc ccgaagcacc 5521 gggtgtggga ggggggtggg gggtgggggt ggggggtatc ggggaaactg gggaagatgg 5581 gagagctgga ggagagtcgt ggggtccgag gagcacctcg gcatctggag tctggcagga 5641 atgttgaagg actgaggggc tagctcaggc agagcaaagg catcgcaaag gctggaaaac 5701 atcggagtga agctgaaggg cctgagctag ggttggagag gaatggggag ccagacattc 5761 atcccggtta cttttggtta caggacgtgg cggctggtcg gataaagggg agctgctggg 5821 aagggttcga ccccagacct gggcagtgaa ggtatagctg gcagcagtgg gcaggtgagg 5881 accgccgtct gctgggcagg tgagtctcct tcttctctct tggcctcgct ccactgacct 5941 tctaaacgaa ggtttagaga gggggcctgg tgagaagaag cggctggcct cgcagcagaa 6001 tggcacatag aaaggcagga tagttagcaa aggagacatc gtctcggggg gagccgagac 6061 agaaggaggc tgggggacca ttggcgaccc caggtggaaa gagctcttag agagaagaaa 6121 gaagaggtgc agggttgcca gtaaagactg aggccgctgc ctccagggag gtgataggag 6181 tccttggaga cagtggcaga gaccatggga tccagcaaga acagaagcat tctaggctgg 6241 ggtcaaacag ggcaagatgg ggtcacaaga cacagatggg tccccagccg ccacaacatc 6301 ccacccaccg taattcactt agaagaaggt tcaagagtgg ctctggcaaa gtcccaagtt 6361 tgccagagcc tcaataactg gagaatggaa aagaagggca gtgcagggtg tcaccagaag 6421 gggagtgggg gctgcaggta tcggactcca gagggatttt acagcaagga ggctgcagtg 6481 ggtccagcct gcagacacac cattcccatg aggcactgcg gcccagggac tggtgcggaa 6541 agggcccaca gtggacttgg tacactgtat gccctaaccg ctcagtccct gggtctggca 6601 tgacagacag aacatttcca ggggagtcaa gggcacagga tgaagccaga cgaggcgagg 6661 caggcggggc agaatgaatg agtttctagg gagggaggtt gggtgcaggt agagcgagta 6721 gctggggtgg tgagccaggg aggcactggc ctccagagtc cgtggccaag gagggccttg 6781 cgggcggcga cggagcagtg atcggtgtct cgaagagctc ggactggaga ctagggtaag 6841 tgtctgtccc gctcgtggtc acccagtctc ctcccacgca agttcaatta actcatgtct 6901 tcatttctcc ctatagccag gtctccagca gaggtggatg tgcctgccag tcactgaagg 6961 cgaggatgac aggtgtggtc aatgtgacag aaagacatga catggtccgg tgtgatggag 7021 aggacagaag ggcagtcatc cagccttctt ggtgagcata ctccctgcca cagggctagt 7081 ccgctcaacc acctaattgt ccacccactc actcaggatt ctgtcctttg cagaacacca 7141 tgggctggcg ccttgtcgta gaagccgtct gttctttcac ttttcccaaa gagctaacac 7201 ttctctgctg ctctctggat cctcctcccc ctaccttgaa ccctcaagat gaaaggtgag 7261 ttctcttctg ccccatgtgg gtgggagagg gtgggatgcc aaggacaggg gtctcattct 7321 ctcccaccca tagaaatggt gctacccagc tcatgtctgg gcctttgaat ccggggactt 7381 ctttaagtcc gtctcgttct gaatcaagaa gatgctgcaa tcagaaccac tacactacct 7441 gcctcaggaa tctgctccaa ggtgagctgg ggcacccttt ggaagcttgc caagcccact 7501 ccccacccca ccccccgccc cacctcattt gtctttattc tctttgcagg tgaagctgaa 7561 agaacagatg gtgtcaacat tttgaaagag cagactcata gcacccaccc acccctgaga 7621 atccatcttc atggccaact ctgcctgacc cgggagacca ccacccacat catcctggag 7681 ccaagcctct accccgggat gacttcatca tctccctcct gtctttttct tcttcctcct 7741 ttcctgtaat tctgtttctt tccttttgtt ccttccttgc ttgagagact caaagcaccc 7801 gtgactctgt ttccccattt accccctttt gaatttgcac taagtcgatt gcactggttt 7861 ggagtcccgg agatagcttt gagtctctcc gtatgaatgt atacagcgag tgtgtaaacc 7921 tctttggcaa tgctgcccca gtacccacct gtcgtccatc tccgtctgag ggcaactggg 7981 tgtggccgtg tgcttgaggc ctcgccttcc cctcgcctag tctggaagca gttccatcat 8041 aaagtgttca acatgcccta cttcatcctt tgcccctcct caccagggcc tcaccagagg 8101 tcctgggtcσ atcaataaat acagttacag tcattggtct cgtggacttc aatataatgc 8161 gactcatggg gggaggttgg gggggcaggg gaaagaaatg agggaagggg aataaaaggg 8221 atagggtaca ccatccatga tgaaggggat aatgaaatgg ggagggcctc taggatggaa 8281 tggggaatag gagatggctt atgaaaaggt gggtggcact gggataggag gctcatggga 8341 tggcacacag cgaaaggcct catgggatgc caagaagcta tggtatggaa ggggcctatg 8401 aatatgaaat ggtgcctgaa gaacgggcaa agcagt
For SEQ ID NOs: 9-62, see Table 1 and description.

Claims

Claims
1. A method for identifying and/or analysing nucleic acid and/or protein associated with a chromosome location in a host, comprising: (i) providing a host including:
(a) an exogenous DNA-binding protein (a "tracer protein"); and
(b) a DNA-binding site located at the specific chromosome location and capable of being bound specifically by the tracer protein;
(ii) allowing the tracer protein to bind to the DNA binding site to form a bound complex in the host;
(iii) identifying nucleic acid and/or protein associated with the bound complex.
2. The method according to claim 1, further comprising the step of isolating the bound complex from the host or from a sample taken from the host before identifying nucleic acid and/or protein associated with the bound complex.
3. The method according to either of claim 1 or claim 2, in which the nucleic acid and/or protein associated with the chromosome location is endogenous.
4. The method according to any preceding claim, in which the DNA-binding site is exogenous.
5. The method according to any preceding claim, in which the DNA binding site is positioned at a gene regulatory element.
6. The method according to any preceding claim, in which the tracer protein is encoded by DNA introduced into the host.
7. The method according to any preceding claim, in which the tracer protein binds DNA without activating and/or modifying the DNA.
8. The method according to any preceding claim, in which the tracer protein comprises an epitope for antibody binding.
9. The method according to any of claims 2 to 8, in which the bound complex is isolated by immuno-precipitation.
10. The method according to any preceding claim, in which the bound complex is analysed by immunofluorescence, antibody array, microarray, and/or quantitative real time PCR
11. The method according to any preceding claim, in which the host is a hybrid formed by crossing a first host containing the tracer protein and a second host containing the DNA binding site.
12. The method according to any preceding claim, in which the host is non-yeast.
13. The method according to any preceding claim, in which the host is an organism, preferably a non-human organism.
14. The method according to claim 12 or claim 13, in which the host is a mouse.
15. A method for identifying and/or analysing endogenous nucleic acid and/or protein associated with a chromosome location in a mouse, comprising:
(i) producing a transgenic tracer mouse comprising an exogenous DNA-binding protein ("tracer protein");
(ii) producing a tiansgenic target mouse comprising an exogenous DNA binding site at the chromosome location, in which the DNA binding site is capable of being bound specifically by the tiacer protein;
(iii) crossing the tracer mouse with the target mouse to produce a hybrid mouse including the tracer protein and the DNA binding site;
Figure imgf000050_0001
(iv) allowing the tracer protein to bind to the DNA binding site to form a bound complex in the hybrid mouse; and
(v) identifying endogenous nucleic acid and/or protein associated with the bound complex.
16. The method according to claim 15, further comprising the step of isolating the bound complex from the hybrid mouse or from a sample taken from the hybrid mouse before identifying endogenous nucleic acid and/or protein associated with the bound complex.
17. The method according to any preceding claim, in which the tracer protein comprises a yeast Gal4 DNA-binding domain fused to a myc epitope tag.
18. The method according to claim 17, in which the DNA binding domain comprises the UAS binding site (SEQ ID NO: 3).
19. The method according to any of claims 1 to 16, in which the tracer protein comprises a zinc finger DNA-binding domain.
20. The method according to claim 19, in which the DNA binding domain comprises a zinc finger sequence specific for the zinc finger protein binding domain.
21. The method according to any of claims 1 to 16, in which the tracer protein comprises a green fluorescent protein fused to a DNA-binding domain.
22. The method according to claim 21, in which the DNA binding domain comprises a DNA sequence specific for the green fluorescent protein DNA-binding domain.
23. The method according to any of claims 1 to 16, in which the DNA-binding site is a transcription factor recognition site, a restriction enzyme recognition site, an enhancer, a silencer, a specifically engineered target site and/or a recombinase enzyme binding site.
24. A method for identifying and/or analysing nucleic acid and/or protein associated with regulating gene expression, comprising:
(i) by the method of any of claims 1 to 23, identifying and/or analysing nucleic acid and/or protein associated with a gene at a chromosome location under a first condition; (ii) by the method of any of claims 1 to 23, identifying and/or analysing nucleic acid and/or protein associated with the gene under a second condition; (iii) comparing the results obtained in step (i) and step (ii); and
(iv) identifying and/or analysing nucleic acid and/or protein associated with regulating gene expression.
25. A method for conducting a drug discovery business, comprising:
(i) by the method of claim 24, identifying and/or analysing nucleic acid and/or protein associated with regulating gene expression; (ii) generating a drug screening assay for identifying and/or analysing agents which inhibit or potentiate regulation of gene expression by the nucleic acid and/or protein identified in step (i);
(iii) conducting animal toxicity profiles on an agent identified and/or analysed in step (ii), or an analogue thereof; (iv) manufacturing a pharmaceutical preparation of an agent having a suitable animal toxicity profile; and
(v) marketing the pharmaceutical preparation to healthcare providers.
26. A method for conducting a bioinformatics business, comprising: (i) by the method of any of claims 1 to 23, identifying and/or analysing nucleic acid and/or protein associated with a gene at a chromosome location under a given condition; and repeating step (i) thereby
(ii) generating a database comprising information identifying and/or analysing different nucleic acid and/or protein associated with one or more genes under one or more conditions.
27. The method according to any preceding claim, in which the nucleic acid is DNA.
28. A host for use in a method according to any of claims 1 to 27, comprising a tracer protein and a DNA-binding site.
29. The host according to claim 29, in which the host is a mouse.
30. The host according to claim 29, in which the host is a human cell.
31. The host according to any of claims 28 to 30, in which the tracer protein comprises a yeast Gal4 DNA-binding domain.
32. The host according to any of claims 28 to 31, in which the DNA-binding site comprises the UAS binding site (SEQ LD NO: 3).
33. A mammal or mammalian cell comprising a yeast Gal4 DNA-binding domain.
34. A mammal or mammalian cell comprising the UAS binding site (SEQ ID NO: 3).
35. The mammal or mammalian cell according to claim 33 or claim 34, in which the mammal or mammalian cell is murine.
36. Use of the host, mammal or mammalian cell according to any of claims 28 to 35 in a method according to any of claims 1 to 27.
37. A nucleic acid construct for insertion of a insertion sequence into a specific location of a host, comprising, in the following order, a first cloning site for insertion of a first nucleic acid sequence homologous to a sequence on one side of the specific location, the insertion sequence, one copy of a direct repeat sequence, a selective marker, a second copy of the direct repeat sequence, and a second cloning site for insertion of a second nucleic acid sequence homologous to a sequence on the other side of the specific location.
38. The nucleic acid construct according to claim 37, in which the insertion sequence is a nucleic acid binding site, for example the UAS binding site (SEQ ID NO: 3).
39. The nucleic acid construct according to claim 37 or claim 38, in which the direct repeats are LoxP sequences.
40. The nucleic acid construct according to any of claims 37 to 39, comprising as neo gene as a selective marker.
PCT/GB2004/002344 2003-06-02 2004-06-02 Identification and/or analysis of nucleic acids and/or proteins associated with a chromosome location WO2004106550A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0312676A GB0312676D0 (en) 2003-06-02 2003-06-02 Binding assay
GB0312676.0 2003-06-02

Publications (2)

Publication Number Publication Date
WO2004106550A2 true WO2004106550A2 (en) 2004-12-09
WO2004106550A3 WO2004106550A3 (en) 2005-03-24

Family

ID=9959202

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2004/002344 WO2004106550A2 (en) 2003-06-02 2004-06-02 Identification and/or analysis of nucleic acids and/or proteins associated with a chromosome location

Country Status (2)

Country Link
GB (1) GB0312676D0 (en)
WO (1) WO2004106550A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007128982A2 (en) 2006-04-07 2007-11-15 Cellcentric Ltd Compositions and methods for epigenetic modification of nucleic acid sequences in vivo
US11466306B2 (en) 2013-02-14 2022-10-11 Osaka University Method for isolating specific genomic regions with use of molecule capable of specifically binding to endogenous DNA sequence

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0953647A1 (en) * 1996-12-16 1999-11-03 Eisai Co., Ltd. Method for preparing retrovirus vector for gene therapy
WO2002040685A2 (en) * 2000-11-16 2002-05-23 Cornell Research Foundation, Inc. Vectors for conditional gene inactivation
WO2003076949A2 (en) * 2002-03-08 2003-09-18 The Babraham Institute Tagging and recovery of elements associated with target molecules
EP1384787A1 (en) * 2002-07-25 2004-01-28 Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts Screening method for the identification and characterization of DNA methyltransferase inhibitors

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0953647A1 (en) * 1996-12-16 1999-11-03 Eisai Co., Ltd. Method for preparing retrovirus vector for gene therapy
WO2002040685A2 (en) * 2000-11-16 2002-05-23 Cornell Research Foundation, Inc. Vectors for conditional gene inactivation
WO2003076949A2 (en) * 2002-03-08 2003-09-18 The Babraham Institute Tagging and recovery of elements associated with target molecules
EP1384787A1 (en) * 2002-07-25 2004-01-28 Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts Screening method for the identification and characterization of DNA methyltransferase inhibitors

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
CARTER D. ET AL.: "Long-range chromatin regulatory interactions in vivo" NATURE GENETICS, NEW YORK, US, vol. 32, no. 4, December 2002 (2002-12), pages 623-626, XP002258764 ISSN: 1061-4036 *
DEKKER J. ET AL.: "Capturing chromosome conformation" SCIENCE (WASHINGTON D C), vol. 295, no. 5558, 15 February 2002 (2002-02-15), pages 1306-1311, XP002301470 ISSN: 0036-8075 cited in the application *
MURRELL A. ET AL.: "Interaction between differentially methylated regions partitions the imprinted genes Igf2 and H19 into parent-specific chromatin loops." NATURE GENETICS, vol. 36, no. 8, August 2004 (2004-08), pages 889-893, XP002301471 ISSN: 1061-4036 *
ORNITZ D.M. ET AL.: "BINARY SYSTEM FOR REGULATING TRANSGENE EXPRESSION IN MICE: TARGETING INT-2 GENE EXPRESSION WITH YEAST GAL4/UAS CONTROL ELEMENTS" PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA, WASHINGTON, US, vol. 88, no. 3, 1 February 1991 (1991-02-01), pages 698-702, XP000175360 ISSN: 0027-8424 *
TOLHUIS B. ET AL.: "Looping and interaction between hypersensitive sites in the active beta-globin locus." MOLECULAR CELL, vol. 10, no. 6, December 2002 (2002-12), pages 1453-1465, XP002301469 ISSN: 1097-2765 cited in the application *
VAN STEENSEL B. ET AL.: "Chromatin profiling using targeted DNA adenine methyltransferase" NATURE GENETICS, NEW YORK, US, vol. 27, no. 3, March 2001 (2001-03), pages 304-308, XP002258763 ISSN: 1061-4036 *
VAN STEENSEL B. ET AL.: "Identification of in vivo DNA targets of chromatin proteins using tethered Dam methyltransferase" NATURE BIOTECHNOLOGY, US, vol. 18, no. 4, April 2000 (2000-04), pages 424-428, XP002187507 ISSN: 1087-0156 *
WILD J. ET AL.: "Targeting and retrofitting pre-existing libraries of transposon insertions with FRT and oriV elements for in-vivo generation of large quantities of any genomic fragment" GENE, ELSEVIER BIOMEDICAL PRESS. AMSTERDAM, NL, vol. 223, no. 1-2, 26 November 1998 (1998-11-26), pages 55-66, XP004153577 ISSN: 0378-1119 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007128982A2 (en) 2006-04-07 2007-11-15 Cellcentric Ltd Compositions and methods for epigenetic modification of nucleic acid sequences in vivo
JP2009532053A (en) * 2006-04-07 2009-09-10 セルセントリック・リミテッド Compositions and methods for epigenetic modification of nucleic acid sequences in vivo
US8298529B2 (en) 2006-04-07 2012-10-30 CellCentric Limited Compositions and method for epigenetic modification of nucleic acid sequences in vivo
US8658393B2 (en) 2006-04-07 2014-02-25 CellCentric Limited Molecules and methods for demethylation of methylated nucleic acid sequences
US11466306B2 (en) 2013-02-14 2022-10-11 Osaka University Method for isolating specific genomic regions with use of molecule capable of specifically binding to endogenous DNA sequence

Also Published As

Publication number Publication date
WO2004106550A3 (en) 2005-03-24
GB0312676D0 (en) 2003-07-09

Similar Documents

Publication Publication Date Title
CN108570479B (en) Method for mediating down producing goat VEGF gene fixed-point knock-in based on CRISPR/Cas9 technology
Clark et al. Chromosomal position effects and the modulation of transgene expression
Tang et al. A Cre/loxP‐deleter transgenic line in mouse strain 129S1/SvImJ
CN107541525A (en) A kind of method knocked in based on CRISPR/Cas9 technologies mediation goat T Beta-4 gene fixed points
CN112852875B (en) Construction method of CD3e transgenic mouse model for tracing tumor T lymphocyte infiltration
CN111201317A (en) Modified Cas9 protein and uses thereof
US6773914B1 (en) PiggyBac transformation system
CN112522264B (en) CRISPR/Cas9 system causing congenital deafness and application thereof in preparation of model pig nuclear donor cells
CN112442515B (en) Application of gRNA target combination in construction of hemophilia model pig cell line
US20040068762A1 (en) Transgenic non-human mammals expressing a reporter nucleic acid under the regulation of androgen response elements
CN114107231B (en) Recombinant adeno-associated virus for realizing whole brain postsynaptic neuron cell body marking and application thereof
WO2004106550A2 (en) Identification and/or analysis of nucleic acids and/or proteins associated with a chromosome location
US6090629A (en) Efficient construction of gene targeting using phage-plasmid recombination
CN106978445A (en) The method of the goat EDAR gene knockouts of CRISPER Cas9 System-mediateds
CN112538497B (en) CRISPR/Cas9 system and application thereof in construction of alpha, beta and alpha & beta thalassemia model pig cell lines
CN113046388B (en) CRISPR system for constructing atherosclerosis pig nuclear transfer donor cells with double genes in combined knockout mode and application of CRISPR system
US6821759B1 (en) Methods of performing homologous recombination based modification of nucleic acids in recombination deficient cells and use of the modified nucleic acid products thereof
CN111118049B (en) Plasmid vector and application thereof
CN106676135A (en) Alb-uPA-teton lentiviral vector and preparation method and application thereof
KR101443937B1 (en) Method for production of TRACP5b
CN112442513B (en) Cas9 overexpression vector and construction method and application thereof
CN111499758B (en) Method for screening human potassium ion channel modulators
CN112877363A (en) Gene editing system for constructing high-quality pig nuclear transplantation donor cells with high lean meat percentage, fast growth and high reproductive capacity and application thereof
CN112522292B (en) CRISPR/Cas9 system for constructing congenital amaranth clone pig nuclear donor cells and application thereof
CN109811008A (en) The method of the mouse FGF5 gene knockout of CRISPR-Cas9 System-mediated

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase