WO2008027548A2 - Cartographie de structure de chromatine à base de microarray - Google Patents

Cartographie de structure de chromatine à base de microarray Download PDF

Info

Publication number
WO2008027548A2
WO2008027548A2 PCT/US2007/019196 US2007019196W WO2008027548A2 WO 2008027548 A2 WO2008027548 A2 WO 2008027548A2 US 2007019196 W US2007019196 W US 2007019196W WO 2008027548 A2 WO2008027548 A2 WO 2008027548A2
Authority
WO
WIPO (PCT)
Prior art keywords
dna
genomic dna
hybridization
level
oligonucleotide
Prior art date
Application number
PCT/US2007/019196
Other languages
English (en)
Other versions
WO2008027548A3 (fr
Inventor
David E. Fisher
Xiaole S. Liu
Jun S. Song
Fatih Ozsolak
Original Assignee
Dana-Farber Cancer Institute, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dana-Farber Cancer Institute, Inc. filed Critical Dana-Farber Cancer Institute, Inc.
Publication of WO2008027548A2 publication Critical patent/WO2008027548A2/fr
Publication of WO2008027548A3 publication Critical patent/WO2008027548A3/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6804Nucleic acid analysis using immunogens

Definitions

  • the invention relates generally to the field of molecular biology. More specifically, the invention relates to gene expression profiling and microarrays. The invention further relates to mathematical methods used to analyze data obtained using microarrays.
  • HDAC histone deacetylase
  • these drugs are believed to inhibit the complexes that promote the formation of stable nucleosomes in the promoter regions, thereby allowing the expression of certain genes that can force a tumor cell to assume a benign phenotype.
  • Microarray-based gene expression analysis and profiling has recently emerged as an important tool in molecular biology.
  • an array of single-stranded DNA oligonucleotide probes each of known sequence and about 25 nucleotides long, is created on a solid substrate such as a glass slide using, e.g., spotting or photolithography.
  • Each spot on the microarray maps with a one-to-one correspondence to a particular nucleotide sequence.
  • Gene expression is then probed by generating a library of dye- labeled complementary DNAs (cDNAs) from total messenger RNA (mRNA, encoding protein products) in a sample of cells, contacting the mixture of cDNAs with the microarray under conditions that permit sequence-specific hydrogen bonding
  • cDNAs dye- labeled complementary DNAs
  • hybridization between the cDNA sequences and the immobilized probes, and then detecting and identifying expressed sequences based on analysis of the pattern of dye attached to the microarray.
  • DNA was isolated with micrococcal nuclease, labeled with Cy3 (green) fluorescent dye, mixed with Cy5 (red)-labeled total genomic DNA 5 and hybridized to microarrays printed with overlapping 50-mer oligonucleotide probes tiled every 20 base pairs across chromosomal regions of interest.
  • the authors of that study used a hidden Markov model to determine nucleosome/linker boundaries and so identified well-positioned (non- delocalized) nucleosomes. Identified nucleosome-free regions of about 150 base pairs were disclosed to include conserved promoter-regions and to occur about 200 base pairs upstream of known coding sequences.
  • the invention is based at least in part on the development by the applicant of a microarray-based method for mapping chromatin structure.
  • Features of the invention include a significantly higher degree of resolution than previously achieved using microarray-based mapping, and application of a method of data analysis not previously applied to optical microarray data. Particularly when used in combination, these features make possible for the first time microarray-based methods of chromatin mapping of highly complex genomes including the human genome.
  • Information derived from the methods of the invention can be used to map the location of certain protein-DNA complexes in genomic DNA. In particular, it is possible using methods of the invention to map the location of so-called positioned nucleosomes on genomic DNA, including human genomic DNA, and thereby determine important biological information such as gene expression and relationships among different types of cells.
  • Methods of the invention will find use whenever it is of interest to characterize a genome, cell, or tissue based on its chromatin structure. More particularly, methods of the invention can be used, without limitation, to perform gene expression profiling, to diagnose and characterize diseases such as cancer and genetic diseases, and to screen for agents useful in the treatment of diseases such as cancer and genetic diseases. Methods of the invention are also useful for identifying transcription start sites, transcriptionally active promoters, transcription factor binding sites, and transcription factors involved in the regulation of gene expression. For example, the methods of the invention are disclosed to be useful for identifying transcription start sites, transcriptionally active promoters, transcription factor binding sites, and transcription factors involved in the regulation of microRNA (miRNA) expression.
  • miRNA microRNA
  • genomic microarrays Prior to the instant invention, the resolution afforded by genomic microarrays was sufficient for mapping positioned nucleosomes on relatively simple genomes, such as for yeast, but insufficient for the same purpose involving a significantly more complex genome, e.g., for a human genome or other mammalian genome.
  • These hurdles have been overcome by the invention, at least in part, by the use of longer oligonucleotide probes and increased tiling density than has previously been used.
  • wavelet denoising greatly reduced the high frequency variance of random noise, thereby greatly improving the quality of data obtained using the microarrays according to the methods of the invention.
  • data analysis is alternatively or in addition improved through the use of masking in order to reduce noise associated with so-called repetitious elements and/or regions characterized by high guanosine-cytidine (GC) content within eukaryotic genomic regions of interest.
  • GC guanosine-cytidine
  • the invention is a method of identifying position of a protein-DNA complex on genomic DNA.
  • the method according to this aspect of the invention includes the steps of fragmenting genomic DNA to yield protein-DNA complexes; isolating DNA from the protein-DNA complexes; hybridizing the isolated DNA with an oligonucleotide microarray, wherein the oligonucleotide microarray is characterized by a plurality of overlapping oligonucleotide probes each 40 to 85 nucleotides long, tiled every 1 to 15 base pairs across at least one genomic region of interest; determining a level of hybridization of the isolated DNA with the oligonucleotide microarray; and identifying position of a protein-DNA complex when the level of hybridization is increased compared to a level of hybridization of randomly fragmented genomic DNA.
  • the invention is a method of identifying position of a protein-DNA complex on genomic DNA.
  • the method according to this aspect of the invention includes the steps of fragmenting genomic DNA to yield protein-DNA complexes; isolating DNA from the protein-DNA complexes; hybridizing the isolated DNA with an oligonucleotide microarray, wherein the oligonucleotide microarray is characterized by a plurality of overlapping oligonucleotide probes each 40 to 85 nucleotides long, tiled every 1 to 25 base pairs across at least one genomic region of interest; determining, using wavelet transform signal processing and denoising, a level of hybridization of the isolated DNA with the oligonucleotide microarray; and identifying position of a protein- DNA complex when the level of hybridization is increased compared to a level of hybridization of randomly fragmented genomic DNA.
  • the invention is a method of identifying position of a protein-DNA complex on human genomic DNA.
  • the method according to this aspect of the invention includes the steps of fragmenting human genomic DNA to yield protein-DNA complexes; isolating DNA from the protein-DNA complexes; hybridizing the isolated DNA with an oligonucleotide microarray, wherein the oligonucleotide microarray is characterized by a plurality of overlapping oligonucleotide probes each 40 to 85 nucleotides long, tiled every 1 to 15 base pairs across at least one human genomic region of interest; determining a level of hybridization of the isolated DNA with the oligonucleotide microarray; and identifying position of a protein-DNA complex when the level of hybridization is increased compared to a level of hybridization of randomly fragmented human genomic DNA.
  • the invention is a method of identifying position of a protein-DNA complex on human genomic DNA.
  • the method according to this aspect of the invention includes the steps of fragmenting human genomic DNA to yield protein-DNA complexes; isolating DNA from the protein-DNA complexes; hybridizing the isolated DNA with an oligonucleotide microarray, wherein the oligonucleotide microarray is characterized by a plurality of overlapping oligonucleotide probes each 40 to 85 nucleotides long, tiled every 1 to 25 base pairs across at least one genomic region of interest; determining, using wavelet transform signal processing and denoising, a level of hybridization of the isolated DNA with the oligonucleotide microarray; and identifying position of a protein-DNA complex when the level of hybridization is increased compared to a level of hybridization of randomly fragmented human genomic DNA.
  • the invention is a method of identifying an expressed gene.
  • the method according to this aspect of the invention includes the steps of fragmenting genomic DNA to yield nucleosomes; isolating DNA from the nucleosomes; hybridizing the isolated DNA with an oligonucleotide microarray, wherein the oligonucleotide microarray is characterized by a plurality of overlapping oligonucleotide probes each 40 to 85 nucleotides long, tiled every 1 to 15 base pairs across at least one genomic region of interest, wherein the genomic region of interest comprises at least a promoter region for a gene; determining a level of hybridization of the isolated DNA with the oligonucleotide microarray; identifying position of a nucleosome when the level of hybridization is increased compared to a level of hybridization of randomly fragmented genomic DNA; and identifying the gene as an expressed gene when the promoter region for the gene is free of nucleosomes.
  • the invention is a method of identifying an expressed gene.
  • the method according to this aspect of the invention includes the steps of fragmenting genomic DNA to yield nucleosomes; isolating DNA from the nucleosomes; hybridizing the isolated DNA with an oligonucleotide microarray, wherein the oligonucleotide microarray is characterized by a plurality of overlapping oligonucleotide probes each 40 to 85 nucleotides long, tiled every 1 to 25 base pairs across at least one genomic region of interest, wherein the genomic region of interest comprises at least a promoter region for a gene; determining, using wavelet transform signal processing and denoising, a level of hybridization of the isolated DNA with the oligonucleotide microarray; identifying position of a nucleosome when the level of hybridization is increased compared to a level of hybridization of randomly fragmented genomic DNA; and identifying the gene as an expressed gene when the promoter region for the gene is free of nucleosomes.
  • the invention is a method of identifying an expressed human gene.
  • the method according to this aspect of the invention includes the steps of fragmenting human genomic DNA to yield nucleosomes; isolating DNA from the nucleosomes; hybridizing the isolated DNA with an oligonucleotide microarray, wherein the oligonucleotide microarray is characterized by a plurality of overlapping oligonucleotide probes each 40 to 85 nucleotides long, tiled every 1 to 15 base pairs across at least one human genomic region of interest, wherein the human genomic region of interest comprises at least a promoter region for a gene; determining a level of hybridization of the isolated DNA with the oligonucleotide microarray; identifying position of a nucleosome when the level of hybridization is increased compared to a level of hybridization of randomly fragmented human genomic DNA; and identifying the gene as an expressed human gene when the promoter region for the gene is free of nucleosomes.
  • the invention is
  • the method according to this aspect of the invention includes the steps of fragmenting human genomic DNA to yield nucleosomes; isolating DNA from the nucleosomes; hybridizing the isolated DNA with an oligonucleotide microarray, wherein the oligonucleotide microarray is characterized by a plurality of overlapping oligonucleotide probes each 40 to 85 nucleotides long, tiled every 1 to 25 base pairs across at least one human genomic region of interest, wherein the human genomic region of interest comprises at least a promoter region for a gene; determining, using wavelet transform signal processing and denoising, a level of hybridization of the isolated DNA with the oligonucleotide microarray; identifying position of a nucleosome when the level of hybridization is increased compared to a level of hybridization of randomly fragmented human genomic DNA; and identifying the gene as an expressed human gene when the promoter region for the gene is free of nucleosomes.
  • the isolated DNA, together with randomly fragmented genomic DNA are hybridized with the microarray in a competitive hybridization reaction.
  • genomic DNA is eukaryotic genomic DNA. In one embodiment the genomic DNA is mammalian genomic DNA. In one embodiment fragmenting the genomic DNA to yield protein-DNA complexes comprises subjecting the genomic DNA to micrococcal nuclease digestion.
  • the isolated DNA is 100 to 200 nucleotides long.
  • the oligonucleotide probes are 50 to 85 nucleotides long. In one embodiment the oligonucleotide probes are 50 to 60 nucleotides long.
  • the oligonucleotide probes are 50 nucleotides long.
  • the oligonucleotide probes are tiled every 10 to 20 base pairs.
  • the oligonucleotide probes are tiled every 10 to 15 base pairs.
  • oligonucleotide probes are tiled every 10 base pairs. In one embodiment determining the level of hybridization comprises masking
  • determining the level of hybridization comprises masking repetitious elements in eukaryotic DNA.
  • the protein-DNA complexes are nucleosomes.
  • the invention is a method of comparing a nucleosome positioning pattern of a test cell to a nucleosome positioning pattern of a reference cell.
  • the method according to this aspect of the invention includes the steps of obtaining a test nucleosome positioning pattern of a test cell using a method of identifying the position of at least one nucleosome according to any one of such foregoing aspects of the invention; obtaining a reference nucleosome positioning pattern of a reference cell using the method of identifying the position of at least one nucleosome according to any one of such foregoing aspects of the invention; and comparing the test nucleosome positioning pattern of the test cell with the reference nucleosome positioning pattern of the reference cell.
  • the reference cell is a normal cell.
  • the reference cell is a cell characteristic of a disease.
  • the reference cell is a cancer cell.
  • test cell is a cell suspected of being a cancer cell and the reference cell is a normal cell.
  • test cell is a cancer cell of a first type and the reference cell is a cancer cell of a second type.
  • test cell is a cell of unknown type and the reference cell is a cell of a known type.
  • test cell is a cell from a first type of tissue and the reference cell is a cell from a second type of tissue. In one embodiment the test cell is a cell from a first subject and the reference cell is a cell from a second subject.
  • test cell has been contacted with a test agent and the reference cell has not been contacted with the test agent.
  • test cell has been contacted with a pharmaceutical agent.
  • pharmaceutical agent is an anticancer agent.
  • pharmaceutical agent is a histone deacetylase inhibitor.
  • the invention is a method of identifying a transcription start site for a microRNA.
  • the method according to this aspect of the invention includes the steps of fragmenting genomic DNA to yield nucleosomes; isolating DNA from the nucleosomes; hybridizing the isolated DNA with an oligonucleotide microarray, wherein the oligonucleotide microarray is characterized by a plurality of overlapping oligonucleotide probes each 40 to 85 nucleotides long, tiled every 1 to 25 base pairs across at least one genomic region of interest, wherein the genomic region of interest comprises a sequence encoding a known or predicted microRNA and 1 kb to 20 kb of sequence upstream therefrom; determining a level of hybridization of the isolated DNA with the oligonucleotide microarray; identifying position of a nucleosome when the level of hybridization is increased compared to a level of hybridization of randomly fragmented genomic DNA; performing chromatin immunoprecipitation (ChIP)
  • hybridizing the isolated DNA with the oligonucleotide array further comprises hybridizing randomly fragmented genomic DNA with the microarray, in a competitive hybridization reaction.
  • the genomic DNA is eukaryotic genomic DNA. In one embodiment the determining the level of hybridization comprises masking repetitious elements in the eukaryotic DNA.
  • genomic DNA is mammalian genomic DNA.
  • genomic DNA is human genomic DNA.
  • fragmenting the genomic DNA to yield nucleosomes comprises subjecting the genomic DNA to micrococcal nuclease digestion.
  • the isolated DNA is 100 to 200 nucleotides long.
  • the oligonucleotide probes are 50 to 85 nucleotides long.
  • the oligonucleotide probes are 50 nucleotides long. In one embodiment the oligonucleotide probes are tiled every 10 to 15 base pairs.
  • the oligonucleotide probes are tiled every 10 base pairs.
  • the determining the level of hybridization comprises masking GC-rich sequences.
  • the determining the level of hybridization of the isolated DNA with the oligonucleotide microarray comprises using wavelet transform signal processing and denoising.
  • FIG. IA is a graph depicting the c-FOS promoter in A375.
  • Light (yellow) ovals represent known nucleosome positions in TIG-3 cells. Dark (red) ovals represent inferred positioned nucleosome locations. Black line is denoised data and lighter (red) line is raw data.
  • Pre-initiation complex binding site (TATA box) and SIE/SRE element locations are indicated by arrows.
  • X-axis indicates relative distance to transcription start site (TSS, black arrow).
  • Y-axis indicates the Iog2 ratios of Cy5 (nucleosomal DNA) and Cy3 (input DNA) signals. Note the high degree of concordance between literature positions and inferred positions.
  • FIG. IB is a graph depicting nucleosome positioning pattern on the Endothelin-1 promoter in seven cell lines as indicated. Heavy black line represents the average signal from all cell lines studied. X-axis, y-axis, dark (red) ovals, and black arrow are as described for FIG. IA, except with reference to Endothelin-1 promoter.
  • FIG. 1 C is a graph depicting BRCA1-NBR2 promoter locus in A375. Data from two replicates are shown. Ovals represent inferred positioned nucleosome locations. Transcription start sites for BRCAl and NBR2 are indicated.
  • FIG. ID is a bar graph depicting abundant histone H3 protein within identified peaks relative to troughs for the A375 BRCA1-NBR2 promoter.
  • FIG. 2A is a graph depicting average signals aligned at transcription start sites for 1181 expressed genes and 1177 unexpressed genes in IMR90.
  • FIG. 2B is a graph depicting nucleosome-free region around transcription start site in IMR90 of expressed genes and genes having a pre-initiation complex (PIC) in the promoter, compared to random distribution of nucleosomes in 1018 unexpressed genes without PIC in promoter.
  • PIC pre-initiation complex
  • FIG. 2C is pair of bar graphs depicting the distribution of positioned nucleosomes in four promoter classes in IMR90.
  • Y-axis indicates the number of positioned nucleosomes found at a certain distance away from TSS normalized by the total number of genes in each class.
  • FIG. 3 is a graph depicting the MALME GABARAP promoter, showing MITF binding sites are mostly nucleosome-free. Black solid line is denoised data. Light (red) line is raw data. ChIP-chip and PhastCons conservation scores are indicated.
  • FIG. 4A is a graph depicting lineage-specific clustering of samples based on nucleosome positioning.
  • FIG. 4B is a graph depicting nucleosome positioning of CDK-SIL VER promoter locus in IMR90, MCF7, T47D, and MEC, non-melanocyte, non-melanoma cell lines that do not express the SILVER gene. SILVER and CDK2 TSS are indicated.
  • FIG. 4C is a graph depicting nucleosome positioning of CDK-SILVER promoter locus in A375, MALME, and primary melanocytes (PM), cell lines and cells that do express the SILVER gene.
  • Ovals (purple) centered at about -1200 and -900 represent melanocyte-specific nucleosomes; ovals (orange) centered at about -600 and -400 represent nucleosomes present in one or two of the cell lines shown; oval (black) centered at about 0 represents A375-specific nucleosome.
  • SILVER and CDK2 TSS are indicated.
  • FIG. 5 A is a graph depicting raw Cy5/Cy3 ratios of all probes in Input/Input negative control hybridization.
  • FIG. 5B is a graph depicting wavelet denoised Iog2 Cy5/Cy3 ratios of all probes in Input/Input negative control hybridization.
  • FIG. 6 is a graph depicting quantile normalized and scaled Cy5/Cy3 signals for eight cell types indicated.
  • FIG. 7 is a graph depicting raw Cy5/Cy3 (black, darker) and wavelet denoised
  • FIG. 8 is a graph depicting percentage of promoter DNA with well positioned nucleosomes for eight cell types indicated.
  • X-axis represents minimum peak-to-trough ratios.
  • FIG. 9 is a graph depicting the distribution of peak-to-trough ratios for all cell lines tested and for Input/Input (curve shown with nodes).
  • FIG. 1OA is a graph depicting auto-correlation of Cy5/Cy3 signals on 100 random promoters in IMR90. Horizontal lines at +0.2 and -0.2 on the y-axis indicate 95% confidence interval.
  • FIG. 1OB is a graph depicting auto-correlation of Cy5/Cy3 signals on 100 random promoters in the Input/Input experiment. Horizontal lines at +0.2 and -0.2 on the y-axis indicate 95% confidence interval.
  • FIG. 11 is a graph depicting known nucleosome positions (light oval, yellow) and inferred nucleosomes (dark ovals, red) for CDC25C in A375.
  • FIG. 12 is a graph depicting known nucleosome positions (light ovals, yellow) and inferred nucleosomes (dark ovals, red and orange) for GADD45A in A375. Peaks with a peak-to-trough ratio greater than 1.4 correspond to inferred nucleosomes at positions between -200 and -1200. Peak with a peak-to-trough level less than 1.4 corresponds to inferred nucleosome centered near 0.
  • FIG. 13 is a graph depicting known nucleosome positions (light ovals, yellow) and inferred nucleosomes (dark ovals, red and orange) for IFNBl in A375. Peaks with a peak-to-trough ratio greater than 1.4 correspond to inferred nucleosomes at positions between about 0 and +100 and between about -800 to -1200. Peak with a peak-to-trough level less than 1.4 corresponds to inferred nucleosome centered near -200.
  • FIG. 14 is a graph depicting known nucleosome positions (light ovals, yellow) and inferred nucleosomes (dark ovals, red and orange) for IL12A in IMR90.
  • Peaks with a peak-to-trough ratio greater than 1.4 correspond to inferred nucleosomes at positions between about -500 and -600 and between about -700 and -800. Peaks with a peak-to- trough level less than 1.4 correspond to inferred nucleosomes centered near -900 and -1000.
  • FIG. 15 is a graph depicting known nucleosome positions (light oval, yellow) and inferred nucleosomes (dark ovals, red) for IL2RA in IMR90.
  • FIG. 16 is a graph depicting known nucleosome positions (light oval, yellow) and inferred nucleosomes (dark ovals, red) for IL2 in MEC.
  • FIG. 17 is a graph depicting known nucleosome positions (light ovals, yellow) and inferred nucleosomes (dark ovals, red and orange) for PF4 in IMR90. Peaks with a peak-to-trough ratio greater than 1.4 correspond to inferred nucleosomes at positions between about 0 and +150, between about -400 and -550, between about -600 and -750, and between about -900 and -1050. Peak with a peak-to-trough level less than 1.4 corresponds to inferred nucleosome centered near -100. ETSl binding sites are indicated.
  • FIG. 18A is a graph depicting the CCNI promoter in IMR90. Dark (blue) line represents denoised data and light (red) line the PhastCons scores obtained from UCSC genome database. Ovals are identified positioned nucleosome locations. Black bar indicates locations of PICs from Kim TH et al. (2005) Nature 436:876-80.
  • FIG. 18B is a bar graph depicting concentration of histone H3 ChIPs in identified peak regions compared to trough regions.
  • FIG. 18C is a bar graph depicting presence of PICs in the CCNI promoter region in IMR90.
  • FIG. 18D is a bar graph depicting enrichment of histone H3 immunoprecipitation relative to anti-HA (negative control) pulldown from chromatin isolated from IMR90 cells that were fixed following MNase treatment.
  • FIG. 18E is a bar graph depicting enrichment of TAFl immunoprecipitation relative to anti-HA (negative control) pulldown from chromatin isolated from IMR90 cells that were fixed following MNase treatment.
  • FIG. 19 is a graph depicting the CBLLl promoter in IMR90. Black bar indicates the location of PIC binding site described by Kim TH et al. (2005) Nature 436:876-80.
  • FIG. 19B is a bar graph depicting histone H3 ChIPs at nucleosome resolution in the CBLLl promoter in IMR90.
  • FIG. 2OA is a graph depicting the UGDH promoter in IMR90.
  • FIG. 2OB is a bar graph depicting histone H3 ChIPs at nucleosome resolution in the UGDH promoter in IMR90.
  • FIG. 2OC is a bar graph depicting anti-TAFl ChIPs at nucleosome resolution in the UGDH promoter in IMR90.
  • FIG. 21 A is a graph depicting average signals aligned at transcription start sites for 1611 expressed genes and 976 unexpressed genes in A375.
  • FIG. 21 B is a graph depicting average signals aligned at transcription start sites for 957 expressed genes and 1555 unexpressed genes in MALME.
  • FIG. 21C is a graph depicting average signals aligned at transcription start sites for 1181 expressed genes and 1177 unexpressed genes in IMR90.
  • FIG. 22 A is a bar graph depicting distribution of peak locations on expressed (filled, black) and unexpressed (hatched, red) genes in A375.
  • FIG. 22B is a bar graph depicting distribution of peak locations on expressed (filled, black) and unexpressed (hatched, red) genes in MALME.
  • FIG. 22C is a bar graph depicting distribution of peak locations on expressed
  • FIG. 23 A is a graph depicting p-values indicating the significance of finding more troughs near transcription start sites on expressed genes compared to unexpressed genes in three cells lines as indicated.
  • the horizontal line represents a significance level of ⁇ .05.
  • FIG. 23B is a graph depicting percentage of expressed genes with a qualifying trough at transcription start site in three cell lines as indicated.
  • FIG. 24 A is a series of three bar graphs depicting histograms of peak-to-trough ratios for expressed (filled, black) and unexpressed (hatched, red) genes, for each of three cells lines as indicated.
  • Y-axis represents the number of peaks with a given peak-to- trough ratio, normalized by the total number of expressed or unexpressed genes, and x- axis represents peak-to-trough ratios.
  • FIG. 24B is a series of three graphs depicting cumulative histograms of peak-to- trough ratios for expressed (filled, black) and unexpressed (hatched, red) genes, for each of three cells lines as indicated.
  • FIG. 25 A is a graph depicting a scatter plot of peak-to-trough ratios of all peaks on expressed genes and the genes' expression values. Lowess curve is shown.
  • FIG. 25B is a graph depicting a scatter plot of peak-to-trough ratios of all peaks on unexpressed genes and the genes' "expression values”. Lowess curve is shown.
  • FIG. 26 is a graph depicting Lowess curves for scatter plots of peak-to-trough ratios of all peaks and corresponding locations in IMR90 demonstrating random distribution of TFII complexes and nucleosomes.
  • FIG. 27 is a graph depicting EGFR promoter in MCF7 and T47D. The dotted line shows the PhastCons conservation scores. Transcription factor binding sites are indicated as rectangles as follows (left to right): VDRE, SPl, and unknown.
  • FIG. 28 is a graph depicting GC content and PhastCons scores averages over all
  • FIG. 29 A is graph depicting average GC content for 17,993 RefSeq genes.
  • FIG. 29B is a graph depicting separate average A, T, G, and C counts for 17,993 RefSeq genes.
  • FIG. 29C is a graph depicting a zoomed-in version of FIG. 29B.
  • FIG. 30 is a graph depicting cumulative histograms of PhastCons scores for 2000 peaks and troughs in A375.
  • FIG. 31 is a graph depicting nucleosome positioning and RNA Polymerase II (Pol 2), H3K4me3, and H3K4/9Ac occupancy surrounding microRNA miR-21 promoters.
  • a nucleosome-depleted area (bottom panel, arrow at 55270243) approximately 700 base pairs (bp) upstream of published position (bottom panel, arrow at 55270965) is likely to be the miR-21 TSS.
  • Nucleosome positioning in promoter regions is thought to underlie regulation of gene expression.
  • a high-resolution tiling microarray approach was developed according to the instant invention and used to identify the translational positions of nucleosomes in the promoters of 3692 genes, including all genes in the Affymetrix Gl 10 Cancer Array, within human primary fibroblast, primary melanocyte, mammary epithelial cell, two melanoma, and two breast cancer cell lines. It was found that expressed genes or genes having a transcription pre-initiation complex (PIC) have a characteristic nucleosome-free region at the transcription start site, flanked by positioned nucleosomes.
  • PIC transcription pre-initiation complex
  • nucleosome The fundamental unit of chromatin is the nucleosome, a dynamic bead-like structure consisting of 147 base pairs (bp) of DNA wrapping around a histone core. It is established that nucleosomal DNA is less accessible to regulatory factors than protein- free DNA. Anderson JD et al. (2000) JM?/ Biol 296:979-87. Many cellular processes converge on nucleosomes and affect their composition and localization. Mellor J (2005) MoI Cell 19:147-57. These observations suggest the existence of a transcriptional regulation mechanism dependent on nucleosome structure and positioning. Workman JL et al. (1998) Annu Rev Biochem 67:545-79.
  • nucleosomes are probably most critical in the promoter and enhancer regions, where gene expression and henceforth the cellular phenotype are regulated. Identifying nucleosome positions in promoter regions is therefore important for understanding how chromatin structure relates to its function.
  • a recent study reported a majority of nucleosomes to be positioned and most of the occupied transcription factor binding sites to be nucleosome-free in Saccharomyces cerevisiae. Yuan GC et al. (2005) Science 309:626-30. In the human genome, however, there are only a handful of characterized positioned nucleosomes within promoters, where they were implicated in regulating gene expression (FIG. IA).
  • these microarrays contained overlapping 50-mer probes tiled every 10 bp to cover the promoter [-1250, +250] bp relative to the transcription start site (TSS) of 3692 human RefSeq genes, which included all genes on the Affymetrix Gl 10 cancer microarray plus 2346 randomly selected RefSeqs. Specialized signal processing techniques were employed to identify positioned nucleosomes. Coiflet4 wavelet decomposition with soft-thresholding at level 2 followed by outlier averaging was used to remove much of the high-frequency noise (Example 3). The denoised data from all samples were quantile normalized and globally scaled to have a median probe ratio of 1.0.
  • a Laplacian of Gaussian (LoG) edge detection method was used to detect peaks on the promoters. Since nucleosomal DNA is often 147 bp in size, a positioned nucleosome was only called if its peak spanned between 10-20 probes with a peak-to-trough ratio of 1.4 (Example 4). Among the cell lines tested, positioned nucleosomes occupied on average 24 ⁇ 3% of promoter regions investigated. To determine the reproducibility of this approach, biological replicate experiments were conducted on the A375 cell line. The correlation coefficient between the raw probe ratios in the two biological replicates was 0.94, and the peaks identified from the two replicates agreed by at least 79%, a significant level allowed by the current microarray technology (Example 6).
  • FIG. IA To biochemically validate the observed nucleosomes, chromatin immunoprecipitations were performed at nucleosome resolution using anti-histone H3 antibody. Then site- directed quantitative polymerase chain reactions (qPCR) were conducted on different regions of the BRCAl, CCNl, UGDH, and CBLLl promoters. In all cases, histone- associated DNA appeared with higher abundance in the identified peaks compared to the neighboring troughs and paralleled the patterns observed on microarrays (FIG. ID and Example 9).
  • Nucleosomes are important regulators of transcription because they affect DNA accessibility. Recently reported studies in yeast showed that active regulatory regions and TSS are relatively depleted of nucleosomes. Yuan GC et al. (2005) Science
  • nucleosome positioning is critical for regulating gene expression, and differences in gene expression underlie cellular differentiation, it is expected that nucleosome positioning may exhibit a cell-type specific (stereotypical) pattern which could "define” the lineage of origin.
  • hierarchical clustering of samples was performed based on nucleosome locations (Example 17); it was observed that cells were clustered by tissue of origin, although four of the samples were cancer cell lines (FIG. 4A). This clustering was robust with different peak-filtering cutoffs, using only the randomly selected genes or all the genes on the array. Although most of the identified positioned nucleosomes were conserved across several samples, the differences might be cell lineage specific and have biological significance.
  • the SILVER gene is active in melanocytes and melanoma where MITF is expressed, but not in breast epithelial cells and breast cancer cells where MITF is absent.
  • the TSS of SILVER was found to be covered by a positioned nucleosome in IMR90, MCF7, T47D, and MEC (FIG. 4B), and nucleosome-free in A375, MALME, and melanocytes (FIG. 4C).
  • cancer lines clustered with early-passage normal cells based on their tissue of origin were significant. Despite all the changes required to transform into a malignant state, the cells may preserve most of their parental nucleosome locations. While certain differences were observed between nucleosome positions of benign and cancer cells from the same lineage, clustering of tumor lines with their normal counterparts supports the existence of multiple pathways to malignant transformation in a given lineage. Analysis of the "classifying" genes (based on nucleosome positioning differences) may provide key insights to pathways of transformation and their drug targeting. These approaches may have important applications in cancer diagnosis and subclassification.
  • a framework was developed that combines experimental and computational approaches to map chromatin structure at high resolution in the human genome. According to results obtained using the methods of the invention, functional cis- regulatory elements such as TSS and transcription factor binding sites (at least for MITF) tend to be nucleosome-free. hi addition, results obtained using these methods suggest that nucleosome positioning may be a lineage-specific marker that is largely preserved in carcinogenesis.
  • the methods of the invention can be used to extend nucleosome mapping to the whole human genome, examining these patterns in more cell lineages and tumors, and determining how changes in nucleosome positioning mechanistically contribute to carcinogenesis and development.
  • FIG. 1 illustrates that positioned nucleosomes are detected in a reproducible manner and identified positioned nucleosomes correlate well with the literature.
  • FIG. IA shows results for the c-FOS promoter in A375. Light (yellow) ovals represent known nucleosome locations in the TIG-3 cells (Fivaz J et al. (2000) Gene 255:169-84) and dark (red) ovals indicate inferred nucleosome positions.
  • nucleosome locations were identified which agreed with published observations in the following genes: IL12A, IFNBl, GADD45, IL2, CDC25C, PF4, and IL2RA. Goriely S et al.
  • FIG. IB shows the nucleosome positioning pattern on the Endothelin-1 promoter is similar in all lines studied.
  • FIG. 1C shows how the BRCA1-NBR2 promoter locus exemplifies the reproducibility of the methods of the invention.
  • FIG. ID shows results from histone H3 chromatin immunoprecipitations (ChIP) at nucleosome resolution performed for the A375 BRCA1-NBR2 promoter. Primers designed to the identified peaks and troughs were used to determine the relative levels of signal amplification in ChIP samples relative to genomic DNA. Primers for the troughs centered at locations -900 and -500 could not be designed due to the local sequence characteristics of the regions. The figure indicates abundant histone H3 protein within the identified peaks relative to troughs.
  • FIG. 2 illustrates that expressed genes or genes having a PIC located in their promoters have a nucleosome-free region around TSS.
  • the promoters were aligned based on TSS and average signals were calculated.
  • Expressed genes have a characteristic depletion of nucleosomes around TSS in IMR90 and other cell lines (Examples 11-14).
  • FIG. 2B shows that IMR90 promoters were partitioned into four classes depending on the presence of a PIC and their expression status.
  • Expressed genes or genes having a PIC in the promoter have a nucleosome-free region around TSS in IMR90, whereas unexpressed genes that do not have PICs have a random distribution of nucleosomes in their promoters.
  • FIG. 2C shows the distribution of positioned nucleosomes in the four promoter classes in IMR90.
  • the distribution of positioned nucleosomes in unexpressed genes not having a PIC (lower panel, light (yellow) bars) is uniform, whereas expressed genes or genes having a PIC tend to have much fewer positioned nucleosomes around TSS and more surrounding the TSS.
  • FIG. 3 illustrates that MITF binding sites are mostly nucleosome-free.
  • the figure represents the MALME GABARAP promoter.
  • Black solid line is the denoised data, red the raw data, blue with dots the ChIP-chip signal (Example 15) and thin blue the PhastCons conservation scores.
  • the ChIP-chip signal covers a broad region including three E-boxes (CACGTG at -606, CATGTG at -633 and at -1100).
  • the maximum ChIP-chip signal comes from location -600 where two E-box elements are present in a trough. Troughs on this promoter tend to be highly conserved, and this seems to be a general pattern (p ⁇ 10 "10 ; Example 19).
  • FIG. 4 illustrates that nucleosome positioning in human promoters may be lineage-specific.
  • FIG. 4A shows that samples were clustered based on the translational positioning of nucleosomes using several peak-finding cutoffs and clustering methods (Example 17). The samples clustered in distinct groups correlating robustly with their tissue of origin. The difference in nucleosome positioning of CDK2-SILVER promoter locus among the samples in FIG. 4B and FIG. 4C agrees with biological expectations, since the SILVER gene is expressed selectively in melanocytes and melanomas. Du J et al. (2004) Cancer Cell 6:565-76.
  • SILVER TSS The presence of a positioned nucleosome on the SILVER TSS and the surrounding compact nucleosome pattern may be important in keeping SILVER silent in IMR90, MCF7, T47D and MEC.
  • Melanocyte-specific nucleosomes, A375-specific nucleosomes, and nucleosomes present only in one or two of the cell lines are shown.
  • A375 there is a positioned nucleosome at the CDK2 TSS even though CDK2 is expressed.
  • This TSS might have been incorrectly annotated, because in all samples there is a low-signal region centered at -99, suggesting that the actual TSS may be located -100 bp upstream of the annotated site.
  • Methods of the invention relate generally to identifying non-delocalized position of a protein-DNA complex on genomic DNA.
  • a protein-DNA complex refers to any complex formed between DNA and at least one protein.
  • the protein-DNA complex typically involves non-covalent interaction between the DNA and the protein components of the complex.
  • Protein-DNA complexes include but are not limited to nucleosomes, pre-initiation complexes, complexes formed between DNA and a transcription factor, complexes formed between DNA and a DNA repair enzyme, and complexes formed between DNA and a centromere protein.
  • a protein-DNA complex is a nucleosome.
  • the invention also relates in part to methods useful for understanding transcriptional regulation of microRNA (miRNA).
  • MicroRNAs are small -22nt RNAs derived from large primary miRNA (pri-miRNA) precursors, which are subsequently processed to the mature form by endonucleases. Bartel DP et al. (2004) Cell 116:281; Cullen BR et al. (2004) MoI Cell 16:861 ; Lee Y et al. (2002) EMBO J 21:4663. Although a growing number of studies implicates miRNAs as key posttranscriptional repressors of gene expression important in normal human physiology and disease (Esquela-Kerscher A et al.
  • H3K4me3 trimethylation of lysine 4 of histone 3
  • H3K4/9Ac acetylation of lysine 4/9 of histone 3
  • the instant invention combines global nucleosome positioning patterns with ChlP-chip screens locating the histone modification markers of active promoters to yield the TSSs of transcriptionally-active pri-miRNAs at high resolution in a global manner.
  • the inventors have mapped the transcription start sites (TSS) of 175 human miRNAs by combining global nucleosome positioning analyses with chromatin signatures for promoters.
  • miRNA promoters identified using methods of the invention 90 were intergenic miRNAs with novel TSSs, 65 were intronic miRNAs with TSS shared with their host genes, and 30 were intronic miRNAs with novel TSSs. Some miRNAs are organized in clusters located close to each other, and the expressions of miRNAs in clusters have been reported to be correlated. Baskerville S et al. (2005) RNA 11:241; Liang Y et al. (2007) BMC Genomics 8:166. Furthermore, for few human and fruit-fly miRNA clusters, the presence of polycistronic transcripts has been shown. Lee Y et al. (2002) EMBOJ21:4663; Aravin AA et al.
  • genomic DNA refers to DNA as it occurs in a cell, e.g., chromatin.
  • genomic DNA refers to isolated genomic DNA. Isolated genomic DNA is genomic DNA that is removed from the cell environment in which it is found in nature. Isolated genomic DNA can but need not necessarily be purified genomic DNA, where purified refers to being free of or at least substantially free of other material. In one embodiment genomic DNA can refer to the entirety of DNA in a genome.
  • genomic DNA can refer to less than the entirety of DNA in a genome, e.g., it can refer to a single chromosome or even just a portion of a single chromosome spanning at least 1 kilobase, but more typically spanning at least hundreds or thousands of kilobases. Except as otherwise specified, genomic DNA refers to unfragmented genomic DNA.
  • the genomic DNA is eukaryotic genomic DNA, i.e., genomic DNA that is derived from a eukaryotic organism.
  • Eukaryotic organisms include both eukaryotic animals and eukaryotic plants.
  • a eukaryotic organism is a eukaryotic animal.
  • derived from a eukaryotic organism includes being derived from a eukaryotic cell or from a population of eukaryotic cells, e.g., a cell line derived from a eukaryotic organism. Examples of eukaryotic cell lines are widely known in the art and many are commercially available from, for example, American Type Culture Collection, Manassas, VA.
  • genomic DNA is mammalian genomic DNA, i.e., genomic DNA that is derived from a mammal.
  • derived from a mammal includes being derived from a mammalian cell or from a population of mammalian cells, e.g., a cell line derived from a mammalian organism. Examples of mammalian cell lines are widely known in the art and many are commercially available from, for example, American Type Culture Collection, Manassas, VA.
  • genomic DNA is human genomic DNA, i.e., genomic DNA that is derived from a human.
  • derived from a human includes being derived from a human cell or from a population of human cells, e.g., a cell line derived from a human. Examples of human cell lines are widely known in the art and many are commercially available from, for example, American Type Culture Collection, Manassas, VA. Human cell lines specifically include, but are not limited to, IMR90, A375, MALME, T47D, and MCF7.
  • fragmenting genomic DNA and fragmented DNA refer to the process and product, respectively, of breaking genomic DNA into fragments that are smaller (shorter) than the starting material genomic DNA.
  • the fragmenting generally can be accomplished using any suitable method.
  • fragmenting is accomplished using at least one enzymatically active enzyme that, alone or used in combination with at least one other enzymatically active enzyme, is capable of cleaving double-stranded DNA.
  • fragmenting involves subjecting genomic DNA to micrococcal nuclease digestion.
  • fragmenting is accomplished using a non-enzymatic method.
  • Such methods are known in the art and can include chemical digestion (for example, hydroxyl radical-based reaction), x-ray and/or other high energy electromagnetic irradiation, and/or sonication. These non- enzymatic methods, in particular, generally are random or nonspecific.
  • the sizes of fragments can be controlled in certain embodiments by the duration of exposure of the genomic DNA to the fragmenting principle, be it enzymatic or non-enzymatic. More particularly, all else being the same, shorter duration exposure to a fragmenting principle generally results in larger fragments than longer duration exposure to the fragmenting principle.
  • fragment sizes can be subsequently selected by using any suitable molecular sizing method, e.g., agaraose gel electrophoresis (AGE), capillary gel electrophoresis (CGE), or polyacrylamide gel electrophoresis (PAGE).
  • AGE agaraose gel electrophoresis
  • CGE capillary gel electrophoresis
  • PAGE polyacrylamide gel electrophoresis
  • fragmenting involves subjecting genomic DNA to micrococcal nuclease digestion. Micrococcal nuclease digestion is particularly useful whenever it is desired to fragment genomic DNA to yield protein- DNA complexes.
  • micrococcal nuclease digestion is particularly useful whenever it is desired to fragment genomic DNA to yield nucleosomes.
  • Nucleosomes are reported to bind DNA particularly tightly, and MNase treatment of genomic DNA generally results in digestion of linker (internucleosomal or uncomplexed) DNA without digesting DNA present within nucleosomes.
  • linker internal or uncomplexed
  • MNase digestion of genomic DNA can yield protein-DNA complexes that are nucleosomes, the DNA of which is typically sized at about 147 base pairs.
  • randomly fragmented genomic DNA refers to DNA derived from genomic DNA that is fragmented independent of DNA sequence or chromatin structure.
  • input genomic DNA can be fragmented by first stripping it of proteins, e.g., by proteinase K treatment followed by phenol/chloroform extraction and ethanol precipitation, and then treating the resulting DNA with any suitable fragmentation principle that produces randomly fragmented genomic DNA.
  • the input genomic DNA fragment sizes can be subsequently selected by using appropriate conditions of duration of contact with the fragmentation principle and the strength of the fragmentation principle.
  • input genomic DNA fragment sizes can be selected using any suitable molecular sizing method, e.g., AGE, CGE, or PAGE.
  • the input genomic DNA fragments are generated and/or selected so as to correspond in size to DNA fragments isolated from protein-DNA complexes.
  • the DNA is isolated from the protein-DNA complexes.
  • the isolation of DNA from the protein-DNA complexes can be accomplished using any suitable method.
  • the protein-DNA complexes are treated with proteinase K followed by phenol/chloroform extraction and ethanol precipitation.
  • genomic DNA is first fragmented into protein-DNA complexes and the DNA subsequently isolated from the complexes, and the resulting isolated DNA is compared to DNA that has first been isolated from input genomic DNA and then subsequently (and randomly) fragmented.
  • These two populations of isolated DNA differ inasmuch as the former is enriched for DNA originally associated with protein-DNA complexes while the latter includes but is not enriched for DNA originally associated with protein-DNA complexes.
  • the isolated DNA can be labeled so that it can be detected upon hybridization with probes on the microarray. While any suitable labeling method can be used, typically the isolated DNA is labeled with at least one fluorescent dye, e.g., Cy3 (green) or Cy5 (red). In one embodiment isolated DNA from the protein-DNA complexes is labeled with Cy5. In one embodiment isolated DNA from the protein-DNA complexes is labeled with Cy3. Particularly but not exclusively for use in two-dye scanning, in one embodiment isolated DNA from randomly fragmented genomic DNA is labeled with one or more labels that are distinct from any label or labels on the isolated DNA from the protein-DNA complexes. In the absence of amplification the sample can be labeled via enzymatic or biochemical methods for labeling nucleic acid samples readily available in the art (e.g., Molecular Probes, Eugene, OR).
  • fluorescent dye e.g., Cy3 (green) or Cy5 (red).
  • isolated DNA from the protein-DNA complexes is labeled
  • 1-5 ⁇ g of sample DNA is labeled with Cy5 using the BioPrime Klenow labeling kit (Invitrogen) following manufacturer's instructions, an equal mass amount of reference or comparison DNA is similarly labeled with Cy3, and the two differentially labeled DNAs are then mixed and concentrated.
  • Additional green fluorescent dyes useful for microairay applications include, without limitation, Alexa 532, POPO-3, PO-PRO-3, Alexa 546, Alexa 555, Alexa 568, and Cy3.5.
  • Additional red fluorescent dyes useful for microarray applications are known and include, without limitation, BODIPY 630/650, Alexa 633, Alexa 647, BODIPY 650/665, Alexa 660, Cy5.5, and Alexa 680.
  • Isolated DNA is hybridized with an oligonucleotide microarray.
  • hybridizing refers to contacting sample DNA and/or control DNA with a microarray, under suitable conditions and for a sufficient amount of time, to permit formation of sequence-specific hydrogen bonding between complementary sequences present in the sample and/or control DNA, on the one hand, and immobilized probe DNA on the other hand.
  • Methods for selecting suitable conditions and times are generally well known in the art and include, for example, optimization of factors including melting temperature (Tm), pH, and concentration of sample DNA.
  • the optimization of Tm in array design can be to fix it to one value (e.g., with 50mers, the Tm is usually chosen as 76 0 C).
  • the Tm is usually chosen as 76 0 C.
  • high resolution needs to be maintained at all regions covered, which can be achieved by fixed spacing of probes.
  • strict Tm optimization can be achieved by changing the length of the probes. The spacing between the probes can be constant, but the probe length might be variable to fix the Tm.
  • the probe is a 50 nt sequence and the Tm is between 72-79 0 C.
  • a probe will generally have a Tm between 72-79 0 C.
  • mixed labeled DNAs are pipetted onto a single microarray and incubated at 65 0 C for 4 hours.
  • sample and reference or comparison DNAs are combined or otherwise allowed to hybridize simultaneously with a microarray.
  • This embodiment is particularly suited for methods where sample and reference or comparison DNAs are differentially labeled.
  • sample and reference or comparison DNAs are differentially labeled.
  • differentially labeled sample and reference or comparison DNAs compete for hybridization to probes on a single microarray, and then, following removal of unbound material, a laser scanner reads the fluorescence intensities and wavelengths at each spot on the microarray. Presence of a signal at a given wavelength at a given position on the microarray can then be associated with the presence of a particular sequence in either or both of the sample and the reference or comparison DNAs.
  • microarrays for each of the sample DNA and the reference or comparison DNA.
  • This particular embodiment can use the same or different labels as between the sample DNA and the reference or comparison DNA, but it must use separate microarrays.
  • Methods of the invention are based on the use of oligonucleotide microarrays, or, equivalently as used herein, simply microarrays.
  • microarrays have been described in the art and refer, generally, to high-density two-dimensional arrays of oligonucleotide probes printed or otherwise fixed onto solid substrate, e.g., a glass slide, such that each position can be assigned to a particular probe sequence.
  • Typical microarrays each include thousands to hundreds of thousands of individual probes.
  • spotted microarrays are used.
  • the probes are oligonucleotides, cDNA, or small fragments of PCR products corresponding to mRNAs.
  • This type of array is typically hybridized with DNA from two samples to be compared (e.g., enriched and random) that are labeled with two different fluorophores.
  • the samples can be mixed and hybridized to one single microarray that is then scanned, allowing the determination of hybridization in one go.
  • Spotted microarrays are commercially available from Eppendorf (Eppendorf Biochip Systems, Westbury, NY), NimbleGen Systems (Madison, WI), and other manufacturers.
  • the oligonucleotide microarrays are used.
  • the probes are designed to match parts of the sequence of known or predicted mRNAs.
  • GE Healthcare Waukesha, WI
  • Affymetrix Santa Clara, CA
  • NimbleGen Systems Madison, WI
  • Agilent Palo Alto, CA
  • Oligonucleotide arrays can be produced either by piezoelectric deposition with full-length oligonucleotides or by in-situ synthesis.
  • Long oligonucleotide arrays typically are composed of 60-mers and are produced by ink-jet printing on a silica substrate.
  • Short oligonucleotide arrays typically are composed of 25-mer or 30-mer oligonucleotides and are produced by photolithographic synthesis (Affymetrix) on a silica substrate or by piezoelectric deposition on an acrylamide matrix(GE Healthcare).
  • the probes are oligonucleotides.
  • an oligonucleotide has its usual meaning and refers to a linear polymer of nucleotides that is 2 to 100 nucleotides long.
  • an oligonucleotide generally refers to a linear polymer of deoxyribonucleotides, i.e., DNA, that is 2 to 100 nucleotides long.
  • the term nucleotide as used herein refers to a phosphate ester of a nucleoside, and, optionally, a derivative thereof.
  • Naturally occurring deoxyribonucleosides include deoxyadenosine, deoxyguanosine, deoxythymidine, and deoxycytidine. These may be formally represented by dA, dG, dT, and dC, respectively, but are commonly represented simply as A, G, T, and C, respectively.
  • Probes according to particular embodiments of the invention are typically 40 to 85 nucleotides long, although they may be shorter than 40 or longer than 85.
  • the probes are sized to correspond to the length of DNA wound around a nucleosome, i.e., approximately 150 nucleotides long.
  • the probes are 40 to 80 nucleotides long, i.e., any one or more of the following lengths: 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 61,
  • the probes are 50 to 85 nucleotides long, i.e., any one or more of the following lengths: 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
  • the probes are 40 to 60 nucleotides long, i.e., any one or more of the following lengths: 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, and 60.
  • the probes are 50 to 60 nucleotides long, i.e., any one or more of the following lengths: 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, and 60.
  • the probes are 50 nucleotides long.
  • the probes on a microarray are uniform in length, but in one embodiment they are non-uniform in length, either by design or not.
  • Microarrays useful according to the invention are characterized by a plurality of overlapping oligonucleotide probes.
  • overlapping oligonucleotide probes refer to oligonucleotide probes which share some degree of sequence identity.
  • the probes are designed such that for any given probe P there are two "nearest neighbors" that differ from P by an amount of sequence T of length n that is either added to the 5' end and removed from the 3' end of P or, conversely, added to the 3' end and removed from the 5' end of P.
  • each of two “nearest neighbor” probes shares 40 consecutive bases in common with P.
  • one "nearest neighbor” P-I is 50 nucleotides long and includes the first 40 nucleotides of P
  • the other "nearest neighbor” P+l is also 50 nucleotides long but includes the last 40 nucleotides of P.
  • P-I and P+i share 30 nucleotides in common, corresponding to nucleotides 10-40 of P.
  • “nearest neighbor” refers to degree of sequence identity and can but need not also refer to physical proximity on the microarray.
  • the overlapping probes are also designed such that the sequence or sequences of one or more genomic regions of interest generally are spanned by combined overlapping sequences of oligonucleotide probes on the microarray.
  • a genomic region of interest spanned 100 base pairs denoted as 1-100
  • the following 50 nucleotide long probes with 10 nucleotide tiling could span the entire genomic region of interest: Pl, including 1-10; P2, including 1-20; P3, including 1-30; P4, including 1-40; P5, including 1-50; P6, including 10-60; P7, including 20-70; P8, including 30-80; P9, including 40-90; PlO, including 50-100; PU, including 60-100; P12, including 70-100; P13, including 80-100; and P14, including 90-100.
  • Overlapping oligonucleotide probes of specified lengths are tiled every certain number of base pairs across at least one genomic region of interest.
  • the tiling is generally every 1 to 25 base pairs, but it can be shorter or longer.
  • the probes are tiled every 1 to 25 base pairs, i.e., any one or more of the following number of base pairs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, and 25.
  • the probes are tiled every 1 to 15 base pairs, i.e., any one or more of the following number of base pairs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15.
  • the probes are tiled every 5 to 15 base pairs, i.e., any one or more of the following number of base pairs: 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15. In one embodiment the probes are tiled every 10 to 20 base pairs, i.e., any one or more of the following number of base pairs: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, and 20. In one embodiment the probes are tiled every 10 to 15 base pairs, i.e., any one or more of the following number of base pairs: 10, 11, 12, 13, 14, and 15. In one embodiment the probes are tiled every 10 base pairs. Generally the probes on a microarray are uniform in tiling, but in one embodiment they are non-uniform in tiling, either by design or not.
  • a genomic region of interest refers to at least a portion of a genome that is desired to be studied.
  • a genomic region of interest is a complete genome.
  • a genomic region of interest is a portion of a genome.
  • a genomic region of interest is a complete chromosome.
  • a genomic region of interest is a portion of a chromosome.
  • the genomic region of interest will generally have a nucleotide sequence that is known.
  • the genomic region of interest can in addition be known to have or be suspected to have at least one protein-DNA complex that is desired to be studied.
  • the genomic region of interest can be known to have or be suspected to have at least one positioned nucleosome or at least one nucleosome-free region that is desired to be studied.
  • a genomic region of interest can include at least one known or suspected promoter region for a gene.
  • the TSS distance to miRNAs is variable, ranging from a few hundred bases to at least as much as 20 kilobases (20kb) upstream. Some promoters are believed to be at least 30kb upstream, and yet others are believed to be as much as 1 megabase (1Mb) upstream. While a majority of distances from TSS to miRNA are believed to fall within a 20kb window, methods of the invention embrace the use of overlapping oligonucleotide probes tiled every 1 to 25 base pairs across at least one genomic region of interest, wherein the genomic region of interest includes a sequence encoding a known microRNA or a predicted microRNA and lkb downstream to 1Mb upstream of the microRNA.
  • methods of the invention embrace the use of overlapping oligonucleotide probes tiled every 1 to 25 base pairs across at least one genomic region of interest, wherein the genomic region of interest includes a sequence encoding a known microRNA or a predicted microRNA and lkb to 20kb upstream of the microRNA.
  • genomic DNA can also include stretches of sequence with high GC content which can reduce sequence-specificity of hybridization owing to the relatively high stability of G-C base pairing compared to A-T base pairing. Areas of high GC content, i.e., GC-rich sequences, thus are generally poor in terms of the information content they yield, and they can contribute to noise in data derived from microarray methods. It can be advantageous to mask GC-rich sequences when practicing the methods of the invention.
  • a GC-rich sequence refers to a nucleotide sequence for which G and C together represent greater than 50 percent of all the nucleotides present in the sequence.
  • a GC-rich sequence refers to a nucleotide sequence having at least 10 consecutive G and C nucleotides.
  • a GC-rich sequence refers to a nucleotide sequence having at least 15 consecutive G and C nucleotides.
  • a GC-rich sequence refers to a nucleotide sequence having at least 20 consecutive G and C nucleotides.
  • a probe 50 nucleotides long that contains 30 G and 20 C is said to have a GC-rich sequence because 100 percent of its nucleotides are G and C.
  • a probe 50 nucleotides long that contains 25 G, 20 C, and 5 A and/or T is said to have a GC-rich sequence because 90 percent of its nucleotides are G and C.
  • a GC-rich sequence is a sequence that has a Tm that is greater than a predetermined desired Tm. Tm may be calculated using the formula
  • Tm (°C) 64.9 + 41(GC-14.9)/(GC + AT) where GC is the number of G and C nucleotides and AT is the number of A and T nucleotides.
  • a 50-mer with a Tm greater than 76 0 C is a GC-rich sequence when GC is at least 29.
  • genomic regions of interest with GC-rich sequences can be masked simply by omitting from the microarray probes having GC-rich sequence.
  • GC-rich sequences can alternatively or in addition be accomplished by omitting from analysis data obtained from probes having GC-rich sequences.
  • the microarray is prepared so as to include GC-rich sequences, but such probes are simply excluded, as it were, at the step of data analysis.
  • Masking can alternatively or in addition be applied to repetitious elements, also known as repetitive DNA sequences. It has been known for many years that many genomes, including the human genome, contain stretches of highly repetitive DNA sequences. An unusually high percent of the human genome, at least 50 percent, is believed to be repetitive in nature. By comparison, repeats account for just 1.5 percent of bacterial genomes and only about 3 percent of fly genomes.
  • Repetitive DNA sequences generally can include simple repeats, tandem repeats, segmental duplications, and interspersed repeats.
  • Simple repeats are duplications of simple sets of DNA bases, typically 1-5 bp) such as A, CA, CGG, etc. Tandem repeats typically occur at centromeres and telomeres and involve duplications of 100-200 base sequences.
  • Segmental duplications are large blocks of DNA, e.g., 10-300 kb, which are copied in another region of the genome.
  • Interspersed repeats include processed pseudogenes, retrotranscripts, DNA transposons, retrovirus retrotransposons, and non-retrovirus retrotransposons.
  • the microarray is prepared so as to include repeated sequences, but such probes are simply excluded, as it were, at the step of data analysis.
  • Repetitious elements can, but need not necessarily, be GC-rich sequences, and conversely GC-rich sequences can, but need not necessarily, be repetitious elements.
  • Methods of the invention include the step of determining a level of hybridization of isolated DNA with the oligonucleotide microarray. This step can be performed using any suitable method, including use of methods and devices known in the art for scanning microarrays.
  • a microarray scanner typically includes at least two lasers tuned to suitable dye excitation wavelengths, an image acquisition system, and a computer for data storage. Laser scanning at two wavelengths can be performed simultaneously or sequentially. The resolution of the scanner is compatible with the microarray printing resolution. The scanner produces a JPEG file, where each pixel indicates the intensity of signal for that pixel on the microarray. The exact position of the spot is determined and signal at that spot is associated with the particular probe sequence at that position.
  • the microarray scanner is a GenePix 4000B Scanner available from Molecular Devices Corporation (Sunnyvale, CA).
  • the strength of a fluorescence signal at a given wavelength at a given position thus can be used to measure the level of hybridization of isolated DNA with a given probe, and hence a given sequence, included in the microarray.
  • the strength of a fluorescence signal at a given wavelength at each and every given position thus can be used to measure the level of hybridization of isolated DNA with each and every probe, and hence each and every sequence, included in the microarray.
  • the information so derived can be used to identify the position of a protein-DNA complex on genomic DNA.
  • sample DNA enriched for DNA specifically isolated from protein-DNA complexes will hybridize to a greater extent to probes specific to that DNA than will randomly fragmented genomic DNA.
  • a protein-DNA complex is well positioned on genomic DNA across a population of cells, that is to say, if the complex is localized to the same or essentially the same position in at least a large number or percentage of cells within a population of cells, then the level of hybridization is increased to probes which correspond to DNA sequences found in the well positioned protein-DNA complexes.
  • the position of the protein-DNA complex is then determined by converting the hybridization pattern information into sequence information using the one-to-one correspondence between probe position and probe sequence.
  • DNA enriched for DNA specifically isolated from protein-DNA complexes will be indistinguishable from the hybridization pattern for randomly fragmented DNA. It should also be noted that DNA enriched for DNA specifically isolated from protein-DNA complexes will hybridize to probes not specific to that DNA to a lesser extent than will randomly fragmented genomic DNA. This can be understood by considering that DNA that was not originally associated with protein-DNA complexes is removed during the isolation step, such that compared to randomly fragmented input genomic DNA 5 the enriched DNA contains little DNA that was not originally associated with protein-DNA complexes.
  • a logarithm of this same ratio is expressed for each of these probes, the logarithm for probe N will be a positive number and the logarithm for probe NFR will be a negative number.
  • Methods according to the invention can be used to identify an expressed gene. Using methods as described above to identify positioned nucleosomes on genomic DNA 3 an expressed gene can be identified when the promoter region for a gene that is included within the genomic region of interest is determined to be free of nucleosomes.
  • a promoter region is a continuous segment of DNA that includes at least one promoter. Promoters are components of sequence-specific gene control regions which are involved in the control of transcription initiation.
  • Promoters generally include sequences to which general transcription factors and RNA polymerase assemble to initiate transcription. In contrast to enhancers, which may occur either 5 ' or 3' to and tens of thousands of base pairs distant from the coding region of a gene, promoters are typically positioned within about one thousand base pairs 5' to the coding region of a gene. Promoters typically include a TATA box, a short sequence of T-A and A-T base pairs that is recognized by the general transcription factor TFIID. Once TFIID binds to the TATA sequence, other general transcription factors, along with RNA Pol II, assemble into a complex with TFIID in the promoter region in preparation for gene transcription.
  • the expressed gene is an expressed human gene.
  • an expressed human gene is an expressed gene in a human genome. Expression can be confirmed using any suitable technique, many examples of which are common in molecular biology. Methods useful for measuring gene expression can include, for example and without limitation, measurement of transcribed message, e.g., using reverse transcriptase-polymerase chain reaction (RT-PCR), and measurement of translated polypeptide product, e.g., using polypeptide-specific enzyme-linked immunosorbent assay (ELISA).
  • RT-PCR reverse transcriptase-polymerase chain reaction
  • ELISA polypeptide-specific enzyme-linked immunosorbent assay
  • Methods according to the invention can be used to generate a gene expression profile for a cell or for a population of cells.
  • a gene expression profile refers to an inventory concerning expression status of one or more genes, usually a plurality of genes, associated with a particular cell or population of cells.
  • microarray to its potential in terms of the quantity of individual sequences that can be assayed in parallel, it is possible not only to determine the expression status of single genes but also to generate a profile of a plurality of genes that are expressed. As described in the Examples below, thousands of genes can be assessed in parallel using a single microarray. Having determined the expression status for each gene represented on the microarray, the expression information taken as a whole can be used to characterize or profile, in a global fashion, the cell or population of cells from which the genomic DNA derives.
  • the invention in some aspects makes use of the gene expression profile obtained using methods according to the invention in order to compare a cell or a population of cells to another cell or another population of cells.
  • methods of the invention can be used to determine if cells from two samples are the same types of cells, e.g., both melanoma cells, or different types of cells, e.g., melanoma cells and breast cancer cells.
  • primary cells and cancer cell lines cluster in distinct classes depending on their tissue of origin, suggesting that nucleosome positioning is lineage-specific and that parental chromatin state is conserved during malignant transformation.
  • a test cell in one embodiment refers to a single cell that is subjected to a test condition. In an alternative embodiment, a test cell refers to a population of cells that is subjected to a test condition.
  • a reference cell in one embodiment refers to a single cell that is not subjected to a test condition. In an alternative embodiment, a reference cell refers to a population of cells that is not subjected to a test condition. Methods according to the invention can be used to generate a nucleosome positioning pattern for a cell or for a population of cells.
  • a nucleosome positioning pattern refers to an inventory concerning positions for one or more nucleosomes, usually a plurality of nucleosomes, associated with a particular cell or population of cells.
  • the positioning information taken as a whole can be used to characterize or profile, in a global fashion, the cell or population of cells from which the genomic DNA derives.
  • the invention in some aspects makes use of the nucleosome positioning pattern obtained using methods according to the invention in order to compare a cell or a population of cells to another cell or another population of cells. For example, it has been discovered that methods of the invention can be used to determine if cells from two samples are the same types of cells, e.g., both melanoma cells, or different types of cells, e.g., melanoma cells and breast cancer cells.
  • nucleosome positioning is lineage-specific and that parental chromatin state is conserved during malignant transformation.
  • at least one population of cells represents normal cells.
  • a normal cell refers to a cell obtained from a nondiseased tissue, e.g., a noncancerous tissue.
  • the normal cell is a nontransformed cell.
  • the normal cell is a transformed cell, e.g., an immortalized cell derived from nondiseased tissue.
  • Normal cells can be obtained using any suitable method and can include, for example, biopsy, primary tissue culture, and proprietary or commercial cell lines.
  • At least one population of cells represents cells characteristic of a disease.
  • a cell characteristic of a disease is a cell that has one or more features that are characteristic of the disease.
  • a feature that is characteristic of a disease can include, without limitation, expression of a cell surface molecule not present on a normal cell, reduced or no expression of a cell surface molecule usually present on a normal cell, expression of an intracellular molecule not present in a normal cell, reduced or no expression of an intracellular molecule usually present in a normal cell, a morphologic feature not present in a normal cell, and absence of a morphologic feature usually present in a normal cell.
  • a reference cell is a cancer cell.
  • a cancer cell is any cell that has unregulated cell growth.
  • Cancer cells include cells from solid cancers (cancerous tumors, e.g., breast cancer and melanoma) as well as cells from hematologic cancers (e.g., leukemia and lymphoma).
  • a cancer cell is a cell that is obtained or derived from a cancerous tumor, including from a metastasis of a malignant tumor.
  • a cancer cell is derived from a cancer cell line.
  • Cancers include, but are not limited to, basal cell carcinoma, biliary tract cancer; bladder cancer; bone cancer; brain and CNS cancer; breast cancer; cervical cancer; choriocarcinoma; colon and rectum cancer; connective tissue cancer; cancer of the digestive system; endometrial cancer; esophageal cancer; eye cancer; cancer of the head and neck; gastric cancer; intra-epithelial neoplasm; kidney cancer; larynx cancer; leukemia; liver cancer; lung cancer (e.g.
  • lymphoma including Hodgkin's and Non-Hodgkin's lymphoma; melanoma; myeloma; neuroblastoma; oral cavity cancer (e.g., lip, tongue, mouth, and pharynx); ovarian cancer; pancreatic cancer; prostate cancer; retinoblastoma; rhabdomyosarcoma; rectal cancer; renal cancer; cancer of the respiratory system; sarcoma; skin cancer; stomach cancer; testicular cancer; thyroid cancer; uterine cancer; cancer of the urinary system, as well as other carcinomas and sarcomas.
  • lymphoma including Hodgkin's and Non-Hodgkin's lymphoma
  • melanoma myeloma
  • neuroblastoma e.g., oral cavity cancer (e.g., lip, tongue, mouth, and pharynx)
  • ovarian cancer pancreatic cancer
  • prostate cancer retinoblastoma
  • a test cell is a cell suspected of being a cancer cell.
  • a cell suspected of being a cancer cell is a cell that has not yet been identified to be a cancer cell but for which there is some clinical reason to suspect the cell may be a cancer cell.
  • a number of genes have been associated with specific types of cancer. Examples include PTC, BRCAl, BRCA2,pl6, APC, RB, WTl, EXTl, p53, NFl, NF2, TSC2, ret, and VHL, to name but a few.
  • the methods of the invention can be used to diagnose other genetic diseases.
  • Methods of the invention can also be used to compare cells of an unknown type to cells of a known type. For example, it is not uncommon for patients to present with so-called metastatic cancer of unknown origin. These are frequently, but not exclusively, adenocarcinomatous tumors presenting in lung, brain, breast, liver, or other tissue, where no primary tumor has been identified. Using methods of the invention it is possible to compare such cancer cells with either normal cells of various known tissue type origins or cancer cells of known origin, thereby to identify the likely type of tissue of origin. Methods of the invention can also be used to compare cells from a first subject to cells from a second subject. Such methods may be useful in determining genetic relatedness between the first subject and the second subject.
  • Methods of the invention can also be used to screen for an effect of a test agent or a pharmaceutical agent.
  • a test cell is contacted with a test agent, and a gene expression profile of the contacted test cell is compared to a gene expression profile of a reference cell that has not been contacted with the test agent. An observed difference between the gene expression profile of the contacted test cell and the reference cell is then associated with an effect of the test agent.
  • a test cell is contacted with a pharmaceutical agent, and a gene expression profile of the contacted test cell is compared to a gene expression profile of a reference cell that has not been contacted with the pharmaceutical agent. An observed difference between the gene expression profile of the contacted test cell and the reference cell is then associated with an effect of the pharmaceutical agent.
  • the test cell is contacted with an anti-cancer therapy.
  • Anti- cancer therapies include cancer medicaments, radiation, and surgical procedures.
  • an anti-cancer therapy specifically includes cancer medicaments and radiation.
  • a "cancer medicament” refers to an agent which is administered to a subject for the purpose of treating a cancer.
  • treating cancer includes preventing the development of a cancer, reducing the symptoms of cancer, and/or inhibiting the growth of an established cancer.
  • cancer medicaments are classified as chemotherapeutic agents, histone deacetylase inhibitors, immunotherapeutic agents, cancer vaccines, hormone therapy, and biological response modifiers.
  • Chemotherapeutic agents include, without limitation, methotrexate, vincristine, adriamycin, cisplatin, non-sugar containing chloroethylnitrosoureas, 5-fluorouracil, mitomycin C, bleomycin, doxorubicin, dacarbazine, taxol, fragyline, Meglamine GLA, valrubicin, carmustaine and poliferposan, MMI270, BAY 12-9566, RAS farnesyl transferase inhibitor, farnesyl transferase inhibitor, MMP, MTA/LY231514, LY264618/Lometexol, Glamolec, CI-994, TNP-470, Hycamtin/Topotecan, PKC412, Valspodar/PSC833, Novantrone/Mitroxantrone, Metaret/Suramin, Batimastat, E7070, BCH-4556, CS-682, 9-AC, AG
  • Lomustine (CCNU), Mechlorethamine HCl (nitrogen mustard), Mercaptopurine, Mesna, Mitotane (o.p'-DDD), Mitoxantrone HCl, Octreotide, Plicamycin, Procarbazine HCl, Streptozocin (streptozotocin), Tamoxifen citrate, Thioguanine, Thiotepa, Vinblastine sulfate, Amsacrine (m-AMSA), Azacitidine, Erthropoietin, Hexamethylmelamine (HMM), Interleukin 2, Mitoguazone (methyl-GAG; methyl glyoxal bis- guanylhydrazone; MGBG), Pentostatin (2'deoxycoformycin), Semustine (methyl- CCNU), Teniposide (VM-26) and Vindesine sulfate.
  • CCNU Mechlorethamine HCl (nitrogen mustard), Mercapto
  • Immunotherapeutic agents include, without limitation, Ributaxin, Herceptin, Quadramet, Panorex, IDEC- Y2B8, BEC2, C225, Oncolym, SMART Ml 95, ATRAGEN, Ovarex, Bexxar, LDP-03, ior t6, MDX-210, MDX-11, MDX-22, OV103, 3622W94, anti-VEGF, Zenapax, MDX-220, MDX-447, MELIMMUNE-2, MELIMMUNE-I 5 CEACIDE 5 Pretarget, NovoMAb-G2, TNT 5 Gliomab-H, GNI-250, EMD-72000, LymphoCide, CMA 676, Monopharm-C, 4B5, ior egf.r3, ior c5, BABS, anti-FLK-2, MDX-260, ANA Ab 5 SMART IDlO Ab 5 SMART ABL 364 Ab 5 and ImmuRAIT-CEA.
  • Cancer vaccines include, without limitation, EGF, Anti-idiotypic cancer vaccines, Gp75 antigen, GMK melanoma vaccine, MGV ganglioside conjugate vaccine, Her2/neu 5 Ovarex, M-Vax, O-Vax, L-Vax, STn-KHL theratope, BLP25 (MUC-I) 5 liposomal idiotypic vaccine, Melacine, peptide antigen vaccines, toxin/antigen vaccines, MVA- based vaccine, PACIS, BCG vacine, TA-HPV, TA-CIN 5 DISC-virus, and ImmuCyst/TheraCys.
  • the pharmaceutical agent is a histone deacetylase (HDAC) inhibitor.
  • HDAC inhibitors include, without limitation, sodium butyrate, trichostatin A 5 suberoylanilide hydroxamic acid (SAHA), 3-(4-dimethylaminophenyl)-N-hydroxy-2- propenamide (IN-2001), valproic acid, suberoylanilide hydroxamic acid, and apicidin.
  • T47D, MCF7, A375, MALME, and IMR90 cells were grown and maintained according to the directions from American Type Culture Collection.
  • Human mammary epithelial cells (MECs) were obtained (Cambrex) and cultured according to the instructions provided by the company.
  • An optimized micrococcal nuclease (MNase) digestion protocol was used based on the protocol used by Chen et al. Chen C et al. (200I) MoI. Cell. Biol. 21(22):7682-95.
  • the cells (grown to 60-70% confluence) were trypsinized and washed with Solution A (300 rnM sucrose, 60 mM KCl, 35 mM HEPES [pH 7.4], 5 mM K 2 HPO 4 , 5 mM MgCl 2 , 0.5 mM CaCl 2 ) gently once.
  • the cells were then resuspended in Solution B (300 mM sucrose, 60 mM KCl, 15 mM NaCl, 35 mM HEPES [pH 7.4], 5 mM K 2 HPO 4 , 5 mM MgCl 2 , 3 mM CaCl 2 ) in 1.5 ml volume.
  • NP-40 was added to 0.05% and mixed by pipetting. 25 U or 200 U of micrococcal nuclease (Worthington Biochemicals) was added to each reaction and incubated at room temperature for 5 minutes. The reaction was stopped by adding 0.5 ml Solution C (100 mM EDTA, 4% SDS). The samples were then treated with RNAse A (0.1 mg/ml) for 1 hour at 37 0 C and proteinase K (0.1 mg/ml) overnight at 50 0 C. DNA was purified by phenol/chloroform extractions and ethanol precipitation.
  • the isolated DNA was run on a 1.5% agarose gel, and the band corresponding to mononucleosomes ( ⁇ 150 base pairs) was gel-extracted using QiaQuick Gel Extraction Kit (Qiagen).
  • QiaQuick Gel Extraction Kit Qiagen
  • Low-level digestion led to 0.015% of the entire DNA to become mononucleosomal DNA, whereas this number was 0.09% for high-level digestion.
  • Equal amounts of mononucleosomal DNA from low and high level digestions were combined in order to have an equal representation of genomic regions that have variable sensitivity to MNase digestion.
  • Genomic DNA was isolated using DNeasy Tissue Kit (Qiagen) and digested with hydroxyl radical-based reaction in a sequence-independent manner as described by Zhang et al. Zhang Y et al. (2001) Nucleic Acids Res. 29(13):E66-6. Reaction timing was adjusted to have a final DNA size distribution of 100-200 base pairs (bp). The reactions were cleaned with QiaQuick Gel Extraction Kit (Qiagen).
  • the data also contained some high- frequency noise, possibly caused by the non-isothermal nature of the probes.
  • nonlinear wavelet denoising with soft-thresholding was used, with statistical estimation of the threshold depending on the standard deviation of the noise.
  • Wavelet decomposition followed by data-driven thresholding of empirical wavelet coefficients provides an effective non-parametric regression approach to noise reduction in signals.
  • Coiflet wavelets and scaling functions (Daubechies I (1988) Comm. Pure andAppl. Math. 41 :909-96; Daubechies I (1992)
  • SIAM are particularly well suited for this purpose, because they are nearly symmetric and have nice convergence properties when sampled data are used in discrete time discrete wavelet transformations.
  • the noise in the hybridization signal appeared to depend on the GC content of the probes and the random size of DNA fragments; since the neighboring probes overlapped by 10— 40 bp, nonwhite noise was observed with weak autocorrelation.
  • soft-thresholding with level-dependent threshold estimates obtained from a hybrid model of minimizing Stein's Unbiased Risk Estimate (SURE; Stein CM (1981) Annals of Statistics 9:1135-51) and universal threshold was used.
  • Coiflet4 wavelet decomposition and thresholding at level 2 effectively removed much of the high frequency noise, while maintaining the salient features of the signal under study; denoising at higher levels distorted the signal too much.
  • the Wavelet Toolbox in Matlab was used to process the data.
  • FIG. 5 shows the effect of wavelet denoising on the Input/Input data. It can be seen that wavelet denoising greatly reduced the variance of random noise.
  • Example 7 Further analysis of the Input/Input experiment is presented in Example 7. In order to be able to compare the experiments in different cell lines, the wavelet- denoised data from the 8 samples were first quantile normalized (Bolstad BM et al. (2003) Bioinformatics 19(2): 185— 93) after a Iog2 transform and then globally scaled so that the median of the probe ratios was set to 1.0. Average signals of the final processed data are shown in FIG. 6. As shown in FIG.
  • G(O, ⁇ ) denote a normalized Gaussian function with mean 0 and standard deviation ⁇ , in units of base pairs. Then, one first convolves the signal /with a Gaussian filter G(O, ⁇ ) before taking derivatives, so that
  • Peak-to-trough ratios were computed as follows: the peak height was taken as the average of three highest probes, and the trough height as max (average of 3 left-most probes, average of 3 right-most probes). This method gave a very conservative definition of peak and trough heights and eliminated many false positives. Since the length of DNA looping around a nucleosome is 147 bp, the minimum and maximum numbers of probes in a peak were required to be 10 and 20, respectively. The peaks detected under this requirement had on average 13.5 probes with standard deviation 2.5, corresponding to 175 ⁇ 25 bp after taking the probe length into account.
  • FIG. 7 shows an example of raw and wavelet denoised signals, together with all the peaks detected by the algorithm.
  • Example 5 Percentage of Promoter DNA with Positioned Nucleosomes To approximate the percentage of promoter DNA with positioned nucleosomes, all peaks with a minimum height of 0.5 Cy5/Cy3 ratio and width between 10 and 20 probes were found. The percentage of tiled sequences inside those peaks was then computed as a function of minimum peak-to-trough ratios (See FIG. 8). FIG. 8 shows percentage of promoter DNA with positioned nucleosomes.
  • FIG. 10 shows the auto-correlation coefficients for 100 randomly selected promoters. As shown by FIG. 10, the randomness of genomic input was demonstrated by an insignificant auto-correlation in the Input/Input experiment.
  • the real-time PCR primers used are provided in Table 1.
  • the primers were designed to amplify either the regions identified as peaks or linker regions.
  • the amplicon sizes were kept between 60— 110 bp to have sufficient resolution to resolve nucleosome and linker DNA regions.
  • Quantitative real-time PCR was performed with 1 ng ChIPed DNA and 10 ng of total genomic DNA using iCycler and SYBR green iQ reagent (Bio-Rad).
  • the threshold cycle values calculated automatically by the iCycle iQ Real-Time Detection System Software (Bio-Rad) were used to estimate the fold enrichment of the tested peak or trough region in immunoprecipitated DNA over the unenriched genomic DNA as described.
  • FIG. 18A 5 trough regions are significantly more conserved than the regions identified as peaks in the CCNI promoter.
  • histone H3 ChIPs at nucleosome resolution showed the presence of more histone proteins in the identified peak regions compared to trough regions.
  • FIG. 18C shows that while a chromatin immunoprecipitation experiment was able to confirm the presence of PICs in the CCNI promoter region in IMR90, this particular experiment did not have sufficient resolution to determine which region in the promoter was bound by the PIC.
  • FIG. 18D indicates that there was a uniform distribution of histone H3. This is expected because the treatment with 200 U of MNase resulted in partial digestion of chromatin and mononucleosomes were only a small fraction of the total chromatin. Then, anti-HA and anti-TAFl ChIPs were performed and the enrichment of TAFl immunoprecipitation measured (FIG.
  • y-axis represents the relative enrichment of the indicated peak/trough region in anti-TAFl pulldown relative to anti-HA pulldown).
  • more TAFl enrichment was consistently observed in the linker region at —600 compared to the peak at —700.
  • more histone occupancy was observed in the peak at -700 compared to the trough at —600.
  • local sequence analysis indicated that there was a candidate TATA box element at —736 (no TATA box element was identified in the PIC binding site identified by Kim et al.
  • FIG. 19A shows the CBLLl promoter in IMR90.
  • FIG. 19B shows histone H3 ChIPs at nucleosome resolution. Black bar indicates the location of PIC binding site found by Kim et al. (2005) Nature 436:876—80.
  • Chromatin immunoprecipitations confirmed the presence of a PIC in this promoter in IMR90 (data not shown). However, no PIC association was detected to this region by fixing the cells after MNase digestion (as described in FIG. 18E), possibly because the PIC in this promoter might not have remained associated with chromatin during the nucleosome isolation procedure.
  • FIG. 2OA shows the UGDH promoter in IMR90.
  • FIG. 2OB shows histone H3 ChIPs at nucleosome resolution.
  • FIG. 2OC shows anti-TAFl ChIPs performed as described for FIG. 18E by fixing the cells after MNase digestion.
  • the PIC was associated with either the binding site identified by Kim et al. (2005) Nature 436:876-80 (black bar) overlapping with one of our peaks (centered at —950), or the neighboring trough region centered at —820.
  • RNA from A375 was extracted in biological duplicates with Trizol ® reagent
  • the IMR90 expression profile was obtained from Kim TH et al. (2005) Nature 436:876-80 and MALME from Garraway LA et al. (2005) Nature 436:117.
  • the Affymetrix probes were remapped to RefSeq genes by using the new annotation described in Dai M et al. (2005) Nucleic Acids Res. 33(20) el75. After discarding the genes having multiple probesets with conflicting present/absent calls, the expression status of about 2/3 of the genes could be determined in the in vivo footprinting assay of the invention.
  • the binding sites of PIC in IMR90 (Kim TH et al. (2005) Nature 436:876-80) were also mapped to the promoters on the tiling array. In IMR90, 610 of the 1181 expressed genes and 159 of the 1177 unexpressed genes had PIC binding sites. The precise numbers of expressed and unexpressed genes are given in Table 3.
  • Table 3 A Number of tiled promoters of expressed and unexpressed genes.
  • Table 3B Number of tiled promoters with unmasked transcription start sites.
  • the promoters of expressed and unexpressed genes were aligned at transcription start sites, and the Iog2 ratios of Cy5 (treatment) and Cy3 (input) signals were averaged at each location.
  • the average signals from expressed genes all displayed a dip at transcription start sites, indicating the existence of free chromatin structure accessible to MNase. See FIG. 21, which shows that the transcription start sites of expressed genes were clearly more sensitive to MNase digestion than those of unexpressed genes.
  • the p- values are shown in FIG. 23, along with the percentage of expressed genes having qualifying troughs.
  • the peaks on expressed and unexpressed promoters are characterized.
  • FIG. 25A shows the scatter plot of peak-to-trough ratios versus expression values of the A375 expressed genes on which the peaks were found.
  • FIG. 25B shows the scatter plot of peak-to-trough ratios versus expression values of the A375 unexpressed genes on which the peaks were found.
  • the lowess curves in each of these figures show that there was no significant correlation between peak-to-trough ratios and expression values.
  • FIG. 26 shows the lowess curves for scatter plots of peak-to-trough ratios of all peaks and corresponding locations in IMR90. There was a noticeable hump at -250 on expressed promoters, which suggested that the peaks there were more pronounced and thus that the corresponding nucleosomes were very well positioned. While FIG. 22 shows that more peaks are likely to occur near transcription start sites, FIG. 26 shows that the peaks near TSS are also more pronounced.
  • MALME cells were grown to 60—70% confluence. Chromatin immunoprecipitation was performed following the protocol used by Du et al. (2004) Cancer Cell 6(6):565— 76. Then, 1 ng ChIPed sample and 1 ng unenriched ChIP input DNA were amplified side-by-side using the ligation-mediated PCR methodology described in Kim TH et al. (2005) Nature 436:876-80. Amplified DNAs were biotin- labeled (Carroll JS et al. (2005) Cell 122:33-43) and submitted to Dana-Farber Cancer Institute Microarray Core where they were hybridized to Affymetrix Promoter Tiling Arrays.
  • FIG. 27 EGFR promoter in MCF7 and T47D.
  • the dotted blue line shows the PhastCons conservation scores. Transcription binding sites are shown as small rectangles: VDRE (Blue), SPl (Orange), and an unknown transcription factor (Violet). The SPl binding site is seen to be located in the central trough region, while VDRE is located 8 bp to the left. The binding sites are relatively well conserved and nucleosome- free. It was shown that VDR and an unknown protein can displace SPl and thereby down-regulate EGFR. McGaffm KR et al. (2005) J. MoI. Endocrinol. 35(1): 117-33.
  • EGFR Epidermal Growth Factor Receptor
  • VDR vitamin D receptor
  • FIG. 27 shows the binding sites of three transcription factors (VDRE, SPl, and one unknown) on the EGFR promoter in MCF7 and T47D.
  • VDRE three transcription factors
  • the most salient features from the in vivo footprinting data namely the well-defined peaks, were extracted.
  • all detectable peaks with width between 10 to 20 probes were ranked within each sample according to their peak-to-trough ratios. Accordingly, the higher the peak-to-trough ratio, the higher the quality of the peak.
  • Si — ⁇ pik ⁇ k ' ⁇ '. ⁇ denote the set of N ⁇ peaks under consideration in i-th sample
  • Si ⁇ peaks with peak-to-trough ratios greater than x-quantile in sample i ⁇ for any X * ' ' " , corresponding to 2300 ⁇ 3300 peaks. 3. Similar peaks with a minimum height of 0.5.
  • the tiled 3692 promoters were oriented and aligned at their transcription start sites and the regions [-1250, 1250] considered. Using unmasked sequences for those regions, the average number of GC count was obtained at each position. The average GC content of 18,000 RefSeq promoters was seen to be quite similar; separate A, T, G, C counts were also determined. Interestingly, the separate counts of A 5 T 5 G, and C were quite different in the regions near and after transcription start sites. See FIG. 28 and FIG. 29. The same phenomenon was recently reported by Saxonov S et al. (2006) Proc. Natl. Acad. Sd. USA 103:1412-7.
  • FIG. 28 shows GC content and PhastCons scores averaged over all 3692 promoters. As can be seen from this figure, the average GC content increases dramatically as one approaches TSS but decreases sharply right at TSS.
  • FIG. 29A shows average GC content for 17993 RefSeq genes.
  • FIG. 29B shows separate average A, T, G, and C contents.
  • FIG. 29C shows a zoomed-in version of FIG. 29B.
  • the 3692 promoters were oriented and aligned at their transcription start sites, and the PhastCons scores (Siepel A et al. (2005) Genome Res. 15:1034-50) were averaged at each position. See FIG. 28.
  • FIG. 30 shows the cumulative histograms of PhastCons scores for 2000 best peaks and troughs in A375, and the difference was significant (p ⁇ 10 ⁇ 10 using Wilcoxon rank sum test).
  • RNA polymerase II RNA polymerase II
  • RNAPIII RNA polymerase III
  • the TSSs of 175 miRNAs were determined to a resolution of 150bp.
  • Pri-miR-21 transcript has been cloned from HeLa cells. Cai X et al. (2004) RNA 10:1957. The TSS identified is located in a nucleosome-depleted and RNAPII, H3K4me3 and H3K4/9 Ac-positive region (FIG. 31). However, based on our analyses, a nucleosome-depleted region 800bp further upstream of this position was more likely to be the miR-21 TSS. To demonstrate this, RT-PCR analyses were performed using primers designed to the 3 1 end of the cloned pri-miR-21 transcript for the RT step and six PCR primer pairs. Primer pairs 1 and 2 were designed to the region downstream of published TSS position at 55270965.
  • Primer pairs 3 and 4 were designed to the region between the two potential TSSs.
  • Primer pairs 5 and 6 were designed to the region upstream of the candidate TSS at 55270243.
  • RNA products were detected upstream of the published position, but not upstream of the candidate position, suggesting that pri- miR-21 transcript may indeed extend further upstream from the published position (FIG. 31).
  • non-overlapping promoter constructs covering the two potential TSSs were cloned and placed upstream of a luciferase reporter. Luciferase expression was detected from the constructs including the candidate, but not the published position.
  • miR-21 TSS may indeed be 800bp upstream of published position.
  • the difference observed may be due to alternative TSS usage in different cell lines.
  • a second example concerns the miR-17-92 cluster, a potential human "oncomir” amplified in B-cell lymphomas. He L et al. (2005) Nature 435:828; Ota A et al. (2004) Cancer Res 64:3087.
  • the expression of miR-17 cluster is regulated by c-myc bound to the first intron of host gene.
  • the promoter for miR-17 cluster is 2kb downstream of the host promoter.
  • To find the TSS for miR-17 cluster non-overlapping 800bp regions surrounding the host and novel TSSs were cloned and placed upstream of a luciferase reporter. Both constructs activated the reporter, suggesting that they both have promoter activity.
  • 1.2kb fragments including c-myc binding region (wild-type or mutated) and the novel TSS were then generated. Mutation of c-myc binding site caused luciferase expression to decrease 4-fold, suggesting that c-myc indeed regulates the activity of this novel promoter.
  • c-myc was knocked-down with siRNA and the levels of transcript in the regions between the novel and host TSSs, and downstream of the novel TSSs, were measured. While there appeared to be no significant change in the region between the two promoters, there was a -2.5 fold decrease in the region downstream of the novel TSS. These results suggest a novel TSS residing 2kb downstream of host TSS.
  • the miR-17 cluster may thus be encoded by transcripts emanating from both TSSs, and c-myc regulation of the miR-17 cluster can happen through the intragenic TSS.
  • miRNA promoters identified using the methods of the invention represent true miRNA promoters
  • the published genome-wide binding data available from few regulatory factors was checked, revealing that their binding sites in 20kb upstream regions of miRNAs fell within lkb of the miRNA TSSs identified using the methods of the invention.
  • four additional miRNA TSSs were validated with RT-PCR analyses.
  • miRNA TSS locations identified were reproducible across these cell lines.
  • Example 24 Identification of Transcription Factors That Regulate MicroRNA Expression Since most functionally-important transcription factors (TFs) bind in close proximity to the TSSs, generally in nucleosome-free regions, having miRNA TSS locations and nucleosome-positioning information provides the opportunity to identify TFs that regulate miRNA expression.
  • TFs transcription factors
  • E-box elements recognized by MITF Hemesath TJ et al. (1994) Genes Dev 8:2770
  • MITF binding can be detected in regions within lkb of miRNA TSSs, and that MITF bound to these regions can indeed regulate the miRNA expression, further verifies the miRNA TSS mapping strategy of the invention.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne des procédés à base de microarray pour identifier les positions de complexes protéine-ADN sur un ADN génomique, comprenant en particulier l'ADN génomique humain. Les procédés peuvent être utilisés pour localiser des nucléosomes positionnés, des complexes pré-initiation, et d'autres complexes protéine-ADN à un niveau global, de manière à permettre la détermination de profil d'expression génétique basée sur l'analyse à base de microarray d'ADN génomique. La présente invention concerne en outre des procédés pour caractériser et comparer différents types de cellules, comprenant des cellules cancéreuses. La présente invention concerne en outre des procédés pour cribler des effets d'agents d'essai et d'agents pharmaceutiques, comprenant des inhibiteurs d'histone désacétylase. La présente invention concerne en outre des procédés pour identifier des sites de début de transcription, des promoteurs actifs sur le plan transcriptionnel, des sites de liaison de facteur de transcription, et des facteurs de transcription impliqués dans la régulation de l'expression génétique, comprenant des microARN.
PCT/US2007/019196 2006-09-01 2007-08-31 Cartographie de structure de chromatine à base de microarray WO2008027548A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US84188506P 2006-09-01 2006-09-01
US60/841,885 2006-09-01

Publications (2)

Publication Number Publication Date
WO2008027548A2 true WO2008027548A2 (fr) 2008-03-06
WO2008027548A3 WO2008027548A3 (fr) 2009-04-16

Family

ID=39136622

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/019196 WO2008027548A2 (fr) 2006-09-01 2007-08-31 Cartographie de structure de chromatine à base de microarray

Country Status (1)

Country Link
WO (1) WO2008027548A2 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106987629A (zh) * 2017-03-31 2017-07-28 上海市第妇婴保健院 一种在单细胞水平上检测基因组上核小体排布的方法
EP3649257B1 (fr) * 2017-07-07 2022-03-30 Nipd Genetics Public Company Limited Enrichissement de régions génomiques ciblées pour analyse parallèle multiplexée
EP3649258B1 (fr) * 2017-07-07 2022-05-04 Nipd Genetics Public Company Limited Analyse parallèle multiplexée enrichie en cible pour l'évaluation d'échantillons d'adn f tal
EP3649260B1 (fr) * 2017-07-07 2022-05-11 Nipd Genetics Public Company Limited Analyse parallèle multiplexée enrichie en cible pour l'évaluation de biomarqueurs tumoraux
EP3649259B1 (fr) * 2017-07-07 2022-05-25 Nipd Genetics Public Company Limited Analyse parallèle multiplexée enrichie en cible pour évaluation du risque pour des troubles génétiques

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050054826A1 (en) * 2003-05-19 2005-03-10 Rosetta Inpharmatics Llc Human diaphanous-3 gene and methods of use therefor
US20050069931A1 (en) * 2002-02-20 2005-03-31 Allis C. David Non-invasive diagnostic test utilizing histone modification markers
US20050136395A1 (en) * 2003-05-08 2005-06-23 Affymetrix, Inc Methods for genetic analysis of SARS virus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050069931A1 (en) * 2002-02-20 2005-03-31 Allis C. David Non-invasive diagnostic test utilizing histone modification markers
US20050136395A1 (en) * 2003-05-08 2005-06-23 Affymetrix, Inc Methods for genetic analysis of SARS virus
US20050054826A1 (en) * 2003-05-19 2005-03-10 Rosetta Inpharmatics Llc Human diaphanous-3 gene and methods of use therefor

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106987629A (zh) * 2017-03-31 2017-07-28 上海市第妇婴保健院 一种在单细胞水平上检测基因组上核小体排布的方法
EP3649257B1 (fr) * 2017-07-07 2022-03-30 Nipd Genetics Public Company Limited Enrichissement de régions génomiques ciblées pour analyse parallèle multiplexée
EP3649258B1 (fr) * 2017-07-07 2022-05-04 Nipd Genetics Public Company Limited Analyse parallèle multiplexée enrichie en cible pour l'évaluation d'échantillons d'adn f tal
EP3649260B1 (fr) * 2017-07-07 2022-05-11 Nipd Genetics Public Company Limited Analyse parallèle multiplexée enrichie en cible pour l'évaluation de biomarqueurs tumoraux
EP3649259B1 (fr) * 2017-07-07 2022-05-25 Nipd Genetics Public Company Limited Analyse parallèle multiplexée enrichie en cible pour évaluation du risque pour des troubles génétiques
EP4116432A1 (fr) * 2017-07-07 2023-01-11 Nipd Genetics Public Company Limited Analyse parallèle multiplexée enrichie en cible pour l'évaluation d'échantillons d'adn f tal
EP4151750A1 (fr) * 2017-07-07 2023-03-22 Nipd Genetics Public Company Limited Analyse parallèle multiplexée enrichie en cibles pour l'évaluation du risque de maladies génétiques

Also Published As

Publication number Publication date
WO2008027548A3 (fr) 2009-04-16

Similar Documents

Publication Publication Date Title
Shigematsu et al. YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs
US20230119938A1 (en) Methods of Preparing Dual-Indexed DNA Libraries for Bisulfite Conversion Sequencing
US20220042090A1 (en) PROGRAMMABLE RNA-TEMPLATED SEQUENCING BY LIGATION (rSBL)
Myllykangas et al. Efficient targeted resequencing of human germline and cancer genomes by oligonucleotide-selective sequencing
US20240158843A1 (en) Accurate and massively parallel quantification of nucleic acid
JP2022528139A (ja) 核酸を解析するための方法および組成物
Mita et al. A novel method, digital genome scanning detects KRAS gene amplification in gastric cancers: involvement of overexpressed wild-type KRAS in downstream signaling and cancer cell growth
Tai et al. High-throughput loss-of-heterozygosity study of chromosome 3p in lung cancer using single-nucleotide polymorphism markers
WO2013123463A1 (fr) Système et procédé de profilage génomique
WO2008027548A2 (fr) Cartographie de structure de chromatine à base de microarray
WO2020219759A1 (fr) Procédés et compositions pour l'enrichissement d'acides nucléiques cibles
Sérandour et al. Single-CpG resolution mapping of 5-hydroxymethylcytosine by chemical labeling and exonuclease digestion identifies evolutionarily unconserved CpGs as TET targets
JP2023065620A (ja) Dna配列のクラスター化のための方法
Nagarajan et al. Methods for cancer epigenome analysis
Lambros et al. High-throughput detection of fusion genes in cancer using the Sequenom MassARRAY platform
JP2009518004A (ja) 核酸ハイブリダイゼーションにおけるCot−1DNAの歪みの低減
MIlANI et al. Detection of alternatively spliced transcripts in leukemia cell lines by minisequencing on microarrays
US11680290B2 (en) Efficient methods and compositions for multiplex target amplification PCR
Devesa-Peiró et al. Molecular biology approaches utilized in preimplantation genetics: real-time PCR, microarrays, next-generation sequencing, karyomapping, and others
US20100021971A1 (en) Method to remove repetitive sequences from human dna
US20230015571A1 (en) Method for diagnosing colorectal cancer by detecting intragenic methylation
Gormus et al. PCR-RFLP and Real-Time PCR techniques in molecular cancer investigations
Fehér et al. Improved DOP-PCR–Based Representational Whole-Genome Amplification Using Quantitative Real-Time PCR
Yim et al. Current Status and Future Clinical Applications of Array-based Comparative Genomic Hybridization
Roy-Chowdhuri et al. Diagnostic Molecular Pathology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07837619

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07837619

Country of ref document: EP

Kind code of ref document: A2