WO2009078939A1 - Methods for identifying nucleotide ligands - Google Patents

Methods for identifying nucleotide ligands Download PDF

Info

Publication number
WO2009078939A1
WO2009078939A1 PCT/US2008/013605 US2008013605W WO2009078939A1 WO 2009078939 A1 WO2009078939 A1 WO 2009078939A1 US 2008013605 W US2008013605 W US 2008013605W WO 2009078939 A1 WO2009078939 A1 WO 2009078939A1
Authority
WO
WIPO (PCT)
Prior art keywords
target molecule
library
random
oligonucleotides
binding
Prior art date
Application number
PCT/US2008/013605
Other languages
French (fr)
Inventor
William Fairbrother
Original Assignee
Brown University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Brown University filed Critical Brown University
Publication of WO2009078939A1 publication Critical patent/WO2009078939A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6811Selection methods for production or design of target specific oligonucleotides or binding molecules
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1048SELEX
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism

Definitions

  • the present invention generally relates to methods of identifying nucleotide ligands, variants of alleles and differences in binding affinities of nucleotide ligands.
  • the invention is a method of identifying a nucleotide ligand that associates with a target molecule, comprising the steps of amplifying a pool of non-random oligonucleotides to form a library of non-random oligonucleotides; contacting the library of non-random oligonucleotides with a target molecule to form an association between the non-random oligonucleotides and the target molecule; and separating the non-random oligonucleotides that associate with the target molecule from the non-random oligonucleotides that do not associate with the target molecule to thereby identify the nucleotide ligand that associates with the target molecule.
  • the invention is a method of identifying a variant of an allele that binds a target molecule, comprising the steps of contacting at least one target molecule with a first pool of non-random oligonucleotides to form a first library of non-random oligonucleotides; and a second pool of non-random oligonucleotides to form a second library of non-random oligonucleotides, wherein each non-random oligonucleotide of the second library of non-random oligonucleotides is an allelic variant of the first library of non-random oligonucleotides; and wherein the first library of non-random oligonucleotides is optionally combined with the second library of non-random oligonucleotides prior to contact with the target molecule; and comparing binding of the target molecule to the first library of non-random oligonucleotides and the second library of non
  • the invention is a method of determining a difference in a binding affinity of a first nucleotide ligand compared to a second nucleotide ligand for a target molecule, wherein the first nucleotide ligand is an allelic variant of the second nucleotide ligand comprising the steps of contacting a first non-random oligonucleotide library of the first nucleotide ligand and a second library of the second nucleotide ligand with the target molecule; and comparing a proportion of the first non-random oligonucleotide library bound to the target molecule with the proportion of the second non-random oligonucleotide library bound to the target molecule, wherein a difference in the proportion of the first non- random oligonucleotide library bound to the target molecule compared to the proportion of the second non-random oligonucleotide library bound to the target molecule indicates a difference in the
  • the invention is a method of determining a binding affinity of a target molecule for a nucleotide ligand, comprising the steps of amplifying a pool of non-random oligonucleotides to form a library of non-random oligonucleotides; contacting the library of non-random oligonucleotides with a target molecule; detecting a nucleic acid sequence in the library of non-random oligonucleotides that binds the target molecule with a first nucleic acid probe, wherein the first nucleic acid probe binds the target molecule and includes a first detectable label; contacting the first nucleic acid probe with a second nucleic acid probe, wherein the second nucleic acid probe binds the target molecule with an affinity different than the first nucleic acid probe and includes a second detectable label that is distinct from the first detectable label, thereby forming a mixture of the first nucleic acid probe and the second
  • the methods of the invention can be employed to identify a nucleotide ligand that associates with a target molecule, a variant of an allele that binds a target molecule, determine a difference in a binding affinity of at least two nucleotide ligands and determining a binding affinity of a target molecule for a nucleotide ligand.
  • Advantages of the claimed invention include, for example, cost effective methods to more accurately identify specific nucleic acid sequences that associate with target molecules that may reflect in vivo interactions between nucleic acid sequences and target molecules.
  • Figure 2 depicts a summary of post transcriptional modifications and the molecular effects of induction of the transcription factor p53.
  • Figure 3 generally illustrates examples of how binding data can be integrated into a model of promoter function. Each manipulation can be vertically integrated result, showing factor binding, competition and the role of modifications in the overall regulation of a promoter.
  • Figure 4 depicts in vivo reporter assay of oligo function.
  • the oligo library is ligated to a promoter-truncated gfp reporter and transfected into 293 cells.
  • PCR amplified reporter expresses an intermediate level of gfp whilst digestion eliminates signal.
  • 293 cells are formaldehyde crosslinked and transcriptionally active fragments are retrieved from lysed cells by immunoprecipitation. After reversing crosslinks, DNA is amplified with Ambion T7 linear amplification kit, target is color labeled and is analyzed by two color array.
  • Figure 5 depicts a method of identifying a nucleotide ligand that associates with a target molecule (also referred to herein as a "MEGASHIFT" protocol).
  • Step 1) All orthologous genomic regions enriched in both human and mouse Oct4 ChIP experiments were aligned and step 2) resynthesized as a tiled contig of 35 mers flanked by universal primer binding sites. The human genomic region was extended to cover the union of this overlap.
  • Step 3) This pool was amplified with labeled primers migrates as a single band and was then used in an EMSA activity with recombinant Oct4. The shifted assay was excised, reamplify and reshifted or analyzed by cloning or array.
  • Step 4) Array analysis, shifted and unselected fraction was reamplified and the T7 containing template used to generate Cy3 (shifted) or Cy5 (unselected) target for the custom oligo array.
  • Figures 6A, 6B, 6C and 6D depict enrichment for Oct4 binding sites.
  • Lane 8-9 EMSA with fraction shifted by recombinant Oct4 ("round 2" in Figure 6A, lane 8) used as a probe in whole cell extract.
  • Figure 6C wt, mutant and multiply shifted fraction of the oligo pool, excised (undetectable) from Figure 6A, lane 8 reamplified and used as a probe in Oct4 EMSA.
  • Figure 6D shift analysis performed in increasing concentration gradient of recombinant Oct4 protein.
  • Figures 7A, 7B, 7C, 7D and 7E depict changes in oligonucleotide enrichment throughout an Oct4 SELEX experiment.
  • Figure 7A Enrichment scores for each round of SELEX were binned and graphed as a histogram.
  • Figure 7B Average enrichment scores were ranked with relevant oligonucleotides marked on the percentile bar.
  • Gel shift assay was repeated for isolates cloned from selected Figure 7C and unselected Figure 7D fractions.
  • Figure 7E Array images corresponding to pool/pool and pool/round 1 are drawn below the gel lane for each oligonucleotide.
  • Figures 8A and 8B depict MEGASHIFT tracks for the UCSC Genome Browser. Annotation is stacked vertically along the chromosomal coordinate axis (x- axis). Starting from the top and proceeding down the mouse (top set) sequences are annotated for the following molecules:oligonucleotides cloned out of Oct4 selected fraction (short, stacked bars). ChIP-PET regions (wide bars), normalized enrichment scores in grayscale for each duplicate probe pair for each array experiment [multiply bound, round 3, round 2, round I ]. Enriched oligonucleotides are shaded darkly. Human (bottom set) is Identical save ChIPped material is analyzed by array (ChIP-chip). Predicted Oct binding sites annotated below. Conservation determined by eight vertebrate blastz alignments.
  • FIGS 9A, 9B and 9C depict comparison of Oct4 site prediction and Oct4 binding.
  • Oct4 sites were scored for each oligonucleotide as the log probability that a random sequence would fit the Oct4 binding model better than the highest scoring window (y-axis) in the oligonucleotide and plotted against enrichment (x-axis). Vertical line represents mean enrichment for each experiment.
  • Figure 9B De Novo Motif Identification was performed using Gibbs sampling trials with varying amounts of input that was ranked according to enrichment in round 1 or the multiply bound fraction. Successful trials converged on motifs with the Oct4 consensus (ATGCAAAT; SEQ ID NO: 1 were recorded on the y-axis.
  • Figure 9B Using the top 3 % of enriched oligonucleotides the effect of motif length was examined.
  • Figures 1 OA, 1OB and 1OC depict de Novo Motif Identification.
  • Figure 1OA singly bound (y-axis) enrichment was plotted against multiply bound enrichment (x- axis) for each oligonucleotide in the dataset.
  • Motif discovery was performed with Gibbs sampler using the dataset of oligonucleotides biased towards the multiply bound state (circles) as the input dataset.
  • Figure 1 OB Three motifs of length twenty were returned and represented in web logo format.
  • FIG. 1 OC Half sites (ATGC (SEQ ID NO: 2), GCAT (SEQ ID NO: 3), AAAT (SEQ ID NO: 4), and ATTT (SEQ ID NO: 5)) are counted in the entire set of oligonucleotides, and also in the set biased towards the singly bound state (above line in Figure 10A) and also in the set biased towards multiply bound (below the line in Figure 10A). Relative enrichment statistics for oligos containing zero, single and higher multiples of Oct4 half sites in the multiply shifted fraction are recorded. For each multiple of half sites, histogram lightly shaded bars mark relative risk (RR) in multiply shifted fraction and the more darkly shaded bars mark RR for the singly shifted fraction.
  • RR relative risk
  • FIG. 1 1 depicts de Novo Motif Identification Oct4 contains two pou domains which recognize a bi-partite signal (diagram).
  • Half sites (ATGC (SEQ ID NO:2), GCAT (SEQ ID NO: 3), AAAT (SEQ ID NO:4), and ATTT (SEQ ID NO: 5)) are counted in the entire set of oligonucleotidess, the set biased towards the singly bound state and the set biased towards multiply bound.
  • Each permutation of half sites with more than two-fold relative risk of being found in the multi-bound state versus the entire set is graphed.
  • Figures 12A and 12 B depict agilent zebrafish oligonucleotide microarray before ( Figure 12A) and after ( Figure 12B) scouring.
  • Figures 13A, 13B and 13C depict analysis of feature recovery 90 features all
  • FIG. 13A Array x and y coordinates are represented graphically. Location of successfully amplified features are depicted with an "o" and failed amplifications with an "x.”
  • Figure 13B spatial relationship between primers and probe orientation.
  • Figure 13C The number of successful amplifications was recorded for each primer pair combination.
  • Figure 14 depicts selection of Bound Ligand by Immunoprecipitation.
  • Anti Oct4 antibody supershifts positive control (lane 3), and immuno selected pool (lane 8) but not initial pool (lane 6)
  • Figure 15 depicts phosphatase treatment decreases Oct4 binding DNA.
  • Supershifted Oct4 indicates a loss of binding activity upon dephosphorylation of Oct4 on the wt probe (lane 3 vs 5) and also on the selected pool (lanes 1 1 vs 13).
  • Figure 16 depicts enrichment of anonymous complexes from an ES cell extract.
  • EMSA performed with Indicated probes. Protein bound fraction was isolated from the shift of the initial pool (lane 5) reamplified and use to reprobe extract (lanes 7, 8). This process was repeated for a total of two cycles of enrichment (lanes 9, 10).
  • the invention is a method of identifying a nucleotide ligand that associates with a target molecule, comprising the steps of amplifying a pool of non-random oligonucleotides (also referred to herein as "oligos") to form a library of non-random oligonucleotides; contacting the library of non-random oligonucleotides with a target molecule to form an association between the non- random oligonucleotides and the target molecule; and separating the non-random oligonucleotides that associate with the target molecule from the non-random oligonucleotides that do not associate with the target molecule to thereby identify the nucleotide ligand that associates with the target molecule.
  • oligos also referred to herein as "oligos”
  • Non-random refers to a pool of oligonucleotides that do not contain all possible permutations of nucleotides.
  • a pool of non-random oligonucleotides can be generated from information regarding a nucleotide sequence, for example, a genome of a nucleic acid sequence associated with the target molecule. For example, genomic sequences associated with Oct4 can be employed to design a pool of non-random oligonucleotides.
  • At least one of the non-random oligonucleotides can include a detectable label (e.g., Cy3, Cy5).
  • a detectable label e.g., Cy3, Cy5
  • the non-random oligonucleotides can be deoxyribo- oligonucleotides (single-stranded deoxyribo-oligonucleotides or double-stranded deoxyribo-oligonucleotides).
  • the deoxyribo-oligonucleotides can include at least one genomic nucleotide sequence, such as at least one member selected from the group consisting of a promoter nucleotide sequence and an enhancer nucleotide sequence.
  • the non-random oligonucleotides can be a ribo- oligonucleotides.
  • the non-random oligonucleotides can be synthetic (e.g., made by oligonucleotide synthesis methods) non-random oligonucleotides.
  • Each of the non- random oligonucleotides can have an identical number of nucleotides.
  • the number of nucleotides in the non-random oligonucleotides can be less than about 100 nucleotides (e.g., between about 50 nucleotides to about 100 nucleotides, about 20 nucleotides, about 25 nucleotides, about 50 nucleotides, about 75 nucleotides).
  • At least a portion of at least one non-random oligonucleotide can overlap with at least a portion of another non-random oligonucleotide.
  • the portion of the non-random oligonucleotide and the portion of another non-random oligonucleotide overlap in a range of between about 19 nucleotides to about 35 nucleotides.
  • At least one non-random oligonucleotides in the pool can include at least one primer binding site, such as at least one universal primer binding site.
  • the association of at least one non-random oligonucleotide with the target molecule can be detected by at least one member selected from the group consisting of a mobility shift assay (e.g., gel mobility shift assay), a hybridization array and an immunoprecipitation assay.
  • a mobility shift assay e.g., gel mobility shift assay
  • a hybridization array e.g., a hybridization array
  • immunoprecipitation assay e.g., immunoprecipitation assay.
  • the association between at least one non-random oligonucleotides with the target molecule can be performed iteratively.
  • a mobility shift assay can be performed iteratively (repeatedly) with at least one non-random oligonucleotide that associates with a target molecule.
  • the method of identifying a nucleotide ligand that associates with a target molecule can further include assessing a binding affinity of the nucleotide ligand for the target molecule.
  • the method of identifying a nucleotide ligand that associates with a target molecule can further include adding an agent (e.g., a drug) at one or more time points selected from the group consisting of before, concomitantly and after contacting the library of non-random oligonucleotides with the target molecule.
  • the agent can disrupt the association of at least one non-random oligonucleotide and the target molecule.
  • the target molecule may associate with the nucleotide ligand, however, upon exposure to the agent, the associate between the target molecule and the nucleotide ligand may be disrupted.
  • the agent can inhibit (also referred to as prevent) the association of at least one non-random oligonucleotide and the target molecule.
  • the agent may promote the association of at least one non- random oligonucleotide and the target molecule.
  • the agent employed in the methods described herein may include a phosphatase inhibitor, or at least one member selected from the group consisting of a drug, an enzyme (e.g., phosphatase) and a nucleic acid (e.g., a small interfering ribonucleic acid).
  • a phosphatase inhibitor or at least one member selected from the group consisting of a drug, an enzyme (e.g., phosphatase) and a nucleic acid (e.g., a small interfering ribonucleic acid).
  • the target molecule employed in the methods of the invention described herein can be a component of an extract of a cell.
  • the methods described herein can further include exposing at least one member selected from the group consisting of the cell and the extract to at least one member selected from the group consisting of an agent, a stress condition and an ultraviolet radiation before the extract of the cell containing the target molecule is prepared.
  • the target molecule employed in the methods described herein can be at least one member selected from the group consisting of a protein, a transcription factor or a splicing factor.
  • the transcription factor can activate a nucleotide sequence that is near or close to the location of binding of the nucleotide ligand in the genome, for example, between about 400 nucleotides to about 2000 nucleotides within a location of where the nucleotide ligand binds a genomic nucleotide sequence.
  • the methods described herein can include employing the amplifying, contacting and separating steps at least twice prior to identifying the nucleotide ligand.
  • At least one non-random oligonucleotide can further include at least one promoter sequence (e.g., a T7 promoter sequence).
  • the method of identifying a nucleotide ligand that associates with a target molecule can further including the step of repeating the steps of amplifying the pool of non-random oligonucleotides to form the library of non-random oligonucleotides, contacting of the library of non-random oligonucleotides with the target molecule to form the association between the non-random oligonucleotides and the target molecule, and separating the non-random oligonucleotides that associate with the target molecule from the non-random oligonucleotides that do not associate with the target molecule to thereby identify the nucleotide ligand that associates with the target molecule.
  • the methods described herein can further include performing each amplifying step performed in the presence of a distinct detectable label.
  • Distinct detectable labels are labels that are different one from the other.
  • one amplifying step can be performed in the presence of a Cy3 label and the subsequent amplifying step can be performed in the presence of a Cy5 label, which is label distinct from a Cy3 label.
  • the invention is a method of identifying a variant of an allele that binds a target molecule (e.g., a component of an extract of a cell), comprising the steps of contacting at least one target molecule with a first pool of non-random oligonucleotides to form a first library of non-random oligonucleotides and a second pool of non-random oligonucleotides to form a second library of non- random oligonucleotides, wherein each non-random oligonucleotide of the second library of non-random oligonucleotides is an allelic variant of the first library of non-random oligonucleotides; and wherein the first library of non-random oligonucleotides is optionally combined with the second library of non-random oligonucleotides prior to contact with the target molecule; and comparing binding of the target molecule to the first library of non-random oligon
  • the method of identifying a variant of an allele that binds a target molecule can further include the step of contacting at least one common nucleotide ligand with the first library and second library; or may further include comparing the binding of the common nucleotide ligand between the first library and the second library.
  • the non-random oligonucleotides in the first pool and/or second pool can be deoxyribo-oligonucleotides, such as deoxyribo-oligonucleotides that include genomic nucleotide sequences (e.g., at least one member selected from the group consisting of a promoter nucleotide sequence and an enhancer nucleotide sequence) or ribo-oligonucleotides.
  • At least one non-random oligonucleotide of at least one member selected from the group consisting of the first library of non-random oligonucleotides and the second library of non-random oligonucleotides can include at least one primer binding site, such as a universal primer binding site.
  • .. . . . - Binding of the target molecule to least one non-random oligonucleotide of at least one member selected from the group consisting of the first library of non- random oligonucleotides and the second library of non-random oligonucleotides can be detected by at least one member selected from the group consisting of a mobility shift assay, a hybridization array and an immunoprecipitation assay.
  • the non-random oligonucleotides of the first library and the non-random oligonucleotides of the second library can be differentially labeled.
  • the non-random oligonucleotides of the first library can include a Cy3 label and the non-random oligonucleotides of the second library can include a Cy5 label, which would result in the second library being differentially labeled compared to the first library (i.e., Cy5 is a different label than Cy3).
  • the invention is a method of determining a difference in a binding affinity of a first nucleotide ligand compared to a second nucleotide ligand for a target molecule (e.g., a component of an extract of a cell), wherein the first nucleotide ligand is an allelic variant of the second nucleotide ligand comprising the steps of contacting a first non-random oligonucleotide library of the first nucleotide ligand and a second library of the second nucleotide ligand with the target molecule; and comparing a proportion of the first non-random oligonucleotide library bound to the target molecule with the proportion of the second non-random oligonucleotide library bound to the target molecule, wherein a difference in the proportion of the first non-random oligonucleotide library bound to the target molecule compared to the proportion of the second non-random oligonu
  • the method of determining a difference in a binding affinity of a first nucleotide ligand compared to a second nucleotide ligand for a target molecule can further include the step of contacting at least one common nucleotide ligand with the first library and second library; can further include comparing the binding of the common nucleotide ligand between the first library and the second library.
  • the invention is a method of determining a binding affinity of a target molecule for a nucleotide ligand, comprising the steps of amplifying a pool of non-random oligonucleotides to form a library of non-random oligonucleotides; contacting the library of non-random oligonucleotides with a target molecule; detecting a nucleic acid sequence in the library of non-random oligonucleotides that binds the target molecule with a first nucleic acid probe, wherein the first nucleic acid probe binds the target molecule and includes a first detectable label; contacting the first nucleic acid probe with a second nucleic acid probe, wherein the second nucleic acid probe binds the target molecule with an affinity different than the first nucleic acid probe and includes a second detectable label that is distinct from the first detectable label, thereby forming a mixture of the first nucleic acid probe and the "
  • One method for identifying transcription factor binding specificity is by an Systematic Evolution of Ligands by Exponential Enrichment (SELEX) iterative method of selecting high affinity binding ligands of known activators. For each cycle of selection, the fraction of an oligo pool that is bound to the target is eluted from the filter and re-amplified. With each round of SELEX high affinity ligands of the target protein become more enriched within the pool. Ligands are cloned and weight matrices are calculated from an alignment of the selected sequence.
  • these methods often leave questions about the in vivo relevance of the output sequences, as natural selection may not always favor the highest binding affinity sites.
  • other factors such as chromatin accessibility greatly limit the usefulness of in vitro binding specificities for predicting sites in vivo (a more complete discussion of this phenomena, can be found in (Wasserman and Sandelin 2004)).
  • ChIP-chip and ChIP-PET locate binding sites in vivo by immunoprecipitating the factor of interest after it has been crosslinked to chromosomal DNA (Orlando and Paro 1993). Binding regions are identified either by array (ChIP-chip) or by sequencing (ChIP- PET). Both these techniques have been applied to the identification of p53-bound regions in human and murine ES cells (Wei, Wu, Vega, Chiu, Ng, Zhang, Shahab, Yong, Fu, Weng et al. 2006).
  • telomere binding does not always enhance transcription.
  • telomere binding is correlated with a repressed transcriptional state (Zhao, Gish, Murphy, Yin, Notterman, Hoffman, Tom, Mack and Levine 2000). This duality is not uncommon for transcription factors and is probably explained by the local context of each site (the identity of neighboring factors on the DNA). High resolution maps of transcription factor binding sites (TFBS) in promoters are required to understand these nuances of individual transcription factor function.
  • TFBS transcription factor binding sites
  • p53 binds specifically to DNA p53 (a target molecule) binds a bipartite sequence composed of two half sites, NNNCWWGNNN (SEQ ID NO: 6), arranged in a head to head orientation separated by 0-13 nucleotides of spacer. Motifs derived from in vivo ChIP results are very much biased towards having none or single nucleotide of spacer sequence (Wei, Wu, Vega, Chiu, Ng, Zhang, Shahab, Yong, Fu, Weng et al. 2006). Bona fide targets of p53 typically have at least two of these response elements (RE) within a few thousand nucleotides of the transcription start site.
  • RE response elements
  • MDRl repressed target
  • MDRl contains an unusual RE where the half sites are arranged in a head to tail orientation. Converting this head to tail orientation to the canonical head to head orientation transforms this p53 responsive silencer into a p53 responsive enhancer.
  • Mutations in p53's DBD may decrease binding strength and alter specificity.
  • p53 exists as a dimer and binds DNA as tetramer contacting DNA at an internal DNA binding domain (DBD).
  • DBD DNA binding domain
  • p53 mutations are recovered from nearly half of all tumors with almost all the mutations altering amino acids in the DBD. While a change in p53 binding activity is clearly the sine qua non of a cells progression towards cancer, the exact nature of this change in activity has been difficult to characterize.
  • the few DBD mutations that have been examined behave as dominant mutations however DNA binding to a canonical RE appears to proceed even with tetramers composed of three mutant alleles (Chan, Siu, Lau and Poon 2004).
  • Oct4 belongs to a well studied class of transcriptional activators and is a critical regulator of stem cells implicated in maintaining the pluripotent state. Like p53, slight changes in Oct4 activity can dramatically alter cellular events. For example a two fold increase in Oct4 activity induces differentiation into endoderm and mesoderm fates while less than normal levels results in differentiation into trophectoderm (Niwa, Miyazaki and Smith 2000). Furthermore slight elevation (about 1.5 fold) in Oct4 in the adult germline is capable of inducing gonadal tumors (Gidekel, Pizov, Bergman and Pikarsky 2003). One interesting aspect of Oct4 function is its unusual modes of binding DNA.
  • Oct4 can bind as a monomer to canonical Oct4 sites or to a variety of non-canonical combinations of half sites. These various configurations of Oct4 sites have been hypothesize to interact with different co-regulators thereby adding a great deal of complexity to the interpretation of factor binding at the promoters. Three such classes of sites have been discovered to compliment the MORE and PORE sites reported and studied previously. p53 also binds multimerically and there is also some indication that non-canonical combinations of p53 half sites may play a role in p53 biology (Menendez, Inga, Jordan and Resnick 2007).
  • Biochemical mapping of protein nucleic complexes to 3.9 mb of p53 regulated genomic regions p53 is a transcription factor that is found mutated in over 50 % of human tumors. The vast majority of all p53 missense mutations recovered from tumors fall within the DNA binding domain. These two facts underscore the importance of p53 . to cancer and also the importance of p53's DNA binding activity to its tumor suppressor function (Olivier, Eeles, Hollstein, Khan, Harris and Hainaut 2002; Soussi and Beroud 2003).
  • p53 binds to DNA and the consequence of its regulation. On some promoters p53 acts as transcriptional activator in some cases as a repressor. p53 function is probably determined by some combination of binding conformation, post translational modification and the identity of interacting factors and nearby factors.
  • oligo library that will cover large regions of the genome that contain cis- modules that are linked to cancer by their function as regulatory control elements for apoptosis, cell growth, DNA or damage response is designed.
  • the network of genes that are controlled by p53 or control p53 expression are widely regarded as the central players in cancer and tumor progression and the promoters and enhancers of such genes capture much of the biology described above.
  • the genomic regions from genes in the p53 network will be assembled from three sources: a list of genes central to p53 regulation, transcriptional targets of p53 detected by microarray in the colorectal cell line HCTl 16, and ChIP-PET studies of p53 binding sites also in HCTl 16 cell line.
  • oligos are a 35-mer and the library is designed by shifting about a 35-mer window in increments of about 10 nucleotides across a genomic region of interest.
  • profiling studies indicate anywhere between a few thousand to a few hundred genes are under p53 regulation.
  • Targets of p53 generally have at least two of these response elements (RE) within a few thousand nucleotides of the transcription start site. Adding about 2kb flanks to the transcriptional start site of about 230 these downstream targets adds another about 91 ,195 oligos.
  • RE response elements
  • the limited number of genes at the center of the p53 network will be covered in their entirety with about 80,000 oligos. These regions of interest will be covered by about 396,339, each about 35-mer oligos, which translates to three fold coverage of about 3.9 megabases.
  • This library will be synthesized on two Agilent custom oligo arrays. The remaining about 91 ,661 spots will be filled with a few individual well- characterized p53 response elements reported in the literature but mostly randomly selected genomic 35-mers as a negative binding control and also to generate background statistics for the array hybridizations.
  • genomic aptamers are commercially synthesized (Atactic Inc) in a solid phase format, cleaved and shipped as a mixed pool.
  • the quality of the resulting oligos is high but the total complexity is limited to about 4000 oligos/order.
  • One option that can be used is to design and order a custom oligo array, grind it into small pieces and PCR amplify the resulting particles (Oleinikov, Zhao and Gray 2005).
  • the surface of an Agilent custom oligo array can be scraped with a razor and amplified as a pool.
  • the primer pair used in the successful amplification is TAACATATGCCTGCAGTGTAC (SEQ ID NO: 7) and
  • Bacterial ly-expressed protein does not contain the necessary modifications to faithfully reproduce DNA binding however translation in the rabbit reticulate lysate results in a competent protein prep.
  • This protein prep will be made by the combined in-vitro transcription/translation reaction off the pRsetb-p53 plasm id constructs kindly provided by the Prives lab in the reticulate lysate.
  • Baculovirus-expressed his tagged p53 proteins are also commercially available and will be purchased if problems with synthesis are encountered (ProteinOne, Bethesda MD).
  • This p53 lysate will be incubated with a radiolabel led probe containing a perfect p53 response element to establish the mobility of the p53 shifted band in a native PAGE gel.
  • the identity of the p53 complex will be verified by blocking/supershifting experiments with DOl monoclonal antibody. For reasons of continuity, DOl is the preferred antibody.
  • DOl is the preferred antibody.
  • the p53 ChIP enriched regions that constitute the majority of the oligo pool will be immunoprecipitated with this antibody. However if DOl does not supershift or has poor specificity, other antibodies such as 1801 , which have been demonstrated to shift p53, can be used.
  • pAb421 will be avoided because it affects the binding affinity of p53 (Olivier, Eeles, Hollstein, Khan, Harris and Hainaut 2002; Soussi and Beroud 2003). If cross-reactivity is observed with multiple antibodies translation will be performed in wheat germ extract. Molecular selection of bound oligos
  • the oligo pool will be amplified by the universal primer pair with labeled dATP.
  • the shifted fraction will be extracted from the gel following each round of EMSA. It is well established that p53 binds as two dimers forming a tetramer complex on DNA (McLure and Lee 1999). p53 complexes could occur with other stoicheometries during the course of our experiments and if these additional complexes as distinguishable by EMSA, the bound oligos could be extracted from the gel. This qualitative information from electrophoretic separation is lost in alternate protocols.
  • the second application of the portioning measurements is to normalize the red and green channels of array.
  • the eluted oligos are PCR amplified and labeled with Cy3 (bound) and Cy5 (starting pool) and then hybridized to the array. After this normalization the ratio of red to green then represents the bound fraction to unbound fraction for each oligo in the pool.
  • a binding curves will be generated from a concentration series of protein. The accuracy of the binding curve will be checked and calibrated by control oligos that are introduced into the starting pool.
  • the binding enrichment scores will be correlated with in vivo activity for the set that coincides with DNase hypersensitive regions and its complement.
  • p53 is modified extensively following induction in living cells and it is unclear the extent to which this program of modification acts on baculovirus expressed or in vitro translated proteins. p53 has been reported to be acetylated, ubiquitinated, sumolyated, phosphorylated and methylated in response to a diverse array of cellular insults.
  • In-vitro translated protein are replaced with a panel of p53 preparations that have been affinity purified from whole HCTl 16 cell extracts with DOl antibody attached to magnetic beads (protein A/G Dynabeads).
  • Each preparation of extract will be induced via different mechanisms including but not necessarily limited to: UV irradiation, ⁇ -radiation, hypoxia, heat shock, cisplatin, 5-Fluorouracil and other genotoxic agents.
  • the transcriptional profile of p53 is cell line specific and interactions with co-regulators may influence binding in a positive or negative manner (Zhao, Gish, Murphy, Yin, Notterman, Hoffman, Tom, Mack and Levine 2000). These may reflect characterized interactions such as gp300 or negative interactions such as competition with another factor for the same binding site.
  • immunoprecipitation is performed as described herein after incubation of the target with ligand and the resulting spectrum of oligos is contrasted with specificity worked out for p53 binding in the presence of co-factors by allowing complexes to form before the IP. Role of co-regulators and competing factors in p53 binding
  • the whole cell extract will be made from HCTl 16 cells growing in log phase. Under these conditions p53 will be low and the promoters represented in the oligo pool would be inactive, however these cells will be treated with 5-fluorouracil for about 6 hours and also mock induced. These conditions are sufficient to upregulated p53 and its downstream targets (Kho, Wang, Zhuang, Li, Chew, Ng, Liu and Yu 2004). The treated and mock treated extract will serve to represent the induced versus the uninduced state in the binding assays performed with the oligo pool. Other means of inducing p53 are described herein. Co-regulators
  • p53 will be affinity-purified from whole cell extract and the binding assay will be performed on the bead.
  • p53 exists in solution as a dimer, and so, the spectrum of ligands that remain fixed on the column will probably reflect the inherent affinity of p53 to ligand to the response element without accounting for the role of positive co-regulators or negative competitors.
  • p53 will be eluted from the beads, desalted and rebound in aqueous phase. These complexes will then be isolated by a second immunoprecipitation and analyzed by two color microarray.
  • the p53/DNA complexes will be immunoprecipitated directly from the complex extract.
  • Known interacting partners such as p300/CREB, and also hsp70 (p53-HSP70 complexes in oral dysplasia).and other factors may alter p53 binding specificity.
  • the results of these assays will be compared to the binding reactions performed with p53 alone. This co-regulator hypothesis predicts that purified p53 will not bind the composite response elements alone but would form complexes only in the presence of extract. Distinct complexes enabled by candidate interaction partners or genetically interacting partners will be queried separately with RNAi depletion experiments. Competitors
  • the extract may contain factors that compete with p53 for DNA binding.
  • the paralogs, p63 and p73 have overlapping binding specificity with p53 and may compete for some response elements. The results of these assays will be compared to the binding reactions performed with p53 alone. This competitor hypothesis would predict that purified p53 would bind the response element alone but not in the presence of extract.
  • binding assays described herein will be repeated with and without drug treatments that induce global shifts towards the phosphorylated isoforms in the proteome.
  • pervanadate about 0.1 to about 3 mM
  • a tyrosine phosphatase inhibitor that should block the dephosphorylation of p53 at residue 55 that occurs in all inductions and otherwise mimics induction by shifting the distribution of p53 to the T88 phosphoform. This site is close to the DBD and alterations in specificity will be assayed as described herein.
  • HCTl 16 cells Determining the DNA binding specificity of particular p53 phosphoisoforms
  • the efficacy of okadaic acid treatment will be analyzed with the large selection of commercially available p53 specific antibodies.
  • HCTl 16 cells To determine the specific affect of individual phosphorylation events, HCTl 16 cells will be treated as described herein and also will be induced with the genotoxic drugs and conditions described in Figure 2.
  • immunoprecipitations will be performed with a panel of IgG antibodies on untreated extract that are specific for the following phosphoepitopes (ser 6, 9, 15, 20, 37, 46, 315, 378, 392 and thr 18, 55, 155, and 377) from the following companies (VWR, Phosphosolutons, and Santa Cruz). In this fashion, specific, natural phosphoisoforms will be selected from extract and assayed for binding.
  • a high throughput binding assay will also be used to test the Piano hypothesis, an idea advanced by Michael Resnick that many different DBD missense mutants lose affinity for DNA in a sequence specific fashion that is idiosyncratic to the mutation. This hypothesis predicts that these mutants may bind DNA with an altered specificity - an outcome that would have important clinical implications.
  • extracts will be made from p53 deficient Sao2 cells transfect with mutant p53 expression vectors and assayed as described herein.
  • the paralogs p63 and p73 performs parallel overlapping roles with p53 in driving transcription where P63's consensus binding site differs from p53 binding site at only two positions (Perez, Ott, Mays and Pietenpol 2007). Binding will be assayed as described herein using antibodies specific to p63 and p73.
  • the gel shift assay will be used to identify functional complexes based on mobility and attempt to enrich the sequence thus bound without initially identifying factors involved in the complex ( Figure 16). Sequences that are capable of being enriched will be analyzed via array. Having identified the motifs bound in particular complexes the genome can be annotated for complexes. Bound fractions will be grouped by complex mobility and each of the 488.000 oligos will be associated with an enrichment score for each group. For each group, the results will be ranked according to enrichment and the top 3 % used as input for the Gibbs motif sampler as described herein to discover binding specificity. When this method is developed with Oct 4, gibbs sampler converges on the correct motif 100 % of the time.
  • the method to predict transcription factor identity of the complexes that form on p53 responsive regions described herein compares motifs discovered within oligos that are enriched in a complex to the database of curated tf motifs in transfac. The validity of predictions will be validated by RNAi depletion of the candidate transcription factor in HCTl 16 cell lines. In addition to using the TF depleted cell line in a gel shift to confirm the disappearance of the factor, the line will be analyzed by ChIP for factor binding at the site in question. In addition, the genes that are located proximal to the predict factor binding site will be examined for alteration in transcriptional response to various DNA damage.
  • the interactions will be tested with a UV cross-linking method.
  • the extract will be allowed to bind a body labeled oligo probe followed by immunoprecipitation with the appropriate antibody. If this method fails to establish the identity of the interacting factor, the oligo will be used as a probe to affinity purify the complex. Briefly, passing extract from larger scale (20L) tissue culture preps over a column of packed Sepharose Beads covalently linked to the oligo probe will retain proteins that assemble on the probe. These factors will analyzed by silver stain, excised and identified by mass spectrometry.
  • Map of regulatory cis-elements on p53 regulated genomic regions can employ the followers. Design of oligo library. Proximal promoter element reporter
  • the candidate enhancer will be ligated upstream of a transcriptional reporter.
  • a PCR based methodology for constructing the in vivo transcriptional reporter library can be utilized. Description of a Pol II ChIP in-vivo reporter system
  • a panel of cmv gfp constructs are truncated at the 5' end for the purpose of creating a minimal promoter transcription reporter will be ligated.
  • the ideal construct would partially expresses gfp allowing room for the detection of both above background expression levels and below background expression. All experiments will be performed in HCTl 16 to maintain consistency with earlier portions of this grant.
  • the oligo library described herein will be amplified, digested at the unique BanHi site in the universal primer and ligated immediately upstream of the minimal promoter transcription reporter.
  • the ligated product will be subject to several final rounds of amplification with outside primers, purified and transfected into cells as a PCR product.
  • transient expression of a PCR fragment containing this cmv promoter upstream of gfp in 293 cells is demonstrated. If a particular oligo in the library is capable of functioning as an enhancer or a proximal promoter element, that reporter should express more gfp than the average reporter in the pool.
  • This linear PCR product will be introduced via lipofectamine transfection into HCTl 16 cell line.
  • Standard ChIP protocol will be utilized to isolate the transcriptionally bound faction of linear PCR products.
  • RNA Polymerase II will be crosslinked to active promoters through exposure to formaldehyde (1 % solution).
  • the chromatin will be sheared and lysed by brief bursts of sonication. PoI II will be immunoprecipitated as described herein and the cross-linkages will be reversed by one hour incubation at 65 C.
  • transfections will be performed on logarithmic phase HCTl 16 cells and in cells induced by a variety of insults (5-FU, UV radiation and ⁇ -radiation and also 5-FU).
  • a fraction of the oligos will be returned as constitutive enhancers or constitutive silencers while others will be regulatory elements that are induced by certain treatments. Results obtained by different methods of induction will be crosschecked to the results of the binding assays performed on similarly treated extracts. This may subdivide the active fraction of the oligo pool further (i.e. an enhancer that forms a low mobility complex that is dependant on phosphorylation). Many oligos maps to fewer promoter region. After separating the oligos in this fashion an understanding of how these elements work together to drive the regulation observed in the promoter will be sought. Specifically, different combinations of regulatory elements that are over-represented (or under- represented) in promoters that are p53 activated will be searched.
  • the same analysis will be performed in promoters that are p53 repressed.
  • the dataset can be divided by several other functional criteria (chromatin accessible/inaccessible) or by treatment. This approach will be used to find suggested synergies important in the different treatments.
  • Transcript level response to particular treatments will be downloaded from publicly available GEO dataset. Significance will be determined by performing the analysis on a thousand shuffles of the oligo/enrichment data and recording the number of permutations with combinations equal to or more extreme than the observed data.
  • GTCTTCAATTGCATG (SEQ ID NO: 10)) and amplified.
  • the snp regions are selected based on their location and also based on their validation status.
  • SNPs in dbSNP, build 125. are used as the source of human variation. SNPs are screened and deemed valid if the entry 1) is found more than once 2) included frequency information 3) or is validated by the hapmap project or the submitter. From these remaining 5 million SNPS a smaller set of promoter SNPs iscreated. The about 1 19,426 SNPs that fall within the about 2kb upstream or about I kB downstream of a refseq transcriptional site are classified as promoter snps and designed into the oligo library. Binding assay / reporter assay
  • the pool will be ligated to the gfp fragment via a Muni site into the gfp reporter fragment.
  • HcTl 16 will be transfected with the pool, crosslinked and assayed for function. Additional cycles of amplification will be performed as necessary.
  • various perturbations will applied to the line. Downstream analysis will focus on variations that demonstrate allelic differences in function or binding when measured under p53 inducing stress conditions.
  • genes in close proximity to SNPs with allelic difference in regulatory potential will be retrieved from the OMIN database. Variations will be reassociated with their haplotype SNPs in silico. This family of variations will be used to search publish reports of association studies with human disease. Similarly SNPs will be searched in haplotypes that show evidence of recent positive selection in the human population. Alleles that are functionally distinct in p53 dependant responses will be genotyped in Coriel lymphoblastoid cell lines and in patient populations. Each year more than one million Americans are diagnosed with cancer.
  • EXEMPLIFICATION EXAMPLE 1 A SIMULTANEOUS EMSA ANALYSIS OF THOUSANDS OF OLIGOS TO CHARACTERIZE IN VIVO OCT4 BINDING REGIONS
  • Oct4 also known as Oct3 and Pou5fl
  • ES embryonic stem
  • Oct4 was isolated from ES cells on the basis of its ability to bind an octamer sequence, ATGCAAAT (SEQ ID NO: 1) (Scholer, Balling et al. 1989).
  • Oct4 expression may also mark adult germline compartments and certain classes of tumors (Gidekel, Pizov et al. 2003; Yamaguchi, Yamazaki et al. 2005; Atlasi, Mowla et al. 2007 — also Kehler J Tolkunova, E et al 2004 EMBO Rep).
  • SELEX a method of identifying high affinity binding sites from random sequence through iterative steps of binding selection and enrichment (Nishimoto, Miyagi et al. 2003).
  • Weight matrices are calculated from an alignment of the selected sequences and used to score the "closeness" of real sequences to the high affinity sites.
  • these methods often leave questions about the in vivo relevance of the output sequences as natural selection may not always favors the highest binding affinity sites.
  • other factors such as chromatin accessibility greatly limit the usefulness of in vitro binding specificities in predicting sites in vivo (Wasserman and Sandelin 2004).
  • ChIP-chip and ChIP-PET locate binding sites in vivo by immunoprecipitating the factor of interest after it has been crosslinked to chromosomal DNA (Oflando and Paro 97). Binding regions are identified either by array (ChIP-chip) or by sequencing (ChIP-PET). Both these techniques have been applied to the identification of Oct4-bound regions in human and murine ES cells (Boyer, Lee et al. 2005; Loh, Wu et al. 2006). However, high throughput localization studies such as ChIP-chip and ChIP
  • PET have several important limitations. Crosslinking is not quantitative and sometimes not even immunoprecipitation, as well as the array densities and sequencing depths (Buck and Lieb 2004). Most of the published location studies return regions that span about 1 kb of genomic space. In how many places within these regions or even whether there is direct interaction between the factor and the DNA cannot immediately be determined. Recent advances in sequencing technology and the practice of size selecting the recovered DNA fragments have improved this situation, allowing for the inference of binding sites to a resolution of 50 base pairs (Johnson et al Science 2007). However, this approach is costly and this inference becomes problematic when recognition elements cluster in closely spaced groups along the DNA. A more direct determination of Oct4 binding sites within these regions will help identify variants that disrupt binding and also shed light on the mechanism of Oct4 function.
  • Oct4 binding does not always enhance transcription. In some cases, Oct4 binding is correlated with a repressed transcriptional state (Boyer LA Supplementary). This duality is not uncommon for transcription factors and is probably explained by the local context of each site and the identity of neighboring factors on the DNA. High resolution maps of transcription factor binding sites (TFBS) in promoters are required to understand these nuances of transcription factor function. To date, functionally defined target gene regulation by Oct4 binding has been described for only a handful of genes at base pair resolution.
  • EMSA can distinguish single from multiple molecules of bound protein.
  • EMSA is a highly quantitative, well established method that can be performed in whole cell extracts, allows for the physical isolation of the bound product and does not require an array to analyze the result.
  • the feasibility of high throughput EMSA will be demonstrated with an analysis of Oct4 binding capacity throughout the Oct4 ChIP enriched regions (Boyer et al).
  • the MEGAshift method utilizes solid-phase DNA synthesis to remake regions of interest as large pools of short oligos. The reverse strand of these oligos are also synthesized as probes onto a custom oligo array. The oligo are incubated with recombinant Oct4 and the bound and unbound fraction analyzed by array. From this, a binding affinity can be directly measured for each oligo and so by proxy for each window along the Oct4-enriched chromosomal region.
  • RESULTS Library design, oligonucleotide synthesis, cloning and sequencing
  • the human sequence was extended to completely cover the mouse sequences.
  • a complex pool of 2,468 oligonucleotides was synthesized in picoarray microfluidic ⁇ ParafloTM devices. The pool was amplified en masse and end-labeled using - 32P-ATP. Each oligonucleotide was designed as a tiled genomic 35-mer flanked by the common sequences CCAGTAGATCTGCCA (SEQ ID NO: 1 1) and ATGGAGTCCAGGTTG (SEQ ID NO: 12) that were used as the universal primer binding pair.
  • Selected oligonucleotides were cloned into TOPO vectors (Invitrogen) under conditions favoring multiple inserts and sequenced.
  • TOPO vectors that contained single inserts were digested with Xhol and HinDIII (New England Biolabs) and ligated into the pGL3basic luciferase reporter vector (Promega) using the same sites.
  • the control wild-type and mutant sequences were cloned into pGL3basic as phosphorylated duplex oligonucleotides using a unique Smal site.
  • Cells were lysed using SoluLyse (Genlantis) in 50 mM Tris-Cl pH 8.0, 1 % NP-40, 2 mM ethylene-diamine-tetraacetic acid (EDTA), 150 mM NaCl, 0.5 mM dithiothreitol (DTT), plus a protease inhibitor cocktail (Roche).
  • the lysate was clarified by centrifugation, and incubated with glutathione-sepharose (GE Healthcare) for 30 minutes at 4°C. Sepharose beads were collected by centrifugation and washed 3 times with the lysis buffer. Washed beads were eluted with 20 mM glutathione in lysis buffer.
  • the purified GST-Oct4 in the eluate was dialyzed into buffer D (20 mM Hepes, pH 7.9, 100 mM KCl, 0.1 mM EDTA, 20 % glycerol, 1 mM DTT, and 0.5 mM phenylmethylsulphonyl fluoride (PMSF).
  • buffer D (20 mM Hepes, pH 7.9, 100 mM KCl, 0.1 mM EDTA, 20 % glycerol, 1 mM DTT, and 0.5 mM phenylmethylsulphonyl fluoride (PMSF).
  • Samples were prepared in 20 ⁇ L (0.6X Buffer D, 50 ng/ ⁇ L Poly dl'dC, 1 ⁇ g/ ⁇ L BSA, 1 mM DTT, and 20 ng of probe). Samples were incubated at room temperature for 30 minutes. Native 4% polyacrylamide gels (29: 1 acrylamide:bisacrylamide, 1 % glycerol, 0.5X TBE) were pre-run for 1 hour at 80V, samples were loaded, and run for 1.75 hours at 80V. Hybridization and Arrays
  • Custom oligonucleotide microarrays (8 x 15K) were produced by Agilent Technologies Inc. Microarrays were hybridized following a modified version of the Agilent Two-Color Microarray-Based Gene Expression Analysis Protocol. Microarrays were hybridized for 3 hours at 50°C and then washed for 1 minute with 2X SSC with 0.2 %SDS, two 1 -minute washes with IX SSC, and finally one 10 second wash with EtOH (e.g., 95 %). Microarrays were then centrifuged dry and scanned using a GenePix 4000B scanner from Molecular Devices.
  • RNA probes were produced and labeled with Cy3 and Cy5 using MEGAshortscriptTM High-Yield Transcription kit (Ambion) after appending a T7 promoter to the oligonucleotides.
  • MEGAshortscriptTM High-Yield Transcription kit (Ambion) after appending a T7 promoter to the oligonucleotides.
  • ES cells were cultured in DMEM + HEPES supplemented with, for example, 1 mM glutamine, sodium pyruvate, and 1 mM MEM non-essential amino acids (Invitrogen) plus 15 % ES cell-qualified heat-inactivated fetal bovine serum (HyClone), 50 ⁇ M 2- mercaptoethanol (Sigma) and leukemia inhibitory factor (LIF/ESGRO, Chemicon). Cultured ES cells were transfected by electroporation (Amaxa Biosystems) using protocols supplied by the vendor.
  • RA retinoic acid
  • Each transfection contained 1.3 ⁇ g carrier plasmid. 200 ng pRL-TK (Promega), and 500 ng of either pGL3basic empty vector or cloned derivatives. Photinus and Renilla luciferase activities were quantified using a dual luciferase assay system (Promega). Web resources
  • Genome browser snapshots of all the gene loci with Oct4 binding data can be viewed at http//fairbrother.biomed. brown.edu/data/Oct4.
  • the raw data from the array experiments are stored as text files on the server as is legend diagramming each experiment. Custom tracks will be submitted to UCSC Genome browser and are also available for download.
  • Transcription factor Oct4 is a key regulator of embryonic stem cell pluripotency and a known oncoprotein. Genome-wide location studies have been performed in vivo with mouse and human embryonic cells to identify genomic regions that crosslink to and immunoprecipitate with Oct4.
  • MEG Ashift microarray evaluation of Genomic aptamers by shift
  • MEG Ashift microarray evaluation of Genomic aptamers by shift
  • Oct4 is generally known as an activator in stem cell state and is turned off during differentiation only about 39 % of Oct4 targets are transcriptionally active in human ES cells (Pan, Chang, Scholer and Pei 2002).
  • Oct4 targets that overlap with mouse ChIP regions about 73 % were in close proximity to genes that were expressed in ES cells.
  • the ChIP-enriched regions were analyzed in both human and mouse.
  • the binding regions in human and mouse did not always align perfectly and so to extend the region of comparison, the union of this overlap was synthesized in human ( Figure 5, step 1).
  • Each of these 40 regions was then used to generate a contig of 35-mers tiled in 19-nucleotide increments across the genomic region enriched in the ChIP assay ( Figure 5, step 2).
  • the radiolabeled oligonucleotide library represents 2468 different genomic 35-mer windows flanked by the universal primer pair but migrated as a single band in the polyacrylamide gel ( Figure 6A, lane 6) with no appreciable shift when incubated with recombinant Oct4 ( Figure 6A, lane 7).
  • the region of the gel where the Oct4 shift would be expected to migrate was excised, re-amplified and used to reprobe Oct4 in round 2 of the selection ( Figure 6A, lane 8-9). In round 2 an appreciable signal, consistent with an Oct4-bound probe, was detected.
  • the EMSA was repeated in whole cell lysate derived from ES cells using the enriched fraction from round 2 as a probe. Because of the tendency of Oct4 to form various hetero and homo complexes, probes containing octamer sequences have been observed to display complex shifting patterns in extract (Remenyi, Lins, Nissen, Reinbold, Scholer and Wilmanns 2003; Remenyi, Tomilin, Pohl, Lins, Philippsen, Reinbold, Scholer and Wilmanns 2001 ). In this experiment the wild-type probe displayed at least three shifted products that could represent Oct4 containing complexes.
  • a two color labeling strategy was used in conjunction with an Agilent custom oligonucleotide array to compare the representation of each oligonucleotide in the shifted band to the enrichment of that oligonucleotide in the starting pool.
  • the hybridization intensity of the selected targets was normalized to the hybridization of the pool by a process of multiplying each probe intensity in the selected channel by a constant such that log of the ratio of selected/pool intensities (e.g., color spot ratios) summed to zero across all the probes on the array.
  • the region upstream of GADD45G represents an example of poor agreement between the mouse ChIP-PET results and human ChIP-chip result (Figure 8B).
  • the ChIPped regions contain little overlap, MEGAshift finds multiple sites that appear to be functionally conserved - i.e. enriched in both species throughout the SELEX experiment.
  • the clustered distribution of binding sites that is common within the data appears to complicate the calling of enriched peaks in ChIP-chip analysis.
  • several closely spaced regions upstream of GADD45G are enriched in the ChIP -chip or ChIP-PET data. These regions overlap incompletely, yet according to MEGAshift, the most significant Oct4 binding is occurring in a non-overlapping region.
  • This oligonucleotide is located in a highly conserved block, comparable to exonic coding sequence, and contains a predicted Oct4 binding site. For these reasons it is likely that this site is a bona fide Oct4 binding site that was missed in the peak calling procedure. Training binding models with enrichment data
  • the oligonucleotides were ranked according to enrichment in the singly shifted fraction of the pool (round 1 ) and multiply shifted fraction (multiple). These ranked lists were then used to generate input sets for Gibbs Sampler trials which searched for motifs of lengths about 8 to about 20 nucleotides long. Using the top 20 most enriched probe signals (0.4 % of the data), the Gibbs sampler converged on a motif that contained the consensus Oct4 binding site (ATGCAAAT (SEQ ID NO: I)) in about 40 % of the trials ( Figure 9B).
  • both the singly shifted and the multiply shifted enrichment values include a wide range of motif lengths that return Oct4 consensus sequences about 100 % of the time (Figure 9C).
  • MEGAshift can determine the identity of sequences present in both the singly or multiply bound state, it should be possible to learn sequence features that pre-dispose a particular element to bind multiple molecules of Oct4. Sequence determinants of dimerization motifs
  • Plotting enrichment of the singly versus multiply bound fraction indicates that a sequence enriched in the singly bound fraction is more likely to also be enriched in the multiply bound fraction (Figure 10A). For example, distinct sequence features favor dimer formation.
  • the Gibbs sampler converges on the core Oct4 binding sequence in the multiply shifted pool, but the behavior of the length parameter suggests slightly different motif characteristics.
  • palindromic combinations of half sites have been known to support homo and heterodimer formation (Remenyi, Tomilin et al. 2001). Although these designed examples are informative, MEGAshift allows for the discovery of real sequences that predispose Oct4 to bind as a multimer.
  • Another application is to discover or to refine binding motifs based on direct evidence and real sequence. Choosing the appropriate parameters and using a MEGAshift ranked input leads to a complete convergence of motif finders on the octamer sequence.
  • Oct4 has a known binding motif but this result demonstrates that binding motifs could be identified from unknown complexes that form on oligonucleotide pools in whole cell extracts. If upstream promoters of coordinately regulated genes were used for the design of oligonucleotide pools, these complexes could be mapped relative to each other allowing for the definition of regulatory modules.
  • Another benefit of being able to test specific sequences is the ability to design mutations into the oligo pool.
  • Several of the binding elements in promoters may harbor polymorphisms and be important functional variants that account for some biologically relevant phenotype.
  • MEGAshift could be used to assay both alleles of such a polymorphism in order to discover functional SNPs.
  • MEGAshift represents a hybrid biochemical, genomic and computational approach to identify a question of binding specificity in gene expression. However any application that involves a pool of nucleic acid and a method of molecular selection could be amenable to this protocol.
  • MEGAshift is inexpensive ($700 oligo synthesis - Atactic - $450 custom oligo array). The Agilent arrays can be stripped and re-used and the results could easily be analyzed by sequencing if array facilities are unavailable.
  • EXAMPLE 2 REMOVAL AND RECOVERY OF OLIGONUCLEOTIDES FROM A MICROARRAY
  • the oligonucleotides used for this experiment were 60 mers harvested from a custom oligonucleotidenucleotide array, which are available in a variety of spot (feature) densities. Each feature consists of a homogenous population of about 60- mers that had been synthesized onto the microscope slide using Agilent Sure Print Technology.
  • a series of pilot experiments were designed to amplify individual features from a discarded Zebrafish exon microarray (G2519F). Three features were chosen at random from the about 45,220 spots present on the zebrafish exon microarray.
  • Primers fifteen nucleotides in length and corresponding to the beginning and end of the published feature sequence, were designed to amplify these three features in three separate PCR reactions.
  • the oligonucleotides were separated from the solid phase attachment by mechanical abrasion. (Agilent Corp. Patent No. 896572 filed on 2001-06-29). Scouring the printed face of the microscope slide with a 20 gauge needle efficiently removed both oligonucleotides and slide coating ( Figures 12A and B). The resulting particles were resuspended in water and any adherent clumps were disrupted by sonication. Of the three features only a single feature was successfully amplified.
  • each oligonucleotide was a unique sequence flanked by these 15 nucleotide primer sequences that had been demonstrated to amplify in the pilot experiment.
  • the resuspended oligonucleotide was tested for the presence often randomly chosen oligonucleotide. Amplifying this pool with primers specific to the internal unique sequences often features resulted in nine successful amplifications. This high rate of recovery indicates that the vast majority of features were successfully transferred and amplified from the solid phase to the aqueous phase.
  • EXAMPLE 3 IMMUNOPRECIPITATION - FOLLOWED BY MAGNETIC BEADS ALLOWS FOR GREATER ENRICHMENT OF OCT4 LIGANDS THAN EMSA.
  • Oct4 phosphorylation As treatment of the extract with phosphatase decreased the Oct4 supershifted band ( Figure 15, lanes 5, 13 vs. lanes 3, 1 1). It is likely that Oct4 or p53 achieves some of its regulatory function in the context of a synergistic modules.
  • synergistic partners such as Sox2 and Nanog that, with Oct4, bind DNA in a coordinated fashion. Both Oct4 and p53, once bound to DNA can act as either activators or repressors. While the mechanism of such diverse action is unknown it is likely due to interaction with nearby factors.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Methods of identifying associations between nucleotide ligands and target molecules include amplifying a pool of non-random oligonucleotides to form libraries of oligonucleotides, contacting the library with a target molecule and separating the non-random oligonucleotides that associate with the target molecule from the non-random oligonucleotides. Methods of identifying a variant of an allele that binds a target molecule can include contacting at least one target molecule with a first and second library of non-random oligonucleotides and comparing the binding of the target molecule with the first and second library. Methods can further include determining a difference in binding affinity of a first nucleotide ligand and a second nucleotide ligand and determining a binding affinity of a target molecule for a nucleotide ligand.

Description

METHODS FOR IDENTIFYING NUCLEOTIDE LIGANDS
RELATED APPLICATION
This application claims the benefit of U.S. Provisional Application No. 61/007,863, filed on December 17, 2007. The entire teachings of the above application are incorporated herein by reference.
BACKGROUND OF THE INVENTION
Understanding interactions between target molecules and nucleic acid sequences, in particular, identifying nucleic acid sequences that associate with target molecules, can lead to important information regarding regulation of genes, the identification of variants of alleles that bind target molecules and relative binding of molecules to nucleic acid sequences. Currently, techniques to identify interactions between target molecules and nucleic acid sequences are relatively expensive, may not accurately identify specific nucleic acid sequences that associate with target molecules and may not reflect in vivo interactions between nucleic acid sequences and target molecules. Thus, there is a need to develop new, improved and effective methods to identify interactions between target molecules and nucleic acid sequences.
SUMMARY OF THE INVENTION
The present invention generally relates to methods of identifying nucleotide ligands, variants of alleles and differences in binding affinities of nucleotide ligands.
In an embodiment, the invention is a method of identifying a nucleotide ligand that associates with a target molecule, comprising the steps of amplifying a pool of non-random oligonucleotides to form a library of non-random oligonucleotides; contacting the library of non-random oligonucleotides with a target molecule to form an association between the non-random oligonucleotides and the target molecule; and separating the non-random oligonucleotides that associate with the target molecule from the non-random oligonucleotides that do not associate with the target molecule to thereby identify the nucleotide ligand that associates with the target molecule.
In another embodiment, the invention is a method of identifying a variant of an allele that binds a target molecule, comprising the steps of contacting at least one target molecule with a first pool of non-random oligonucleotides to form a first library of non-random oligonucleotides; and a second pool of non-random oligonucleotides to form a second library of non-random oligonucleotides, wherein each non-random oligonucleotide of the second library of non-random oligonucleotides is an allelic variant of the first library of non-random oligonucleotides; and wherein the first library of non-random oligonucleotides is optionally combined with the second library of non-random oligonucleotides prior to contact with the target molecule; and comparing binding of the target molecule to the first library of non-random oligonucleotides and the second library of non- random oligonucleotides, wherein binding of the target molecule identifies the variant of the allele that binds the target molecule.
In an additional embodiment, the invention is a method of determining a difference in a binding affinity of a first nucleotide ligand compared to a second nucleotide ligand for a target molecule, wherein the first nucleotide ligand is an allelic variant of the second nucleotide ligand comprising the steps of contacting a first non-random oligonucleotide library of the first nucleotide ligand and a second library of the second nucleotide ligand with the target molecule; and comparing a proportion of the first non-random oligonucleotide library bound to the target molecule with the proportion of the second non-random oligonucleotide library bound to the target molecule, wherein a difference in the proportion of the first non- random oligonucleotide library bound to the target molecule compared to the proportion of the second non-random oligonucleotide library bound to the target molecule indicates a difference in the binding affinity for the first nucleotide ligand compared to the second nucleotide ligand for the target molecule.
In a further embodiment, the invention is a method of determining a binding affinity of a target molecule for a nucleotide ligand, comprising the steps of amplifying a pool of non-random oligonucleotides to form a library of non-random oligonucleotides; contacting the library of non-random oligonucleotides with a target molecule; detecting a nucleic acid sequence in the library of non-random oligonucleotides that binds the target molecule with a first nucleic acid probe, wherein the first nucleic acid probe binds the target molecule and includes a first detectable label; contacting the first nucleic acid probe with a second nucleic acid probe, wherein the second nucleic acid probe binds the target molecule with an affinity different than the first nucleic acid probe and includes a second detectable label that is distinct from the first detectable label, thereby forming a mixture of the first nucleic acid probe and the second nucleic acid probe; hybridizing the mixture to a collection of nucleic acid sequences that are complementary to the first nucleic acid probe and the second nucleic acid probe; detecting the first nucleic acid probe and the second nucleic acid probe that hybridize to the collection; and determining a ratio of the first detectable label to the second detectable label, to thereby determine the binding affinity of the target molecule for a nucleotide ligand.
The methods of the invention can be employed to identify a nucleotide ligand that associates with a target molecule, a variant of an allele that binds a target molecule, determine a difference in a binding affinity of at least two nucleotide ligands and determining a binding affinity of a target molecule for a nucleotide ligand. Advantages of the claimed invention include, for example, cost effective methods to more accurately identify specific nucleic acid sequences that associate with target molecules that may reflect in vivo interactions between nucleic acid sequences and target molecules.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 depicts visualization of Data Flow. From Top to Bottom. Transcription factors interact specifically at a particular genomic loci and drive the transcription of nearby genes. Binding regions are resynthesized as a contig of 35- mer oligos tiled across a region of factor binding. The tiling path is represented to scale as the diagonal line and magnified within the circles. Specific interactions between protein factors and an oligo(s) are recapitulated in vitro and recorded by array. Oligo enrichment is written as custom annotation tracks in bed file format in the UCSC genome browser. The length of the oligo is annotated as grey bars on the linear chromosome and the enrichment in the bound fraction is annotated in greyscale shading where black = highly enriched and white -=no enrichment).
Figure 2 depicts a summary of post transcriptional modifications and the molecular effects of induction of the transcription factor p53. Figure 3 generally illustrates examples of how binding data can be integrated into a model of promoter function. Each manipulation can be vertically integrated result, showing factor binding, competition and the role of modifications in the overall regulation of a promoter.
Figure 4 depicts in vivo reporter assay of oligo function. The oligo library is ligated to a promoter-truncated gfp reporter and transfected into 293 cells. PCR amplified reporter expresses an intermediate level of gfp whilst digestion eliminates signal. 293 cells are formaldehyde crosslinked and transcriptionally active fragments are retrieved from lysed cells by immunoprecipitation. After reversing crosslinks, DNA is amplified with Ambion T7 linear amplification kit, target is color labeled and is analyzed by two color array.
Figure 5 depicts a method of identifying a nucleotide ligand that associates with a target molecule (also referred to herein as a "MEGASHIFT" protocol). Step 1) All orthologous genomic regions enriched in both human and mouse Oct4 ChIP experiments were aligned and step 2) resynthesized as a tiled contig of 35 mers flanked by universal primer binding sites. The human genomic region was extended to cover the union of this overlap. Step 3) This pool was amplified with labeled primers migrates as a single band and was then used in an EMSA activity with recombinant Oct4. The shifted assay was excised, reamplify and reshifted or analyzed by cloning or array. Step 4) Array analysis, shifted and unselected fraction was reamplified and the T7 containing template used to generate Cy3 (shifted) or Cy5 (unselected) target for the custom oligo array.
Figures 6A, 6B, 6C and 6D depict enrichment for Oct4 binding sites. Figure 6 A Panel 1 :Perfect octamer, ATGCAAAT (SEQ ID NO: 1), containing oligonucleotide from the immunoglobulin heavy chain promoter ("wt" lane 1 -2) and its octamer scrambled control ("mut" lane 3-4) analyzed by EMSA with recombinant Oct4 (even number lanes) or no protein control (odd lanes). Mobility associated with singly bound oligonucleotide marked with an arrow; multiply bound oligonucleotide marked with a feathered arrow. Round 1 (lane 5-6) of the Oct4 enrichment was performed with the synthetic oligo pool as a probe. The singly shifted fraction was excised re-amplified and used as a probe in lanes 7 and 8. The singly shifted fraction from round 2 was used as a probe in lane 9 and 10. Figure 6B EMSA performed in ES cell extract with unlabeled wt competitor. Extracts were pre-incubated with Oct4 antibody (lane 3, 7, 9) or an antibody against a closely related Octl . Band consistent with Oct4 bound species indicated with arrows. Lane 6-7:EMSA with oligo pool as probe. Lane 8-9, EMSA with fraction shifted by recombinant Oct4 ("round 2" in Figure 6A, lane 8) used as a probe in whole cell extract. Figure 6C wt, mutant and multiply shifted fraction of the oligo pool, excised (undetectable) from Figure 6A, lane 8 reamplified and used as a probe in Oct4 EMSA. Figure 6D shift analysis performed in increasing concentration gradient of recombinant Oct4 protein.
Figures 7A, 7B, 7C, 7D and 7E depict changes in oligonucleotide enrichment throughout an Oct4 SELEX experiment. Figure 7A Enrichment scores for each round of SELEX were binned and graphed as a histogram. Figure 7B Average enrichment scores were ranked with relevant oligonucleotides marked on the percentile bar. Gel shift assay was repeated for isolates cloned from selected Figure 7C and unselected Figure 7D fractions. Figure 7E Array images corresponding to pool/pool and pool/round 1 are drawn below the gel lane for each oligonucleotide.
Figures 8A and 8B depict MEGASHIFT tracks for the UCSC Genome Browser. Annotation is stacked vertically along the chromosomal coordinate axis (x- axis). Starting from the top and proceeding down the mouse (top set) sequences are annotated for the following molecules:oligonucleotides cloned out of Oct4 selected fraction (short, stacked bars). ChIP-PET regions (wide bars), normalized enrichment scores in grayscale for each duplicate probe pair for each array experiment [multiply bound, round 3, round 2, round I ]. Enriched oligonucleotides are shaded darkly. Human (bottom set) is Identical save ChIPped material is analyzed by array (ChIP-chip). Predicted Oct binding sites annotated below. Conservation determined by eight vertebrate blastz alignments.
Figures 9A, 9B and 9C depict comparison of Oct4 site prediction and Oct4 binding. For each oligo, Oct4 sites were scored for each oligonucleotide as the log probability that a random sequence would fit the Oct4 binding model better than the highest scoring window (y-axis) in the oligonucleotide and plotted against enrichment (x-axis). Vertical line represents mean enrichment for each experiment. Figure 9B De Novo Motif Identification was performed using Gibbs sampling trials with varying amounts of input that was ranked according to enrichment in round 1 or the multiply bound fraction. Successful trials converged on motifs with the Oct4 consensus (ATGCAAAT; SEQ ID NO: 1 were recorded on the y-axis. Figure 9B) Using the top 3 % of enriched oligonucleotides the effect of motif length was examined. Figures 1 OA, 1OB and 1OC depict de Novo Motif Identification. Figure 1OA singly bound (y-axis) enrichment was plotted against multiply bound enrichment (x- axis) for each oligonucleotide in the dataset. Motif discovery was performed with Gibbs sampler using the dataset of oligonucleotides biased towards the multiply bound state (circles) as the input dataset. Figure 1 OB Three motifs of length twenty were returned and represented in web logo format. Figure 1 OC Half sites (ATGC (SEQ ID NO: 2), GCAT (SEQ ID NO: 3), AAAT (SEQ ID NO: 4), and ATTT (SEQ ID NO: 5)) are counted in the entire set of oligonucleotides, and also in the set biased towards the singly bound state (above line in Figure 10A) and also in the set biased towards multiply bound (below the line in Figure 10A). Relative enrichment statistics for oligos containing zero, single and higher multiples of Oct4 half sites in the multiply shifted fraction are recorded. For each multiple of half sites, histogram lightly shaded bars mark relative risk (RR) in multiply shifted fraction and the more darkly shaded bars mark RR for the singly shifted fraction. Both measures are relative to the entire set and the dashed line marks no enrichment; RR=I). Figure 1 1 depicts de Novo Motif Identification Oct4 contains two pou domains which recognize a bi-partite signal (diagram). Half sites (ATGC (SEQ ID NO:2), GCAT (SEQ ID NO: 3), AAAT (SEQ ID NO:4), and ATTT (SEQ ID NO: 5)) are counted in the entire set of oligonucleotidess, the set biased towards the singly bound state and the set biased towards multiply bound. Each permutation of half sites with more than two-fold relative risk of being found in the multi-bound state versus the entire set is graphed. Lightly shaded histogram bars mark relative risk (RR) of particular combination occurring in the multiply shifted fraction. More darkly shaded bars correspond to singly shifted fraction. Both measures are relative to the entire set and the blue dashed line marks zero enrichment; RR=I).
Figures 12A and 12 B depict agilent zebrafish oligonucleotide microarray before (Figure 12A) and after (Figure 12B) scouring. Figures 13A, 13B and 13C depict analysis of feature recovery 90 features all
30 combinations of 6 different flanking sequences were distributed randomly on an Agilent oligonucleotide array. Figure 13A Array x and y coordinates are represented graphically. Location of successfully amplified features are depicted with an "o" and failed amplifications with an "x." Figure 13B spatial relationship between primers and probe orientation. Figure 13C The number of successful amplifications was recorded for each primer pair combination.
Figure 14 depicts selection of Bound Ligand by Immunoprecipitation. EMSA analysis of positive (Wt), negative (Mt), unselected (Pool) mock selected (IP-) and two rounds of immuno selected pool fractions (IP2).Anti Oct4 antibody supershifts positive control (lane 3), and immuno selected pool (lane 8) but not initial pool (lane 6)
Figure 15 depicts phosphatase treatment decreases Oct4 binding DNA. Supershifted Oct4 (upper arrow) indicates a loss of binding activity upon dephosphorylation of Oct4 on the wt probe (lane 3 vs 5) and also on the selected pool (lanes 1 1 vs 13).
Figure 16 depicts enrichment of anonymous complexes from an ES cell extract. EMSA performed with Indicated probes. Protein bound fraction was isolated from the shift of the initial pool (lane 5) reamplified and use to reprobe extract (lanes 7, 8). This process was repeated for a total of two cycles of enrichment (lanes 9, 10).
DETAILED DESCRIPTION OF THE INVENTION
The features and other details of the invention, either as steps of the invention or as combinations of parts of the invention, will now be more particularly described and pointed out in the claims. It will be understood that the particular embodiments of the invention are shown by way of illustration and not as limitations of the invention. The principle features of this invention can be employed in various embodiments without departing from the scope of the invention.
In an embodiment, the invention is a method of identifying a nucleotide ligand that associates with a target molecule, comprising the steps of amplifying a pool of non-random oligonucleotides (also referred to herein as "oligos") to form a library of non-random oligonucleotides; contacting the library of non-random oligonucleotides with a target molecule to form an association between the non- random oligonucleotides and the target molecule; and separating the non-random oligonucleotides that associate with the target molecule from the non-random oligonucleotides that do not associate with the target molecule to thereby identify the nucleotide ligand that associates with the target molecule.
The target molecule employed in the methods of the invention can be affixed to a solid support matrix. Likewise, the nucleotide ligands identified in the methods of the invention can be affixed to a solid support matrix "Non-random," as used herein in reference to oligonucleotides, refers to a pool of oligonucleotides that do not contain all possible permutations of nucleotides. A pool of non-random oligonucleotides can be generated from information regarding a nucleotide sequence, for example, a genome of a nucleic acid sequence associated with the target molecule. For example, genomic sequences associated with Oct4 can be employed to design a pool of non-random oligonucleotides.
At least one of the non-random oligonucleotides can include a detectable label (e.g., Cy3, Cy5).
In an embodiment, the non-random oligonucleotides can be deoxyribo- oligonucleotides (single-stranded deoxyribo-oligonucleotides or double-stranded deoxyribo-oligonucleotides). The deoxyribo-oligonucleotides can include at least one genomic nucleotide sequence, such as at least one member selected from the group consisting of a promoter nucleotide sequence and an enhancer nucleotide sequence.
In another embodiment, the non-random oligonucleotides can be a ribo- oligonucleotides.
The non-random oligonucleotides can be synthetic (e.g., made by oligonucleotide synthesis methods) non-random oligonucleotides. Each of the non- random oligonucleotides can have an identical number of nucleotides. The number of nucleotides in the non-random oligonucleotides can be less than about 100 nucleotides (e.g., between about 50 nucleotides to about 100 nucleotides, about 20 nucleotides, about 25 nucleotides, about 50 nucleotides, about 75 nucleotides). At least a portion of at least one non-random oligonucleotide can overlap with at least a portion of another non-random oligonucleotide. For example, the portion of the non-random oligonucleotide and the portion of another non-random oligonucleotide overlap in a range of between about 19 nucleotides to about 35 nucleotides. At least one non-random oligonucleotides in the pool can include at least one primer binding site, such as at least one universal primer binding site.
The association of at least one non-random oligonucleotide with the target molecule can be detected by at least one member selected from the group consisting of a mobility shift assay (e.g., gel mobility shift assay), a hybridization array and an immunoprecipitation assay. Such techniques are well-established and known to one of skill in the art.
The association between at least one non-random oligonucleotides with the target molecule can be performed iteratively. For example, a mobility shift assay can be performed iteratively (repeatedly) with at least one non-random oligonucleotide that associates with a target molecule.
The method of identifying a nucleotide ligand that associates with a target molecule can further include assessing a binding affinity of the nucleotide ligand for the target molecule.
The method of identifying a nucleotide ligand that associates with a target molecule can further include adding an agent (e.g., a drug) at one or more time points selected from the group consisting of before, concomitantly and after contacting the library of non-random oligonucleotides with the target molecule. The agent can disrupt the association of at least one non-random oligonucleotide and the target molecule. For example, the target molecule may associate with the nucleotide ligand, however, upon exposure to the agent, the associate between the target molecule and the nucleotide ligand may be disrupted. The agent can inhibit (also referred to as prevent) the association of at least one non-random oligonucleotide and the target molecule. The agent may promote the association of at least one non- random oligonucleotide and the target molecule.
The agent employed in the methods described herein may include a phosphatase inhibitor, or at least one member selected from the group consisting of a drug, an enzyme (e.g., phosphatase) and a nucleic acid (e.g., a small interfering ribonucleic acid).
The target molecule employed in the methods of the invention described herein can be a component of an extract of a cell. The methods described herein can further include exposing at least one member selected from the group consisting of the cell and the extract to at least one member selected from the group consisting of an agent, a stress condition and an ultraviolet radiation before the extract of the cell containing the target molecule is prepared.
The target molecule employed in the methods described herein can be at least one member selected from the group consisting of a protein, a transcription factor or a splicing factor. The transcription factor can activate a nucleotide sequence that is near or close to the location of binding of the nucleotide ligand in the genome, for example, between about 400 nucleotides to about 2000 nucleotides within a location of where the nucleotide ligand binds a genomic nucleotide sequence. The methods described herein can include employing the amplifying, contacting and separating steps at least twice prior to identifying the nucleotide ligand.
At least one non-random oligonucleotide can further include at least one promoter sequence (e.g., a T7 promoter sequence). The method of identifying a nucleotide ligand that associates with a target molecule can further including the step of repeating the steps of amplifying the pool of non-random oligonucleotides to form the library of non-random oligonucleotides, contacting of the library of non-random oligonucleotides with the target molecule to form the association between the non-random oligonucleotides and the target molecule, and separating the non-random oligonucleotides that associate with the target molecule from the non-random oligonucleotides that do not associate with the target molecule to thereby identify the nucleotide ligand that associates with the target molecule.
The methods described herein can further include performing each amplifying step performed in the presence of a distinct detectable label. Distinct detectable labels are labels that are different one from the other. For example, one amplifying step can be performed in the presence of a Cy3 label and the subsequent amplifying step can be performed in the presence of a Cy5 label, which is label distinct from a Cy3 label.
In another embodiment, the invention is a method of identifying a variant of an allele that binds a target molecule (e.g., a component of an extract of a cell), comprising the steps of contacting at least one target molecule with a first pool of non-random oligonucleotides to form a first library of non-random oligonucleotides and a second pool of non-random oligonucleotides to form a second library of non- random oligonucleotides, wherein each non-random oligonucleotide of the second library of non-random oligonucleotides is an allelic variant of the first library of non-random oligonucleotides; and wherein the first library of non-random oligonucleotides is optionally combined with the second library of non-random oligonucleotides prior to contact with the target molecule; and comparing binding of the target molecule to the first library of non-random oligonucleotides and the second library of non-random oligonucleotides, wherein binding of the target molecule identifies the variant of the allele that binds the target molecule.
The method of identifying a variant of an allele that binds a target molecule can further include the step of contacting at least one common nucleotide ligand with the first library and second library; or may further include comparing the binding of the common nucleotide ligand between the first library and the second library.
The non-random oligonucleotides in the first pool and/or second pool can be deoxyribo-oligonucleotides, such as deoxyribo-oligonucleotides that include genomic nucleotide sequences (e.g., at least one member selected from the group consisting of a promoter nucleotide sequence and an enhancer nucleotide sequence) or ribo-oligonucleotides. At least one non-random oligonucleotide of at least one member selected from the group consisting of the first library of non-random oligonucleotides and the second library of non-random oligonucleotides can include at least one primer binding site, such as a universal primer binding site. .. . . . - Binding of the target molecule to least one non-random oligonucleotide of at least one member selected from the group consisting of the first library of non- random oligonucleotides and the second library of non-random oligonucleotides can be detected by at least one member selected from the group consisting of a mobility shift assay, a hybridization array and an immunoprecipitation assay. The non-random oligonucleotides of the first library and the non-random oligonucleotides of the second library can be differentially labeled. For example, the non-random oligonucleotides of the first library can include a Cy3 label and the non-random oligonucleotides of the second library can include a Cy5 label, which would result in the second library being differentially labeled compared to the first library (i.e., Cy5 is a different label than Cy3).
In an additional embodiment, the invention is a method of determining a difference in a binding affinity of a first nucleotide ligand compared to a second nucleotide ligand for a target molecule (e.g., a component of an extract of a cell), wherein the first nucleotide ligand is an allelic variant of the second nucleotide ligand comprising the steps of contacting a first non-random oligonucleotide library of the first nucleotide ligand and a second library of the second nucleotide ligand with the target molecule; and comparing a proportion of the first non-random oligonucleotide library bound to the target molecule with the proportion of the second non-random oligonucleotide library bound to the target molecule, wherein a difference in the proportion of the first non-random oligonucleotide library bound to the target molecule compared to the proportion of the second non-random oligonucleotide library bound to the target molecule indicates a difference in the binding affinity for the first nucleotide ligand compared to the second nucleotide ligand for the target molecule. The method of determining a difference in a binding affinity of a first nucleotide ligand compared to a second nucleotide ligand for a target molecule can further include the step of contacting at least one common nucleotide ligand with the first library and second library; can further include comparing the binding of the common nucleotide ligand between the first library and the second library.
In a further embodiment, the invention is a method of determining a binding affinity of a target molecule for a nucleotide ligand, comprising the steps of amplifying a pool of non-random oligonucleotides to form a library of non-random oligonucleotides; contacting the library of non-random oligonucleotides with a target molecule; detecting a nucleic acid sequence in the library of non-random oligonucleotides that binds the target molecule with a first nucleic acid probe, wherein the first nucleic acid probe binds the target molecule and includes a first detectable label; contacting the first nucleic acid probe with a second nucleic acid probe, wherein the second nucleic acid probe binds the target molecule with an affinity different than the first nucleic acid probe and includes a second detectable label that is distinct from the first detectable label, thereby forming a mixture of the first nucleic acid probe and the" second nucleic acid probe; hybridizing the mixture to a collection of nucleic acid sequences that are complementary to the first nucleic acid probe and the second nucleic acid probe; detecting the first nucleic acid probe and the second nucleic acid probe that hybridize to the collection; and determining a ratio of the first detectable label to the second detectable label, to thereby determine the binding affinity of the target molecule for a nucleotide ligand. Embodiments of the methods of the invention are described below.
One method for identifying transcription factor binding specificity is by an Systematic Evolution of Ligands by Exponential Enrichment (SELEX) iterative method of selecting high affinity binding ligands of known activators. For each cycle of selection, the fraction of an oligo pool that is bound to the target is eluted from the filter and re-amplified. With each round of SELEX high affinity ligands of the target protein become more enriched within the pool. Ligands are cloned and weight matrices are calculated from an alignment of the selected sequence. However, these methods often leave questions about the in vivo relevance of the output sequences, as natural selection may not always favor the highest binding affinity sites. In addition, it has long been observed that other factors such as chromatin accessibility greatly limit the usefulness of in vitro binding specificities for predicting sites in vivo (a more complete discussion of this phenomena, can be found in (Wasserman and Sandelin 2004)).
More recently new methods have been developed to measure interactions between transcription factors and chromatin. These methods, termed ChIP-chip and ChIP-PET, locate binding sites in vivo by immunoprecipitating the factor of interest after it has been crosslinked to chromosomal DNA (Orlando and Paro 1993). Binding regions are identified either by array (ChIP-chip) or by sequencing (ChIP- PET). Both these techniques have been applied to the identification of p53-bound regions in human and murine ES cells (Wei, Wu, Vega, Chiu, Ng, Zhang, Shahab, Yong, Fu, Weng et al. 2006).
Genome-wide Chromatin Immunoprecipitation
High throughput localization studies such as ChIP-chip and ChIP-PET have several important limitations. Chemical crosslinking does not provide a quantitative measure and sometimes not even a measure of direct binding. The resolution is limited by the shearing size of the DNA prior to the immunoprecipitation, as well as the array densities and sequencing depths (Buck and Lieb 2004). Most of the published location studies return regions that span about 1 kb of genomic space. The number and location of binding sites within these regions cannot immediately be determined. Recent advances in sequencing technology and the practice of size selecting the recovered DNA fragments have improved this situation, allowing for the inference of binding sites to a resolution of 50 base pairs (Johnson, Mortazavi, Myers and Wold 2007; Robertson, Hirst, Bainbridge, Bilenky, Zhao, Zeng, Euskirchen, Bernier, Varhol, Delaney et al. 2007). However, this approach remains costly and the resolution is entirely dependant on the number of binding sites in the genome. Furthermore, the inference of binding site from the distribution of sequenced tags becomes problematic when recognition elements cluster in closely spaced groups along the DNA. A more direct determination of p53 binding sites within these regions will help identify variants that disrupt binding and also shed light on the mechanism of p53 function. It has been shown that p53 binding does not always enhance transcription. There are many examples where p53 binding is correlated with a repressed transcriptional state (Zhao, Gish, Murphy, Yin, Notterman, Hoffman, Tom, Mack and Levine 2000). This duality is not uncommon for transcription factors and is probably explained by the local context of each site (the identity of neighboring factors on the DNA). High resolution maps of transcription factor binding sites (TFBS) in promoters are required to understand these nuances of individual transcription factor function. p53 binds specifically to DNA p53 (a target molecule) binds a bipartite sequence composed of two half sites, NNNCWWGNNN (SEQ ID NO: 6), arranged in a head to head orientation separated by 0-13 nucleotides of spacer. Motifs derived from in vivo ChIP results are very much biased towards having none or single nucleotide of spacer sequence (Wei, Wu, Vega, Chiu, Ng, Zhang, Shahab, Yong, Fu, Weng et al. 2006). Bona fide targets of p53 typically have at least two of these response elements (RE) within a few thousand nucleotides of the transcription start site. p53 like many transcription factors can repress as well as enhance transcription of target genes. One such repressed target, MDRl , contains an unusual RE where the half sites are arranged in a head to tail orientation. Converting this head to tail orientation to the canonical head to head orientation transforms this p53 responsive silencer into a p53 responsive enhancer.
Mutations in p53's DBD may decrease binding strength and alter specificity. p53 exists as a dimer and binds DNA as tetramer contacting DNA at an internal DNA binding domain (DBD). p53 mutations are recovered from nearly half of all tumors with almost all the mutations altering amino acids in the DBD. While a change in p53 binding activity is clearly the sine qua non of a cells progression towards cancer, the exact nature of this change in activity has been difficult to characterize. The few DBD mutations that have been examined behave as dominant mutations however DNA binding to a canonical RE appears to proceed even with tetramers composed of three mutant alleles (Chan, Siu, Lau and Poon 2004). This raises the possibility of a gain in function possibly achieved via a shift in specificity (Chan, Siu, Lau and Poon 2004). Consistent with this hypothesis, a panel of p53 minus cell line (Saθs2) transfected with DBD mutants display markedly different patterns of target gene activation (Menendez, Inga and Resnick 2006). The manner in which p53 is induced may alter specificity. p53 is induced by DNA damaging agents (UV, gamma, metal and various other chemical treatments) and other causes of cellular stress (nucleotide depletion, hypoxia and others). Expression studies demonstrate that p53 induced by different wavelengths of ionizing radiation and in the absence of DNA damage zinc treatment regulate different sets of genes (Zhao, Gish, Murphy, Yin, Notterman, Hoffman, Tom, Mack and Levine 2000). This difference could simply be a reflection on the broad spectrum of cellular changes accompanying irradiation or could be caused by a different state of p53 - either an altered capacity for transactivation or DNA binding. There is some evidence for the latter as p53 undergoes differential modification during induction with gamma and UV radiation. One consequence of UV irradiation is the phosphorylation of residue 392 by RNA dependant protein kinase, PKR. While this and the seventeen other reported modifications lie outside the DNA binding domain P392 has been shown to alter DNA binding affinity. Acetylation of the c-terminal domain was also shown to increase the affinity of p53 for oligos containing REs.
Use of Oct4 for high-throughout analysis of TFBS.
Oct4 belongs to a well studied class of transcriptional activators and is a critical regulator of stem cells implicated in maintaining the pluripotent state. Like p53, slight changes in Oct4 activity can dramatically alter cellular events. For example a two fold increase in Oct4 activity induces differentiation into endoderm and mesoderm fates while less than normal levels results in differentiation into trophectoderm (Niwa, Miyazaki and Smith 2000). Furthermore slight elevation (about 1.5 fold) in Oct4 in the adult germline is capable of inducing gonadal tumors (Gidekel, Pizov, Bergman and Pikarsky 2003). One intriguing aspect of Oct4 function is its unusual modes of binding DNA.
Oct4 can bind as a monomer to canonical Oct4 sites or to a variety of non-canonical combinations of half sites. These various configurations of Oct4 sites have been hypothesize to interact with different co-regulators thereby adding a great deal of complexity to the interpretation of factor binding at the promoters. Three such classes of sites have been discovered to compliment the MORE and PORE sites reported and studied previously. p53 also binds multimerically and there is also some indication that non-canonical combinations of p53 half sites may play a role in p53 biology (Menendez, Inga, Jordan and Resnick 2007). Biochemical mapping of protein nucleic complexes to 3.9 mb of p53 regulated genomic regions p53 is a transcription factor that is found mutated in over 50 % of human tumors. The vast majority of all p53 missense mutations recovered from tumors fall within the DNA binding domain. These two facts underscore the importance of p53 . to cancer and also the importance of p53's DNA binding activity to its tumor suppressor function (Olivier, Eeles, Hollstein, Khan, Harris and Hainaut 2002; Soussi and Beroud 2003).
There have been conflicting reports describing where p53 binds to DNA and the consequence of its regulation. On some promoters p53 acts as transcriptional activator in some cases as a repressor. p53 function is probably determined by some combination of binding conformation, post translational modification and the identity of interacting factors and nearby factors.
Locations where proteins bind these regions of interest will be pinpointed and an unbiased approach to survey the proteins that associate with these regions will be described. Computational methods to elucidate factor identity will be employed. As the oligo pool covers p53 binding regions and p53 regulated promoters, it is reasonable to expect p53 to be well represented amongst the factors binding these regions. p53 binding specificity within these 3.9 megabases of sequence will be assessed. Design of the oligo library
An oligo library that will cover large regions of the genome that contain cis- modules that are linked to cancer by their function as regulatory control elements for apoptosis, cell growth, DNA or damage response is designed. The network of genes that are controlled by p53 or control p53 expression are widely regarded as the central players in cancer and tumor progression and the promoters and enhancers of such genes capture much of the biology described above. The genomic regions from genes in the p53 network will be assembled from three sources: a list of genes central to p53 regulation, transcriptional targets of p53 detected by microarray in the colorectal cell line HCTl 16, and ChIP-PET studies of p53 binding sites also in HCTl 16 cell line. Previous studies have identified 65,572 individual p53 ChIP fragments by paired endtag sequencing strategy from induced HCTl 16 cells (Wei, Wu, Vega, Chiu, Ng, Zhang, Shahab, Yong, Fu, Weng et al. 2006). While most of these fragments were non-overlapping, 1 ,416 clusters were defined by three fragments or more. These PET-3 clusters then represent a high confidence set of p53 in vivo binding sites. The average fragment size of p53 Chip- PET clusters is 625 bp. However previous studies suggest that the enrichment calls are frequently too restrictive and occasionally high affinity sites fall just outside of the annotation (see, for example, Figure 8B). Allowing about an extra 500 nucleotides on each side of these fragments would cover 2.3 mb, a region that could be resynthesized by about 225,144 oligos. Each oligo is a 35-mer and the library is designed by shifting about a 35-mer window in increments of about 10 nucleotides across a genomic region of interest. In addition, profiling studies indicate anywhere between a few thousand to a few hundred genes are under p53 regulation. Targets of p53 generally have at least two of these response elements (RE) within a few thousand nucleotides of the transcription start site. Adding about 2kb flanks to the transcriptional start site of about 230 these downstream targets adds another about 91 ,195 oligos. Finally, the limited number of genes at the center of the p53 network will be covered in their entirety with about 80,000 oligos. These regions of interest will be covered by about 396,339, each about 35-mer oligos, which translates to three fold coverage of about 3.9 megabases. This library will be synthesized on two Agilent custom oligo arrays. The remaining about 91 ,661 spots will be filled with a few individual well- characterized p53 response elements reported in the literature but mostly randomly selected genomic 35-mers as a negative binding control and also to generate background statistics for the array hybridizations.
For an Oct4 library, genomic aptamers are commercially synthesized (Atactic Inc) in a solid phase format, cleaved and shipped as a mixed pool. The quality of the resulting oligos is high but the total complexity is limited to about 4000 oligos/order. In order to increase the scale of production considering several alternatives may be considered. One option that can be used is to design and order a custom oligo array, grind it into small pieces and PCR amplify the resulting particles (Oleinikov, Zhao and Gray 2005). The surface of an Agilent custom oligo array can be scraped with a razor and amplified as a pool.
Using an expired Zebrafish genomic array, the slide is scraped, the debris is collected and primers are designed corresponding to three of probe sequences on the array. Amplifying the slide scrapings for thirty cycles results in a single primer pair yielding the correctly sized product and the other two primer pairs failing to amplify anything. The primer pair used in the successful amplification is TAACATATGCCTGCAGTGTAC (SEQ ID NO: 7) and
AGATCATGCAATTGAAGAC (SEQ ID NO: 8). This pair is appropriate flanks to a p53 probe because it contains no p53 binding sites. Sequencing will confirm that the probe had been correctly amplified. Agilent custom oligo slides can be currently printed at a density of about 244,000 spots per 1" X 3" microscope slide. For the 35-mer oligo design this would expand the coverage to about 4.8 Mb of sequence placing this method on scale with ChIP studies. Subsequent testing revealed that 90 % of the designed oligos amplified as predicted but if this method fails to yield a quality pool, intermediate pool sizes can be obtained at higher cost through commercial sources (e.g. Biosearch Technologies, Novarto CA or Codon devices, Cambridge MA).
Determination of the binding specificity of in vitro translated p53 protein preparation for binding assay.
Bacterial ly-expressed protein does not contain the necessary modifications to faithfully reproduce DNA binding however translation in the rabbit reticulate lysate results in a competent protein prep. This protein prep will be made by the combined in-vitro transcription/translation reaction off the pRsetb-p53 plasm id constructs kindly provided by the Prives lab in the reticulate lysate. Baculovirus-expressed his tagged p53 proteins are also commercially available and will be purchased if problems with synthesis are encountered (ProteinOne, Bethesda MD).
This p53 lysate will be incubated with a radiolabel led probe containing a perfect p53 response element to establish the mobility of the p53 shifted band in a native PAGE gel. The identity of the p53 complex will be verified by blocking/supershifting experiments with DOl monoclonal antibody. For reasons of continuity, DOl is the preferred antibody. The p53 ChIP enriched regions that constitute the majority of the oligo pool will be immunoprecipitated with this antibody. However if DOl does not supershift or has poor specificity, other antibodies such as 1801 , which have been demonstrated to shift p53, can be used. pAb421 will be avoided because it affects the binding affinity of p53 (Olivier, Eeles, Hollstein, Khan, Harris and Hainaut 2002; Soussi and Beroud 2003). If cross-reactivity is observed with multiple antibodies translation will be performed in wheat germ extract. Molecular selection of bound oligos
After appropriate EMSA conditions have been determined the oligo pool will be amplified by the universal primer pair with labeled dATP. The shifted fraction will be extracted from the gel following each round of EMSA. It is well established that p53 binds as two dimers forming a tetramer complex on DNA (McLure and Lee 1999). p53 complexes could occur with other stoicheometries during the course of our experiments and if these additional complexes as distinguishable by EMSA, the bound oligos could be extracted from the gel. This qualitative information from electrophoretic separation is lost in alternate protocols. The filter binding assay and immunoprecipitation with DOl and protein A/G magnetic bead both represent more rapid and cleaner alternatives to the EMSA method however no distinction is made between singly and multiply bound oligos. Applying p53 binding reactions to nitrocellulose membrane via a vacuum dot blot apparatus will results in p53 (and hence p53 bound oligos ) being retained on the filter. An advantage of immunoprecipitation and filter binding separations is that DNA oligos can be eluted with progressively increasing salt concentration from the attached p53 - the salinity of the eluting fraction then correlates with the affinity of that ligand for p53. Microarray enrichment and calculating p53 Kd for, about 488,000 oligos
By placing DEAE membrane under the nitrocellulose filter the total bound and unbound fraction will be estimated for the entire pool. This measurement of total pool partitioned between the bound and unbound fraction is important for two reasons. Arrays and incorporating dyes are expensive and will only be performed when enrichment (i.e., an increase in the bound fraction) can be demonstrated between the selected fraction and the starting pool. The second application of the portioning measurements is to normalize the red and green channels of array. The eluted oligos are PCR amplified and labeled with Cy3 (bound) and Cy5 (starting pool) and then hybridized to the array. After this normalization the ratio of red to green then represents the bound fraction to unbound fraction for each oligo in the pool. A binding curves will be generated from a concentration series of protein. The accuracy of the binding curve will be checked and calibrated by control oligos that are introduced into the starting pool.
In vitro binding assays do not always faithfully recapitulate in vivo binding events. For example, chromatin plays a dominant role in controlling gene expression and this role is not modeled by in-vitro binding assays. All the sequences in the oligo library are derived from regions that were recognized by p53 in vivo in ChIP-PET and expression profiling studies. Nevertheless, a substantial proportion of this oligo pool may be chromatinized at its endogenous location. Overlap analysis of p53 Chip-PET data with NHRGI DNAse HS data detects about 330 regions (spanning an average of about 625 nucleotides per region) that overlap with DNAse hypersensitive regions and, thus, are presumably euchromatic prior to p53 induction (Crawford Collins Genome Res 2006). The maps of protein DNA complexes discovered on these accessible regions will be compared to maps discovered on the inaccessible regions to search for complexes that are required to open chromatin. As described herein, a high throughput mini-gene system that tests elements as enhancer or silencers with respect to transcription will be introduced. In order to asses the notion that the in-vitro binding assay will better model TFBS in accessible regions than inaccessible chromatin, the binding enrichment scores will be correlated with in vivo activity for the set that coincides with DNase hypersensitive regions and its complement. Perhaps a more immediate concern is the modification state of p53 in the binding assays. p53 is modified extensively following induction in living cells and it is unclear the extent to which this program of modification acts on baculovirus expressed or in vitro translated proteins. p53 has been reported to be acetylated, ubiquitinated, sumolyated, phosphorylated and methylated in response to a diverse array of cellular insults. Expression profiling studies indicate that the manner of induction affects the p53 dependant transcriptional profiles. The role of post transcriptional modifications on p53 has not been decoupled from the presence of "insult specific" co-regulators. Attempts will be made to uncouple these influences and better characterize the effect of each on p53 activity.
In-vitro translated protein are replaced with a panel of p53 preparations that have been affinity purified from whole HCTl 16 cell extracts with DOl antibody attached to magnetic beads (protein A/G Dynabeads). Each preparation of extract will be induced via different mechanisms including but not necessarily limited to: UV irradiation, γ-radiation, hypoxia, heat shock, cisplatin, 5-Fluorouracil and other genotoxic agents.
The transcriptional profile of p53 is cell line specific and interactions with co-regulators may influence binding in a positive or negative manner (Zhao, Gish, Murphy, Yin, Notterman, Hoffman, Tom, Mack and Levine 2000). These may reflect characterized interactions such as gp300 or negative interactions such as competition with another factor for the same binding site. To better characterize the behavior of p53 in the presence of other factors, immunoprecipitation is performed as described herein after incubation of the target with ligand and the resulting spectrum of oligos is contrasted with specificity worked out for p53 binding in the presence of co-factors by allowing complexes to form before the IP. Role of co-regulators and competing factors in p53 binding
The whole cell extract will be made from HCTl 16 cells growing in log phase. Under these conditions p53 will be low and the promoters represented in the oligo pool would be inactive, however these cells will be treated with 5-fluorouracil for about 6 hours and also mock induced. These conditions are sufficient to upregulated p53 and its downstream targets (Kho, Wang, Zhuang, Li, Chew, Ng, Liu and Yu 2004). The treated and mock treated extract will serve to represent the induced versus the uninduced state in the binding assays performed with the oligo pool. Other means of inducing p53 are described herein. Co-regulators
Using magnetic beads p53 will be affinity-purified from whole cell extract and the binding assay will be performed on the bead. p53 exists in solution as a dimer, and so, the spectrum of ligands that remain fixed on the column will probably reflect the inherent affinity of p53 to ligand to the response element without accounting for the role of positive co-regulators or negative competitors. To account for the tetrameric binding which may be inhibited by the immobilized dimers, p53 will be eluted from the beads, desalted and rebound in aqueous phase. These complexes will then be isolated by a second immunoprecipitation and analyzed by two color microarray. In order to address what contribution factors that present in the extract may have on p53 binding specificity, the p53/DNA complexes will be immunoprecipitated directly from the complex extract. Known interacting partners such as p300/CREB, and also hsp70 (p53-HSP70 complexes in oral dysplasia).and other factors may alter p53 binding specificity. The results of these assays will be compared to the binding reactions performed with p53 alone. This co-regulator hypothesis predicts that purified p53 will not bind the composite response elements alone but would form complexes only in the presence of extract. Distinct complexes enabled by candidate interaction partners or genetically interacting partners will be queried separately with RNAi depletion experiments. Competitors
In addition to these co-regulators, the extract may contain factors that compete with p53 for DNA binding. The paralogs, p63 and p73, have overlapping binding specificity with p53 and may compete for some response elements. The results of these assays will be compared to the binding reactions performed with p53 alone. This competitor hypothesis would predict that purified p53 would bind the response element alone but not in the presence of extract.
In addition to co-regulators there are many other variables that influence p53 binding. p53 is heavily phosphorylated and modified by other post transcriptional process. While all of these modifications lie outside the DBD some of these phosphorylation events alter binding and could be an important component of p53 binding specificity. Assaying the effects of post transcriptional modifications on p53 binding
The effect of post -transcriptional modification on the DNA binding activity of p53 will be determined. Preliminary results will demonstrate how phosphorylation of Oct4 increases binding by treating the extract with 1 unit of calf intestinal phosphatase. Prior reports suggest that modification status may modulate DNA binding. p53 is induced by a wide range of cellular insults with different consequences both to its modification status and transcriptional targets. The phosphorylation of residue 392 by RNA dependant protein kinase, PKR, affects DNA binding and is strongly induced by UV radiation (Cuddihy, Wong, Tarn, Li and Koromilas 1999). The spectrum of p53 dependant transcriptional changes differs with different types of induction (Kho, Wang, Zhuang, Li, Chew, Ng, Liu and Yu 2004; Zhao, Gish, Murphy, Yin, Notterman, Hoffman, Tom, Mack and Levine 2000). Drug perturbations that may alter global phosphorylation status
To determine if this (and other) modifications alters the DNA binding specificity of p53 in such a way that explains the shift in downstream transcriptional targets binding assays described herein will be repeated with and without drug treatments that induce global shifts towards the phosphorylated isoforms in the proteome. For this purpose, pervanadate (about 0.1 to about 3 mM), a tyrosine phosphatase inhibitor that should block the dephosphorylation of p53 at residue 55 that occurs in all inductions and otherwise mimics induction by shifting the distribution of p53 to the T88 phosphoform, will be administered. This site is close to the DBD and alterations in specificity will be assayed as described herein.
To survey the effects of serine threonine phosphorylation induced and mock induced HCTl 16 cells will be treated with Okadaic acid (0.17 nm) a powerful inhibitor of serine threonine phosphatases prior to the preparation of extract. The accumulated hyperphosphorylated p53 will be immunoprecipitated with DOl and utilized in the pool shifts described herein.
Determining the DNA binding specificity of particular p53 phosphoisoforms The efficacy of okadaic acid treatment will be analyzed with the large selection of commercially available p53 specific antibodies. To determine the specific affect of individual phosphorylation events, HCTl 16 cells will be treated as described herein and also will be induced with the genotoxic drugs and conditions described in Figure 2. If global phosphorylation induced by okadaic acid alters binding specificity, immunoprecipitations will be performed with a panel of IgG antibodies on untreated extract that are specific for the following phosphoepitopes (ser 6, 9, 15, 20, 37, 46, 315, 378, 392 and thr 18, 55, 155, and 377) from the following companies (VWR, Phosphosolutons, and Santa Cruz). In this fashion, specific, natural phosphoisoforms will be selected from extract and assayed for binding.
Particular phosphorylation events that change binding specificity will be further analyzed by two methods. The first approach to probing the role of individual phosphorylation events in binding will be siRNA depletion of specific kinases described in Figure 2. These depletion experiments will not narrow the possibilities to single phosphorylation sites. To test how binding is altered by a single phosphorylated site, serine to alanine mutation mimicking a stable dephosphorylated state or serine to aspartic acid mimicking a stably phosphorylated isoform will be constructed and cloned into a mammalian expression vector. Expressing mutants into a p53 null line such as Saos2 will then be used to make extract and re-analyze the binding as described herein. Mutants and paralogs
A high throughput binding assay will also be used to test the Piano hypothesis, an idea advanced by Michael Resnick that many different DBD missense mutants lose affinity for DNA in a sequence specific fashion that is idiosyncratic to the mutation. This hypothesis predicts that these mutants may bind DNA with an altered specificity - an outcome that would have important clinical implications. To test this mutant constructs in binding assays, extracts will be made from p53 deficient Sao2 cells transfect with mutant p53 expression vectors and assayed as described herein.
The paralogs p63 and p73 performs parallel overlapping roles with p53 in driving transcription where P63's consensus binding site differs from p53 binding site at only two positions (Perez, Ott, Mays and Pietenpol 2007). Binding will be assayed as described herein using antibodies specific to p63 and p73. An unbiased map of 3.9 megabases of p53 responsive genomic space
As a detailed map of the locations of p53 binding sites emerges the question of p53 function demands an understanding of the context in which each response elements operate. All the factors in these p53 responsive regions will be mapped in order to understand how p53 function is determined. Initially MEGAshift will be employed. Gel shift assays will be performed on extracts prepared from parallel treatments of the HCTl 16 cell lines with the goal of contrasting promoter occupancy in the induced versus the uninduced state.
To enrich the fraction of oligos that are protein bound large portions of the gel that correspond to complexes migrating slower than the unbound fraction will be excised. Bands or regions will be excised from the EMSA and reshifted for at least three rounds of EMSA selection. At each round of EMSA selection regions corresponding to non-specific bands can be avoided during gel excision. The mobility is also a useful means to separate bound ligands into functionally distinct fractions which is necessary for efficient motif detection (Section C l .6). In our experience, several rounds of selection could be performed efficiently with filter binding or IP and then, in the final round, the enriched pool can be analyzed by EMSA and then fractionated according to mobility.
A diverse array of complexes forming on the ligand pool are anticipated, and it is anticipated that p53 will comprise a significant fraction of the signal. This assumption will be tested by cross-referencing data both in the induced state and also with several modifications. Identification of complexes informatically
The gel shift assay will be used to identify functional complexes based on mobility and attempt to enrich the sequence thus bound without initially identifying factors involved in the complex (Figure 16). Sequences that are capable of being enriched will be analyzed via array. Having identified the motifs bound in particular complexes the genome can be annotated for complexes. Bound fractions will be grouped by complex mobility and each of the 488.000 oligos will be associated with an enrichment score for each group. For each group, the results will be ranked according to enrichment and the top 3 % used as input for the Gibbs motif sampler as described herein to discover binding specificity. When this method is developed with Oct 4, gibbs sampler converges on the correct motif 100 % of the time. This binding specificity will be compared with the motifs described in Transfac database (Wingender, Dietze, Karas and Knuppel 1996). A standard method to compare weight matrices is through the use of Kullback-Leiber distances (Aerts, Van Loo, Thijs, Moreau and De Moor 2003). If possible, a web-based implementation of this algorithm will be used to identify the best candidate for binding partner (Roepcke, Grossmann, Rahmann and Vingron 2005). Alternatively, an implementation that will perform pairwise comparisons between the output of the gibbs sampler for each group and the Transfac database can be written. The closest match to the discovered motif will be regarded as the best candidate trans-acting factor for that complex. In this manner, reasonable predictions about the identity of the DNA binding factor in each of the different complexes can be made. Validation of the identify of complexes experimentally
The method to predict transcription factor identity of the complexes that form on p53 responsive regions described herein compares motifs discovered within oligos that are enriched in a complex to the database of curated tf motifs in transfac. The validity of predictions will be validated by RNAi depletion of the candidate transcription factor in HCTl 16 cell lines. In addition to using the TF depleted cell line in a gel shift to confirm the disappearance of the factor, the line will be analyzed by ChIP for factor binding at the site in question. In addition, the genes that are located proximal to the predict factor binding site will be examined for alteration in transcriptional response to various DNA damage.
If antibodies against the candidate are available the interactions will be tested with a UV cross-linking method. Here, the extract will be allowed to bind a body labeled oligo probe followed by immunoprecipitation with the appropriate antibody. If this method fails to establish the identity of the interacting factor, the oligo will be used as a probe to affinity purify the complex. Briefly, passing extract from larger scale (20L) tissue culture preps over a column of packed Sepharose Beads covalently linked to the oligo probe will retain proteins that assemble on the probe. These factors will analyzed by silver stain, excised and identified by mass spectrometry.
An exhaustive series of experiments querying p53 and other transcription factor binding events on a selected subset of the genome are described. After recording enrichment for each oligo in each bound fraction, these data will be visualized as grayscale shading on genomic annotation which will be viewed as custom tracts in the UCSC genome browser. Figure 3 illustrates how this data can be integrated into models that will be graphically rendered and made publicly available along with downloadable UCSC annotation tracts and raw data files.
Map of regulatory cis-elements on p53 regulated genomic regions can employ the followers. Design of oligo library. Proximal promoter element reporter
In order to assess the transcriptional activity of each sequence in the pool the candidate enhancer will be ligated upstream of a transcriptional reporter. To avoid complexity limitations associated with creating libraries a PCR based methodology for constructing the in vivo transcriptional reporter library can be utilized. Description of a Pol II ChIP in-vivo reporter system
A panel of cmv gfp constructs are truncated at the 5' end for the purpose of creating a minimal promoter transcription reporter will be ligated. In order to synthesize a reporter capable of detecting both silencers and enhancers, the ideal construct would partially expresses gfp allowing room for the detection of both above background expression levels and below background expression. All experiments will be performed in HCTl 16 to maintain consistency with earlier portions of this grant.
The oligo library described herein will be amplified, digested at the unique BanHi site in the universal primer and ligated immediately upstream of the minimal promoter transcription reporter. The ligated product will be subject to several final rounds of amplification with outside primers, purified and transfected into cells as a PCR product. In Figure 4, transient expression of a PCR fragment containing this cmv promoter upstream of gfp in 293 cells is demonstrated. If a particular oligo in the library is capable of functioning as an enhancer or a proximal promoter element, that reporter should express more gfp than the average reporter in the pool.
However, unlike most reporter system our method requires a means of physically selecting the active fraction so the representation of each oligo in the starting pool can be measured in that active fraction. After considering FACS approaches, a chromatin immunoprecipitation approach to select transcriptionally active DNA is elected.
This linear PCR product will be introduced via lipofectamine transfection into HCTl 16 cell line. Standard ChIP protocol will be utilized to isolate the transcriptionally bound faction of linear PCR products. RNA Polymerase II will be crosslinked to active promoters through exposure to formaldehyde (1 % solution).
Following crosslinking, the chromatin will be sheared and lysed by brief bursts of sonication. PoI II will be immunoprecipitated as described herein and the cross-linkages will be reversed by one hour incubation at 65 C.
Constructs that associate with Pol II in this manner will then be reamplified with external primers.
Sonication shears naked DNA to 300-500 base pairs but behaves unpredictably on chromatinized DNA. If the sonication shears the plasmid to lower molecular weight, the oligo insert will be recovered as a smaller product by using a different primer pair. This product will either be 1) analyzed for enrichment on the array or 2) re-ligated upstream of the minimal promoter gfp reporter fragment for another round of enrichment. To explore the alterations in cis-element function in the induced versus uninduced states transfections will be performed on logarithmic phase HCTl 16 cells and in cells induced by a variety of insults (5-FU, UV radiation and γ-radiation and also 5-FU). Antibodies for Pol II Pulldown It has been noted that the correlation between transcription and pol II binding is not always complete (Guenther, Levine, Boyer, Jaenisch and Young 2007). Several transcriptionally quiescent genes have pol II enrichment in the promoter region. Although on average transcriptionally active genes are at least four fold more enriched for Pol II than transcriptionally inactive genes. As docked polll is in the initiation form (hypophosporylated) performing Chip with antibodies that recognize CTD phosphoepitopes at ser 2 or ser 5 should increase the selection for elongating PoIII complexes and hence transcriptionally active genes, (i.e., antibody Hl 4 or H5). It has been observed that these antibodies do immunoprecipitate fragments as readily as the initiation specific (8WGl 6) or the antibody, 4H8, which is efficient and widely used. Development of an in vivo Pol II IP reporter system.
Several reporter systems and IP antibodies against Pol Il will be explored to find the combination that returns the highest correlation between transcriptional readout and enrichment within the immunoprecipitation. Each of these measures will be performed relative to a control. In this case a standard quantity of the expression vector mcherry (which drives the red fluorescent protein) will be spiked into the mingene reporter library. Enrichment will be detected by analyzing images captured from stereo fluorescent dissecting scope with morphometric analysis software (velocity) capable of quantifying fluorescence in the red and green channels. Typically hundreds of copies of transfected plasmids enter cells during a transfection. Thus a single cell should provide a reasonable sample size for a reliable estimate of the pool's average level of transcription. This assumption can be easily verified by measuring the variance of red/green ratio between cells transfected by the pool and single gfp reporter construct. One hundred fluorescent cells will be sampled post-transfection and the ratio of red to green will be used to score enrichment from round to round. If enrichment of transcriptionally active gfp minigenes is successful an increasing fraction of the pool should express an increasing amount of green fluorescence as a greater green to red with each round of enrichment. The majority of oligos can be inert with respect to transcriptional function.
A fraction of the oligos will be returned as constitutive enhancers or constitutive silencers while others will be regulatory elements that are induced by certain treatments. Results obtained by different methods of induction will be crosschecked to the results of the binding assays performed on similarly treated extracts. This may subdivide the active fraction of the oligo pool further (i.e. an enhancer that forms a low mobility complex that is dependant on phosphorylation). Many oligos maps to fewer promoter region. After separating the oligos in this fashion an understanding of how these elements work together to drive the regulation observed in the promoter will be sought. Specifically, different combinations of regulatory elements that are over-represented (or under- represented) in promoters that are p53 activated will be searched. The same analysis will be performed in promoters that are p53 repressed. The dataset can be divided by several other functional criteria (chromatin accessible/inaccessible) or by treatment. This approach will be used to find suggested synergies important in the different treatments. Transcript level response to particular treatments will be downloaded from publicly available GEO dataset. Significance will be determined by performing the analysis on a thousand shuffles of the oligo/enrichment data and recording the number of permutations with combinations equal to or more extreme than the observed data.
Screening of 100,000 validated promoter SNPs for allelic differences in factor binding and function. To understand genetic components of cancer susceptibility, a screen of human genomic regions (35-mers) centered over population variations (snps) ban be performed. The screen identifies oligos that exhibit allelic differences in binding or allelic differences in enhancer/silencer function. As there is such a close connection between process that sense DNA damage, cell cycle, apoptosis; the test of function assay will be performed under all these conditions. The binding assays will be performed as described herein, but on a pool of genomic variations. Similarly the function assays will utilize this second library in the in vivo transcription reporter described herein to test the 35-mer region around each site of variation for differential enhancer/silencer function under the conditions described below. Oligo pool
The process of SNP selection, and the format with which they will be tested, will be described. Both alleles of each SNP are tested in their 35-mer windows of genomic context (e.g., about 17 nucleotides flanking the variable position on either side). For the purpose of amplification, these regions are flanked by universal primer binding sites (ATGCCTGCAGTGTAC (SEQ ID NO: 9)
GTCTTCAATTGCATG (SEQ ID NO: 10)) and amplified. The snp regions are selected based on their location and also based on their validation status.
The more than 10 million SNPs in dbSNP, build 125. are used as the source of human variation. SNPs are screened and deemed valid if the entry 1) is found more than once 2) included frequency information 3) or is validated by the hapmap project or the submitter. From these remaining 5 million SNPS a smaller set of promoter SNPs iscreated. The about 1 19,426 SNPs that fall within the about 2kb upstream or about I kB downstream of a refseq transcriptional site are classified as promoter snps and designed into the oligo library. Binding assay / reporter assay
To test the effect each allele of each variable region will have on gene expression, the pool will be ligated to the gfp fragment via a Muni site into the gfp reporter fragment. HcTl 16 will be transfected with the pool, crosslinked and assayed for function. Additional cycles of amplification will be performed as necessary. In addition to the original transfection, various perturbations will applied to the line. Downstream analysis will focus on variations that demonstrate allelic differences in function or binding when measured under p53 inducing stress conditions.
To research connections between variation and human health, genes in close proximity to SNPs with allelic difference in regulatory potential will be retrieved from the OMIN database. Variations will be reassociated with their haplotype SNPs in silico. This family of variations will be used to search publish reports of association studies with human disease. Similarly SNPs will be searched in haplotypes that show evidence of recent positive selection in the human population. Alleles that are functionally distinct in p53 dependant responses will be genotyped in Coriel lymphoblastoid cell lines and in patient populations. Each year more than one million Americans are diagnosed with cancer.
Insight that will help cancer patients in their battle against this disease can be gained through a further understanding of the cellular biology of the p53 gene - a gene which is mutated in more than half of all human tumors. A method of monitoring 244,000 binding events with one measurement will be established. Drugs that modulate phosphorylation status of p53 will be used to discover how these modifications alter protein binding in colorectal carcinoma cells and in the cells of patients who are are currently taking phosphatase inhibitors as therapy. The ability to genotype patients exists and drugs that target specific proteins are being developed. The hereditary component of cancer strongly suggests that variation that exists within the human population can pre-dispose individual to cancer. Indeed such variations have already been reported. All variations in promoter regions of human genes will be considered and which variations alter the ability of proteins to recognize these sequences will be sought. A series of follow up experiments that will further refine these variations -particularly determining whether these fall within binding sites of p53 or other proteins in the DNA damage pathway. A fundamental understanding of cancer that can be put into practice in the near term to the important task of cancer therapy and detection can be achieved.
REFERENCES
Aerts, S., et al, Bioinformatics 19: 115-II14 (2003) Boyer, L.A., et al, Cell 122: 947-956 (2005)
Buck, M. J., et al., Genomics 83: 349-360 (2005)
Chan, W.M., et al., MoI Cell Biol 24: 3536-3551 (2004)
Cuddihy, A.R., et al., Oncogene 18: 2690-2702 (1999)
Gidekel, S., G., et al., Cancer CeI 4: 361-370 (2003) Guenther, M.G., et al., Cell 130: 77-88 (2007)
Johnson, D.S., et al, Science 316: 1497-1502 (2007)
Kho, P.S., et al, J Biol Chem 279: 21 183-21 192 (2004)
Loh, Y.H., et al, Nat Genet 38: 431-440 (2006).
Mclure, K.G., et al., EMBO J 18: 763-770 (1999) Menendez, D., A., et al, Oncogene 26: 2191 -2201 (2007)
Menendez, D., A., et al, Cell Biol 26: 2297-2308 (2006)
Niwa, H., et al., Nat Genet 24: 372-376 (2000)
Oleinikov, A.V., et al, Nucleic Acids Res 33: E92 (2005).
Olivier, M., et al, Hum Mutat 19: 607-614 (2002) Orlando, V., et al, Cell 75: 1 187-1 198 (1993)
Pan, G.J., et al, Cell Res 12: 321 -329 (2002)
Perez, C.A., et al, Oncogene 26:7363-7370 (2007
Remenyi, A., et al, Genes Dev 17: 2048-2059 (2003)
Remenyi, A., et al, MoI Cell 8: 569-580 (2001) Robertson, G., et al, Nat Methods 4:651-657 (2007)
Roepcke, S., et al, Nucleic Acids Res 33: W438-441 (2005)
Soussi, T., et al, Hum Mutat 21 : 192-200 (2003) Wasserman, W. W. et al., Nat Rev Genet 5: 276-287 (2004). Wei, C.L., et al., Cell 124: 207-219 (2006) Wingender, E., et al.. NUCLEIC ACIDS RES 24: 238-241 (1996) Zhao, R., et al., Genes Dev 14: 981-993 (2000)
The teachings of all patents, published applications and references cited herein are incorporated by reference in their entirety.
EXEMPLIFICATION EXAMPLE 1 : A SIMULTANEOUS EMSA ANALYSIS OF THOUSANDS OF OLIGOS TO CHARACTERIZE IN VIVO OCT4 BINDING REGIONS
Gene expression is initiated by the interaction of transcription factors with accessible locations in chromatin. Modulation of this accessibility and changes in the activity and composition of transcription factors are a major source of gene regulation during development. The transcription factor Oct4 (also known as Oct3 and Pou5fl) has been implicated in maintaining embryonic stem (ES) cell pluripotency and also reprogramming somatic cells to an ES cell fate (for review see Pan, Chang et al. 2002). Oct4 was isolated from ES cells on the basis of its ability to bind an octamer sequence, ATGCAAAT (SEQ ID NO: 1) (Scholer, Balling et al. 1989). It was later shown to be a principal factor in maintaining a stem cell state - a property that generated great interest in the target genes for this transcription factor (Niwa, Miyazaki et al. 2000). Oct4 expression may also mark adult germline compartments and certain classes of tumors (Gidekel, Pizov et al. 2003; Yamaguchi, Yamazaki et al. 2005; Atlasi, Mowla et al. 2007 — also Kehler J Tolkunova, E et al 2004 EMBO Rep). In-vitro binding specificity of Oct4 has been determined by SELEX, a method of identifying high affinity binding sites from random sequence through iterative steps of binding selection and enrichment (Nishimoto, Miyagi et al. 2003). Weight matrices are calculated from an alignment of the selected sequences and used to score the "closeness" of real sequences to the high affinity sites. However these methods often leave questions about the in vivo relevance of the output sequences as natural selection may not always favors the highest binding affinity sites. In addition, it has long been observed that other factors such as chromatin accessibility greatly limit the usefulness of in vitro binding specificities in predicting sites in vivo (Wasserman and Sandelin 2004).
More recently new methods have been developed to measure interactions between transcription factors and chromatin. These methods, termed ChIP-chip and ChIP-PET, locate binding sites in vivo by immunoprecipitating the factor of interest after it has been crosslinked to chromosomal DNA (Oflando and Paro 97). Binding regions are identified either by array (ChIP-chip) or by sequencing (ChIP-PET). Both these techniques have been applied to the identification of Oct4-bound regions in human and murine ES cells (Boyer, Lee et al. 2005; Loh, Wu et al. 2006). However, high throughput localization studies such as ChIP-chip and ChIP
PET have several important limitations. Crosslinking is not quantitative and sometimes not even immunoprecipitation, as well as the array densities and sequencing depths (Buck and Lieb 2004). Most of the published location studies return regions that span about 1 kb of genomic space. In how many places within these regions or even whether there is direct interaction between the factor and the DNA cannot immediately be determined. Recent advances in sequencing technology and the practice of size selecting the recovered DNA fragments have improved this situation, allowing for the inference of binding sites to a resolution of 50 base pairs (Johnson et al Science 2007). However, this approach is costly and this inference becomes problematic when recognition elements cluster in closely spaced groups along the DNA. A more direct determination of Oct4 binding sites within these regions will help identify variants that disrupt binding and also shed light on the mechanism of Oct4 function. It has been shown that Oct4 binding does not always enhance transcription. In some cases, Oct4 binding is correlated with a repressed transcriptional state (Boyer LA Supplementary). This duality is not uncommon for transcription factors and is probably explained by the local context of each site and the identity of neighboring factors on the DNA. High resolution maps of transcription factor binding sites (TFBS) in promoters are required to understand these nuances of transcription factor function. To date, functionally defined target gene regulation by Oct4 binding has been described for only a handful of genes at base pair resolution. These include Oct4 itself (Chew et al 2005 MCB 25:6031), FGF-2 (Abrosetti et al 1997 MCB 17:6321), Nanog (Rodda et al JBC 2005 280:24731), Sox2 (Chew 2005 MCB), and osteopontin (Botquin et al 1998 G&D 12:2073).
In order to pinpoint the specific sites and quantify the strength of binding of a factor to its targets in vivo, the large volumes of output sequence returned by high- throughput ChIP methods will need to be interrogated by traditional means such as the electrophoretic mobility shift assays (EMSA). As these genomic location studies become increasingly utilized, the field will require high throughput technology to identify binding sites within these genomic regions. One potential method is to use protein binding microarrays (PBMs) to measure binding affinities of a labeled protein to double stranded substrates arrayed on a glass slide. This technique has been used to identify binding specificities for transcription factors and can measure the degree to which a protein binds to a particular sequence (Mukherjee, Berger et al. 2004). While PBM is high-throughput, it lacks some of the flexibility and qualitative features of EMSA. For example, EMSA can distinguish single from multiple molecules of bound protein. EMSA is a highly quantitative, well established method that can be performed in whole cell extracts, allows for the physical isolation of the bound product and does not require an array to analyze the result.
The feasibility of high throughput EMSA will be demonstrated with an analysis of Oct4 binding capacity throughout the Oct4 ChIP enriched regions (Boyer et al). The MEGAshift method utilizes solid-phase DNA synthesis to remake regions of interest as large pools of short oligos. The reverse strand of these oligos are also synthesized as probes onto a custom oligo array. The oligo are incubated with recombinant Oct4 and the bound and unbound fraction analyzed by array. From this, a binding affinity can be directly measured for each oligo and so by proxy for each window along the Oct4-enriched chromosomal region.
These results demonstrate that the majority of nucleotide sequences are not able to bind recombinant Oct4. Of the fraction of nucleotides that were able to bind, these sequences can also interact with the Oct4 in ES cell extracts. Numerous sequences with multiple paired and overlapping non-consensus Oct4 bound sequences have been identified, underscoring the difficulty of identifying biologically relevant sites using ChIP, SELEX or in silico approaches by themselves.
METHODS: Library design, oligonucleotide synthesis, cloning and sequencing
The tool, liftOver (ref), was used to map the mouse coordinates onto the human genome and a perl script was used to identify overlapping regions. The human sequence was extended to completely cover the mouse sequences. A complex pool of 2,468 oligonucleotides was synthesized in picoarray microfluidic μParafloTM devices. The pool was amplified en masse and end-labeled using - 32P-ATP. Each oligonucleotide was designed as a tiled genomic 35-mer flanked by the common sequences CCAGTAGATCTGCCA (SEQ ID NO: 1 1) and ATGGAGTCCAGGTTG (SEQ ID NO: 12) that were used as the universal primer binding pair. Selected oligonucleotides were cloned into TOPO vectors (Invitrogen) under conditions favoring multiple inserts and sequenced. For expression analysis, TOPO vectors that contained single inserts were digested with Xhol and HinDIII (New England Biolabs) and ligated into the pGL3basic luciferase reporter vector (Promega) using the same sites. The control wild-type and mutant sequences were cloned into pGL3basic as phosphorylated duplex oligonucleotides using a unique Smal site.
Recombinant human Oct4 and preparation of whole cell extracts
GST-human Oct4 bacterial expression vectors and protocols were supplied by Drs. Yehudit Bergman (Hebrew University, Israel), and Jungho Kim (Sogang University, South Korea). Briefly, an overnight culture of BL21 -DE3 (Codon-plus, Stratagene) E. coli was diluted 1 :20 in LB, grown to OD660 0.5, and induced for 4 hours with 1 mM isopropyl-β-D-thiogalactopyranoside (IPTG) at 300C. Cells were lysed using SoluLyse (Genlantis) in 50 mM Tris-Cl pH 8.0, 1 % NP-40, 2 mM ethylene-diamine-tetraacetic acid (EDTA), 150 mM NaCl, 0.5 mM dithiothreitol (DTT), plus a protease inhibitor cocktail (Roche). The lysate was clarified by centrifugation, and incubated with glutathione-sepharose (GE Healthcare) for 30 minutes at 4°C. Sepharose beads were collected by centrifugation and washed 3 times with the lysis buffer. Washed beads were eluted with 20 mM glutathione in lysis buffer. The purified GST-Oct4 in the eluate was dialyzed into buffer D (20 mM Hepes, pH 7.9, 100 mM KCl, 0.1 mM EDTA, 20 % glycerol, 1 mM DTT, and 0.5 mM phenylmethylsulphonyl fluoride (PMSF).
Whole cell extracts were obtained from J l ES (male) undifferentiated and retinoic acid (RA) differentiated cells. Cells were pelleted, resuspend. and incubated in extraction buffer .(200 mM KCL, 100 mM Tris pH 8.0, 0.2 mM EDTA, 0.1 % Igepal, 10 % glycerol, and 1 mM PMSF) for about 50 minutes on ice. Cell debris was pelleted and extracts were frozen using liquid N2 and stored in the -8O0C. Electrophoretic Mobility Shift Assays Oligonucleotides were prepared for EMSA by end labeling PCR products with γ-32P-ATP. Samples were prepared in 20 μL (0.6X Buffer D, 50 ng/μL Poly dl'dC, 1 μg/μL BSA, 1 mM DTT, and 20 ng of probe). Samples were incubated at room temperature for 30 minutes. Native 4% polyacrylamide gels (29: 1 acrylamide:bisacrylamide, 1 % glycerol, 0.5X TBE) were pre-run for 1 hour at 80V, samples were loaded, and run for 1.75 hours at 80V. Hybridization and Arrays
Custom oligonucleotide microarrays (8 x 15K) were produced by Agilent Technologies Inc. Microarrays were hybridized following a modified version of the Agilent Two-Color Microarray-Based Gene Expression Analysis Protocol. Microarrays were hybridized for 3 hours at 50°C and then washed for 1 minute with 2X SSC with 0.2 %SDS, two 1 -minute washes with IX SSC, and finally one 10 second wash with EtOH (e.g., 95 %). Microarrays were then centrifuged dry and scanned using a GenePix 4000B scanner from Molecular Devices. RNA probes were produced and labeled with Cy3 and Cy5 using MEGAshortscript™ High-Yield Transcription kit (Ambion) after appending a T7 promoter to the oligonucleotides. Cell Culture and transient transfection
Differentiation of JI ES cells occurred in the presence of retinoic acid (RA) (e.g., about 10"7 M, 100 nM) over a 16-day period. ES cells were cultured in DMEM + HEPES supplemented with, for example, 1 mM glutamine, sodium pyruvate, and 1 mM MEM non-essential amino acids (Invitrogen) plus 15 % ES cell-qualified heat-inactivated fetal bovine serum (HyClone), 50 μM 2- mercaptoethanol (Sigma) and leukemia inhibitory factor (LIF/ESGRO, Chemicon). Cultured ES cells were transfected by electroporation (Amaxa Biosystems) using protocols supplied by the vendor. Each transfection contained 1.3 μg carrier plasmid. 200 ng pRL-TK (Promega), and 500 ng of either pGL3basic empty vector or cloned derivatives. Photinus and Renilla luciferase activities were quantified using a dual luciferase assay system (Promega). Web resources
Genome browser snapshots of all the gene loci with Oct4 binding data can be viewed at http//fairbrother.biomed. brown.edu/data/Oct4. The raw data from the array experiments are stored as text files on the server as is legend diagramming each experiment. Custom tracks will be submitted to UCSC Genome browser and are also available for download. Transcription factor Oct4 is a key regulator of embryonic stem cell pluripotency and a known oncoprotein. Genome-wide location studies have been performed in vivo with mouse and human embryonic cells to identify genomic regions that crosslink to and immunoprecipitate with Oct4. A novel high-throughput binding assay called MEG Ashift (microarray evaluation of Genomic aptamers by shift) is described herein, which is used to pinpoint the exact location, strength and nature of the DNA-protein complexes within these immunoprecipitated fragments. All genomic regions identified as ChIP enriched in both human and mouse will be considered, and resynthesized this DNA as tiled contigs of about 35-mers. This oligo pool is then assayed for binding to recombinant Oct4 by gel shift. The degree of binding for each oligo is accurately measured on a specially designed array. The relationship between experimentally determined and computationally predicted binding strengths is explored, many novel functional combinations of Oct4 half sites are found, and 100 % de novo motif discovery rate when MEGAshift is used in conjunction with motif discovery programs is demonstrated. In addition to further refining location studies for transcription factors, this method holds promise for the high throughput screening of promoter SNPs or conserved regions.
RESULTS:
Determining Transcription Factor Binding Specificity With Real Ligands. Pooled oligonucleotide design and experimental scheme Genomic regions that were found to be occupied by Oct4 in vivo using ChIP- chip or ChIP-PET in both human and mouse ES cells were used as starting material (Boyer. Lee. Cole, Johnstone, Levine, Zucker, Guenther, Kumar, Murray, Jenner et al. 2005; Loh, Wu, Chew, Vega, Zhang, Chen, Bourque, George, Leong, Liu et al. 2006). It was reasoned that the intersection of these results would contain the highest-confidence set of Oct4 binding sites that are conserved across these two species. While the original ChIP-PET study reported 88 Oct4 targets shared by the human and the mouse results, only twenty genomic regions fit this criterion of overlapping binding regions (20 in either species = 40 gene regions total). While Oct4 is generally known as an activator in stem cell state and is turned off during differentiation only about 39 % of Oct4 targets are transcriptionally active in human ES cells (Pan, Chang, Scholer and Pei 2002). Amongst the subset of these targets that overlap with mouse ChIP regions, about 73 % were in close proximity to genes that were expressed in ES cells. In this regard, the subset chosen for study is significantly different (p-value < 0.002, χ2 =10, d.f. =1) than the overall dataset from which they were derived either because: the overlapping conserved set has a lower false positive rate or Oct4 sites that function as enhancers are subject to stronger purifying selection than their silencer or otherwise non-enhancer counterparts. The ChIP-enriched regions were analyzed in both human and mouse. The binding regions in human and mouse did not always align perfectly and so to extend the region of comparison, the union of this overlap was synthesized in human (Figure 5, step 1). Each of these 40 regions was then used to generate a contig of 35-mers tiled in 19-nucleotide increments across the genomic region enriched in the ChIP assay (Figure 5, step 2). In parallel the reverse compliment of each oligonucleotide sequence was then used as a probe sequence for an oligonucleotide array. Universal primers flanking the 35-mer were designed to enable PCR amplification of the library en masse. This library then represented an approximate two-fold coverage of the Oct4 regions in human and the orthologous regions in mouse and is capable of reporting binding sites with 19-nucleotide resolution. The library was then tested for Oct4 binding by EMSA (Figure 5, step 3) using the complex pool as a probe. The shifted fraction was isolated and analyzed either by sequencing or by hybridizing the sample to probes on a custom oligonucleotide array (Figure 5, step 4). Six array probes were designed for each oligonucleotide present in the pool: duplicates of sense (2), antisense (2) and their singly mismatched probes (1+1). By differentially labeling the EMSA-selected and non- selected initial pool, enrichment can be derived from the red/green ratios of hybridization intensities on the array. Sequential enrichment of Oct4-binding activity from a complex pool
In order to isolate the fraction of the library that binds Oct4 an EMSA was performed in the presence of purified recombinant human full-length Oct4. To establish the appropriately discriminating concentration of Oct4 for this experiment and to control for possible binding contributions from the flanking primers, a canonical octamer site, derived from a human immunoglobulin heavy chain promoter, was amplified with a mutant control. At the optimized Oct4 concentration, the wild type control was saturated and the mutant probe showed trace amounts of binding (Figure 6A, lanes 1-4). The radiolabeled oligonucleotide library represents 2468 different genomic 35-mer windows flanked by the universal primer pair but migrated as a single band in the polyacrylamide gel (Figure 6A, lane 6) with no appreciable shift when incubated with recombinant Oct4 (Figure 6A, lane 7). The region of the gel where the Oct4 shift would be expected to migrate (as determined by the positive control) was excised, re-amplified and used to reprobe Oct4 in round 2 of the selection (Figure 6A, lane 8-9). In round 2 an appreciable signal, consistent with an Oct4-bound probe, was detected. To determine if this fraction also bound Oct4 in whole cell extract, the EMSA was repeated in whole cell lysate derived from ES cells using the enriched fraction from round 2 as a probe. Because of the tendency of Oct4 to form various hetero and homo complexes, probes containing octamer sequences have been observed to display complex shifting patterns in extract (Remenyi, Lins, Nissen, Reinbold, Scholer and Wilmanns 2003; Remenyi, Tomilin, Pohl, Lins, Philippsen, Reinbold, Scholer and Wilmanns 2001 ). In this experiment the wild-type probe displayed at least three shifted products that could represent Oct4 containing complexes. While pre- incubating the extract with antibodies against the closely related Octl had no effect on the formation of these complexes (Figure 6B, lane 4), Oct4 specific antibodies eliminated two of these shifted products and greatly increased the intensity of the third (arrow in Figure 6B lane 3). These three complexes can be efficiently competed with unlabelled wt probe and were also lost when EMSA was performed with differentiated ES cells, which lack Oct4 (data not shown). Both with the wt and selected pool, the upper band was most responsive to the pre-incubation with Oct4 antibodies (Figure 6B lane 9 versus 8). A band of similar mobility was detected in the EMSA performed with the unselected pool but was not responsive to Oct4 antibody (Figure 6B lanes 6-7). This demonstrates that the enrichment protocol performed with the recombinant bacterial Iy expressed Oct4 selected for a population of probes that, as a group, had increased affinity for endogenous Oct4 derived from ES cells.
The selection with recombinant Oct4 was repeated an additional round resulting in more enrichment. In this final round 3, a faint band corresponding to probe bound by multiple Oct4 molecules was detected. The corresponding region of round 2 was excised from the gel from round 2. While no band was visible in this region of the gel, a product was amplified and this pool resulted in enrichment in the additionally shifted molecules (see Figure 6C, lanes 5-6). This mixed population cannot be saturated within the concentration range of the protein required to saturate the positive control (Figure 6D). Recording the enrichment of about 2468 genomic sites in the Oct4 bound fraction of the Pool with a microarray
To determine the binding affinities of each of the 2468 oligonucleotides, a two color labeling strategy was used in conjunction with an Agilent custom oligonucleotide array to compare the representation of each oligonucleotide in the shifted band to the enrichment of that oligonucleotide in the starting pool. The hybridization intensity of the selected targets was normalized to the hybridization of the pool by a process of multiplying each probe intensity in the selected channel by a constant such that log of the ratio of selected/pool intensities (e.g., color spot ratios) summed to zero across all the probes on the array. Initially all probes indicated a similar level of enrichment but as the sequential enrichment progressed the tight distribution of enrichment scores gradually spread into progressively enriched and depleted subclasses of molecules (Figure 7A). Sequencing of the initial pool verifies that the oligonucleotide synthesis protocol had proceeded with high fidelity and without any detectable enrichment bias. We cloned about 50 oligonucleotides from the Oct4 enriched sets, and sequencing revealed that these consist of about 41 unique clones where eight of these sequences were cloned multiple times. In addition to these eight clones, there were several other cases where a sequence was represented multiple times in overlapping clones (Figure 8A). Presumably regions that were cloned multiple times from the selected pool were more highly enriched in the selected pool because they contain strong Oct4 binding sites. Distribution of Oct4 binding sites in mammalian promoters
Custom genome tracks were written to facilitate the visual comparison of these cloning and enrichment results at each round of the SELEX experiment using the popular UCSC genome browser (Figure 8). The region upstream of the REST gene represents an example of close agreement between the mouse ChIP-PET result and human ChIP-chip result. Oligonucleotides corresponding to the mouse and human were cloned multiple times from this region near the REST gene, also known as NRSF, and correspond well to the human site of maximal Oct4 binding. This region is highly conserved (Figure 8A) and also contains high quality matches to Octl weight matrices- motifs that are indistinguishable from the Oct4 consensus sequences.
The region upstream of GADD45G represents an example of poor agreement between the mouse ChIP-PET results and human ChIP-chip result (Figure 8B). Though the ChIPped regions contain little overlap, MEGAshift finds multiple sites that appear to be functionally conserved - i.e. enriched in both species throughout the SELEX experiment. The clustered distribution of binding sites that is common within the data appears to complicate the calling of enriched peaks in ChIP-chip analysis. For example, several closely spaced regions upstream of GADD45G are enriched in the ChIP -chip or ChIP-PET data. These regions overlap incompletely, yet according to MEGAshift, the most significant Oct4 binding is occurring in a non-overlapping region. This oligonucleotide is located in a highly conserved block, comparable to exonic coding sequence, and contains a predicted Oct4 binding site. For these reasons it is likely that this site is a bona fide Oct4 binding site that was missed in the peak calling procedure. Training binding models with enrichment data
To study the role of the Oct4 consensus sequence in Oct4 binding, all oligonucleotides were scored based on similarity with the SELEX determined Oct4 weight matrix. The highest scoring window for each oligonucleotide was then compared to its enrichment value from the array. As expected there is no initial bias for Oct4 sites in the unselected pool's self comparison, however throughout the course of SELEX the higher scoring Oct4 sites (lower values on the y-axis, Figure 9A) experience greater enrichment (a rightward migration on the x-axis, Figure 9A) than the oligonucleotides without discernable Oct4 sites (Figure 9A, upper portion). To determine whether incorporating the added information derived from the EMSA experiment increased the accuracy of motif prediction the oligonucleotides were ranked according to enrichment in the singly shifted fraction of the pool (round 1 ) and multiply shifted fraction (multiple). These ranked lists were then used to generate input sets for Gibbs Sampler trials which searched for motifs of lengths about 8 to about 20 nucleotides long. Using the top 20 most enriched probe signals (0.4 % of the data), the Gibbs sampler converged on a motif that contained the consensus Oct4 binding site (ATGCAAAT (SEQ ID NO: I)) in about 40 % of the trials (Figure 9B). This value peaked using about 3 % of the data and decreased as a greater fraction of progressively less enriched data was added to the search space. Using this optimal amount of input data, the effect of motif length was systematically explored. While the Oct4 consensus is eight nucleotides long, the sampler converges to the consensus in only about 50 % of the runs when the motif length is set to eight, nine or ten. In about 97 % of the cases where the sampler does not converge, the output motif is a slightly truncated version of the consensus that extends to at least six positions of the octamer.
Thus, these sequences do not represent independent motifs. Indeed, both the singly shifted and the multiply shifted enrichment values include a wide range of motif lengths that return Oct4 consensus sequences about 100 % of the time (Figure 9C). This demonstrates that MEGAshift can be used to infer the binding specificity of a factor de novo. As MEGAshift can determine the identity of sequences present in both the singly or multiply bound state, it should be possible to learn sequence features that pre-dispose a particular element to bind multiple molecules of Oct4. Sequence determinants of dimerization motifs
Plotting enrichment of the singly versus multiply bound fraction indicates that a sequence enriched in the singly bound fraction is more likely to also be enriched in the multiply bound fraction (Figure 10A). For example, distinct sequence features favor dimer formation. The Gibbs sampler converges on the core Oct4 binding sequence in the multiply shifted pool, but the behavior of the length parameter suggests slightly different motif characteristics. In other POU domain proteins, palindromic combinations of half sites have been known to support homo and heterodimer formation (Remenyi, Tomilin et al. 2001). Although these designed examples are informative, MEGAshift allows for the discovery of real sequences that predispose Oct4 to bind as a multimer.
Changing the sampling parameters to include more than one binding model and restricting the oligonucleotide input for Gibbs sampling in such a way that retains the requirement for binding but emphasizes oligos that are bound as multimers (Figure 10A), returns three motifs that appear as chimeric combinations of Oct4 half sites. The full range of half site combinations found to be enriched in the multiply bound fraction (all points below the red line) relative to the singly bound fraction (all points above the green line) are noticeably more diverse than the combinations identified to date (Figure 1 1). In general, oligonucleotides that contained three or more half sites were enriched in the multiply shifted fraction (Figure 10C). Oligonucleotides that contained the ATGC half site also tended to be enriched in the multiply bound fraction. MOREs and POREs, the well-studied palindromic half site combinations represent only a minority of the total possible combinations observed in vivo in the selection of data present here (Figure 1 1).
DISCUSSION:
This manuscript presents a series of experiments that serve as a bridge between low resolution in vivo ChIP-chip results and a high resolution molecular characterization of protein-nucleic acid interactions. Because MEGAshift is coupled to a readout of in vivo binding activity, this technique represents a means of obtaining the best possible estimate of Oct4 binding in ES cells. One intriguing feature of this work is the role of non-canonical binding sites that are comprised of various combinations of half sites. While this class of elements were not detected in SELEX experiments or in the original ChIP data, synthetic versions of these elements have been shown to facilitate Oct4 binding as a dimer (Remenyi, Lins et al. 2003; Remenyi, Tomilin et al. 2001). This type of binding arrangement is becoming a common theme amongst transcription factor binding sites (REST, p53) and MEGAshift offers a powerful tool to discover.
Another application is to discover or to refine binding motifs based on direct evidence and real sequence. Choosing the appropriate parameters and using a MEGAshift ranked input leads to a complete convergence of motif finders on the octamer sequence. Oct4 has a known binding motif but this result demonstrates that binding motifs could be identified from unknown complexes that form on oligonucleotide pools in whole cell extracts. If upstream promoters of coordinately regulated genes were used for the design of oligonucleotide pools, these complexes could be mapped relative to each other allowing for the definition of regulatory modules.
Another benefit of being able to test specific sequences is the ability to design mutations into the oligo pool. Several of the binding elements in promoters may harbor polymorphisms and be important functional variants that account for some biologically relevant phenotype. MEGAshift could be used to assay both alleles of such a polymorphism in order to discover functional SNPs.
MEGAshift represents a hybrid biochemical, genomic and computational approach to identify a question of binding specificity in gene expression. However any application that involves a pool of nucleic acid and a method of molecular selection could be amenable to this protocol. MEGAshift is inexpensive ($700 oligo synthesis - Atactic - $450 custom oligo array). The Agilent arrays can be stripped and re-used and the results could easily be analyzed by sequencing if array facilities are unavailable. I I
Figure imgf000048_0001
EXAMPLE 2: REMOVAL AND RECOVERY OF OLIGONUCLEOTIDES FROM A MICROARRAY
The oligonucleotides used for this experiment were 60 mers harvested from a custom oligonucleotidenucleotide array, which are available in a variety of spot (feature) densities. Each feature consists of a homogenous population of about 60- mers that had been synthesized onto the microscope slide using Agilent Sure Print Technology. In order to demonstrate that a reasonable fraction of the array features could be removed from the slide and resuspended as a complex pool of oligonucleotides, a series of pilot experiments were designed to amplify individual features from a discarded Zebrafish exon microarray (G2519F). Three features were chosen at random from the about 45,220 spots present on the zebrafish exon microarray. Primers, fifteen nucleotides in length and corresponding to the beginning and end of the published feature sequence, were designed to amplify these three features in three separate PCR reactions. The oligonucleotides were separated from the solid phase attachment by mechanical abrasion. (Agilent Corp. Patent No. 896572 filed on 2001-06-29). Scouring the printed face of the microscope slide with a 20 gauge needle efficiently removed both oligonucleotides and slide coating (Figures 12A and B). The resulting particles were resuspended in water and any adherent clumps were disrupted by sonication. Of the three features only a single feature was successfully amplified. In order to explore whether the failure to amplify was caused by incomplete scouring or some variable intrinsic to the feature/primer sequence the entire process was repeated on a second discarded zebrafish array and, again, only one of the three primer pairs amplified its template (data not shown).
Subsequent custom oligonucleotide arrays were designed with these flanking primers in the about 244,000 oligonucleotide format. The design of each oligonucleotide was a unique sequence flanked by these 15 nucleotide primer sequences that had been demonstrated to amplify in the pilot experiment. To verify that the complexity of an oligonucleotide pool could be maintained in this cleavage and amplification protocol, the resuspended oligonucleotide was tested for the presence often randomly chosen oligonucleotide. Amplifying this pool with primers specific to the internal unique sequences often features resulted in nine successful amplifications. This high rate of recovery indicates that the vast majority of features were successfully transferred and amplified from the solid phase to the aqueous phase.
To our surprise, several attempts to amplify other features on the original zebrafish arrays failed. This indicated that while certain flanking sequences functioned well, the sequence of the flanks was imposing a surprisingly high bias on the outcome of this procedure. In other words, some aspect of the flank was affecting either the success of the original synthesis, or liberation, or amplification of that oligonucleotide from the array. As it may be desirable to amplify oligonucleotide pools with other flanking primer sequences a new array was designed to explore the sequence requirements for successful amplification of an oligonucleotide from an array.
All 30 possible combinations of six commonly used sequencing primers were tested as flanking primers on an Agilent (1 X 244K ) Custom Oligonucleotide Array. Each of these thirty combinations were tested in triplicate which were distinguished from each other by the presence and location of a small panel of six cutter restriction sites. Of these 90 amplifications, 62 failed and 28 amplified. The success appeared to be independent of the position of the feature on the slide arguing against inconsistent scouring as the cause of failed amplification (Figure 13A). Examining the 28 successes clearly illustrates the sequence bias. Half of all successful amplifications with the fifteen primer pairs were achieved by three pairs and the attached primer complimentary to the attached end of the oligo appear more sensitive to sequence bias than its unattached counterpart (variance of success clustered by attached primer = 6.8 , by unattached primer= 4.3).
One primer, Exon2AmpFor, failed in all its amplifications though it has been used to successfully (with Exon2AmpRev) to amplify CHO genomic and plasmid DNA. Removing this primer and comparing the role of sequence at the attached 5 'end versus the free 3 'end suggests that primer sequence is a more dominant parameter in amplification success at the 5 'end than at the 3' end (variance of row "Total 5" = 6.8; variance of column "Total 3" = 4.3). Table 2: Primers
Figure imgf000051_0001
EXAMPLE 3 : IMMUNOPRECIPITATION - FOLLOWED BY MAGNETIC BEADS ALLOWS FOR GREATER ENRICHMENT OF OCT4 LIGANDS THAN EMSA.
While the EMSA provides qualitative information that allows for the separation of complexes by mobility, performing successive enrichments by gel excision results in fairly modest enrichments. In order to improve the selection efficiency an immunoprecipitation strategy was utilized (Figure 14), which enriched the pool considerably. After two rounds of selection the binding in the pool was comparable to the wild type control.
This binding appears to be greatly enhanced by Oct4 phosphorylation as treatment of the extract with phosphatase decreased the Oct4 supershifted band (Figure 15, lanes 5, 13 vs. lanes 3, 1 1). It is likely that Oct4 or p53 achieves some of its regulatory function in the context of a synergistic modules. For Oct4 there are known synergistic partners such as Sox2 and Nanog that, with Oct4, bind DNA in a coordinated fashion. Both Oct4 and p53, once bound to DNA can act as either activators or repressors. While the mechanism of such diverse action is unknown it is likely due to interaction with nearby factors. To understand the mechanism of Oct4 and p53 action, the identity and genomic distribution of complexes in the vicinity of Oct4 (or p53 ) binding sites was sought. The radiolabeled probe with ES cell extract. From the complex spectrum of bands a region above the non-specific complex is excised, extracted and this oligo is re- amplified in order to repeat the method. Individual low mobility complexes were enriched and excised from the gel for later array analysis. The Oct4 bound fraction of the pool was not preferentially enriched with this method (Figure 16, lanes 6, 8, and 10).
EQUIVALENTS While this invention has been particularly shown and described with references to example embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.

Claims

CLAIMSWhat is claimed is:
1. A method of identifying a nucleotide ligand that associates with a target molecule, comprising the steps of: a) amplifying a pool of non-random oligonucleotides to form a library of non-random oligonucleotides; b) contacting the library of non-random oligonucleotides with a target molecule to form an association between the non-random oligonucleotides and the target molecule; and c) separating the non-random oligonucleotides that associate with the target molecule from the non-random oligonucleotides that do not associate with the target molecule to thereby identify the nucleotide ligand that associates with the target molecule.
2. The method of Claim 1, wherein the target molecule is affixed to a solid support matrix.
3. The method of Claim 1, wherein the non-random oligonucleotides are deoxyribo-oligonucleotides.
4. The method of Claim 3, wherein the deoxyribo-oligonucleotides are single- stranded deoxyribo-oligonucleotides.
5. The method of Claim 3, wherein the deoxyribo-oligonucleotides are double- stranded deoxyribo-oligonucleotides.
6. The method of Claim 3, wherein the deoxyribo-oligonucleotides include at least one genomic nucleotide sequence.
7. The method of Claim 6, wherein the genomic nucleotide sequence is at least one member selected from the group consisting of a promoter nucleotide sequence and an enhancer nucleotide sequence.
8. The method of Claim 1, wherein the non-random oligonucleotides are ribo- oligonucleotides.
9. The method of Claim 1, wherein the non-random oligonucleotides are synthetic non-random oligonucleotides.
10. The method of Claim 1, wherein each of the non-random oligonucleotides has an identical number of nucleotides.
1 1. The method of Claim 10, wherein the number of nucleotides is less than about 100 nucleotides.
12. The method of Claim 1 1, wherein the number of nucleotides is about 50 nucleotides.
13. The method of Claim 1, wherein each of the non-random oligonucleotides has between about 50 nucleotides to about 100 nucleotides.
14. The method of Claim 1, wherein at least a portion of at least one non-random oligonucleotide overlaps with at least a portion of another non-random oligonucleotide.
15. The method of Claim 14, wherein the portion of the non-random oligonucleotide and the portion of another non-random oligonucleotide overlap in a range of between about 19 nucleotides to about 35 nucleotides.
16. The method of Claim 1, wherein at least one non-random oligonucleotide includes at least one primer binding site.
17. The method of Claim 16, wherein the primer binding site includes at least one universal primer binding site.
18. The method of Claim 1 , wherein the association of at least one non-random oligonucleotide with the target molecule is detected by at least one member selected from the group consisting of a mobility shift assay, a hybridization array and an immunoprecipitation assay.
19. The method of Claim 18, wherein the mobility shift assay is performed iteratively with at least one non-random oligonucleotide that associates with the target molecule.
20. The method of Claim 1 , wherein at least one of the non-random oligonucleotides includes a detectable label.
21. The method of Claim 1 , further including assessing a binding affinity of the nucleotide ligand for the target molecule.
22. The method of Claim 1, further including adding an agent at one or more time points selected from the group consisting of before, concomitantly and after contacting the library of non-random oligonucleotides with the target molecule.
23. The method of Claim 22, wherein the agent disrupts the association of at least one non-random oligonucleotide and the target molecule.
24. The method of Claim 22, wherein the agent inhibits the association of at least one non-random oligonucleotide and the target molecule.
25. The method of Claim 22, wherein the agent promotes the association of at least one non-random oligonucleotide and the target molecule.
26. The method of Claim 22, wherein the agent includes a phosphatase inhibitor.
27. The method of Claim 22, wherein the agent includes at least one member selected from the group consisting of a drug, an enzyme and a nucleic acid.
28. The method of Claim 27, wherein the enzyme includes a phosphatase.
29. The method of Claim 27, wherein the nucleic acid includes a small interfering ribonucleic acid.
30. The method of Claim 1, wherein the target molecule is a component of an extract of a cell.
31. The method of Claim 30, further including exposing at least one member selected from the group consisting of the cell and the extract to at least one member selected from the group consisting of an agent, a stress condition and an ultraviolet radiation before the extract of the cell containing the target molecule is prepared.
32. The method of Claim 1, wherein the target molecule is a protein.
33. The method of Claim 1, wherein the target molecule is a transcription factor.
34. The method of Claim 33, wherein the transcription factor activates a nucleotide sequence that is between about 400 nucleotides to about 2000 nucleotides within a location of where the nucleotide ligand binds a genomic nucleotide sequence.
35. The method of Claim 1 , wherein the target molecule is a splicing factor.
36. The method of Claim 1, wherein each of the non-random oligonucleotides in the library includes a detectable label.
37. The method of Claim 1 , wherein the amplifying, contacting and separating steps are performed at least twice prior to identifying the nucleotide ligand.
38. - The method of Claim 37, wherein at least one promoter sequence is added to at least one non-random oligonucleotide.
39. The method of Claim 38, wherein the promoter sequence is a T7 promoter sequence.
40. The method of Claim 1 , further including the step of repeating the steps of amplifying the pool of non-random oligonucleotides to form the library of non-random oligonucleotides, contacting of the library of non-random oligonucleotides with the target molecule to form the association between the non-random oligonucleotides and the target molecule, and separating the non-random oligonucleotides that associate with the target molecule from the non-random oligonucleotides that do not associate with the target molecule to thereby identify the nucleotide ligand that associates with the target molecule.
41. The method of Claim 40, wherein the steps are repeated at least twice.
42. The method of Claim 40, wherein each amplifying step is performed in the presence of a distinct detectable label.
43. A method of identifying a variant of an allele that binds a target molecule, comprising the steps of: a) contacting at least one target molecule with i) a first pool of non-random oligonucleotides to form a first library of non-random oligonucleotides; and ii) a second pool of non-random oligonucleotides to form a second library of non-random oligonucleotides, wherein each non-random oligonucleotide of the second library of non-random oligonucleotides is an allelic variant of the first library of non-random oligonucleotides; and wherein the first library of non-random oligonucleotides is optionally combined with the second library of non-random oligonucleotides prior to contact with the target molecule; and b) comparing binding of the target molecule to the first library of non- random oligonucleotides and the second library of non-random oligonucleotides, wherein binding of the target molecule identifies the variant of the allele that binds the target molecule.
44. The method of Claim 43, further including the step of contacting at least one common nucleotide ligand with the first library and second library.
45. The method of Claim 44, further including comparing the binding of the common nucleotide ligand between the first library and the second library.
46. The method of Claim 43, wherein the target molecule is a component of an extract of a cell.
47. The method of Claim 43, wherein the first library of non-random oligonucleotides and the second library of non-random oligonucleotides are deoxyribo-oligonucleotides.
48. The method of Claim 47, wherein the deoxyribo-oligonucleotides include genomic nucleotide sequences.
49. The method of Claim 48, wherein the genomic nucleotide sequences are at least one member selected from the group consisting of a promoter nucleotide sequence and an enhancer nucleotide sequence.
50. The method of Claim 43, wherein at least one non-random oligonucleotide of at least one member selected from the group consisting of the first library of non-random oligonucleotides and the second library of non-random oligonucleotides includes at least one universal primer binding site.
51. The method of Claim 43, wherein binding of the target molecule to least one non-random oligonucleotide of at least one member selected from the group consisting of the first library of non-random oligonucleotides and the second library of non-random oligonucleotides is detected by at least one member selected from the group consisting of a mobility shift assay, a hybridization array and an immunoprecipitation assay.
52. The method of Claim 43, wherein the non-random oligonucleotides of the first library and the non-random oligonucleotides of the second library are differentially labeled.
53. A method of determining a difference in a binding affinity of a first nucleotide ligand compared to a second nucleotide ligand for a target molecule, wherein the first nucleotide ligand is an allelic variant of the second nucleotide ligand comprising the steps of: a) contacting a first non-random oligonucleotide library of the first nucleotide ligand and a second library of the second nucleotide ligand with the target molecule; and b) comparing a proportion of the first non-random oligonucleotide library bound to the target molecule with the proportion of the second non-random oligonucleotide library bound to the target molecule, wherein a difference in the proportion of the first non-random oligonucleotide library bound to the target molecule compared to the proportion of the second non-random oligonucleotide library bound to the target molecule indicates a difference in the binding affinity for the first nucleotide ligand compared to the second nucleotide ligand for the target molecule.
54. The method of Claim 53, further including the step of contacting at least one common nucleotide ligand with the first library and second library.
55. The method of Claim 54, further including comparing the binding of the common nucleotide ligand between the first library and the second library.
56. The method of Claim 53, wherein the target molecule is a component of an extract of a cell.
57. A method of determining a binding affinity of a target molecule for a nucleotide ligand, comprising the steps of: (a) amplifying a pool of non-random oligonucleotides to form a library of non-random oligonucleotides; (b) contacting the library of non-random oligonucleotides with a target molecule;
(c) detecting a nucleic acid sequence in the library of non-random oligonucleotides that binds the target molecule with a first nucleic acid probe, wherein the first nucleic acid probe binds the target molecule and includes a first detectable label;
(d) contacting the first nucleic acid probe with a second nucleic acid probe, wherein the second nucleic acid probe binds the target molecule with an affinity different than the first nucleic acid probe and includes a second detectable label that is distinct from the first detectable label, thereby forming a mixture of the first nucleic acid probe and the second nucleic acid probe;
(e) hybridizing the mixture to a collection of nucleic acid sequences that are complementary to the first nucleic acid probe and the second nucleic acid probe; (f) detecting the first nucleic acid probe and the second nucleic acid probe that hybridize to the collection; and (g) determining a ratio of the first detectable label to the second detectable label, to thereby determine the binding affinity of the target molecule for a nucleotide ligand.
PCT/US2008/013605 2007-12-17 2008-12-11 Methods for identifying nucleotide ligands WO2009078939A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US786307P 2007-12-17 2007-12-17
US61/007,863 2007-12-17

Publications (1)

Publication Number Publication Date
WO2009078939A1 true WO2009078939A1 (en) 2009-06-25

Family

ID=40380182

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/013605 WO2009078939A1 (en) 2007-12-17 2008-12-11 Methods for identifying nucleotide ligands

Country Status (1)

Country Link
WO (1) WO2009078939A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110785814A (en) * 2018-01-05 2020-02-11 因美纳有限公司 Predicting quality of sequencing results using deep neural networks

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1991019813A1 (en) * 1990-06-11 1991-12-26 The University Of Colorado Foundation, Inc. Nucleic acid ligands
WO2007109067A2 (en) * 2006-03-21 2007-09-27 The Arizona Board Of Regents, A Body Corporate Acting On Behalf Of Arizona State University Non-random aptamer libraries and methods for making

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1991019813A1 (en) * 1990-06-11 1991-12-26 The University Of Colorado Foundation, Inc. Nucleic acid ligands
WO2007109067A2 (en) * 2006-03-21 2007-09-27 The Arizona Board Of Regents, A Body Corporate Acting On Behalf Of Arizona State University Non-random aptamer libraries and methods for making

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DJORDJEVIC ET AL: "SELEX experiments: New prospects, applications and data analysis in inferring regulatory pathways", BIOMOLECULAR ENGINEERING, ELSEVIER, NEW YORK, NY, US, vol. 24, no. 2, 1 June 2007 (2007-06-01), pages 179 - 189, XP022081378, ISSN: 1389-0344 *
OGASAWARA HIROSHI ET AL: "Genomic SELEX search for target promoters under the control of the PhoQP-RstBA signal relay cascade", JOURNAL OF BACTERIOLOGY, vol. 189, no. 13, July 2007 (2007-07-01), pages 4791 - 4799, XP002518131, ISSN: 0021-9193 *
SHIMADA TOMOHIRO ET AL: "Systematic search for the Cra-binding promoters using genomic SELEX system", GENES TO CELLS, vol. 10, no. 9, September 2005 (2005-09-01), pages 907 - 918, XP002518130, ISSN: 1356-9597 *
TANTIN DEAN ET AL: "High-throughput biochemical analysis of in vivo location data reveals novel distinct classes of POU5F1(Oct4)/DNA complexes", GENOME RESEARCH, vol. 18, no. 4, April 2008 (2008-04-01), pages 631 - 639, XP002518132, ISSN: 1088-9051 *
TOLEDANO M B ET AL: "REDOX-DEPENDENT SHIFT OF OXYR-DNA CONTACTS ALONG AN EXTENDED DNA-BINDING SITE: A MECHANISM FOR DIFFERENTIAL PROMOTER SELECTION", CELL, CELL PRESS, CAMBRIDGE, NA, US, vol. 78, no. 5, 9 September 1994 (1994-09-09), pages 897 - 909, XP008040092, ISSN: 0092-8674 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110785814A (en) * 2018-01-05 2020-02-11 因美纳有限公司 Predicting quality of sequencing results using deep neural networks

Similar Documents

Publication Publication Date Title
Choi et al. Massively parallel reporter assays of melanoma risk variants identify MX2 as a gene promoting melanoma
Hollenhorst et al. DNA specificity determinants associate with distinct transcription factor functions
Lipovich et al. MacroRNA underdogs in a microRNA world: evolutionary, regulatory, and biomedical significance of mammalian long non-protein-coding RNA
Inaki et al. Transcriptional consequences of genomic structural aberrations in breast cancer
Robin et al. SORBS2 transcription is activated by telomere position effect–over long distance upon telomere shortening in muscle cells from patients with facioscapulohumeral dystrophy
Hung et al. Diverse roles of hnRNP L in mammalian mRNA processing: a combined microarray and RNAi analysis
Erson-Bensan et al. Alternative polyadenylation: another foe in cancer
Kim et al. Integrative analysis of gene amplification in Drosophila follicle cells: parameters of origin activation and repression
Zhou et al. Identification of a direct Dlx homeodomain target in the developing mouse forebrain and retina by optimization of chromatin immunoprecipitation
Gentile et al. PRC2-associated chromatin contacts in the developing limb reveal a possible mechanism for the atypical role of PRC2 in HoxA gene expression
Wu et al. Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome
Shibata et al. Detection of DNA fusion junctions for BCR-ABL translocations by Anchored ChromPET
US20040058356A1 (en) Methods for global profiling gene regulatory element activity
Véronèse et al. Contribution of MLPA to routine diagnostic testing of recurrent genomic aberrations in chronic lymphocytic leukemia
JP6411372B2 (en) Method and system for identifying patient-specific driver mutations
Tantin et al. High-throughput biochemical analysis of in vivo location data reveals novel distinct classes of POU5F1 (Oct4)/DNA complexes
Li et al. SRSF5 regulates alternative splicing of DMTF1 pre-mRNA through modulating SF1 binding
Skotheim et al. A universal assay for detection of oncogenic fusion transcripts by oligo microarray analysis
Wada et al. Development of detection method for novel fusion gene using GeneChip exon array
AU2003276609B2 (en) Qualitative differential screening for the detection of RNA splice sites
WO2009078939A1 (en) Methods for identifying nucleotide ligands
Shin et al. Capicua is involved in Dorsal-mediated repression of zerknüllt expression in Drosophila embryo
Ferrari et al. High grade B-cell lymphoma with MYC, BCL2 and/or BCL6 rearrangements: Unraveling the genetic landscape of a rare aggressive subtype of non-Hodgkin lymphoma
Choi et al. Massively parallel reporter assays combined with cell-type specific eQTL informed multiple melanoma loci and identified a pleiotropic function of HIV-1 restriction gene, MX2, in melanoma promotion
Georgiades et al. Active regulatory elements recruit cohesin to establish cell-specific chromatin domains.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08862501

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08862501

Country of ref document: EP

Kind code of ref document: A1