WO2002097061A2 - Iterative promoter affinity chromatography to identify trans-regulatory networks of gene expression - Google Patents

Iterative promoter affinity chromatography to identify trans-regulatory networks of gene expression Download PDF

Info

Publication number
WO2002097061A2
WO2002097061A2 PCT/US2002/019221 US0219221W WO02097061A2 WO 2002097061 A2 WO2002097061 A2 WO 2002097061A2 US 0219221 W US0219221 W US 0219221W WO 02097061 A2 WO02097061 A2 WO 02097061A2
Authority
WO
WIPO (PCT)
Prior art keywords
gene
network
trans
expression
promoter
Prior art date
Application number
PCT/US2002/019221
Other languages
French (fr)
Other versions
WO2002097061A3 (en
Inventor
Lawrence S. Zisman
Original Assignee
Myomatrix Molecular Technologies
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Myomatrix Molecular Technologies filed Critical Myomatrix Molecular Technologies
Priority to AU2002344238A priority Critical patent/AU2002344238A1/en
Publication of WO2002097061A2 publication Critical patent/WO2002097061A2/en
Publication of WO2002097061A3 publication Critical patent/WO2002097061A3/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the complexes were separated by two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) and 69 individual proteins spots were excised, digested in situ with trypsin and analysed with nanoelectrospray and/or MALDI mass spectroscopy (MS). Forty one of the 69 spots corresponded to previously identified proteins. The remainder were identified with an algorithm adapted to allow direct searching of EST databases. Screening was performed by translating peptide sequences into degenerate oligonucleotide sequences which were then input to search the EST database. This approach to characterization of the spliceosome accomplished in a three year period what had been sought after for over twenty years. However, this approach was not able to reveal mechanisms for specific patterns of regulation of protein or gene expression. Furthermore, information from the human genome project is not sufficient to predict the temporal expression of proteins, nor specific protein-protein or protein-gene interactions.
  • FIG. 1 MEF2 Western blot analysis before and after ⁇ MHC promoter affinity chromatography demonstrating efficacy and specificity of the method.
  • Lane 1 nuclear extracts; 2 and 3, flow through from the affinity beads; 4, affinity purified MEF2.
  • Figure 6. Schematic representation which outlines the methodology of PANGSEQ (Permutation Analysis of Genomic Sequence), which is used to locate coding sequences for proteins identified with IPAC.
  • a splice junction handler (SJH) is used to identify intron/exon junctions during both intitial homology search and start codon search.
  • SJH splice junction handler
  • the mRNA abundance of this set of genes should be decreased when protein Tfi is not present (i.e., when transcription of the Tfi gene is suppressed or the Tfi gene is knocked out).
  • the TF complex is an inhibitor of transcription
  • the mRNA abundance of the set of genes predicted to be regulated by it should be increased when protein Tfi is not present.
  • nucleic acids can be purified by precipitation, chromatography (including preparative solid phase chromatography, oligonucleotide hybridization, and triple helix chromatography), ultracentrifugation, and other means.
  • Polypeptides and proteins can be purified by various methods including, without limitation, preparative disc-gel electrophoresis, isoelectric focusing, HPLC, reversed-phase HPLC, gel filtration, ion exchange and partition chromatography, precipitation and salting-out chromatography, extraction, and countercurrent distribution.
  • the protein is digested separately with trypsin (or Endo Lys-C), which generates peptides with a basic amino acid at the C-terminus, and with Endo Glu-C, which provides peptides ending in glutamic acid (except the original C-terminus of the protein). Then two sets of very different and overlapping sets of proteolytic peptides are generated. LC-ESI-MS/MS is performed on each set of proteolytic peptides and CID spectra are obtained.
  • trypsin or Endo Lys-C

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Physiology (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

The present invention is directed to novel proteomic and genomic technologies based on iterative promoter affinity chromatography (IPAC) and a reverse translation algorithm permutation analysis of genomic sequences (PANGSEQ). These technologies permit the development of trans-regulatory networks for the predictive modeling of complex biological systems and are relevant to drug discovery as well as diagnostic and research applications in the field of organogenesis and pathogenesis.

Description

ITERATIVE PROMOTER AFFINITY CHROMATOGRAPHY TO IDENTIFY TRANS- REGULATORY NETWORKS OF GENE EXPRESSION
This application claims priority under 35 U.S.C. §119(e) from U.S. Provisional Patent Application Serial No. 60/282,589 filed April 9, 2001, which is incorporated herein by reference in its entirety.
FIELD OF THE INVENTION
The invention relates to novel proteomic and genomic technologies for the predictive modeling of complex biological systems, which are based on iterative promoter affinity chromatography (IP AC) and a reverse translation algorithm permutation analysis of genomic sequences (PANGSEQ).
BACKGROUND OF THE INVENTION
Information regarding the structure, function, and expression of previously undescribed genes may be obtained by homology to genes of known function. Imposing order on the genome by this method constitutes the major effort of bioinformatics. Homology-based bioinformatics depends on traditional biochemistry and molecular biology to provide reference data of known entities for extrapolation to the unknown. However, the structure, function and temporal order of expression of a large number of genes in relevant biological systems cannot currently be discerned by bioinformatics, because the progress of the reference data derived from t ese more traditional methods has lagged behind. The field of proteomics has recently emerged to bridge the gap between the relatively slow yield of standard biochemistry and high throughput bioinfbrmatic analysis of the human genome.
The term "proteome" was introduced in 1994 to describe the entire set of proteins encoded by a genome (Wilkins et al., Biotechol. Genet. Eng. Rev., 1995, 13: 19-50). Recent advances in technology have permitted investigators to identify large sets of proteins in biological systems that may be used to decipher the syntax of the genome. An example of applied proteomics is the characterization of the multi-protein spliceosome complex by mass spectroscopy followed by EST-database searching. Neubauer et al. (Nature Genetics, 1998, 20: 46-50) isolated human splicing complexes from HeLa nuclear extracts with pre-mRNA substrate affinity chromatography. The complexes were separated by two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) and 69 individual proteins spots were excised, digested in situ with trypsin and analysed with nanoelectrospray and/or MALDI mass spectroscopy (MS). Forty one of the 69 spots corresponded to previously identified proteins. The remainder were identified with an algorithm adapted to allow direct searching of EST databases. Screening was performed by translating peptide sequences into degenerate oligonucleotide sequences which were then input to search the EST database. This approach to characterization of the spliceosome accomplished in a three year period what had been sought after for over twenty years. However, this approach was not able to reveal mechanisms for specific patterns of regulation of protein or gene expression. Furthermore, information from the human genome project is not sufficient to predict the temporal expression of proteins, nor specific protein-protein or protein-gene interactions.
Several regulatory systems control which genes in a cell are expressed at a given time. Expression of a gene requires that its DNA be transcribed into mRNA, that appropriate processing of the mRNA occur to link coding regions of the mRNA sequence together, and that this processed mRNA be translated into protein. There are regions of DNA which are responsible for regulation of transcription but which do not contain coding information for protein expression. One such region is located upstream (5') from the start codon of a gene. This region is called the "promoter". In order for transcription to occur, a set of proteins called transcription factors (TFs) must first bind to the promoter. The TFs, in turn, interact with RNA polymerase which transcribes RNA on the DNA template. The "promoter" is called a "cis-acting element" because it is contiguous to the coding region of the DNA to be transcribed. The TFs however are encoded by the DNA located in other regions of the genome and are thus called "trans-acting elements".
Promoters typically contain motifs that are known to bind to specific TFs; however, these motifs do not provide sufficient information to define entire transcription factor complexes (TFCs) which bind to it. One reason for this limitation is that TFCs consist of multiple proteins which interact directly or indirectly with the promoter motif. Another reason for this limitation is that motif combinations may vary within a given promoter and bind to currently unknown transcription factor complexes. Therefore, in order to understand which TFCs are required for the expression of a particular gene of interest, one must have direct knowledge of the proteins which form the TFC.
It has been proposed that temporal analysis of gene expression will identify genes regulated by the same or highly homologous TFs. Genes whose pattern of expression over time are similar may be placed in "clusters" or groups whose cis-regulatory elements are bound by the same proteins in vivo. Based on this hypothesis, gene network models can be constructed by inference from temporal analysis of gene expression. However, a direct study of protein/ promoter interactions (i.e., trans-regulatory elements) would permit a construction of a gene network without the need for inference.
Current approaches to modeling gene networks include: 1) Boolean, in which each member of the network is a gene that is either "on" (transcribed) or "off (not transcribed); 2) dynamic models in which kinetic analysis and binding equilibria are used in differential equations; 3) circuit models in which gene networks are constructed by analogy to electronic circuits; 4) neural networks in which a given gene interaction is assigned a weight matrix with linear or sigmoidal transfer functions (Vohradsky J., FASEB J. 2001;15:846-854.). An intermediate model between the Boolean network model and the differential equation model has also recently been proposed (Akutsu et al., Bioinformatics. 2000;16:727-734.). However, all of these modeling approaches are primarily theoretical and are believed to be hard to implement. To understand the complexity of biological systems, new approaches to experimental design, data analysis, and model construction are needed. The present invention addresses this and other needs by disclosing novel methodologies, which allow to generate sets (required for complex modeling of biological systems) that may be amenable to modeling with alternative mathematical techniques.
SUMMARY OF THE INVENTION
The primary object of the present inventoion is to provide an Iterative Promoter Affinity Chromatography (IP AC) method for identifying a trans-regulatory pathway for the expression of a gene, comprising the steps of:
(a) selecting a promoter of a first gene (e.g., alpha-myosin heavy chain (αMHC) promoter);
(b) performing affinity chromatography to isolate transcription factors which interact with the promoter under given physiological conditions;
(c) separating constituent proteins of said isolated transcription factors (e.g., using two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) and/or multidimensional chromatography);
(d) identifying at least a partial amino acid sequence for a plurality of the separated proteins (e.g., using mass spectroscopy);
(e) generating DNA sequences encoding the amino acid sequences;
(f) determining one or more second genes that encode one or more sequenced proteins (e.g., by performing a homology search against genomic databases);
(g) specifying promoters for one or more second genes (e.g., so that only one promoter of a set of promoters with greater than 80% homology is used in subsequent steps), and
(h) repeating steps (a)-(g) for at least one specified promoter to identify the regulatory pathway (preferably, until feedback loops are identified for all promoters). In a preferred embodiment, the affinity chromatography is performed by: (i) immobilizing the promoter (preferably, using biotin-streptavidin); (ii) incubating the immobilized promoter with the nuclear extract to allow binding of all transcription factor complexes which interact with the promoter under given physiological conditions; (iii) eluting the unbound components of the nuclear extract, and (iv) eluting the bound transcription factor complexes.
In a specific embodiment, in the IP AC method of the invention, prior to the second elution step, the transcription factor complexes which interact with the promoter are subjected to cross-linking, and, between steps (c) and (d), the separated proteins (e.g., separated using liquid phase isoelectric focusing or glycerol gradient separation) are subjected to a partial proteolysis (e.g., tryptic digest) followed by multidimensional chromatography to isolate proteolytic fragments.
In a preferred embodiment, in the IP AC method of the invention, the DNA sequences encoding the amino acid sequences are generated using a permutation algorithm (e.g., PANGSEQ) comprising the steps of:
(i) inputting the amino acid sequence;
(ii) identifying within the amino acid sequence a primary region, wherein the primary region includes a string of amino acids having the highest contiguous recurrence in the sequence; (iii) generating all possible codon permutations that could encode the primary region; (iv) examining a gene database to identify DNA sequences that match the codon permutations that could translate into the primary region; (v) selecting a subset of the gene database, wherein the subset contains genes that could translate into the primary region; (vi) identifying within the amino acid sequence a secondary amino acid apart from the primary region; (vii) determining a positional relationship of the secondary amino acid with respect to the primary region, and (vii) searching the subset of the gene database for DNA sequences that include codon permutations that could encode both the primary region and the secondary amino acid, at the determined positional relationship.
In a separate embodiment, the method of the invention further comprises the determination of when the synthesis of the constituent proteins of the isolated transcription factors is turned on (e.g., by preparing nuclear extracts after pulse-chase labeling of proteins with stable isotopically labeled phenylalanine (Phe) and leucine (Leu).
In another embodiment, the method of the invention further comprises the analysis of the compartmental distribution of the isolated transcription factors.
In yet another embodiment, the method of the invention further comprises the identification of the phosphorylation state of the constituent proteins of the isolated transcription factor complexes.
The invention further provides a method for identifying a component of a trans-regulatory pathway or network for the expression of a gene under specific physiological or pathophysiological conditions, comprising (i) identifying a first trans-regulatory pathway or network for the expression of the gene in the absence of said specific physiological or pathophysiological conditions using the method of claim 1 ; (ii) identifying a second trans-regulatory pathway or network for the expression of the gene in the presence of said specific physiological or pathophysiological conditions using the method of claim 1 ; (iii) comparing the two pathways to identify the components, which are present in the second pathway but not in the first, and (iv) comparing the two networks to determine which combinations of components are different between the two networks.
The present invention also provides a database of trans-regulatory gene expression networks comprising data obtained using the IP AC method disclosed above (e.g., wherein the data is obtained for a number of different genes and/or for a number of different physiological or pathophysiological conditions).
In conjunction with the database, provided is the use of the database (i) to predict the effect of a drug on a cell under specific physiological or pathophysiological conditions and (ii) to identify changes in gene expression and/or patterns of gene and protein expression occurring in a disease, during a tissue differentiation, or during embryonic development. Further provided herein is a method for validating a trans-regulatory gene expression network identified using the IP AC method of the invention, comprising perturbing the expression of at least one gene in the network.
In a specific embodiment, the present invention provides a method for constructing a trans-regulatory gene expression network identified using the IP AC method of the invention, comprising perturbing the biological system under study by suppressing expression of a gene when the gene is not in the network. Also provided are: (i) a method for constructing a trans-regulatory gene expression network identified using the IP AC method of the invention, comprising perturbing the biological system under study by increasing expression of a gene when the gene is not in the network; (ii) a method for constructing a trans-regulatory gene expression network identified using the IP AC method of the invention, comprising perturbing the biological system under study by suppressing expression of a gene in the network using gene knock-out technology;
(iii) a method for constructing a trans-regulatory gene expression network identified using the IP AC method of the invention, comprising perturbing the biological system under study by suppression of a gene, when the gene is not in the network, using gene knock-out technology;
(iv) a method for constructing a trans-regulatory gene expression network identified using the IP AC method of the invention, comprising perturbing the biological system under study by increasing expression of a gene in the network using antisense technology;
(v) a method for constructing a trans-regulatory gene expression network identified using the IP AC method of the invention, comprising perturbing the biological system under study by increasing expression of a gene, when the gene is not in the network, using antisense technology; (vi) a method for constructing a trans-regulatory gene expression network identified using the IP AC method of the invention, comprising perturbing the biological system under study by treating the system with a biologically active compound (e.g., wherein the biological system is cardiac myocytes and the biologically active compound is selected from the group consisting of angiotensin II, norepinephrine, phenylephrine, and endothelin);
(vii) a method for constructing a trans-regulatory gene expression network identified using the IP AC method of the invention, comprising analysis of the biological system under study by constructing trans-regulatory networks at different time points during the development of the biological system;
(viii) a method for constructing a trans-regulatory gene expression network identified using the IP AC method of the invention wherein an operator inputs a perturbation to the network, and the network is changed by a transition function to a different network such that the different network is the same as that which occurred in the real world experiment, and
(ix) a method for constructing a trans-regulatory gene expression network identified using the IP AC method of the invention, wherein an operator inputs a perturbation to the network, and the network is changed by a transition function to a different network such that the different network is similar within parameters set by a probability function to that which would be found in the biological system if subjected to the perturbation in a real world experiment.
These and other aspects and advantages of the invention will be better understood ence to the Drawings, Detailed Description, and Examples.
DESCRIPTION OF THE DRAWINGS Figure 1. Schematic representation of the platform technology of the invention. IP AC (Iterative Promoter Affinity Chromatography) forms the core of the technology, and PANGSEQ (Permutation Analysis of Genomic Sequence) is used to locate coding sequences for proteins identified with IP AC. Pulse chase experiments with stable isotopically labeled phenylalanine and leucine are used to determine when synthesis of particular transcription factors are turned on. Analysis of the relocation of transcription factors (TFs) from the cytosol to the nucleus and changes in phosphorlyation states of TFs are examined to permit construction of more predictive state transition functions of the trans-regulatory gene expression networks.
Figure 2. Schematic representation of IP AC (Iterative Promoter Affinity Chromatography) technology. An initial promoter is used to trap transcription factors (TFs), which interact with it. The TFs are eluted from the promoter and the constituent proteins separated by 2D-PAGE. Separated proteins are partially sequenced using mass spectroscopy (MS), and the resulting sequences are used to identify the corresponding genes. Promoters of identified genes are cloned and used for repeat promoter chromotography. The cycle is repeated, e.g., until feedback loops are identified for all promoter-TFs interactions.
Figure 3. Gel elctrophoresis analysis showing a comparison of the amount of αMHC promoter PCR product before and after streptavidin beads pull-down. Lane 1, DNA marker; Lane 2, biotinylated αMHC promoter; Lane 3, residual promoter in solution after streptavidin bead capture.
Figures 4 A and 4B. 2D-PAGE analysis of the cardiomyocyte nuclear extracts before (A) and after (B) αMHC promoter affinity chromatography.
Figure 5. MEF2 Western blot analysis before and after αMHC promoter affinity chromatography demonstrating efficacy and specificity of the method. Lane 1, nuclear extracts; 2 and 3, flow through from the affinity beads; 4, affinity purified MEF2. Figure 6. Schematic representation which outlines the methodology of PANGSEQ (Permutation Analysis of Genomic Sequence), which is used to locate coding sequences for proteins identified with IPAC. A splice junction handler (SJH) is used to identify intron/exon junctions during both intitial homology search and start codon search. For more details on this technology see commonly owned PCT Publication No. WO 01/96980.
Figure 7. Schematic representation of the methodology combining IPAC with the analysis of the time course of TFs protein synthesis. Time course of TFs protein synthesis is examined by pulse chase experiments with stable isotopically labeled Phe and Leu. Cells are incubated with these isotopes and nuclear extracts are harvested at serial time points. The time course of protein synthesis of particular TFs is detected by mass shifts of MS spectra due to the incorporation of stable isotopically labeled Phe and Leu.
Figure 8. Schematic representation of the use of cross-linking to define specific protein-protein interactions within a TF complex (TFC). A. Promoter Affinity Chromatography resulting in binding of two distinct TFCs to the same promoter. B. Elution and 2D-PAGE of TFCs results in loss of information regarding distinct protein-protein interactions within the TFCs. C. Cross-linking of TFCs prior to elution from promoter. D. Separation of high molecular weight species resulting from cross-linked TFCs cannot be accomplished by a traditional 2D-PAGE, but can be achieved using liquid phase isolectric focusing or glycerol gradient separation. E. After the separation of cross-linked TFCs, tryptic digest is performed. F. Multi-dimensional chromatography (MDC) can be used to separate tryptic fragments prior to injection in ESI MS/MS.
Figure 9. Schematic representation of multi-dimensional chromatography (MDC) prior to ES MS/MS. DETAILED DESCRIPTION
The present invention advantageously provides a novel proteomics- and genomics-based technology based on Iterative Promoter Affinity Chromatography (IPAC) and, optionally, a Reverse Translation Algorithm Permutation Analysis of Genomic Sequences (PANGSEQ). This technology permits the development of trans-regulatory networks for the predictive modeling of complex biological systems. The resulting model is reverse engineered from data generated by IPAC, tested through perturbation of specific nodes in the simulated system, and validated by perturbation of these same targets in the biological system itself.
IPAC forms the core of the disclosed novel platform technology for generating trans-regulatory networks. The overall scheme for this platform is represented in Figure 1. In a preferred embodiment, this process includes the following steps (see Figure 2): 1) promoter affinity chromatography is used to identify transcription factor complexes (TFCs) which regulate expression of a gene of interest; 2) the proteins which comprise the TFC are separated by two- dimensional polyacrylamide gel electrophoresis (2D-PAGE) and/or multidimensional chromatography and partially sequenced with mass spectroscopy (MS); 3) a permutation algorithm (e.g., PANGSEQ) generates all possible DNA sequences which could code for the partial protein sequences; 4) a homology search against genomic databases identifies the genes coding for the partially sequenced proteins; 5) the promoters from these genes are identified and cloned; 6) promoters with greater than 80% homology are identified and filtered so that only one of a set of highly homologous promoters is used in subsequent steps; 7) new promoter affinity columns are manufactured for the filtered promoters of each protein in the original TFC; 8) promoter affinity chromatography is performed in parallel with all of the newly generated columns, and 9) the cycle is repeated until feedback loops are identified for all promoters. The present disclosure systemically reviews each component and its integration into the overall approach for high throughput data acquisition.
Additional methodologies can be used in conjunction with IPAC technology of the present invention.
PANGSEQ is the first reverse translation genomic search engine designed specifically for a proteomics strategy. This method is described in detail in the co-owned PCT Publication No. WO 01/96860, which is incorporated herein by reference in its entirety and attached as Appendix A. Other reverse translation approaches can be used and are within the scope of the present invention. Potential advantages of PANGSEQ are summarized in Figure 6. The difficulty in using a reverse translation search engine is related to the fact that several codons may translate into the same amino acid. Thus as one increases the number of amino acids in an input protein sequence, the number of cDNA permutations which could be translated into the input amino acid string grows to a very large number. The PANGSEQ algorithm overcomes this problem by performing permutations on a subset of the amino acids in any given input partial protein sequence and searching the genome with these permutated subsets which are at a known distance from each other. Because the protein sequence is known, the search can be set to require 100% homology at the required intervals (though the percent homology requirement can be relaxed). The set of matching sequences are then scanned to determine if they translate into the entire input amino acid sequence. Because the program localizes coding sequence in genomic DNA it also identifies the 5'UTR which contains the gene's promoter. Additional analysis of these regions can be performed with either public domain motif search engines or commercially available programs such as Genematix, etc.
Cross-linking to define specific protein-protein interactions in a TFC. If more than one TFC binds to a given promoter, 2D-P AGE will result in hierarchical loss of distinction between the complexes: all the proteins in the different complexes will be distributed based on size and pi without separation based on protein-protein interactions. Thus it will not be possible to identify which proteins belong to which complex: they will all be mixed together. There are two possible approaches to solving this problem: (i) prediction of protein-protein interactions, or (ii) cross-linking experiments. Because the tools available to accurately predict protein-protein interactions based on protein sequence are not currently available, the present inventors have chosen the second option. Because cross-linking will result in high molecular weight species, we will use multi-dimensional chromatography as an alternative to 2D-PAGE prior to ES-MS/MS. The conceptual framework for this problem and its proposed solution by cross-linking is shown in Figure 8. To demonstrate the relationship of each component in a TFC and to probe protein surface topology, cross-linking reagents are used. The chemical cross-linking of protein components following exposure to a bi-functional reagent implies that only the polypeptides involved in the interaction were contiguous during the course of the reaction (Lomant AJ, Fairbanks G, J Mol Biol. 1976;104:243-61). To optimize the cross-linking reactions, the following cross-linkers can be used: succinimidyl-4-(N-maleimido-methyl) cyclohexane-1-carboxylate (SMCC), disuccinimidyl suberate (DSS), bismaleimidohexane (BMH), bis[b-(4 azidosalizylamido)ethyl] disulfide (BASED), bis(sulfosuccinimidyl) suberate (BS3), and dimethyl adipimidate (DMA). SDS-PAGE and/or high-resolution two-dimensional gels cen be used on the transcription factor mixture with and without cross-linking (Shevchenko et al., Anal Chem. 1996;68:850-8). Alternative methods, including multi-dimensional chromatography coupled with mass spectrometry (Washburn et al., Nat Biotechnol. 2001;19:242-7) and isotope-coded affinity tag (ICAT) (Gygi et al., Nat Biotechnol., 1999;17:994-9) can be also used.
Pulse-chase experiments to determine the onset of TF synthesis. Pulse-chase experiments constitute a method for temporal analysis of protein expression (TAPE). Typically, the biological system under study is pulsed with a labeled amino acid at a selected time point and then "chased" with an excess of unlabeled amino acid. The incorporation of the labeled amino acid in a particular protein rises and falls in relation to that protein's rate of synthesis. The pulse-chase technique generates a sigmoid shape curve for label incorporation. The inflection point in the sigmoidal incorporation curve of each protein marks the maximal uptake of the label. The length of the delay in the apparent chase time reflects the position of a particular protein in the synthetic pathway (Ferguson P, et al., J Mol Biol. 2000;297:99-117).
Determination of the phosphorylation state of TFs. A major class of proteins has been described which dwell in the cytoplasm of cells but which translocate to the nucleus when phosphorylated. These proteins, called "signal transucers and activators of transcription" (STATS) can be phosphorylated by kinases which are activated by several different types of cell surface receptors. A tyrosine phosphorylation event causes dimerization of the STAT protein and, thereby, translocation to the nucleus where the STAT protein binds distinctive response elements to activate transcription. Additional serine phosphorylation may further modify induction of transcription (Horvath C, et al., Current Opinion in Cell Biol. 1997;9:233-239.). Detection of the phosphorylation state of IPAC purified TF complexes can be performed according to the method of Annan et al. (Annan R, et al., Anal Chem. 2001;73:393-404) with ESI MS/MS.
Compartmental analysis of TF distribution. Several important classes of transcription factors reside in the cytoplasm and translocate at a later time into the nucleus. Translocation typically requires a signal transduction event to occur. Thus there is not necessarily a direct temporal correlation between synthesis of trans-regulatory elements and gene expression. In order to generate a timeline for TF synthesis it is necessary to perform pulse chase experiments combined with compartmental analysis (e.g., to determine the time when TF reserves in the cytoplasm are synthesized).
The technology proposed herein examines DNA-protein, and protein-protein interactions of transcription factor complexes. Most efforts to model gene expression networks propose to use data solely from changes in mRNA abundance over time or between conditions. In contrast, our approach interrogates the time course of protein synthesis (temporal analysis of protein expression-TAPE), and/or differences in protein populations between conditions, In our technique, the discovered proteins drive the growth of a DNA promoter "tree." The DNA elements are manufactured as tools to identify additional trans-regulatory protein complexes. The technology, once fully established, could operate independently once initial parameters are set. The manufacturing steps required can be automated and driven by the bioinformatic discovery- steps. Different models could be generated depending on the starting promoter chosen. Intersections of the different models could be identified to build more complex systems. These systems could be used to simulate complex biological processes. The usefulness of such systems will depend on their ability to predict the response to perturbations. Such predictive power is essential to the utility of simulated biological systems: do they predict changes which actually occur? The simulated system must be engineered in such a way that parameters can be defined to test it. The disclosed technology is testable at least at two levels: at the level of gene expression, and at the level of protein expression (phenotype). Consider a protein Tfϊ in a TF complex. If this protein is eliminated, it should effect the rate of transcription of genes dependent on this TF complex. In the case where the TF complex is an activator, the mRNA abundance of this set of genes should be decreased when protein Tfi is not present (i.e., when transcription of the Tfi gene is suppressed or the Tfi gene is knocked out). In the case where the TF complex is an inhibitor of transcription, the mRNA abundance of the set of genes predicted to be regulated by it should be increased when protein Tfi is not present. Thus, a validation test of IPAC is to determine if predicted changes in expression of genes j-k with perturbation of a particular TFi actually occur. Because of these validation experiments the system is also able to predict the phenotype associated with Tfi suppression and consequent effects of altered gene j-k expression. Other perturbations of Tfi could be performed, including overexpression, mutation, or altered phosphorylation. Overexpression of Tfi could be achieved by infection of cardiac myocytes with an adenoviral construct containing the gene for Tfi; the adenoviral construct could contain a Tfi gene modified by site-directed mutagenesis. Decreased Tfi expression could be achieved by "knock-out" technologies. These perturbations can be tested by quantitative measurement of the changes in mRNA abundance of the genes predicted to be effected by this TF complex before and after modification of Tfi concentration or function.
To date, the major limitation of modeling complex biological systems has been the lack of data that can actually be modeled. The technology disclosed in the present invention allows to generate the required data. Because of the potential complexity of the data, the data may require the development of novel modeling strategies. A model of six iterations of IPAC is outlined in Table 1. A state transition function can be defined which generates the network at timepoint 2 based solely on input from timepoint 1 (i.e., the function generates state j+1 with data input solely from state j.) Each variable may be considered a dimension in an n-dimensional space. Each state transition function may be described as a vector with n dimensions between two contiguous states. The dimensions the present technology measures are as follows: 1) time; 2) promoter (DNA); 3) transcription factor (TF, protein); 4) phosphorylation state of TF; 5) location of TF (compartment) 6) gene expression level (mRNA abundance). Each state is associated with a phenotype (the output, which could be the level of a particular protein "marker" or a biological function or response). Thus the present model exists in a six-dimensional "space" with one or more outputs. For representational simplicity, Table 1 shows an example of a three-dimensional matrix:
Time ε{Tl, T2); Promoter ε {Px, PI, P2, P3, P4, P5, P6}; Transcription Factor ε { F1, F2, F3, F4, F5, F6}.
Vector Sj ->Sj+1 is defined by the state transition function, f(Tj5 Pj5 Fj).
Table 1. Hypothetical 3 dimensional matrix defining two states of a trans-regulatory network. See text for details. A "1" indicates activation of transcription, a "0" indicates no interaction, a "-1" indicates suppression of transcription.
Figure imgf000017_0001
Defϊnitions
Within the meaning of the present invention, the term "network" is defined as a set of objects which exist in specific states, and which interact with each other to change those states. An interaction can be described as an input output function and can be represented as a "vector" connecting two objects. An interaction can be further defined as inhibitory or stimulatory. The number of interactions required to change the state of an object is defined by rules which are given ordinal assignments. For example, a rule may be that for object A to change state (i.e., be increased) the net input/output sum must be greater than or equal to 2. As disclosed herein, this definition of a network is applied to genes whose expression regulate each other through trans-acting proteins.
As used herein, the term "trans" in "trans-regulatory pathway" or "trans-regulatory network" for the expression of a gene is used to refer to proteins, which bind DNA regulatory elements or bind other proteins which directly or indirectly bind DNA regulatory elements (e.g., promoters).
Within the meaning of the present invention, the term "perturbing" generally means changing. When used in connection with gene expression, this term encompasses both suppressing and increasing the expression.
As used herein, the term "isolated" means that the referenced material is removed from the environment in which it is normally found. Thus, an isolated biological material can be free of cellular components, . e. , components of the cells in which the material is found or produced. In the case of nucleic acid molecules, an isolated nucleic acid includes a PCR product, an isolated mRNA, a cDNA, or a restriction fragment. In another embodiment, an isolated nucleic acid is preferably excised from the chromosome in which it may be found, and more preferably is no longer joined to non-regulatory, non-coding regions, or to other genes, located upstream or downstream of the gene contained by the isolated nucleic acid molecule when found in the chromosome. In yet another embodiment, the isolated nucleic acid lacks one or more introns. Isolated nucleic acid molecules include sequences inserted into plasmids, cosmids, artificial chromosomes, and the like. Thus, in a specific embodiment, a recombinant nucleic acid is an isolated nucleic acid. An isolated protein may be associated with other proteins or nucleic acids, or both, with which it associates in the cell, or with cellular membranes if it is a membrane-associated protein. An isolated organelle, cell, or tissue is removed from the anatomical site in which it is found in an organism. An isolated material may be, but need not be, purified.
The term "purified" as used herein refers to material that has been isolated under conditions that reduce or eliminate the presence of unrelated materials, i. e. , contaminants, including native materials from which the material is obtained. For example, a purified protein is preferably substantially free of other proteins or nucleic acids with which it is associated in a cell; a purified nucleic acid molecule is preferably substantially free of proteins or other unrelated nucleic acid molecules with which it can be found within a cell. As used herein, the term "substantially free" is used operationally, in the context of analytical testing of the material. Preferably, purified material substantially free of contaminants is at least 50% pure; more preferably, at least 90% pure, and more preferably still at least 99% pure. Purity can be evaluated by chromatography, gel electrophoresis, immunoassay, composition analysis, biological assay, and other methods known in the art.
Methods for purification are well-known in the art. For example, nucleic acids can be purified by precipitation, chromatography (including preparative solid phase chromatography, oligonucleotide hybridization, and triple helix chromatography), ultracentrifugation, and other means. Polypeptides and proteins can be purified by various methods including, without limitation, preparative disc-gel electrophoresis, isoelectric focusing, HPLC, reversed-phase HPLC, gel filtration, ion exchange and partition chromatography, precipitation and salting-out chromatography, extraction, and countercurrent distribution. For some purposes, it is preferable to produce the polypeptide in a recombinant system in which the protein contains an additional sequence tag that facilitates purification, such as, but not limited to, a polyhistidine sequence, or a sequence that specifically binds to an antibody, such as FLAG and GST. The polypeptide can then be purified from a crude lysate of the host cell by chromatography on an appropriate solid-phase matrix. Alternatively, antibodies produced against the protein or against peptides derived therefrom can be used as purification reagents. Cells can be purified by various techniques, including centrifugation, matrix separation (e.g., nylon wool separation), panning and other immunoselection techniques, depletion (e.g., complement depletion of contaminating cells), and cell sorting (e.g., fluorescence activated cell sorting [FACS]). Other purification methods are possible. A purified material may contain less than about 50%, preferably less than about 75%, and most preferably less than about 90%, of the cellular components with which it was originally associated. The "substantially pure" indicates the highest degree of purity which can be achieved using conventional purification techniques known in the art.
In a specific embodiment, the term "about" or "approximately" means within 20%, preferably within 10%, and more preferably within 5% of a given value or range. Alternatively, especially in biological systems, the term "about" means within about a log (i.e., an order of magnitude) preferably within a factor of two of a given value, depending on how quantitative the measurement.
A "sample" as used herein refers to a biological material which can be tested for the presence of a protein or a nucleic acid. Such sample can be obtained from any source, including tissue, bone marrow, blood and blood cells, plural effusions, cerebrospinal fluid (CSF), ascites fluid, and cell culture.
Non-human animals include, without limitation, laboratory animals such as mice, rats, rabbits, hamsters, guinea pigs, etc.; domestic animals such as dogs and cats; and, farm animals such as sheep, goats, pigs, horses, and cows.
As used herein, the term "homologous" in all its grammatical forms and spelling variations refers to the relationship between proteins that possess a "common evolutionary origin," including proteins from superfamilies (e.g., the immunoglobulin superfamily) and homologous proteins from different species (e.g., myosin light chain, etc.) (Reeck et al, Cell 50:667, 1987). Such proteins (and their encoding genes) have sequence homology, as reflected by their sequence similarity, whether in terms of percent similarity or the presence of specific residues or motifs at conserved positions.
Accordingly, the term "sequence similarity" in all its grammatical forms refers to the degree of identity or correspondence between nucleic acid or amino acid sequences of proteins that may or may not share a common evolutionary origin (.yee Reeck et al. , supra). However, in common usage and in the instant application, the term "homologous," when modified with an adverb such as "highly," may refer to sequence similarity and may or may not relate to a common evolutionary origin. Sequence comparison algorithms used to determine sequence similarity include, but are not limited to, BLAST (BLAST P, BLAST N, BLAST X), FASTA, DNA Strider, the GCG (Genetics Computer Group, Program Manual for the GCG Package, Version 7, Madison, Wisconsin) pileup program, etc. using the default parameters provided with these algorithms. Sequences that are substantially homologous can be identified by comparing the sequences using standard software available in sequence data banks, or in a Southern hybridization experiment under, for example, stringent conditions as defined for that particular system.
In accordance with the present invention there may be employed conventional molecular biology, microbiology, and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature. See, e.g. , Sambrook, Fritsch & Maniatis, Molecular- Cloning: A Laboratory Manual, Second Edition (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York (herein "Sambrook et al. , 1989"); DNA Cloning: A Practical Approach, Volumes I and II (D.N. Glover ed. 1985); Oligonucleotide Synthesis (M.J. Gait ed. 1984); Nucleic Acid Hybridization [B.D. Hames & S.J. Higgins eds. (1985)]; Transcription And Translation [B.D. Hames & S.J. Higgins, eds. (19S4)]; Animal Cell Culture [R.I. Freshney, ed. (1986)]; Immobilized Cells And Enzymes [IRL Press, (1986)]; B.EPerbal, A Practical Guide To Molecular Cloning (1984); F.M. Ausubel etα/. (eds.), Current Protocols in Molecular Biology, John Wiley & Sons, Inc. (1994).
A "nucleic acid molecule" refers to the phosphate ester polymeric form of ribonucleosides (adenosine, guanosine, uridine or cytidine; "RNA molecules"); or deoxyribonucleosides (deoxyadenosine, deoxyguanosine, deoxythymidine, or deoxycytidine; "DNA molecules"); or any phosphoester analogs thereof, such as phosphorothioates and thioesters, in either single stranded form, or a double-stranded helix; or "protein nucleic acids" (PNA) formed by conjugating bases to an amino acid backbone; or nucleic acids containing modified bases, for example thiouracil, thio-guanine and fluoro-uracil. Double stranded DNA-DNA, DNA-RNA and RNA-RNA helices are possible. The term nucleic acid molecule, and in particular DNA or RNA molecule, refers only to the primary and secondary structure of the molecule, and does not limit it to any particular tertiary forms. Thus, this term includes double-stranded DNA found, ter alia, in linear (e.g., restriction fragments) or circular DNA molecules, plasmids, and chromosomes. In discussing the structure of particular double-stranded DNA molecules, sequences may be described herein according to the normal convention of giving only the sequence in the 5' to 3' direction along the nontranscribed strand of DNA (i.e., the strand having a sequence homologous to the mRNA). A "recombinant DNA molecule" is a DNA molecule that has undergone a molecular biological manipulation.
A "polynucleotide" or "nucleotide sequence" is a series of nucleotide bases (also called "nucleotides") in DNA and RNA, and means any chain of two or more nucleotides. A nucleotide sequence typically carries genetic information, including the information used by cellular machinery to make proteins and enzymes. These terms include double or single stranded genomic and cDNA, RNA, any synthetic and genetically manipulated polynucleotide, and both sense and anti-sense polynucleotide (although only sense stands are being represented herein). This includes single- and double-stranded molecules, i.e., DNA-DNA, DNA-RNA and RNA-RNA hybrids.
The polynucleotides herein may be flanked by natural regulatory (expression control) sequences, or may be associated with heterologous sequences, including promoters, internal ribosome entry sites (IRES) and other ribosome binding site sequences, enhancers, response elements, suppressors, signal sequences, polyadenylation sequences, introns, 5'- and 3'- non-coding regions, and the like. The nucleic acids may also be modified by many means known in the art. Non-limiting examples of such modifications include methylation, "caps", substitution of one or more of the naturally occurring nucleotides with an analog, and internucleotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoroamidates, carbamates, etc.) and with charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.). Polynucleotides may contain one or more additional covalently linked moieties, such as, for example, proteins (e.g. , nucleases, toxins, antibodies, signal peptides, poly-L- lysine, etc.), intercalators (e.g., acridine, psoralen, etc.), chelators (e.g., metals, radioactive metals, iron, oxidative metals, etc.), and alkylators. The polynucleotides may be derivatized by formation of a methyl or ethyl phosphotriester or an alkyl phosphoramidate linkage. Furthermore, the polynucleotides herein may also be modified with a label capable of providing a detectable signal, either directly or indirectly. Exemplary labels include radioisotopes, fluorescent molecules, biotin, and the like. A "coding sequence" or a sequence "encoding" an expression product, such as a RNA, polypeptide, protein, or enzyme, is a minimum nucleotide sequence that, when expressed, results in the production of that RNA, polypeptide, protein, or enzyme, i.e., the nucleotide sequence encodes an amino acid sequence for that polypeptide, protein or enzyme. A coding sequence for a protein may include a start codon (usually ATG, though as shown herein, alternative start codons can be used) and a stop codon.
The term "gene", also called a "structural gene" means a DNA sequence that codes for a particular sequence of amino acids, which comprise all or part of one or more proteins or enzymes, and may include regulatory (non-transcribed) DNA sequences, such as promoter sequences, which determine for example the conditions under which the gene is expressed. The transcribed region of the gene may include untranslated regions, including introns, 5 '-untranslated region (UTR), and 3'-UTR, as well as the coding sequence.
A term "promoter sequence'Or "cis-acting element" is used herein to refer to a DNA regulatory region, which is contiguous to the coding region of the DNA and is capable of binding RNA polymerase and initiating transcription of a downstream (3' direction) coding sequence. Within the promoter sequence will be found a transcription initiation site (conveniently defined for example, by mapping with nuclease SI), as well as protein binding domains (consensus sequences) responsible for the binding of RNA polymerase and various transcription factors (TFs). These TFs are also termed "trans-acting elements".
The terms "express" and "expression" mean allowing or causing the information in a gene or DNA sequence to become manifest, for example producing a protein by activating the cellular functions involved in transcription and translation of a corresponding gene or DNA sequence. A DNA sequence is expressed in or by a cell to form an "expression product" such as mRNA or a protein. The expression product itself, e.g. the resulting mRNA or protein, may also be said to be "expressed" by the cell.
The terms "vector", "cloning vector" and "expression vector" mean the vehicle by which a DNA or RNA sequence (e.g. a foreign gene) can be introduced into a host cell, so as to transform the host and promote expression (e.g. transcription and translation) of the introduced sequence. Vectors include plasmids, phages, viruses, etc.; they are discussed in greater detail below. As used herein, the term "oligonucleotide" refers to a nucleic acid, generally of at least 10, preferably at least 15, and more preferably at least 20 nucleotides, preferably no more than 100 nucleotides, that is hybridizable to a genomic DNA molecule, a cDNA molecule, or an mRNA molecule encoding a gene, mRNA, cDNA, or other nucleic acid of interest. Oligonucleotides can be labeled, e.g., with 32P -nucleotides or nucleotides to which a label, such as biotin, has been covalently conjugated. Generally, oligonucleotides are prepared synthetically, preferably on a nucleic acid synthesizer. Accordingly, oligonucleotides can be prepared with non-naturally occurring phosphoester analog bonds, such as thioester bonds, etc.
The present invention provides antisense nucleic acids (including ribozymes), which may be used to inhibit expression of genes and proteins of the invention. An "antisense nucleic acid" is a single stranded nucleic acid molecule which, on hybridizing under cytoplasmic conditions with complementary bases in an RNA or DNA molecule, inhibits the latter's role. If the RNA is a messenger RNA transcript, the antisense nucleic acid is a countertranscript or mRNA-interfering complementary nucleic acid. As presently used, "antisense" broadly includes RNA-RNA interactions, RNA-DNA interactions, triple helix interactions, ribozymes and RNase-H mediated arrest. Antisense nucleic acid molecules can be encoded by a recombinant gene for expression in a cell (e.g., U.S. Patent No. 5,814,500; U.S. Patent No. 5,811,234), or alternatively they can be prepared synthetically (e.g., U.S. Patent No. 5,780,607).
A "knock-in" mammal is a mammal in which an endogenous gene is substituted with a heterologous gene (Roemer et al, New Biol., 3:331, 1991). Preferably, the heterologous gene is "knocked-in" to a locus of interest, either the subject of evaluation (in which case the gene may be a reporter gene; see Elefanty et al, Proc. Natl. Acad. Sci. USA, 95:11897,1998) of expression or function of a homologous gene, thereby linking the heterologous gene expression to transcription from the appropriate promoter. This can be achieved by homologous recombination, transposon (Westphal and Leder, Curr. Biol., 7:530, 1997), using mutant recombination sites (Araki et al, Nucleic Acids Res 25: 868, 1997) or PCR (Zhang and Henderson, Biotechniques, 25:784, 1998). See also, Coffman, Semin. Nephrol, 17:404, 1997; Esther etal, Lab. Invest., 74:953, 1996; Murakami et al, Blood Press. Suppl., 2:36, 1996. A "knockout mammal" is a mammal (e.g., mouse) that contains within its genome a specific gene that has been inactivated by the method of gene targeting (see, e.g., US Patents No. 5,777,195 and No. 5,616,491). A knockout mammal includes both a heterozygous knockout (i.e., one defective allele and one wild-type allele) and a homozygous knockout (i.e., two defective alleles). Preparation of a knockout mammal requires first introducing a nucleic acid construct that will be used to suppress expression of a particular gene into an undifferentiated cell type termed an embryonic stem (ES) cell. This cell is then injected into a mammalian embryo. A mammalian embryo with an integrated cell is then implanted into a foster mother for the duration of gestation. Zhou, et al. (Genes and Development, 9:2623-34, 1995) describe PPCA knockout mice.
The term "knockout" refers to partial or complete suppression of the expression of at least a portion of a protein encoded by an endogenous DNA sequence in a cell. The term "knockout construct" refers to a nucleic acid sequence that is designed to decrease or suppress expression of a protein encoded by endogenous DNA sequences in a cell. The nucleic acid sequence used as the knockout construct is typically comprised of (1) DNA from some portion of the gene (exon sequence, intron sequence, and/or promoter sequence) to be suppressed and (2) a marker sequence used to detect the presence of the knockout construct in the cell. The knockout construct is inserted into a cell, and integrates with the genomic DNA of the cell in such a position so as to prevent or interrupt transcription of the native DNA sequence. Such insertion usually occurs by homologous recombination (i.e., regions of the knockout construct that are homologous to endogenous DNA sequences hybridize to each other when the knockout construct is inserted into the cell and recombine so that the knockout construct is incorporated into the corresponding position of the endogenous DNA). The knockout construct nucleic acid sequence may comprise (1) a full or partial sequence of one or more exons and/or introns of the gene to be suppressed, (2) a full or partial promoter sequence of the gene to be suppressed, or (3) combinations thereof. Typically, the knockout construct is inserted into an embryonic stem cell (ES cell) and is integrated into the ES cell genomic DNA, usually by the process of homologous recombination. This ES cell is then injected into, and integrates with, the developing embryo.
Included within the scope of this invention is a mammal in which two or more genes have been knocked out or knocked in, or both. Such mammals can be generated by repeating the procedures set forth herein for generating each knockout construct, or by breeding to mammals, each with a single gene knocked out, to each other, and screening for those with the double knockout genotype.
Regulated knockout animals can be prepared using various systems, such as the tet- repressor system (see US Patent No. 5,654,168) or the Cre-Lox system (see US Patents No. 4,959,317 and No. 5,801,030).
EXAMPLES
The present invention will be better understood by reference to the following Examples, which are provided by way of illustration of the invention and are not intended to limit it.
EXAMPLE 1: Validation of IPAC Technology for α-Mvosin Heavy Chain Promoter
Materials and Methods Isolation of Human Heart Cardiomyocytes. Human heart tissue is available through collaboration with the Albany Medical Center Human heart tissue repository, Albany, NY. At the time of cardiac transplantation the explanted heart is obtained immediately in the OR and a direct intracoronary infusion of cardioplegia solution is administered. Ventricular myocytes are isolated as previously described by Dipla K et al (Dipla K, et al., Circ Res. 1999; 84: 435-444.) (Circ Res 1999;84:435-444). Briefly, IDC explanted hearts are used for the isolation of ventricular cardiomyocytes. Explanted hearts undergo ex vivo perfusion with cold, cardioplegic solution through the root of the aortic artery and transported to the laboratory in cold, Krebs-Henseleit (K-H) solution containing (inmM): glucose 12.5, KC15.4, lactic acid 1, MgSO4 1.2, NaCI 130, NaH2PO4 1.2, NaHCO3 25 and sodium pyruvate 2 (pH 7.4). The heart is perfused through a small catheter placed into the lumen of an artery (branch off the LAD or LCCA for left ventricular myocytes, RCA for right ventricular myocytes) with K-H solution containing 10 mM taurine (30 min, non-recirculating, 37°C). The heart is cut away from the perfused segment of the heart prior to perfusion with coUagenase. This segment is perfused for an additional 30 minutes with recirculating K-H solution containing 180 units/ml coUagenase, 20 mM BDM, 20 mM taurine and 50 ?M CaC12. After the coUagenase solution, the tissue is rinsed for 10 minutes with K-H solution containing 10 mM taurine, 20 mM BDM, and 200 ?M CaC12. The tissue is then minced and only the mid-myocardial cells used for further analysis. The cells are collected by centrifugation and resuspended in K-H solution containing 1% BSA, 10 mM taurine and 250 ?M CaC12.
Preparation rat neonatal cardiomyocyte nuclear exacts. All steps are performed at 4oC. Neonatal rat hearts are excised and the atria removed by dissection. The remaining ventricles of the hearts are rinsed, minced, and homogenized in 10 volumes of 0.25 M sucrose buffer. The crude mixture is centrifuged at 1000'g for 10 min. The resulting pellet is re-suspended in 0.25 M sucrose buffer containing 0.5% Triton X-100. The mixture is pelleted again at 1000'g for 10 min, which yields crude heart nuclei. Re-suspend the pellet in 20 ml of 2.2 M sucrose buffer and lay the nuclei re-suspension on a discontinuous sucrose density gradient consisting of a 7.5 ml cushion of 2.28 M sucrose, layered over a 9 ml cushion of 2.7 M sucrose. Centrifugation is done at 25 000 rpm in a Beckman SW 28 rotor at 4oC for 1 hr. The cardiac myocyte nuclei form a cloudy white band at the 2.28M /2.7 M interface. A wide mouth pipette is used to pipette out the nuclei. The nuclei are diluted in five volumes of the 0.25 M sucrose buffer supplemented with protease inhibitor, and centrifuged at 5000'g for 10 min. Finally, the nuclear pellet is re-suspended in 5 ml of the 0.25 M sucrose buffer containing protease inhibitor.
Magnetic Promoter Affinity Bead Preparation. The promoter region of interest is amplified by PCR using a biotinylated forward primer. The coupling of DNA to magnetic beads is done by incubation Dynabeads M-280 Streptavidin (10 mg) with the amplified biotinylated DNA promoter (50 mg). Finally, the beads are washed 3 times by re-suspension and magnetic separation in portion of 5 ml of in TEN buffer (10 mM Tris-HCl, 1 mM EDTA, and 100 mM NaCI, pH 7.9).
Affinity Purification Transcription Factor. Cardiomyocyte nuclear exacts are desalted using a PD 10 column (obtained from Pharmacia) equilibrated with the incubation buffer (25 mM HEPES, pH7.9, 100 mM KC1, 0.1 mM EDTA, 10% Glycerol, ImM DTT and 1 mM MgC12). After elution, the nuclear exacts are incubated with the custom-made magnetic promoter affinity beads overnight at 4oC. The beads are washed 5 times with the incubation buffer. The bound protein mixture (transcription factor mixture) is eluted by changing the concentration of KCl to l M.
Cross-linking reaction. To demonstrate the relationship of each component in a TF complex and to probe protein surface topology, cross-linking reagents are used. The chemical cross-linking of protein components following exposure to a bi-functional reagent implies that only the polypeptides involved in the interaction were contiguous during the course of the reaction (Lomant AJ, et al., J Mol Biol. 1976;104:243-61.). To optimize the cross-linking reactions, the following cross-linkers are used: succinimidyl-4-(N-maleimido-methyl) cyclohexane-1-carboxylate (SMCC), disuccinimidyl suberate (DSS), bismaleimidohexane (BMH), bis[b-(4 azidosalizylamido)ethyl] disulfide (BASED), bis(sulfosuccinimidyl) suberate (BS3), and dimethyl adipimidate (DMA). SDS-PAGE and/or high-resolution two-dimensional gels are used to separate the transcription factor mixture with and without cross-linking (Shevchenko A, et al., Anal Chem. 1996;68:850-8). However, use of ID and 2D-PAGE for analysis large protein conjugates (i.e., MW:>200kDa) is problematic because these large species do not migrate very well into the polyacrylamide gel. Therefore, alternative methods, including multi-dimensional chromatography coupled with mass spectrometry (Washburn MP, et al., Nat Biotechnol. 2001;19:242-7.) and isotope-coded affinity tag (ICAT) (Gygi SP, et al., Nat Biotechnol. 1999;17:994-9.) is used. For the ICAT approach, after the nuclear exacts are separated by the promoter affinity chromatography, the bound materials are eluted, followed by isotope-coded affinity tag, and trypsin digestion. The tagged tryptic peptides are separated and identified by affinity chromatography coupled with tandem mass spectrometry. For the multi-dimensional chromatography, the transcriptional factor mixture from the promoter affinity bead pull-down is digested by trypsin. The peptides are loaded directly onto a strong cation exchange (SCX) capillary column coupled with a reversed phase (RP) capillary column. After separation of the peptides with the SCX+RP column, Tandem mass spectra are generated. Partial amino acid sequences are deduced from the spectra by using Protein-Lynx TM or by doing de-novo sequencing.
ID or 2D PAGE. For sequence analysis, SDS-PAGE and/or high resolution two dimensional gel electrophoresis (2D-PAGE) are conducted in experiments which do not employ cross-linking. Basically, the following protocol is employed for the 2D-PAGE: the sample is concentrated, desalted, and reconstituted in 300 ml of sample buffer V (9.0 M urea, 4% grain CHAPS, 0.5% Triton X-100, 0.8% carrier ampholytes and 65 mM DTT) and a few of 0.2% Bromophenol Blue dye. Two-dimensional electrophoresis is performed with a PROTEANa IEF cell (Bio-Rad, CA USA), using pre-cast pH 3- 10 L immobilized linear gradient (IPG) strips (180'3 '0.5 mm) for the first dimension (isoelectric focusing, IEF), and sodium dodecyl sulfate (SDS) 8 % polyacrylamide gels (200'200'1.0 mm) for the second dimension (SDS-PAGE) using a Bio-Rad PROTEANa II xi 2D system. After 2D-PAGE, the protein spots are visualized with a modified silver-staining protocol (http://www.protana.com/PDF/ASMS/). All the spots are excised and followed by in-gel tryptic digestion (Lomant AJ, et al., J Mol Biol. 1976;104:243-61).
Multidimensional protein identification technology (MudPIT) (Link A, et al., Nature Biotechnology. 1999;17:676-681.). Protein mixture from the promoter affinity chromatography is digested by endoprotease (trypsin, Lys-C, Glu-C and Asp-N). Complex peptide mixtures are loaded onto a biphasic microcapillary column packed with strong cation exchange (SCX) and reverse-phase (RP) materials. Peptides are directly eluted into the mass spectrometry because a voltage (kV) supply is directly interfaced with the microcapillary column. Peptide is first displaced from SCX to RP by a salt gradient and eluted off the RP into the MS/MS. In an iterative process, the microcolumn is re-equilibrated and an additional salt step of higher concentration displace more tightly bound peptides from the SCX to the RP. The peptides are again eluted by an RP gradient into the MS/MS, and the process is repeated. The tandem mass spectra generated are correlated to theoretical mass spectra generated from protein or DNA databases by the ProteinLynx algorithm.
In 2D-PAGE, proteins are separated in one dimension by isoelectric point (pi) and in the other dimension by molecular weight (MW). Proteins with extremes in pi and molecular weight, and membrane-associated or bound proteins are rarely seen in a 2D-PAGE study. In this case, an alternative approach is employed (i.e. multi-dimensional chromatography coupled with mass spectrometry). Because the system is largely unbiased, protein from all sub-cellular portion of the cells with extremes in pi, MW, abundance, and hydrophobicity can be identified. As disclosed herein, preferably proteins in nuclear extracts are studied, and the main reason for using multi-dimensional chromatography is to allow analysis of cross-linked protein complexes. Microbore HPLC-Tandem Mass Spectrometry. To obtain structural information for each component of the transcriptional factor mixture by mass spectrometry, the molecule to be studied must undergo fragmentation of one or more bonds in a manner that ions are formed, the mass/charge (m/z) ratio of which can be related to the structural features. This can be accomplished in a tandem mass spectrometer. Basically, when a peptide is ionized by electrospray ("soft" ionization), it forms an (M+H)+ ion that is then mass analyzed to determine its molecular weight. To obtain the structure information of the peptide, the (M+H)+ ion exiting the first mass spectrometer (MSI) is passed into a region (collision cell) containing a neutral gas (He, Ar or Xe) at -10-3 torr. Upon collision with atom with low energy, the peptide (M+H)+ ion ("precursor ion") fragments primarily at amide bonds along the backbone, generating a ladder of sequence ions. These ions are then mass analyzed in the second mass spectrometry (MS2) of the tandem instrument. If the charge is retained on the N-terminal portion of the fragment ion after cleavage of the amide bond, thenb-type ions are formed; however, if the charge is retained on the C-terminal portion, y-type ions are formed. A complete series of either one or both ion types allows the determination of the amino acid sequence by subtraction of the masses of the adjacent sequence ions. Since it takes 1-2 min or less to record the spectrum, one can then set MSI for the next peptide ion and obtain its CID spectrum, and so on. However, as the sample is continuously ionizing, the (M+H)+ ions of the components not being analyzed at the time are lost. Thus, HPLC is used for fractionation of the proteolytic peptides before MS/MS is conducted. The tryptic peptides are analyzed and sequenced on LC-ESI-MS/MS system (Micromass Qtof 2, Micromass, UK) as described elsewhere (Gatlin CL, et al., Anal Biochem. 1998;263:93-101.). For the peptide separation with HPLC, Solvent A is 0.04% HAC + 0.005% Heptafluorobutyric acid + 5% acetonitrile, and solvent B is 0.04% HAC + 0.005% Heptafluorobutyric acid + 80% acetonitrile. A custom-packed Vydac Cl 8 column (75 mm i.d. ' 360mm o.d. ' 8 cm resin length) is used. The HPLC flow of 200 ul/min is split to a 200 nl/min microspray flow rate. After loading the sample, the column is washed with 100% solvent A for 10 minutes. Peptides are eluted with a linear gradient of 0-80% solvent B in 60 minutes. The outlet of the column is directly inserted into an electrospray needle (New Objective, Inc. Cambridge, MA). Electro-spray ionization is performed using the following conditions. The needle voltage is set at 1.8 kV. Mass spectra are acquired as peptides eluted from the LC by scanning a mass range of 400-2000 m/z. Tandem mass spectra are compared directly with amino acid sequence databases using a computer algorithm ProteinLynxTM.
The Non-Redundant protein (NRP 309353 entries, updated daily) and OWL (University of Leeds, Leeds, England) databases are obtained as ASCII text files in a FASTA format from the National Center for Biotechnology Information (NCBI) by anonymous ftp. Several criteria are used to evaluate the database search output and judge the confidence of the protein identification by tryptic peptide mapping: 1) the number of matching peptides and their deviation from the calculated mass; 2) the sequence coverage of the protein by the matching peptides; 3) the difference in the number of matched peptides for each candidate protein; and 4) the agreement of the experimental and theoretical pi and MW values. The protein identification is considered unambiguous when a minimum of 4-6 peptides are detected in a map, and the majority of the measured peptide masses match to the calculated masses within 50 ppm. In addition, more than 15-20% of the protein sequence should be covered by the matching peptides. When more than one candidate protein is retrieved by the search, a "gap" of at least two matching peptides should separate the identified protein from the next best candidate.
For unknown protein sequencing, the following strategy is used. The protein is digested separately with trypsin (or Endo Lys-C), which generates peptides with a basic amino acid at the C-terminus, and with Endo Glu-C, which provides peptides ending in glutamic acid (except the original C-terminus of the protein). Then two sets of very different and overlapping sets of proteolytic peptides are generated. LC-ESI-MS/MS is performed on each set of proteolytic peptides and CID spectra are obtained. Sequences derived from CID spectra are then assembled, making use of the overlapping peptides from the two digests, as well as the molecular weights of large peptides for which no sequences are available but that encompass two or more sequenced peptides. If some ambiguities still remain, such as two or more ways to arrange the order of the peptides, a third enzyme that has the specificity that would distinguish these remaining
Analysis of transcription factor phosphorylation. Phosphopeptide mapping of TF complexes purified by IPAC is performed according to the method of Annan et al. (Annan R, et al., Anal Chem. 2001;73:393-404.). This approach uses two orthogonal MS scanning techniques to detect phosphopeptide-specific marker ions at m/z 63 and/or 79 in the negative ion mode. The specificity of this technique is tested by pre-treating samples with specific phosphatases.. The sample of interest is treated with or without specific phosphatase followed by endoprotease digestion. The two sets of peptide are analyzed by mass spectrometry. Phosphopeptide maps of phosphatase treated and untreated TF complexes are compared. The 80-Da difference due to the removal of a phosphate moiety and the difference in retention times between the enzyme treated and untreated sample are used to identify the phosphopetides. Because of the low stoichiometry of the phosphorylation occurring on a phosphoprotein, and the roof effect from the large number of non-phosphorylated peptides, an alternative approach for enrichment of phosphopeptides may be necessary. Immobilized metal affinity chromatography (IMAC) has been shown to be useful since immobilized (Fe3+) ions selectively retain and pre-concentrate phosphorylation peptides. Basically, after the TF complexes are digested by endoprotease, the phosphopeptides are enriched by the IMAC column with 0.1M acetic acid. Then, the phosphopeptides can be eluted with 0.1% ammonium acetate, pH 8.0 containing 50 mM Na2HPO4. Mass spectrometry is performed on those eluted phosphopeptides .
Results and Discussion
In order to demonstrate feasibility of the IPAC technology, a well-characterized system related to cardiac myocytes was chosen. Depicted in Figure 3 is the strategy employed to identify the α-myosin heavy chain (α MHC) TFC. Cardiomyocyte nuclear extracts were separated from neonatal cardiomyocytes by 2.2-2.8 M sucrose gradients as described elsewhere (Awais D, et al., J Mol Cell Cardiol. 2000;32:1969-80). α MHC promoter affinity beads were prepared with streptavidin Dynal beads. The selected region of the α MHC promoter (33-657) contained motifs for MEF2, P300, CdxA and GATA4 (Gulick J, et al, J Biol Chem. 1991;266:9180-5.). The αMHC promoter template was amplified by PCR with the following two primers -.forward biotinylated primer, GTGCACCT GCAAAGTGGA TG; reverse primer, AGTTTCGG GTGGGGGCTC TTC. The amplified biotinylated DNA region product bound directly the streptavidin beads. Proof that the biotinylated αMHC promoter was bound by the strepavidin beads was demonstrated by PCR of the promoter solution before and after incubation/capture by the strepavidin coated Dynal beads (Figure 3). The TF complex was affinity purified by αMHC affinity Dynal beads, as shown in Figure 5. The bound material was specifically eluted from the beads by using 1 M NaCI in 25 mM HEPES, pH 7.9. Figure 4 demonstrates 2D-PAGE of nuclear extracts before and after affinity chromatography demonstrating significant reduction in the proteins displayed (as it should be). Three antibodies including MEF2, P300 and GATA-4 (Saint Cruze, CA) were used to validate the transcriptional factor mixture. Western blot analysis demonstrated that the eluted mixture was cross-reactive with all three antibodies. Depicted in Figure 5 is an example for MEF2 western blotting analysis of the cardiomyocytes nuclear exacts before and after promoter affinity pull-down. Please note that the nuclear extracts were overloaded relative to the capacity of the promoter affinity beads, therefore, MEF2 was also found in the flow-through.
The proteins separated by 2D-PAGE after αMHC promoter affinity chromatography are excised digested and subjected to ESI tandem MS. Partial protein sequences are reverse translated to DNA and the genes and location within the genome are determined using PANGSEQ. The promoters for six of these genes are cloned by PCR and used to construct new promoter affinity columns. Nuclear extracts are subjected to IPAC with these new columns and the process repeated.
The above experiment is also performed with nuclear extracts from rat neonatal and adult human cardiac myocytes after treatment with angiotensin II (Ang II) to stimulate a hypertrophy phenotype. Ang II is used because of its known ability to induce hypertrophy in both rat and human cardiac myocytes.
If more than one TF complex binds to a given promoter, 2D PAGE will result in hierarchical loss of distinction between the complexes: all the proteins in the different complexes will be distributed based on size and pi without separation based on protein-protein interactions. Thus it will not be possible to identify which proteins belong to which complex: they will all be mixed together. There are two possible approaches to solving this problem: 1) prediction of protein-protein interactions, or 2) cross-linking experiments. Because the tools to accurately predict protein-protein interactions based on protein sequence are not available, the second option was chosen. Because cross-linking results in high molecular weight species, multi-dimensional chromatography is used as an alternative to 2D-PAGE prior to ES-MS/MS. Pulse chase experiments following the format of experiments 1 and 2 can be performed to determine the time course of TF synthesis at baseline and if synthesis is stimulated by Ang II.
The phosphorylation states of the TFs identified in experiments 1 and 2 are compared.
Compartmental analysis of protein distribution and protein associations is performed in the following fractions of rat neonatal and adult human cardiac myocytes: membrane, cytosol, and nuclei. The analysis is repeated after treatment with Ang II. Cross-linking of the protein complexes in each fraction is performed. The cross-linked complexes are partially digested and subjected to multi-dimensional chromatography followed by ESI MS/MS. Compartmental analysis is combined with pulse-chase experiments to identify the time of synthesis of TFs which reside in the cytoplasm.
For IPAC to work, access to the entire genome of the organism under study is required. Rat neonatal cardiac myocytes were chosen for the initial experiments because of the facility with which these cells can be prepared. However, there is limited information available regarding the rat genome. It is also feasible to use nuclear extracts from mouse hearts; however, only 11% of the mouse genome is currently available in draft sequence, and only 1.4% of the sequence is finalized. Once the mouse genome is fully sequenced, the technology disclosed herein will provide a powerful tool for the simulation of trans-regulatory networks in this species, and to interrogate the effect of specific knockouts and transgenic constructs on these networks. In another specific embodiment, disclosed herein is the identification of TFs in rat neonatal cardiac myocytes and search for homologous genes in the human genome. Thus, human promoters are preferably used to affinity purify rat cardiac TF complexes. This requires an intermediate step of finding the human homologs to the rat genes. To generate useful trans-regulatory networks a different approach can be also taken. Nuclei are extracted from isolated human heart cardiac myocytes obtained from end-stage failing hearts (isolated from human heart tissue obtained at the time of cardiac transplantation) for use in IPAC. In this way human heart cardiac TF complexes are obtained that can be used in conjunction with the human genome. These human heart myocytes can be grown for several days in culture, and subjected to signal transduction events, such as treatment with Ang II. The cultures can also be treated with stable isotopically labeled amino acids for the pulse chase experiments. EXAMPLE 2: Example of a Perturbation of a TF Network Generated by IPAC
Example of a perturbation of a TF network generated by IPAC is shown in Tables 2-3, in which the perturbation occurs within the network. The example consists of a three dimensional matrix: Time ε {TI, T2); Promoter ε {Px, PI, P2, P3, P4, P5, P6}; Transcription Factor ε { Fl, F2, F3, F4, F5, F6}.
Vector Sj ->Sj+, is defined by the state transition function, f(TJ5 Pj5 Fj).
Table 2. A 3 dimensional matrix defining two states of a trans-regulatory network (State One is determined at time point 1; State 2 is determined at time point 2. A "1" indicates activation of transcription, a "0" indicates no interaction, a "-1" indicates suppression of transcription.
Figure imgf000035_0001
Figure imgf000036_0001
Table 3. A 3 dimensional matrix showing a perturbation of STATE one (fromTable 1): STATE 1' wherein gene expression of TF 2 is suppressed by an external manipulation (such as knockout or anti-sense technology). The suppression of TF 2 in STATE 1 'leads to a different STATE two (STATE 2'); A new transition function is required to predict the STATE 2'. A "1" indicates activation of transcription, a "0" indicates no interaction, a "-1 " indicates suppression of transcription. Vector S'j.->S 'j+1 is defined by the state transition function, f (Tj5 Pj5 Fj)
Figure imgf000036_0002
Example of a perturbation in the biological system in which the perturbation occurs outside the network: A gene which is not a transcription factor is suppressed by anti-sense or knockout technology (i.e., gene expression of a G-protein coupled receptor or an integrin gene is suppressed in cardiac myocytes). IPAC is used to generate a trans-regulatory network in this system. This trans-regulatory network is compared to the normal system in which the targeted gene is expressed at normal levels.
Example of a perturbation in the biological system in which the perturbation consists of treating the system with a bioactive compound: ie. IPAC is used to generate a trans-regulatory network in cardiac myocytes before and after treatment with the bioactive peptide angiotensin II, or before and after treatment with a kinase inhibitor, etc.
* * *
The present invention is not to be limited in scope by the specific embodiments described herein. Indeed, various modifications of the invention in addition to those described herein will become apparent to those skilled in the art from the foregoing description and the accompanying figures. Such modifications are intended to fall within the scope of the appended claims.
It is further to be understood that all base sizes or amino acid sizes, and all molecular weight or molecular mass values, are approximate, and are provided for description.
All patents, patent applications, publications, and other materials cited herein are hereby incorporated herein by reference in their entireties.

Claims

WHAT IS CLAIMED:
1. A method for identifying a trans-regulatory pathway for the expression of a gene, comprising the steps of:
(s) selecting a promoter of a first gene;
(b) performing affinity chromatography to isolate transcription factors which interact with the promoter under given physiological conditions;
(c) separating constituent proteins of said isolated transcription factors;
(d) identifying at least a partial amino acid sequence for a plurality of the separated proteins;
(e) generating DNA sequences encoding the amino acid sequences;
(f) determining one or more second genes that encode one or more sequenced proteins;
(g) specifying promoters for one or more second genes, and
(h) repeating steps (a)-(g) for at least one specified promoter to identify the regulatory pathway.
2. The method of claim 1 , wherein the affinity chromatography is performed by:
(i) immobilizing the promoter;
(ii) incubating the immobilized promoter with the nuclear extract to allow binding of all transcription factor complexes which interact with the promoter under given physiological conditions; (iii) eluting the unbound components of the nuclear extract, and (iv) eluting the bound transcription factor complexes.
3. The method of claim 2 wherein the promoter is biotinylated.
4. The method of claim 2, wherein, prior to the second elution step, the transcription factor complexes which interact with the promoter are subjected to cross-linking, and, between steps (c) and (d), the separated proteins are subjected to a partial proteolysis followed by multidimensional chromatography to isolate proteolytic fragments.
5. The method of claim 4, wherein the partial proteolysis is a tryptic digest.
6. The method of claim 4, wherein the separation of the constituent proteins of the isolated transcription factors in step (c) is performed using liquid phase isoelectric focusing.
7. The method of claim 4, wherein the separation of the constituent proteins of the isolated transcription factors in step (c) is performed using glycerol gradient separation.
8. The method of claim 1 , wherein the separation of the constituent proteins of the isolated transcription factors in step (c) is performed using two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) and/or multidimensional chromatography.
9. The method of claim 1 , wherein identification of the amino acid sequence for the separated proteins in step (d) is performed using mass spectroscopy.
10. The method of claim 1, wherein identification of the amino acid sequence in step (d) is performed for each of the separated proteins.
11. The method of claim 1 , wherein in step (e) all possible DNA sequences encoding the amino acid sequences are generated.
12. The method of claim 1, wherein in step (e) the DNA sequences encoding the amino acid sequences are generated using a permutation algorithm.
13. The method of claim 12, wherein the permutation algorithm comprises the steps of:
(i) inputting the amino acid sequence;
(ii) identifying within the amino acid sequence a primary region, wherein the primary region includes a string of amino acids having the highest contiguous recurrence in the sequence; (iii) generating all possible codon permutations that could encode the primary region; (iv) examining a gene database to identify DNA sequences that match the codon permutations that could translate into the primary region; (v) selecting a subset of the gene database, wherein the subset contains genes that could translate into the primary region; (vi) identifying within the amino acid sequence a secondary amino acid apart from the primary region; (vii) determining a positional relationship of the secondary amino acid with respect to the primary region, and (vii) searching the subset of the gene database for DNA sequences that include codon permutations that could encode both the primary region and the secondary amino acid, at the determined positional relationship.
14. The method of claim 1, wherein in step (f) the second genes that encode sequenced proteins are determined by performing a homology search against genomic databases.
15. The method of claim 1 , further comprising performing a genome- wide promoter homology analysis, so that only one promoter of a set of promoters with greater than 80% homology is used in subsequent steps.
16. The method of claim 1, wherein steps (a)-(g) are repeated until feedback loops are identified for all promoters.
17. The method of claim 1, further comprising the determination of when the synthesis of the constituent proteins of the isolated transcription factors is turned on.
18. The method of claim 17, wherein the nuclear extracts are prepared after pulse- chase labeling of proteins with stable isotopically labeled phenylalanine (Phe) and leucine (Leu).
19. The method of claim 1 , further comprising the analysis of the compartmental distribution of the isolated transcription factors.
20. The method of claim 1, further comprising the identification of the phosphorylation state of the constituent proteins of the isolated transcription factor complexes.
21. The method of claim 1, wherein the first promoter in step (a) is the alpha- myosin heavy chain (αMHC) promoter.
22. A method for identifying a component of a trans-regulatory pathway or network for the expression of a gene under specific physiological or pathophysiological conditions, comprising (i) identifying a first trans-regulatory pathway or network for the expression of the gene in the absence of said specific physiological or pathophysiological conditions using the method of claim 1 ; (ii) identifying a second trans-regulatory pathway or network for the expression of the gene in the presence of said specific physiological or pathophysiological conditions using the method of claim 1 ; (iii) comparing the two pathways to identify the components, which are present in the second pathway but not in the first, and (iv) comparing the two networks to determine which combinations of components are different between the two networks.
23. A database of trans-regulatory gene expression networks comprising data obtained using the method of claim 1.
24. The database of claim 23, wherein the data is obtained for a number of different genes and/or for a number of different physiological or pathophysiological conditions.
25. The use of the database of claim 23 to predict the effect of a drug on a cell under specific physiological or pathophysiological conditions.
26. The use of the database of claim 23 to identify changes in gene expression and/or patterns of gene and protein expression occurring in a disease.
27. The use of the database of claim 23 to identify changes in gene expression and/or patterns of gene and protein expression occurring during a tissue differentiation.
28. The use of the database of claim 23 to identify changes in gene expression and/or patterns of gene and protein expression occurring during embryonic development.
29. A method for validating a trans-regulatory gene expression network identified using the method of claim 1, comprising perturbing the expression of at least one gene in the network.
30. A method for constructing a trans-regulatory gene expression network identified using the method of claim 1, comprising perturbing the biological system under study by suppressing expression of a gene when the gene is not in the network.
31. A method for constructing a trans-regulatory gene expression network identified using the method of claim 1, comprising perturbing the biological system under study by increasing expression of a gene when the gene is not in the network.
32. A method for constructing a trans-regulatory gene expression network identified using the method of claim 1, comprising perturbing the biological system under study by suppressing expression of a gene in the network using gene knock-out technology.
33. A method for constructing a trans-regulatory gene expression network identified using the method of claim 1, comprising perturbing the biological system under study by suppression of a gene, when the gene is not in the network, using gene knock-out technology.
34. A method for constructing a trans-regulatory gene expression network identified using the method of claim 1, comprising perturbing the biological system under study by increasing expression of a gene in the network using antisense technology.
35. A method for constructing a trans-regulatory gene expression network identified using the method of claim 1 , comprising perturbing the biological system under study by increasing expression of a gene, when the gene is not in the network, using antisense technology.
36. A method for constructing a trans-regulatory gene expression network identified using the method of claim 1, comprising perturbing the biological system under study by treating the system with a biologically active compound.
37. The method of claim 36, wherein the biological system is cardiac myocytes and the biologically active compound is selected from the group consisting of angiotensin II, norepinephrine, phenylephrine, and endothelin.
38. A method for constructing a trans-regulatory gene expression network identified using the method of claim 1, comprising analysis of the biological system under study by constructing trans-regulatory networks at different time points during the development of the biological system.
39. A method for constructing a trans-regulatory gene expression network identified using the method of claim 1 wherein an operator inputs a perturbation to the network, and the network is changed by a transition function to a different network such that the different network is the same as that which occurred in the real world experiment.
40. A method for constructing a trans-regulatory gene expression network identified using the method of claim 1 wherein an operator inputs a perturbation to the network, and the network is changed by a transition function to a different network such that the different network is similar within parameters set by a probability function to that which would be found in the biological system if subjected to the perturbation in a real world experiment.
PCT/US2002/019221 2001-04-09 2002-04-09 Iterative promoter affinity chromatography to identify trans-regulatory networks of gene expression WO2002097061A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002344238A AU2002344238A1 (en) 2001-04-09 2002-04-09 Iterative promoter affinity chromatography to identify trans-regulatory networks of gene expression

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US28258901P 2001-04-09 2001-04-09
US60/282,589 2001-04-09

Publications (2)

Publication Number Publication Date
WO2002097061A2 true WO2002097061A2 (en) 2002-12-05
WO2002097061A3 WO2002097061A3 (en) 2003-10-23

Family

ID=23082177

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/019221 WO2002097061A2 (en) 2001-04-09 2002-04-09 Iterative promoter affinity chromatography to identify trans-regulatory networks of gene expression

Country Status (2)

Country Link
AU (1) AU2002344238A1 (en)
WO (1) WO2002097061A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2430500A1 (en) * 2009-05-14 2012-03-21 Pioneer Hi-Bred International, Inc. Inverse modeling for characteristic prediction from multi-spectral and hyper-spectral remote sensed datasets
CN109900814A (en) * 2017-12-08 2019-06-18 中国科学院大连化学物理研究所 It is a kind of based on glycosidic bond mass spectrum can fragmentation type chemical cross-linking agent analysis method and application
US11812746B2 (en) 2016-08-28 2023-11-14 The State Of Israel, Ministry Of Agriculture & Rural Development, Agricultural Research Organization (Aro) (Volcani Center) Method of controlling fungal infections in plants

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
BING ET AL.: 'Purification and characterization of the serum amyloid A3 enhancer factor' JOURNAL OF BIOLOGICAL CHEMISTRY vol. 274, no. 35, 27 August 1999, pages 24649 - 24656, XP002142011 *
MIKHEEV ET AL.: 'CArG binding factor A (CBF-A) is involved in transcriptional regulation of the rat Ha-ras promoter' NUCLEIC ACIDS RESEARCH vol. 28, no. 19, 2000, pages 3762 - 3770, XP002964802 *
NEUBAUER ET AL.: 'Mass spectrometry and EST-database searching allows characterization of the multi-protein spliceosome complex' NATURE GENETICS vol. 20, 20 September 1998, pages 46 - 50, XP002106943 *
NILSSON ET AL.: 'Identification of proteins in a human pleural exudate using two-dimensional preparative liquid-phase electrophoresis and matrix-assisted laser desorption/ionization mass spectrometry' ELECTROPHORESIS vol. 20, 1999, pages 860 - 865, XP002964805 *
OJAMAA ET AL.: 'Thyroid hormone regulation of alpha-myosin heavy chain promoter activity assesed by in vivo DNA transfer in rat heart' BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS vol. 179, no. 3, 30 September 1991, pages 1269 - 1275, XP002964804 *
PARK ET AL.: 'Hepcidin, a urinary antimicrobial peptide synthesized in the liver' JOURNAL OF BIOLOGICAL CHEMISTRY vol. 276, no. 11, 16 March 2001, pages 7806 - 7810, XP002211393 *
WESER ET AL.: 'Transcription efficiency of human polymerase III genes in vitro does not depend on the RMP-forming autoantigen La' NUCLEIC ACIDS RESEARCH vol. 28, no. 20, 2000, pages 3935 - 3942, XP002964803 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2430500A1 (en) * 2009-05-14 2012-03-21 Pioneer Hi-Bred International, Inc. Inverse modeling for characteristic prediction from multi-spectral and hyper-spectral remote sensed datasets
EP2430500B1 (en) * 2009-05-14 2021-07-14 Pioneer Hi-Bred International, Inc. Inverse modeling for characteristic prediction from multi-spectral and hyper-spectral remote sensed datasets
US11812746B2 (en) 2016-08-28 2023-11-14 The State Of Israel, Ministry Of Agriculture & Rural Development, Agricultural Research Organization (Aro) (Volcani Center) Method of controlling fungal infections in plants
CN109900814A (en) * 2017-12-08 2019-06-18 中国科学院大连化学物理研究所 It is a kind of based on glycosidic bond mass spectrum can fragmentation type chemical cross-linking agent analysis method and application
CN109900814B (en) * 2017-12-08 2021-06-08 中国科学院大连化学物理研究所 Analysis method and application of fragmentable chemical cross-linking agent based on glycosidic bond mass spectrum

Also Published As

Publication number Publication date
AU2002344238A1 (en) 2002-12-09
WO2002097061A3 (en) 2003-10-23

Similar Documents

Publication Publication Date Title
Galperin The molecular biology database collection: 2005 update
Marquez et al. Unmasking alternative splicing inside protein-coding exons defines exitrons and their role in proteome plasticity
Xia et al. RNA-Seq analysis and de novo transcriptome assembly of Hevea brasiliensis
Twyman Principles of proteomics
Mochida et al. Genomics and bioinformatics resources for crop improvement
Reddy et al. Analysis of the myosins encoded in the recently completed Arabidopsis thaliana genome sequence
Chesarino et al. Chemoproteomics reveals Toll-like receptor fatty acylation
Foley et al. A global view of RNA-protein interactions identifies post-transcriptional regulators of root hair cell fate
CA2712079A1 (en) Compositions and methods of detecting post-stop peptides
Grant et al. The Xenopus ORFeome: a resource that enables functional genomics
Li et al. Revisiting the identification of canonical splice isoforms through integration of functional genomics and proteomics evidence
Matsubara et al. Intra-genomic GC heterogeneity in sauropsids: evolutionary insights from cDNA mapping and GC 3 profiling in snake
CN104911261B (en) A kind of method of Study On Rice and pathogen cooperating type
Xu et al. Genome and population sequencing of a chromosome-level genome assembly of the Chinese tapertail anchovy (Coilia nasus) provides novel insights into migratory adaptation
Lagarrigue et al. LncRNAs in domesticated animals: from dog to livestock species
Neiro et al. Identification of putative enhancer-like elements predicts regulatory networks active in planarian adult stem cells
Fixsen et al. SALL1 enforces microglia-specific DNA binding and function of SMADs to establish microglia identity
Tyagi et al. Comparative signatures of selection analyses identify loci under positive selection in the Murrah Buffalo of India
Dumrongprechachan et al. Dynamic proteomic and phosphoproteomic atlas of corticostriatal axons in neurodevelopment
Kalluri et al. Shotgun proteome profile of Populus developing xylem
Yang et al. Establishing the architecture of plant gene regulatory networks
WO2002097061A2 (en) Iterative promoter affinity chromatography to identify trans-regulatory networks of gene expression
Luo et al. Utilization of a zebra finch BAC library to determine the structure of an avian androgen receptor genomic region
Brown Understanding a genome sequence
Zhang et al. Phylogenomic detection and functional prediction of genes potentially important for plant meiosis

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM EC EE ES FI GB GD GE HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP