MXPA97008401A - Regulatory genes of the cromat - Google Patents

Regulatory genes of the cromat

Info

Publication number
MXPA97008401A
MXPA97008401A MXPA/A/1997/008401A MX9708401A MXPA97008401A MX PA97008401 A MXPA97008401 A MX PA97008401A MX 9708401 A MX9708401 A MX 9708401A MX PA97008401 A MXPA97008401 A MX PA97008401A
Authority
MX
Mexico
Prior art keywords
glu
arg
lys
leu
asp
Prior art date
Application number
MXPA/A/1997/008401A
Other languages
Spanish (es)
Other versions
MX9708401A (en
Inventor
Jenuwein Thomas
Laible Gotz
Original Assignee
Boehringer Ingelheim International Gmbh 55218 Ingelheim De
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from DE19516776A external-priority patent/DE19516776A1/en
Application filed by Boehringer Ingelheim International Gmbh 55218 Ingelheim De filed Critical Boehringer Ingelheim International Gmbh 55218 Ingelheim De
Publication of MXPA97008401A publication Critical patent/MXPA97008401A/en
Publication of MX9708401A publication Critical patent/MX9708401A/en

Links

Abstract

The invention is related to the deregulation of the chromatin regulatory genes, which have a SET domain, such deregulation is of importance in certain cancer conditions. These genes, in particular SET domains as such, can be used in the diagnosis and therapy of such conditions

Description

Chromatin regulatory genes DESCRIPTION OF THE INVENTION The present invention relates to genes that play a role in the structural and functional chromatin re-ging, and to its use for therapy and diagnosis. The functional organization of eukaryotic chromosomes in centromeres, telomeres as well as in echromatic and heterochromatic regions, constitutes a decisive mechanism to guarantee the exact replication and distribution of genetic information when each cell division is carried out. In contrast to this, tumor cells are frequently characterized by rearrangements, translocations and aneuploidy of chromosomes (Solo on et al., 1991; Pardue, 1991). Even though the mechanisms "that lead to an increased instability of chromosomes in tumor cells have not yet been explained, in recent times a series of experimental systems, beginning with telomere position effects in yeast (Renauld et al., 1993 Buck and Shore, 1995, Allshire et al., 1994), through the effect of position with variegation (PEV, of the German Positions-effekt-Variegation) in Drosophila (Reuter and Spierer, 1992) and finishing in the analysis of the breakpoints of translocation in human leukemias (Solomon et al., 1991; Cleary 1991), have made it possible to identify some chromosomal proteins, "which causally participate in the disorganized proliferation. First, it was found that overexpression of a shortened version of the SIR4 protein leads to a prolonged life span in yeast (Kennedy et al., 1995). Since the SIR proteins contribute to the formation of multimeric complexes in the "Mating Type Loci = (loci (sites) of a sexual nature)" tacit and in the telomere, it could happen that the overexpressed SIR4 interferes with these complexes of the heterochromatin type, which finally leads to an uncontrolled proliferation. This assumption is consistent with the frequency of the appearance of a disordered length of telomeres in most REF: 25604 types of human cancers (Counter et al., 1992). Second, genetic analysis of the VEP in Drosophila identified a series of gene products that modify the structure of chromatin in heterochromatic positions and within the homeotic gene cluster (Reuter and Spierer, 1992). Mutations of some of these genes, such as the nodule (Garzino et al., 1992) and the polyhomeo- tic (Smouse and Perrimon, 1990) can cause dysregulated cell proliferation or cell death in Drosophila. Thirdly, homologs were described in mammalian animals, both activators (tri-thorax or trx group) as well as repressors (eg Polycomb or Pe group) of the chromatin structure of the Drosophila selector genes homeotic Among these, it has been shown with respect to human HRX / ALL-1 (trx group) that it participates in translocation-induced leukemogenesis (Tkachuk et al., 1992, Gu et al., 1992), and that overexpression of bi mouse (Pe group) leads to the genesis of lymphomas (Haupt et al., 1991, Brunk et al., 1991, Alkema et al., 1995). A model for the function of chromosomal proteins allows to conclude that these form multimeric complexes, which, depending on the balanced character between activators and repressors in the complex, determine the degree of condensation of the surrounding region of the chromatin. (Locke et al., 1988). A displacement of this balance by overexpression of one of the components of the complex revealed a neodistribution of euchromatic and heterochromatic regions (Buck and Shore, 1995, Reuter and Spierer, 1992, Eisenberg et al., 1992). This dose effect can destabilize the chromatin structure at predetermined loci, which ultimately leads to a transition from the normal state to the transformed state. Despite the characterization of HRX / ALL-1 and bmi as proto-oncogenes, which can modify the structure of the chromatin, knowledge of the gene products of mammalian animals, which interact with chromatin, is still very limited. In contrast to this, approximately 120 alleles for chromatin regulators were described by genetic analysis of the VEP in Drosophila (Reuter and Spierer, 1992). A region located at the carboxy terminus with similarity in sequence was recently identified, which is common to a positive regulator of Drosophila chromatin (trx, trx group) and a negative regulator of Drosophila chromatin (E (z)). , group Pe) (Jones and Gelbart, 1993). In addition, this carboxy terminus is also conserved in Su (var) 3 -9, which is a dominant suppressor of the chromatin distribution in Drosophila (Tschiersch et al., 1994). In the present invention, it was started from the consideration that this protein domain, designated as "SET" (Tschiersch et al., 1994) because of its evolutionary conservation and the presence in products of antagonist genes, defines a new gene family of chromatin regulators of mammalian animals, important in the history of development. In addition, the characterization of other members of the SET domain gene group, together with HRX / ALL-1, contributes to the explanation of the mechanisms that are responsible for certain structural changes in chromatin that can lead to malignant transformation. The present invention is therefore based on the mission of identifying chromatin regulatory genes, explaining their function and employing them for diagnosis and therapy. To solve the problem posed by this mission, the information about the sequence of the SET domain was first used, in order to obtain from human cDNA libraries the homologous human cDNAs for the Drosophila SET domain genes. Two cDNAs were obtained, which constitute human homologs of E (z) and Su (va) 3 -9 respectively; the corresponding human genes were designated as EZH2 and SUV39H (compare below); In addition, a variant form of EZH2 was identified, which was designated as EZH1.
The present invention therefore relates to DNA molecules, which contain a sequence encoding a chromatin regulatory protein, having a SET domain or a partial sequence thereof, which are characterized in that they have the nucleotide sequence shown in FIG. Figure 6 or Figure 7. The DNA molecules according to the invention are also referred to below as "genes according to the invention". The genes according to the invention bear the designations EZH2 and SEV39H, they were originally designated as "HEZ-2" and "H3 -9". The invention concerns in another aspect the cDNAs derived from these genes, including their degenerate variants, - to mutants, which encode functional chromatin regulators, as well as to variants that have to be attributed to a duplication of the genes, - an example of them is EZH1, whose partial sequence is represented in Figure 8 in comparison with EZH2. In order to solve the problem posed by the mission established within the framework of the present invention, the following procedure was carried out in particular: based on the information about the sequences of the conserved SET domains, a specific cDNA library was scrutinized with reduced rigor human B cells with a mixed Drosophila DNA probe, which encodes the SET domains of E (z) and Su (var) 3 -9. From 500,000 plaques, 40 primary phages were selected. After another two rounds of scrutiny, it was revealed "that 31 phages encode authentic E (z) sequences and that five phages constitute variants of E (z). In contrast to this, only two phages hybridized with the probe, which contained only the SET domain of S (var) 3 -9. The phage inserts were amplified by polymerase chain reaction (PCR) and analyzed by restriction mapping and partial sequencing. Representative cDNA inserts were subcloned and sequenced along their entire length. The 5 'ends were isolated, positive phages were screened yet again with 5' -ADN probes, after which, after subcloning, full cDNAs were obtained. The complete cDNA, which encodes the human homolog of E (z), was designated EZH2, and the DNA, "encoding the human homologue of Su (var) 3 -9, was designated SUV39H. Taken together, the identity between the Drosophila amino acids and human proteins is 61% for EZH2 and 43% for SUV39H, the C-terminal SET domain being conserved to a very high degree (88% for EZH2 and 53% for SUV39H ). The comparison between sequences revealed other manifest regions with homology, p. ex. a domain rich in cysteine in EZH2 and a Chro cell or in SUV39H. (In polycomb it was shown that the Chromo box is the essential domain for the interaction between DNA and chromatin, Messmer et al., 1992). In contrast, the 207 amino acids "containing the GTP binding motif at the amino terminus of the Drosophila protein are missing in the human SUV39H homolog. The comparison between the amino acid sequences of the Drosophila genes and of the human genes is represented in Figures 1 and 2. Figure 1 shows the comparison of the amino acid sequence between EZH2 and the intensifier of Drosophila Enhancer of zeste (E ( z)). It shows identical amino acids of the conserved SET domain located at the carboxy end (shaded box) and the Cys rich region (the Cys radicals are highlighted). Hypothetical localization signals in the nucleus are underlined. Figure 2 shows the comparison between the amino acid sequences of the human homolog SUV39H and that of Drosophila SU (var) 3 -9). The identical amino acids of the conserved SET domain located at the carboxy terminus are represented (shaded box) and the Chromo domain (shaded box darker). Hypothetical localization signals in the nucleus are underlined. In the upper part of the figure a schematic set of both protein structures is represented, which shows that in the human homolog there are 207 amino acids missing at the end of N. Since the consensus sequences in translation in the vicinity of the starting ATG of the human SUV39H cDNA are also present in the corresponding internal position in SU (var. ) 3 -9), the Drosophila protein should contain additional exons, which would become superfluous for function at a later stage of evolution. (The accuracy of this hypothesis can be confirmed by expressing in Drosophila the human SV39H cDNA and the full or shortened cDNAs at the 5 'end of Su (var) 3-9 In addition, another cDNA with the designation MG- was described. 44 (see below), which also lacks the 5 'end of the Drosophila gene). In addition to the human cDNA of SUV39H, the homologous locus (si thio) in the mouse was also isolated (Suv39h, see below), whose sequence analysis and whose promoter structure unambiguously confirm the shortening at the amino terminus of the homologous genes of mammals compared to the Su (var) 3 -9 of Drosophila. DNA blot analyzes, carried out within the framework of the present invention, clearly indicate that homologous genes are represented in Su (var) 3 -9 mammals of individual loci in mice and in humans, while genes are represented Mammalian homologues of E (z) are encoded by two separate loci in mice and in humans. The second human locus (designated as EZH1) was also confirmed by characterization of a small number of cDNA variants, which are differentiated by their 3 'flanking sequences from most of the clones isolated from the human cDNA library. The differences between EZH2 and EZH1 in the sequenced zone are represented in Figure 8: The EST domain of EZH1 shows mutations against? ZH2, - in addition, the EZH1 variant isolated by the authors of the present invention (most likely an aberrantly spliced cDNA) carries a stop codon "found in the reading frame, which shortens the protein in 47 amino acids at the end of C. Figure 8 depicts the nucleotide sequence of the cDNA of EZH2 from position 1.844 to 2.330 in the line in each upper case, the 5 'splice site and the potential stop codon being underlined. In order to associate a partial cDNA sequence of the EZH1 variant to the EZH2 sequence, the Wisconsin GCG Netzwerkservice gap program was used. The premature stop codon in EZH1 (position 353) is underlined. The sequences, which encode the conserved SET domain, are highlighted. In addition, the 3 'end (position 151 in EZH1) of the aberrant transcript B52 (see below) is represented. Through the available sequence, B52 was shown to be 97% identical to EZH1 and 72% identical to EZH2. The comparison of the sequence of EZH1 with that of EZH2, as well as the finding that in humans and in mice appear two separate homologous loci of E (z), allow us to draw the conclusion that in mammals a duplication of the gene has appeared. In a comparison with cDNA sequences in the GeneBank databank, it was found with surprise that certain partial cDNA sequences registered in this database, which are derived from aberrant transcripts in tumor tissues, constitute mutated versions of the cDNAs according to the invention: On the one hand, in the search for BRCA1, a gene that predisposes to cancers of the breast and uterine tube, a partial cDNA sequence was isolated with 271 nucleotides bearing the designation B52, which encodes a mutated variant of the SET domain and which had been mapped on the human chromosome 17q21 (Friedman et al., 1994). In the context of the present invention, it was surprisingly found that B52 has a 97% identity with the EZH1 cDNA variant according to the invention (compare the above citation), - possibly EZH1 constitutes a gene, whose reactivation plays a certain role in the untidy proliferation. On the other hand, a cDNA (2,800 nucleotides, - MG-44) had been isolated from the human chromosome Xpll (Geraghty et al., 1993), a region that predisposes to degenerative disorders of the retina and synovial sarcomas. It was found with surprise «that this cDNA has a 98% identity with the cDNA of SUV39H according to the invention. The new genes developed within the framework of the present invention therefore make it possible to deduce a connection between certain cancer diseases and mutations in chromatin regulators.; in the case of the MG-44 cDNA, since it has several point mutations and change of reading frame (Frameshift), which interrupts the Chromo and SET domains, it could be explained for the first time, with the help of the cDNA of SUV39H according to the invention, a connection between S (var) 3 -9 and MG-44. Together with the sequences already mentioned, the well-documented human homolog Drosophila trx, HRX / ALL-1 (Tkachuk et al., 1992) could be recorded in the GeneBank sequence data bank as another human member of the SET family of proteins. Gu et al., 1992), in addition to a gene of unknown function that bears the designation G9a, which is present in the Major Histocompatibility Complex (MHP) (Milner and Campbell, 1993), and thirdly, an unpublished cDNA (KG-1), which had been isolated from immature myeloid tumor cells (No ura et al., 1994). While G9a is currently the only human gene to have a SET domain, for which no mutated version is known at present, KG-1 carries an insertion of 342 amino acids, which dissociates the SET domain into one half of the amino terminus and in one half of the carboxy terminus. Probably, this AD? C of KG-1 constitutes an aberrantly spliced variant, since consensus splice sites are found at 5 'and 3' and both ends of the insert. In total, four of the five human members known at the present time of the SET protein family are subject to modifications, all of which mutate to the SET domain (HRX / ALL-1, EZH1 / B52, SUV39H / MG-44 and KG - 1 ) . In addition, in three cases the corresponding human gene loci were mapped in the vicinity of breakpoints of the translocation or unstable chromosomal regions (HEX / ALL-1, EZH1 / B52 and SUV39H / MG-44). Aberrant transcripts of genes from human SET domains are depicted in Figure 3. On the left, the figure shows the situation of the five genes of SET domains known at the present time, in the respective chromosome. Among others, the three genes (HRX / ALL-1, EZH1 / B52 and SUV39H / MG-44) are represented, for which aberrant cDNAs have been mapped at breakpoints of translocation or unstable regions of chromatin. Four of the five genes of represented SET domains present mutations, all of which interrupt the SET domain located at the carboxy terminus, which is represented in the figure by the dark box. A translocation binds half of the amino terminus of HRX with an uncorrelated sequence, which is represented as a dotted box with the ENL designation. Mutations and a premature stop codon modify the SET domain of EZH1 / B52. Punctual and frame shift mutings interrupt the Chromo and SET domains in MG-44. A large insert dissociates the SET domain of KG-l into two halves. No aberrant transcripts for G9a are known to date. The conglomerate rich in cysteine, existing in B52, is represented as a dotted box, - in HRX / ALL-1 the region with homology for methyl transferases is represented as a shaded box and the A / T hooks as vertical lines. The names of the respective authentic genes are indicated on the right in the figure. The fact that "a mammalian animal gene from the SET protein family, HRX / ALL-1, had been brought into connection with a translocation-induced leukemogenesis (Tkachuk et al., 1992; Gu et al., 1992) is a strong indication that proteins with the SET domain are not only important developmental regulators, which jointly determine the modifications of gene expression, which are not chromatin, but, after a mutation, also disrupt the normal cell proliferation. Since all the mutations described so far interrupt the primary structure of the SET domain, it is natural for the assumption that the SET domain as such is the one that plays a decisive role in the transition from the normal state to the transformed state. An important function for the SET domain can also be assumed by virtue of its evolutionary conservation in gene products, which appear from a yeast to a human being. Figure 4 shows the evolutionary conservation of proteins that have the SET domain: Using the tfasta program of the Wisconsin GCG Netzwerkservice, proteins and open reading frames with homology were identified for the SET domain. The figure shows a representative selection that goes from a yeast to a human being. The figures indicate amino acids. They are represented: the SET domain located at the end of carboxy by a black square, the rich regions in Cys by a dotted square of dark color, the reason for fixing GTP in Su (var) 3 -9 by a dotted square of color clear and the Chromo domain of Su (var) 3 -9 and H3 -9 by an empty square with light dots. A region with homology to methyltransferase (trx and HRX) is represented as a shaded box. The "A / T hooks" ("A / T Hooks") are represented by vertical lines. Another region rich in Ser (S in C26E.10) and a region rich in Glu (E in G9A) or ankyrin repeats (ANK in G9a) are also highlighted. YHR119 (access number in GeneBank number U00059) and C26E6.10 (access number in GeneBank U13875) are open reading frames of cosmids without any functional characterization, which have been registered recently in the database. The percent indicates the totality of amino acid identities between human and Drosophila proteins. Figure 5 shows the coincidence in terms of amino acids of the SET domain. The SET domain of the genes represented in Figure 4 was set up using the Pileup program of the Wisconsin GCG Netzwerkservice. For the comparison of the SET domain of KG-l, the large amino acid insertion, which dissociates the SET domain into two halves, was eliminated before the Pileup. The amino acid positions that have 8 coincidences between 10 are highlighted. Due to the criteria verified within the framework of the present invention, a participation of the genes "having a SET domain, in the chromatin-dependent formation of proliferation untidy; these genes or the cDNAs derived from them, as well as partial and mutated sequences, can therefore be used in the therapy and in the diagnosis of diseases that have to be attributed to one of such proliferations: The differences in the level of transcription of RNAs from SET domains between normal cells and transformed cells can be used as diagnostic parameters of diseases in which gene expression of SET domains is upset. Thus, oligonucleotides, which encode the SET domain as such or partial segments thereof, can be used as diagnostic markers, in order to diagnose different types of cancer, in which the SET domain is mutated. For the detailed analysis of the function of the cDNAs according to the invention or segments thereof with respect to the use for diagnosis of the gene sequences of the SET domain, the homologous EZH1 mouse cDNAs were isolated within the framework of the present invention. (Ezhl) and SUV39H (Suv39h). In the case of the use of a DNA probe specific for mice, which codifies the SET domain in "protection against RNAse" analysis for the investigation of the activity of the Ezhl gene during the normal development of a mouse, an rather broad expression profile, which is similar to that of bmi (Haupt et al., 1991). The analyzes carried out with the mouse sequences are amplified with human sequences, in order to compare the amounts of RNA between immature precursor cells, tumor cells and differentiated cells in different human cell culture systems. In order to check whether the SET domain is correspondingly suitable as a tumor marker for diagnosis for different cancer diseases or as a general diagnostic parameter, standard methods for the determination of RNA concentration, which are described in laboratory manuals, can be used. specialized (Sambrook, J. Fri tsch, EF and Maniatis, T., 1989, Cold Spring Harbor Laboratory Press) such as Northern blot analysis, nuclease protection analysis or RNAse protection analysis. In order to investigate the frequency with which the SET domain undergoes specific mutations, the DNA probes specific for SET can be used for the analysis of single strand conformation polymorphisms (SSCP; Gibbons et al., nineteen ninety five) . The types of cancers, in which SET-specific DNA probes can be used as diagnostic markers, are breast cancer (EZH1, - Friedmam et al., 1994), synovial sarcoma (SUV39H; Geraghty et al .; 1993) and leukemia. By virtue of the knowledge of the nucleotide sequence of the SET domain genes, the corresponding proteins derived from the cDNA sequences can be prepared in a recombinant form, which are also object of the present invention, introducing the cDNAs that encode such proteins into appropriate vectors. and expressing them in host organisms. The techniques used for the production of recombinant proteins are usually known to a skilled technician and can be deduced from specialty manuals (Sambrook, J., Fri tsch, EF and Maniatis, T., 1989, Cold Spring Harbor Laboratory Press ). Accordingly, another aspect of the present invention is recombinant DNA molecules containing the DNAs encoding EZH2, SUV39H or EZH1 as well as expression control sequences functionally linked to them, as well as host organisms that have been transformed with them.
The recombinant proteins according to the invention can be used to analyze the interaction of proteins of SET domains with chromatin or with other members of heterochromatin complexes, based on the recognitions obtained in this case about the mode of action of these complexes , the possibilities are defined that are in particular for the deliberate intervention in the mechanisms that participate in them and can be used for therapeutic applications. The investigations, which serve for the additional analysis of the function of the SET domains are carried out p. ex. expressing in vitro as well as in tissue cultures the cDNAs that encode EZH2 or human SUV39H respectively and are provided with an epitope, against which antibodies are available. After immunoprecipitation with the respective antibodies specific for the epitope, it can be ascertained whether EZH2 and SUV39H can interact in vi tro with each other and whether complex formation between EZH2 and / or SUV39H with other chromatin regulators takes place in vivo. It has already been assumed by other authors (DeCamillis et al., 1992, Rastelli et al., 1993, Orlando and Paro, 1993), that a complex formation between different members of heterochromatin proteins is essential for the function of these. By virtue of the availability of the genes of the SET domain, according to the invention it can be checked whether the SET region constitutes a domain that functions by virtue of interactions, or if it contributes to the formation of multimeric heterochromatic complexes. Likewise, it can be checked if the SET domain has an inhibitory function, similar to that of the BTB domain located at the amino terminus of different chromatin regulators, including the GAGA factor (Adams et al, 1992). In total, analysis of interactions with EZH2 and SUV39H proteins provided with epitopes allow an additional characterization of the function of the SET domains. This opens up possibilities of proceeding against the disordered activity, introducing p. ex. by gene therapeutic methods, negative variants in terms of dominants of the cDNA sequences of SET domains in the cell. Such variants are obtained p. ex. defining in the first place the functional domains of the SET proteins, p. ex. the segments of sequences that are responsible for the interaction between DNA and chromatin or the interaction between protein and protein, and leading to expression in the cell involved then the DNA sequences shortened by the respective domain (s) (s) or by segments thereof, in order to compete with a disorder of proliferation, which is caused by the intact functional protein. The availability of the cDNAs according to the invention also allows the production of transgenic animals, e.g. ex. mice, in which genes of SET domains can be overexpressed ("gain -of -function") or in which these genes can be excluded ("loss -of -function"); for the last analyzes, the corresponding sequences of animals, especially sequences of mice, of the genes according to the invention are used. These mice are also object of the present invention. Especially, the analysis of the "gain -of -function" in which alleles of the genes according to the invention are introduced into a mouse, ultimately provides a conclusion about the causal participation of EZH2 and SUV39H in the dependent activation of the chromatin of tumor formation. For the analysis of the "gain -of-function", the complete cDNA sequences of human EZH2 and SUV39H as well as their mutated versions, such as EZH1 / B52 and MG-44, can be driven by vectors that make possible high degrees of expression, p. ex. plasmids with the human jS-actin promoter, as well as the immunoglobulin heavy chain enhancer (Eμ) as well as Moloney virus enhancers (Mo-LTR). It was recently revealed that overexpression, dependent on Eμ / Mo-LTR, of the bmi gene, which in common with EZH2 is counted in the Pe group of negative regulators of chromatin, is sufficient to generate lymphocytes in transgenic mice ( -A2.ke.ma et al., 1995).
By interrupting the "loss -of-function" analysis of the endogenous loci of mice for Ezhl and Suv39h by homologous recombination in embryonic stem cells, it can be determined whether the loss of the in vivo function of the genes leads to an abnormal development of the mouse . By virtue of these in vivo systems the effect of EZH2 and SUV39H can be confirmed; these systems also serve as a basis for animal models in relation to human gene therapy. The use in gene therapy of the DNA sequences according to the invention or sequences derived therefrom (eg complementary antisense oligonucleotides) is carried out - depending on whether the disease to be treated has to be attributed to a disorder of the chromatin as a consequence of the absence of the functional sequence of a gene, or also as a consequence of an overexpression of the corresponding genes - is carried out by introducing the functional sequence of a gene, by inhibiting the expression of genes, e.g. ex. with the aid of antisense oligonucleotides, or by introduction of a sequence encoding a negative sequence in terms of dominants. The introduction of the respective DNA sequences in the cell can be carried out with the aid of classical methods for the transfection of higher eukaryotic cells, among which the transfer of genes by means of viral vectors (retroviruses, adenoviruses or adeno associated viruses) is counted or by means of non-viral systems based on receptor-mediated endocytosis; compilations about the usual methods are given p. ex. by Mi tano and Caskey, 1993; Jolly, 1994; Vile and Russell 1994; Tepper and Mulé, 1994; Zatlokal et al. , 1993, PCT patent document WO 93/07283. For the inhibition of the expression of the genes according to the invention, low molecular weight substances, which are involved in the transcription machinery, also fall into consideration; after an analysis of the 5 'regulatory region of the genes, it can be screened for substances that totally or partially block the interaction of the respective transcription factors with this region, p. ex. with the aid of the method described in WO 92/13092. An inhibition of unregulated proliferation can also be established in the gene product, using the corresponding antibodies therapeutically against the EZH2 or SUV39H protein, preferably human or humanized antibodies. The production of such antibodies is carried out according to known methods, such as described p. ex. by Malavsi and Albertini, 1992, or by Rhein, 1993. Antibodies against EZH2 or SUV39H, which can be used in therapy or diagnosis, are also the object of the present invention.
Compilation of the figures Figure 1: Comparison between the amino acid sequences of EZH2 and E (z) Figure 2: Comparison between the amino acid sequences of SUV39H and Su (var) 3 -9 Figure 3: Aberrant transcripts of human genes from the SET domain Figure 4: Evolutionary conservation of proteins from the SET domain Figure 5 Coincidence of amino acids in the SET domain Figure 6 DNA and amino acid sequences of EZH2 Figure 7 DNA and amino acid sequences of SUV39H Figure 8 Ccomparison of partial sequences of the human EZH2 and EZH1 cDNAs Example a) Production of a cDNA library A cDNA library specific for human B cells was produced, as described by Bardwell and Treisman, 1994, isolating poly (A) + RNA from human BJA-B cells, reverse transcribing by priming with poly (dT) ) 15 and converting it into double-stranded cDNA. After the addition of an EcoRI adapter of the 5 'sequence AATTCTCGAGCTCGTCGACA, the cDNA was ligated into the EcoRI site of the bacteriophage gt10. The propagation and amplification of the library were carried out within E. coli C600. b) Production of DNA probes Drosophila DNA probes, which encode the conserved SET domains of E (z) and Su (var) 3 r 9, were produced by a polymerase chain reaction (PCR) on the basis of the published Drosophila sequences (Jones and Gelbart, 1993; Tschiersch et al. , 1994): 1 μg of DNA from Drosophila melanogaster (Clontech) was subjected to both primers E (z) 1.910 (5'ACTGAATTCGGCTGGGGCATCTTTCTTAAGG) and E (z) 2.280 (5'ACTCTAGACAATTTCCATTTCACGCTCTATG) to a PCR amplification (35 cycles of 30 seconds at 94 ° C, 30 seconds at 55 ° C and 30 seconds at 72 ° C). The corresponding probe with the EST domain for Su (var) 3 -9 was amplified by 10 ng of plasmid DNA (Tschiersch et al., 1994, clone M4) with the pair of primers suvar.up (5 'ATATAGTACTTCAAGTCCATTCAAAAGAGG) and suvar .dn (5 'CCAGGTACCGTTGGTGCTGTTTAAGACCG), using the same cycle conditions. The DNA fragments-overblasts of the SET domain were gel purified and partially sequenced in order to confirm whether the amplified sequences were correct. c) Scrutiny of the cDNA library x 105 plaque forming units, pfu) were incubated with 5 ml of the culture of the bacterial host strain E. coli C600 (suspended with an OD600 optical density of 0.5 in 10 mM MgSO4) at 37 ° C during days and then poured over a large preheated LB cuvette (200 mm x 200 mm). After overnight growth at 37 ° C the phages were absorbed on a nylon membrane (GeneScreen). The membrane, with the face with absorbed phage facing up, was left to float for 30 seconds in a denaturation solution (1.5 M NaCl, 0.5 M NaOH), then submerged for 60 seconds in a denaturation solution and finally neutralized for 5 minutes in 3 M NaCl, 0.5 M Tris pH 8. Then the membrane was briefly rinsed in 3xSSC and the phage DNA was fixed on the nylon filter by UV crosslinking. The filter was prehybridized for 30 minutes at 50 ° C in 30 ml of Church's buffer (1% BSA, 1 mM EDTA and 0.5 M NaHP04, pH 7.2), then 2 x 106 cpm of the mixture of radioactively labeled DNA probes (SET of E (z) and SET of Su (var) 3 -9), - the DNA probes were produced by random priming using the kit (kit) RediPrime (Amersham). Hybridization was carried out overnight at 50 ° C. After the hybridization solution was removed, the filter was washed for 10 seconds in 2xSSC, 1% SDS at room temperature, then washed at 50 ° C for 10 seconds. The filter was wrapped in a Saranwrap wrap and subjected to autoradiography using an enlarging or reinforcing sheet. Positive phage colonies were identified on the original plate by autoradiogram assignment and the corresponding agar pieces were removed with the larger end of a Pasteur pipette. The phage cluster was eluted overnight at 4 ° C in 1 ml of SM buffer (5.8 g of NaCl, 2 g of MgSO4-H20, 50 ml of Tris pH 7.5, 5 ml of gelatin 2% on l 1 of H20), which contained a few drops of CHC13. The phage lysate material was again spread on plates for a second and third scrutiny rounds in order to obtain individual, well-isolated positive plaques (from 20 to 100 plaques per plaque in the third round). d) Sequence analysis The recombinant phage cDNA inserts were subcloned in the polylinkers of pBluescript KS (Stratagene) and sequenced in an automatic sequencer (Applied Biosystems) using the dideoxy method. The complete sequence of at least two independent isolated materials for each gene obtained was ascertained by primer walking. The sequences were analyzed using the software package CGC (University of Wisconsin) and the research on homology was carried out with the "Blast and fasta" or "tfasta" programs of the Netzwerkservice. The complete sequences of EZH2 and SUV39H are depicted in Figures 6 and 7.
SEQUENCE PROTOCOL (1) GENERAL DATA (i) APPLICANT: (A) NAME: Boehringer Ingelheim International GmbH (B) STREET: Binger Strasse 173 (C) LOCALITY: Ingelheim am Rheim (E) COUNTRY: Germany (F) POSTAL CODE NUMBER: 55216 (G) TELEPHONE: 06132/77282 (H) TELEFAX: 06132/774377 (ii) TITLE OF THE INVENTION: Chromatin regulatory genes (iii) NUMBER OF SEQUENCES: 11 (iv) LEGIBLE VERSION BY COMPUTER: (A) DATA SUPPORT: floppy disk (B) COMPUTER: IBM PC compatible (C) OPERATING SYSTEM: PC-DOS / MS-DOS (D) LOGICAL PROGRAM: Patentln Relay n ° 1.0, Version No. 1.30 (EPA) (v) DATES OF THE CURRENT APPLICATION: APPLICATION NUMBER: PCT / EP 96/01818 (2) DATA ABOUT SEQ ID NO: 1: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 2,600 base pairs (B) TYPE: nucleotide (C) CHAIN FORM: single-chain (D) TOPOLOGY: linear (ii) TYPE OF THE MOLECULE: cDNA for mRNA (iii) HYPOTHESIS: NONE (iv) ANTICIPATION: NO (vi) ORIGINAL PROVENANCE: (A) ORGANISM: Homo sapiens (G) TYPE OF CELLS: B cells (ix) FEATURE: (A) NAME / KEY: 5 'UTR (B) POSITION: 11..89 (ix) FEATURE: (A) NAME / KEY: CDS (B) POSITION: 90..2330 (ix) FEATURE: (A) NAME / KEY: 3 'UTR (B) POSITION: 2331..2600 (xi) DESCRIPTION OF THE SEQUENCE: SEQ ID NO: 1 AGGCAGTGGA GCCsGGGCGG CGGCGGCGGC GGCGCGCGGG GGCGAOGCGC GOGAACAAOS 60 CGAGTCQGsG CGCGGGAOGA AGAA3-AATC ATO GGC CAG ACT GGG AAG AAA TCT 113 Met Gly Gln Thr Gly Lys Lys Ser 1 5 GAG AAG GGA CCA CTT TCT TOG CGG AAG CCT CTA AAA TCA GAG TAC ATO 161 Glu Lys Gly Pro Val Cys Trp Arg Lys Arg Val Lys Ser Glu Tyr Met 10 15 20 CGA CIG AGA CAG CTC AAG AGG TTC AGA CGA GCT GAT GAA CTA AAG AGT 209 Arg Leu Arg Gln Leu Lys Arg Phe Arg Arg Wing Asp Glu Val Lys S < ar 25 30 35 40 ATO TTT AGT TCC AAT CGT CAG AAA ATT TTG GAA AGA ACG GAA ATC TTA 257 Met Phe Ser Be Asn Arg Gln Lys He Leu Glu Arg Thr Glu He Leu 45 50 55 AAC CA GAA TGG AAA CAG CGA AGG ATA CAG CCT GIG CAC ATC CTG ACT 305 Asn Gln Glu Trp Lys Gln Arg Arg He Gln Pro Val His He Leu Thr 60 65 70 TCT GTG AGC TCA TTG CGC GGG ACT AGG GAG TCT TCG GIG ACC ACT GAC 353 Be Val Be Ser Leu Arg Gly Thr Arg Glu Cys S «er Val Thr Ser Asp 75 80 85 TTG GAT TTT CCA AC CAA GTC ATC CCA TTA AAG -ACT CTG AAT GCA GTT 401 Leu Asp Phe Pro Thr Gln Val He Pro Leu Lys Thr Leu Asn Wing Val 90 95 100 GCT TCA GTA CCC ATA ATG TAT TCT TGG TCT CCC CTA CAG CAG AAT TTT 449 Ala Ser Val Pro He Met Tyr Ser Trp Ser Pro Leu Gln Gln Asn Phe 105 110 115 120 ATG GTG GAA GAT GAA ACT GTT TTA CAT AAC ATT CCT TAT AIG GGA GAT 497 pt-? / Sl m. ? Aon m. . Thr-? I T on N "i? Csn Tl f = Pm TVr Mf- * 1- (TI v?.« = N-.
GAA GTT TTA GAT CAG GAT GGT ACT TTC AT GAA GAA CTA ATA AAA AAT 545 Glu Val Leu Asp Gln Asp Gly Thr Phe He Glu Glu Leu He Lys Asn 140 145 150 TAT GAT GGG AAA CTA CAC GGG GAT AGA GAA TGT GGG TTT ATA AAT GAT 593 Tyr Asp Gly Lys Val His Gly Asp Arg Glu Cys Gly Phe He Asn Asp 155 160 165 GAA ATT TTT GTG GAG TTG GTG AAT GCC CTT GCT CA TAT AAT GAT GAT 641 Glu He Phe Val Glu Leu Val Asn Ala Leu Gly Gln Tyr Asn Asp Asp 170 175 180 GAC GAT GAT GAT GAT GAC GAT CCT GAA GAA AGA GAA GAA AAG CAG 689 Asp Asp Asp Asp Asp Gly Asp Asp Pro Glu Glu Glu Arlu Glu Glu Lys Gln 185 '190 195 200 AAA GAT CTG GAG GAT CAC CGA GAT GAT AAA GAA AGC CGC CCA CCT CGG 737 Lys Asp Leu Glu Asp His Arg Asp Asp Lys Glu Be Arg Pro Pro Arg 205"210 215 AAA T --- T CCT TCT GAT AAA ATT TTT GAA GCC ATT TCC TCA AIG TTT CCA 785 Lys Phe Pro Ser Asp Lys He Phe Glu Wing Be Ser Met Phe Pro 220"225 230 GAT AAG GGC AC GCA GAA GAA CI? AAG GAA AAA TAT AAA GAA CTC ACC 833 Asp Lys Gly Thr Wing Glu Glu Leu Lys Glu Lys Tyr Lys Glu Leu Thr 235 240 245 GAA CAG CAG CTC CCA GGC GCA CTT CCT GAT TCT ACC CCC AAC ATA 881 Glu Gln Gln Leu Pro Gly Ala Leu Pro Pro Glu Cys Thr PT? Asn He 250 255 260 GAT GGA CCA AAT GCT AAA TCT GTT CAG AGA GAG CAA AGC TTA CAC TCC 929 Aso Gly Pro Asn Ala Lys Ser Val Gln Arg Glu Gln S «sr Leu His Ser 265 270 275 280 TTT CAT ACG CTT TTC TCT AGG CGA TCT TTT AAA TAT GAC TGC TTC CI? 977 Phe His Thr Leu Phe Cys Arg Arg Cys Phe Lys Tyr Aso Cys Phe Leu 285 290"295 CAT CCT TTT CAT GCA ACA CCC AAC ACT TAT AAG CGG AAG AAC ACA GAA 1025 His Pro Phe His Wing Thr Pro Asn Thr Tyr Lys Arg Lys Asn Thr Glu 300 305 310 ACA GCT CTA GAC AAC AAA CCT TCT GGA CCA CAG TCT TAC CAG CAT TTG 1073 Thr Ala Leu Aso Asn Lys Pro Cys Gly -Pro Gln Cys Tyr Gln His Leu 315"320" 325 GAG GGA GCA AAG GAG TTT GCT GCT GCT CTC ACC GCT GAG CGG ATA AAG 1121 Glu Gly Ala Lys Glu Phe Ala Ala Ala Leu Thr Ala Glu Arg He Lys 330 335 340 ACC CCA CCA AAA OCT CCA GGA GGC CGC AJ --- A AGA GGA CGG CTT CCC AAT 1169 Thr Pro Pro Lys Ag Pro Gly Gly Arg Arg Arg Gly Arg Leu Pro Asn 345 350 355 360 AAC ACT AGC AGG CCCAGCA - rCCTA- ATTAATGTCCTGGA ^ 1217 -Asn Be Ser Arg Pro Be Thr Pro Thr He Asn Val Leu Glu Ser Lys 365 370 375 GAT AC GAC ACT GAT AGG GAA GCA GGG ACT GAA ACO GGG GGA GAC AAC 1265 Asp Thr Asp S «er Asp Arg Glu Ala Gly Thr Glu Thr Gly Glu Glu Asn 380 385 390 AAT GAT AAA GAA GAA GAA GAG AAG AAA GAT GAA ACT TCG AGC TCC TCT 1313 Asn Asp Lys Glu Glu Glu Glu Lys Lys Asp Glu Thr S <; sr Ser S «sr S« sr 395 400 405 GAA GCA AAT TCT CGG TCT ACA CAA AC CCA ATA AAG AG AAG OCA AAT ATT 1361 Glu Wing Asn Ser Arg Cys Gln Thr Pro He Lys Met Lys Pro Asn He 410 415 420 GAA CCT CCT GAG AAT GIG GAG TGG ACT GCT GCT GAA GCC TCA ATG TTT 1409 Glu Pro Pro Glu -Asn Val Glu Trp S-er Gly Wing Glu Wing S-sr Met Phe 425 430 435 440 AGA GTC CTC ATT GGC ACT TAC TAT GAC AAT TTC TCT GCC ATT GCT AG 1457 Arg Val Leu He Gly Thr Tyr Tyr Asp Asn Phe Cys Wing He Wing Arg 445 450 455 TTA ATT GGG ACC AAA ACA TCT AGA CAG GTG TAT GAG ITT AGA GTC AAA 1505 Leu He Gly Thr Lys Thr Cys Arg Gln Val Tyr Glu Phe Arg Val Lys 460 465 470 GAA TCT AGC ATC ATA GCTCX-? GCTCCCGCTGAGGATGTGGAT ACT CCT 1553 Glu Ser Ser He Wing Pro Ala Wing Pro Wing Glu Asp Val Asp Thr Pro 475 480 485 CCA AGG AAA AAG AGG AAA AGG CAC CGG TTG TGG GCT GCA CAC TGC AGA 1601 Pro Arg Lys Lys Lys Arg Lys His Arg Leu Trp Wing Wing His Cys Arg 490 495 500 AAG ATA CAG CTG AAA AAG GAC GGC TCC TCT AAC CAT GTT TAC AAC TAT 1649 Lys He Gln Leu Lys Lys Asp Gly Ser Ser Asn His Val Tyr Asn Tyr 505 510 515 520 CAA CCC TGT GAT CAT CCA CQG CAG CCT TCT GAC ACT TCG TGC CCT TGT 1697 Gln Pro Cys Asp His Pro Arg Gln Pro Cys Asp Ser Ser Cys Pro Cys 525 530 535 CTG ATA GCA CAA AAT TTT TCT GAA AAG TTT TCT CAA TCT ACT TCA GAG 1745 Val He Wing Gln Asn Phe Cys Glu Lys Phe Cys Gln Cys Ser Ser Glu 540 545 550 TCT CAA AAC CGC TTT CCG GGA TGC CGC TGC AAA GCA CAG TGC AAC ACC 1793 Cys Gln Asn Arg Phe Pro Gly Cys Arg Cys Lys Wing Gln Cys Asn Tbr 560 565 AAG CAG TGC CCG TGC TAC CTG GCT GTC CGA GAG TGT GAC CCT GAC CTC 1841 Lys Gln Cys Pro Cys Tyr Leu Wing Val Arg Glu Cys Asp Pro Asp Leu 570 575 580 TGT CTT ACT TCT GGA GCC GCT CAT TGG GAC ACT AAA AAT GIG TCC 1889 Cys Leu Thr Cys Gly Ala Wing Asp His Trp Asp Ser Lys Asn Val Ser 585 590 595 600 TGC AAG AACTGCACTATTCAGCGGsGCTCC AAA AAG CAT CTA TTG CIG 1937 Cys Lys Asn Cys Ser He Gln Arg Gly Ser Lys Lys His Leu Leu Leu 605 610 615 GCA CCA TCT GAC CTG GCA GGC TGG GGG ATT TTT ATC AAA GAT CCT CTG 1985 Wing P ro Be Asp Val Wing Gly Trp Gly He Phe He Lys Asp Pro Val 620 625 630 CAG AAA AAT GAA TTC ATC TCA GAA TAC TGT GGA GAT ATT ATT TCT CAA 2033 Gln Lys Asn Glu Phe He Ser Glu Tyr Cys Gly Glu He He Ser Gln 635 640 645 GAT GAA GAC GAC AGA GGG AAA GTG TAT GAT AAA TAC ATG TGC AGC 2081 Asp Glu Wing Asp Arg Arg Gly Lys Val Tyr Asp Lys Tyr Met Cys Ser 650 655 660 TTT CTG TTC AAC TTG AAC AAT GAT TTT CTG CTG GAT GCA ACC CGC AAG 2129 Phe Leu Phe Asn Leu Asn Asn Asp Phe Val Val Asp Wing Thr Arg Lys 665 670 675 680 GCT AAC AAA ATT CCT ITT GCA AAT CAT TCG GTA AAT CCA AAC TGC TAT 2177 Gly Asn Lys He Arg Phe Wing Asn His Ser Val Asn Ero Asn Cys Tyr 685 690 695 GCA AAA GTT AIG ATG GTT AAC GCT GAT CAC AGG ATA GCT ATT TTT GCC 2225 Wing Lys Val Met Met Val Asn Gly Aso His Arg He Gly He Phe Wing 700 705 710 AAG AGA GCC ATC CAG ACT GGC GAA GAG CTG TTT TTT GAT TAC AC - A TAC 2273 Lys A g Ala He Gln Thr Gly Glu Glu eu Phe Phe Asp Tyr Arg Tyr 715 720 725 AGC CAG GCT GAT GCC CTG AAG TAT GTC GGC ATC GAA AGA GAA ATG GAA 2321 Ser Gln Ala Asp Ala Leu Lys Tyr Val Gly He Glu Arg Glu I * fet Glu 730 735 740 ATC CCT TGA CATCTGCTAC CICCTCCCCC TCCTCTGAAA CAGC-TGCCTT 2370 He Pro * 745 AGCTTCAGGA ACCTCGAGIA CIGTGGGCAA TTTAGAAAAA GAACAIGCAG TT3-GAAA --- TC 2430 TGAA --- TIGCA AAGIACICTA AGAATAA --- TT AI-A - I - AA-IGA GTTTAAAAAT CAACTTTTTA 2490 TTGCCTTCTC ACCAGCTGCA A lUp lG TACCAGIGAA TITIGGCAAT -AA --- X - CA --- TCAT 2550 GGTACATITT TCAA ITI TO AIAAAGAAIA CT --- GAACITG TCAAAAAAAA 2600 (2) DATA ABOUT SEQ ID NO: 2 (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 747 amino acids (B) TYPE: amino acid (D) TOPOLOGY: l ineal (ii) TYPE OF THE MOLECULE: Protein (xi) DESCRIPTION OF THE SEQUENCE: SEQ ID NO: 2: Met Gly Gln Thr Gly Lys Lys Ser Glu Lys Gly Pro Val Cys Trp Arg 1 5? or 15 Lys Arg Val Lys Ser Glu Tyr Met Arg Leu Arg Gln Leu Lys Arg Phe 20 25 3rd Arg Arg Wing Asp Glu Val Lys Ser Met Phe Ser Being Asn Ars Gln Lvs 35 40 45 He Leu Glu Arg Thr Glu He Leu Asn Gln Glu Trp Lys Gln Arg Arg 50 55 60 lie Gln Pro Val His He Leu Thr Be Val Be Ser Leu Arg Gly Thr 65 70 75 8th Pro Leu Lys Thr Leu Asn Ala Val Ala Ser Val Pro He Met Tyr Ser 100 105 110 Trp Ser Pro Leu Gln Gln Asn Phe Met Val Glu Asp Glu Thr Val Leu 115 120 125 His Asn He Pro Tyr Met Gly Asp Glu Val Leu Asp Gln Asp Gly Thr 130 135 140 Phe He Glu Glu Leu He Lys Asn Tyr Asp Gly Lys Val His Gly Asp 145 150 155 160 Arg Glu Cys Gly Phe He Asn Asp Glu He Phe Val Glu Leu Val Asn 165 170 175 Wing Leu Gly Gln Tyr Asn Asp Asp Asp Asp Asp Asp Asp Asp Gly Asp Asp 180 185 190 Pro Glu Glu Arg Glu Glu Lys Gln Lys Asp Leu Glu Asp His Arg Asp 195 200 205 Asp Lys Glu Ser Arg Pro Pro Arg Lys Phe Pro Ser Asp Lys He Phe 210 215 220 Glu Wing He Ser Ser Met Phe Pro Asp Lys Gly Thr Ala Glu Glu Leu 225 230 235 240 Lys Glu Lys Tyr Lys Glu Leu Thr Glu Gln Gln Leu Pro Gly Ala Leu 245 250 255 Pro Pro Glu Cys Thr Pro Asn He Asp Gly Pro Asn Wing Lys Ser Val 260 265 270 Gln Arg Glu Gln Ser Leu His Ser Phe His Thr Leu Phe Cys Arg Arg 275 280 285 Cys Phe Lys Tyr Asp Cys Phe Leu His Pro Phe His Ala Thr Pro Asn 290 295 300 Thr Tyr Lys Arg Lys Asn Thr Glu Thr Ala Leu Asp Asn Lys Pro Cys 305 310 315 320 Gly Pro Gln Cys Tyr Gln His Leu Glu Gly Ala Lys Glu Phe Ala Ala 325 330 335 Ala Leu Thr Ala Glu Arg He Lys Thr Pro Pro Lys Arg Pro Gly Gly 340 345 350 Arg Arg Arg Gly Arg Leu Pro Asn Asn Ser Ser Arg Pro Ser Thr Pro 355 360 365 Thr He Asn Val Leu Glu Ser Lys Asp Thr Asp Ser Asp Arg Glu Wing 370 375 380 Gly Thr Glu Thr Gly Gly Glu Asn Asn Asp Lys Glu Glu Glu Glu Lys 385 390 395 400 Lys Asp Glu Thr Being Being Ser Glu Wing Asn Being Arg Cys Gln T-hr 405 410 415 Pro He Lys Met Lys Pro Asn He Glu Pro Pro Glu Asn Val Glu Trp 420 425 430 Ser Gly Wing Glu Wing Being Met Phe -Arg Val Leu He Gly Thr Tyr Tyr 435 440 445 Asp Asn Phe Cys Wing He Wing Arg Leu He Gly Thr Lys Thr Cys Arg 450 455 460 Gln Val Tyr Glu Phe Arg Val Lys Glu Ser Ser He He Ala Pro Wing 465 470 475 480 Pro Ala Glu Asp Val Asp Thr Pro Pro Arg Lys Lys Lys Arg Lys His 485 490 495 Arg Leu Ttp Wing Wing His Cys Arg Lys He Gln Leu Lys Lys Asp Gly 500 505 510 Ser Ser Asn His Val Tyr Asn Tyr Gln Pro Cys Asp His Pro Arg Gln 515 520 525 Pro Cys Asp Ser Ser Cys Pro Cys Val He Wing G -ln Asn Phe Cys Glu 530 535 540 Lys Phe Cys Gln Cys Ser Ser Glu Cys Gln Asn Arg Phe Pro Gly Cys 545 550 555 560 Arg Cys Lys Wing Gln Cys Asn Thr Lys Gln Cys Pro Cys Tyr Leu -Ala 565 * 570 575 Val Arg Glu Cys Asp Pro Asp Leu Cys Leu Thr Cys Gly Wing Wing Asp 580 585 590 Hs Tro Asp Ser Lys Asn Val Ser Cys Lys Asn Cys Ser He Gln Ag 595 600 605 Gly Ser Lys Lys His Leu Leu Leu Ala Pro Ser Asp Val Wing Gly Trp 610 * 615 620 Gly He Phe He Lys Asp Pro Val Gln Lys Asn Glu Phe He Ser Glu 625 630 635 640 Tyr Cys Gly Glu He He Ser Gln Asp Glu Wing Asp Arg Arg Gly Lys 645 650 655 Val Tyr Asp Lys Tyr Met Cys Ser Phe Leu Phe Asn Leu Asn Asn Asp 660 665 670 Phe Val Val Asp Wing Thr Arg Lys Gly Asn Lys He Arg P-he Wing Asn 675 680 685 His Ser Val Asn Pro Asn Cys Tyr Ala Lys Val Met Met Val Asn Gly 690 695 700 Asp His Arg He Gly He Phe Wing Lys Arg Wing He Gln Thr Gly Glu 705 710 715 720Glu Leu Phe Phe Asp Tyr Arg Tyr Ser Gln Wing Asp Wing Leu Lys Tyr 725 730 735 Val Gly He Glu Arg Glu and Glu He Pro * 740 745 DATA ABOUT SEQ ID NO: 3: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 2732 base pairs (B) TYPE: nucleotide (C) CHAIN FORM: single-chain (D) TOPOLOGY: linear (ii) TYPE OF THE MOLECULE: cDNA for mRNA (iii) HYPOTHESIS: NONE (iv) ANTICIPATION: NO (vi) ORIGINAL ORIGINAL (OR) ORGANISM: Homo sapiens (G) TYPE OF CELLS: B cells (ix) CHARACTERISTICS: (A) NAME / KEY: 5 'UTR (B) POSITION: 1. 44 (ix) FEATURE: (A) NAME / KEY: CDS (B) POSITION: 45. 1283 (ix) FEATURE: ( A) NAME / KEY: 3 'UTR (B) POSITION: 1284. . 2732 (xi) DESCRIPTION OF THE SEQUENCE: SEQ ID NO: 3 TCGCGAGGCC GGCTAGGCCC GAAIGTCCTT AGCCCTGGGG AAAG ATG GCG GAA AAT 56 Met Wing Glu Asn 750 TTA AAA GGC TGC AGC GTG TCT TGC-AAG TCT TCT TGG AAT CAG CTG CAG 104 Leu Lys Gly Cys Ser Val Cys Cys Lys Ser Ser T p Asn Gln Leu Gln 755 760 765 GAC CTG TGC CGC CIG GCC AAG CTC TCC TGC CCT GCC CTC GGT ATC TCT 152 Asp Leu Cys Arg Leu Wing Lys Leu Ser Cys Pro Wing Leu Gly He Ser 770 775 780 AAG AGG AAC CTC TAT GAC TTT GAA GTC GAG T-AC CTG TGC GAT TAC AAG 200 Lys Arg Asn Leu Tyr Asp Phe Glu Val Glu Tyr Leu Cys Asp Tyr Lys 785 790 795 AAG ATC CGC GAA CAG GAA TAT TAC CIG CTG AAA TGG CCT GGA TAT CC-A 248 Lys He Arg Glu Gln Glu Tyr Tyr Leu Val Lys Trp Arg Gly Tyr Pro 800 805 810 815 GAC TCA GAG AGC ACC TGG GAG CCA CGG CAG AAT CTC AAG TCT GTG CCT 296 Asp Ser Glu Ser Thr Trp Glu Pro Arg Gln Asn Leu Lys Cys Val Arg 820 825 830 ATC CTC AAG CAG TTC CAC AAG GAC TTA GAA AGG GAG CIG CTC CGG CGG 344 He Leu Lys Gln Phe His Lys Asp Leu Glu Arg Glu Leu Leu Arg Arg 835 840 845 CAC CAC CGG TCA AAG ACC COC CGG CAC CTG GAC OCA AGC TTG GCC AAC 392 His His Arg Ser Lys Thr Pro Arg His Leu Asp Pro Ser Leu Wing Asn 850 855 860 TAC CTG GTG CAG AAG GCC AAG CAG AGG CGG GCG CTC CCT CGC TGG GAG 440 Tyr Leu Val Gln Lys Ala Lys Gln Arg Arg Ala Leu Arg Arg Trp Glu 865 870 875 CAG GAG CTC AAT GCC AAG CGC AGC CAT CIG GGA CGC ATC ACT CTA GAG 488 Gln Glu Leu Asn Wing Lys Arg Ser His Leu Gly Arg He Thr Val Glu 880 885 890 895 AAT GAG GTG GAC CTG GAC GGC CCT CCG CGG GCC TTC GTG TAC ATC AAT 536 Asn Glu Val Asp Leu Asp Gly Pro Pro Arg Wing Phe Val Tyr He Asn 900 905 910 GAG TAC CCT CTT GCT GAG GGC ATC ACC CTC AAC CAG GTG GCT GTG GGC 584 Glu Tyr Arg Val Gly Glu Gly He Thr Leu Asn Gln Val Wing Val Giy 915 920 925 TGC GAG TGC CAG GAC TCT CTG TGG GCA CCC ACT GGA GGC TGC TGC CCG 632 Cys Glu Cys Gln sp Cys Leu Trp Wing Pro Tnr Giy Gly Cys Cys Pro 930 935 940 GGG GCG TCA CTG CAC AAG ITT GCC TAC AAT GAC CAG GGC CAG GTG CGG 680 Gly Ala Ser Leu His Lys Phe Wing Tyr Asn Aso Gln Gly Gln Val Arg 945 950"955 CTT OGA GCC GGG CTG CCC ATC TAC GAG TGC AAC TCC CGC TGC CGC TGC 728 Leu Arg Ala Gly Leu Pro He Tyr Glu Cys Asn Ser Arg Cys Arg Cys 960 965 970 975 GGC TAT GAC TGC OCA AAT CCT GTG CTA CAG AAG GCT ATC CGA TAT GAC 776 Gly Tyr Asp Cys Pro Asn Arg Val Val Gln Lys Gly He Arg Tyr Asp 980 985"990 CTC TGC ATC TTC CGG ACG GAT GAT GGG CCT GGC TGG GGC GTC CGC ACC 824 Leu Cys He Phe Arg Thr Asp Asp Gly Arg Gly Tro Gly Val Arg Thr 995 1000 '1005 CIG GAG AAG ATT CGC AAG AAC AGC TTC GTC ATG GAG TAC GTG GGA GAG 872 Leu Glu Lys He Arg Lys Asn Ser Phe Val Met Glu Tyr Val Gly Glu 1010 1015 1020 ATC ATT ACC TCA GAG GAG GAG GAG CGG CGG GGC CAG ATC TAC GAC CGT 920 He He Thr Ser Glu Glu Wing Glu Arg Arg Gly Gln He Tyr Asp Arg 1025 1030 1035 CAG GGC GCC ACC TAC CTC TTT GAC CTG GAC TAC GTC GAG GAC GTG TAC 968 Gln Gly Wing Thr Tyr Leu Phe Asp Leu Asp Tyr Val Glu Asp Val Tyr 1040 1045 1050 1055 A-TC CTG GAT GCC sCC T- ^ TAT GGC -AAC ATC TCC CAC T ^ 1016 Thr Val Asp Wing Wing Tyr Tyr Gly Asn He Ser His Phe Val Asn His 1060 1065 1070 ACT TCT GAC CCC AAC CTG CAG GTG TAC AAC GTC TTC ATA GAC AAC CTT 1064 S «3r Cys Asp Pro Asn Leu Gln Val Tyr, Asn Val Phe He Asp Asn Leu 1075 1080 1085 GAC GAG CX ^ CTG CCC CGC ATC GCT TTC TTT GCC ACA AGA ACC ATC CGG 1112 Asp Glu Arg Leu Pro Arg He Wing Phe Phe Wing Thr Arg Thr He Arg 1090 1095 1100 GCA GGC GAG GTC CTC ACC TTT GAT TAC AAC ATG CA GTG GAC CCC GTG 1160 Wing Gly Glu Glu Leu Thr Phe Asp Tyr Asn Met Gln Val Asp Pro Val 1105 1110 1115 GAC ATG GAG AGC AOC CGC AIG GAC TCC AAC TTT GGC CIG GCT GGG CTC 1208 Asp Met Glu Ser Thr Arg Met Asp Ser Asn Phe Gly Leu Wing Gly Leu 1120 1125 1130 1135 CCT GGC TCC CCT AAG AAG CGG GTC CGT ATT GAA TGC AAG TCT GGG ACT 1256 Pro Gly Pro Pro Lys Arg Val Arg He Glu Cys Lvs Cys Gly Thr 1140 1145 * 1150 GAG TCC TGC CGC AAA TAC CTC TTC TAC CCCTTAGAAG TCIGAGGCCA 1303 Glu S-er Cys Arg Lys Tyr Leu Phe * 1155 1160 GACIGACIGA GGGGGCCTGA A --- CTACA-IGC ACC-TCOCCCA CT --- CTGC-CCT CCIGTCGAGA 1363 AIGACIGCCA GGGCCTOGCC TGCCTCCACC TCOCCCCACC TGCTCCIACC TGCTCTACGT 1423 TGAGGGCTGT GGCCGTGCTG AGGACCGACT CCAGGAGTCC CCT - TCCCTG TCCCAGCCCC 1483 ATCTGTGGCT TGC -? CTTACA AACCCCCAOC C-ACCTTCAGA AA ---- AGITTTT CAACATCAAG 1543 ACTCTCTGTC GTIGGGATTC AIGGCCTATT AAGGAGGTCC AAGGGGTGAG TO --- CAACOCA 1603 GCTC --- AGAAT -AT-AITTCTTT TIGCACCTCC TTCIGCCTGG AGAITGAQGG GTCIGCTGCA 1663 GGCCTCCTCC CIGCTGCCCC AAAGCTAIGG GGAAGCAACC CCAGAGCAGG CAGACATCAG 1723 A-GGCCAGACT GCC --- A --- OCCG A-CATGAAGCT GGTTCCCCAA CC-ACAGAAAC TTTGTAC --- AG 1783 TGAAAGAAAG GGGTCCCTGG CXTIAOGGGCT GAGGCTGCTT TUTGCTO3IG CTIACAGIGC 1843 TG --- TG-AGTCT TGGCCCTAAG A --- CTAGGG TCTCTTCTTC AGGGCT- GCAT ATCIGAGAAG 1903 TGGAIGCCCA CAIGCCACIG GAAGGGAACT GGGTGTCCAT GGGC-CACTGA GCAGIGAG-AG 1963 GAAOGCAGÍIG CAGAGCTGGC (--- AGCCCTGGA GOTAGGCTGG GACCAAGCTC TGCCTTCAC-? 2023 GTGCAG-IG-AA GG-IACCTAOG GClL'l'iGGGA GCTCTGCGCT TGCTAGG33C CCTGADCTGG 2083 QGTCTCAIGA CCGCIGACAC CACTCAGAGC TGGAftGCAAG AICTAGAIAG IX- JIAS -A 2143 GCAC TAGGA (CAAS ^ A-IGTG C? TG G GTGGTGA-D3V GGTGCC GsC ACIA ---- GTA ---- A 2203 GCACCTGGTC CAOGIGGAIT CTCTCAGOGA AGCCTTGAAA ACCACGGAQG TGGAIGCCAG 2263 GAAAGGGCCC AIGTGGCAGA AGGCAAACTA CAGGCCAAGA ATTGGGGGTG GGGGAGATGG 2323 CTTCCCCACT ATGGGATGAC GAGGCGAGAG GGAAGCCCTT GCIGCCTCCC AITCCCAGAC 2383 CrCAGCCCTT TGTGCTCACC CIGCTTCCAC TGGTCTCAAA ACTCACCIGC C - ACAAATCT 2443 ACAAAAGGOG AAGGTTCTGA TGGCTGCCTT GCTCCTTGCT CCCCCACCCC CTCTGAGGAC 2503 TTCTCTAGGA -AGTOCTTCCT GACIACCICT GCCCAGAGIG C-0 - CT-A ---- A-IG AGACTCTAIG 2563 CCCIGCTATC -ACffiTGCCAGA TUTAIGTGTC TGTCTCTCTG TCCATCCCGC CGGCCCCCCA 2623 GACIAACCTC CAGGCAIGGA (CTGAATCTGG TTCTCCTCIT GTACACCCCT CAACCC-IA1G 2683 CAGCCTQGAG TGGGCATCAA TAAAAIGAAC TGTCGACIGA AAAAAAAAA 2732 (2) DATA ABOUT SEQ ID NO: 4: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 413 amino acids (B) TYPE: Amino Acid (D) TOPOLOGY: Linear (ii) TYPE OF THE MOLECULE: Protein (xi) DESCRIPTION OF THE SEQUENCE: SEQ ID NO: 4.
Met Wing Glu Asn Leu Lys Gly Cys Ser Val Cys Cys Lys Ser Ser Trp 1 5 10 15 Asn Gln Leu Gln Asp Leu Cys Arg Leu Wing Lys Leu Ser Cys Pro Wing 20 25 30 Leu Gly He Ser Lys Arg Asn Leu Tyr Asp Phe Glu Val Glu Tyr Leu 35 40 45 Cys Asp Tyr Lys Lys He Arg Glu Gln Glu Tyr Tyr Leu Val Lvs Tro 50 55 60 Arg Gly Tyr Pro Asp Ser Glu Ser Thr Trp Glu Pro Arg Gln Asn Leu 65 70 75 80 Lys Cys Val Arg He Leu Lys Gln Phe His Lys Asp Leu Glu Arg Glu 85 90 95 Leu Leu Arg Arg His His Arg Ser Lys Thr Pro Arg His Leu Asp Pro 100 105 110 Ser Leu Ala Asn Tyr Leu Val Gln Lys Ala Lys Gln Arg Arg Ala Leu 115 120 125 Arg Arg Trp Glu Gln Glu Leu Asn Wing Lys Arg Ser His Leu Gly Arg 130 135 140 He Thr Val Glu Asn Glu Val Asp Leu Asp Gly Pro Pro Arg Ala Phe 145 150 155 160 Val Tyr He Asn Glu Tyr Arg Val Gly Glu Gly He Tr Leu Asn Gln 165 170 175 Val Wing Val Gly Cys Glu Cys Gln Asp Cys Leu Trp Wing Pro Thr Gly 180 185 190 Gly Cys Cys Pro Gly. Ala Ser Leu His Lys Phß Ala Tyr Asn Asp Gln 195 200 205 Gly Gln Val Arg Leu Arg Wing Gly Leu Pro He Tyr Glu Cys Asn Ser 210 215 220 Arg Cys Arg Cys Gly Tyr Asp Cys Pro Asn Arg Val Val Gln Lys Gly 225 230 235 240 He Arg Tyr Asp Leu Cys He Phe Arg Thr Asp Asp Gly Arg Gly Trp 245 250 Gly Val Arg Thr Leu Glu Lys He Arg Lys Asn S-sr Phe Val Met Glu 260 265 270 Tyr Val Gly Glu He He Thr Ser Glu Glu Ala Glu Arg Arg Gly Gln 275 280 285 He Tyr Asp Arg Gln Gly Wing Thr Tyr Leu Phe Asp Leu Asp Tyr Val 290 295 300 Glu Asp Val Tyr Thr Val Asp Wing Wing Tyr Tyr Gly Asn He Ser His 305 310 315 320 Phe Val Asn His Ser Cys Asp Pro Asn Leu Gln Vial Tyr Asn Val Phe 325 330 335 He Asp Asn Leu Asp Glu Arg Leu Pro Arg He Wing Phe Phe Wing Thr 340 345 350 Ag Thr He Arg Wing Gly Glu Glu Leu Thr Phe Asp Tyr Asn Met Gln 355 360 365 Val Asp Pro Val Asp Met Glu Ser Thr Arg Met Asp Be Asn Phe Gly 370 375 380 Leu Wing Gly Leu Pro Gly Ser Pro Lys Lys Arg Val Arg He Glu Cys 385 390 395 400 Lys Cys Gly Thr Glu Ser Cys Arg Lys Tyr Leu Phe * 405 410 \ 2) DATA ABOUT SEQ ID NO: 5: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 489 base pairs (B) TYPE: nucleotide (C) CHAIN FORM: single-chain (D) TOPOLOGY: linear (ii) TYPE OF THE MOLECULE: cDNA for mRNA (iii) HYPOTHESIS: NONE (iv) ANTICIPATION: NO (vi) ORIGINAL PROVENANCE: (A) ORGANISM: Homo sapiens (G) TYPE OF CELLS: B cells (ix) CHARACTERISTICS: (A) NAME / KEY: CDS (B) POSITION: 1. ^ 341 (C) OTHER FEATURE: partial sequence homology with SEQ ID NO. 1 (ix) CHARACTERISTICS: (A) NAME / KEY: hypothetical non-coding region (B) POSITION: 342..489 (xi) DESCRIPTION OF THE SEQUENCE: SEQ ID NO: 5: 46 Leu Thr Cys Gly Wing Ser Glu His Trp Asp Cys Lys Val Val Ser 5 10 15 TCTAAAAAC TCCACCATC CAG aCT GGA CITAAGAAG 94 Cys Lys Asn Cys Ser He Gln Arg Gly Leu Lys Lys Ss S S S 0 25 3o Ala Prn rz TZ? ^ ^ ^ ^ A ^ ^ ATA AAG GAG TCT GTG Wing Pro Being Asp Val Wing Gly Trp G Gllyy TThhrr PPhhee HHee LLyyss GGlluu Ser Val 142 35 40 45 190 GAT GAG GCT SCT O-? CsC GGAAAGGTC T-AT CíACAAATACAroT ^ 238 Asp Glu Wing Asp Arg Arg Gly Lys Val Tyr Asp Lys Tyr Met Ser Ser 65 70 75 TTC CTC TTC AAC CTC AAT AAT GAT TTT CTA GTC GAT GCT ACT OGG AAA 286 Phe Leu Phe Asn Leu Asn Asn Asp Phe Val Val A = p Wing Thr Arg Lys 80 85 90 95 GGA AAC AAA ATT CGA TTT GCA AAT CAT TCA CTG AAT CCC AC TGT TAT 334 Gly Asn Lys He Arg Phe Wing Asn His Ser Val Asn Pro Asn Cys Tyr 100 105 110 GCC AAA G GTGAGTCCCA GIBAaCIGGG AGGIGGGGTG G3G3AIGGAT GU'I 'IT-L-AC 391 Ala Lys TGTGA --- TTCC AIT? GT? CTT --AACATITTC CTIAGCTGAG Cl-AIILTI IG 441 TO - AAAGA ---- A ATCA1GA? TA AIATCIGCTA TC-A-1TT --- AGG C --- a - TC-TC 489 (2) DATA ABOUT SEQ ID NO: 6: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 113 amino acids (B) TYPE: amino acids (D) TOPOLOGY: linear (E) OTHER CHARACTERISTICS: Partial sequence, homology for SEQ ID NO: 2 (ii) TYPE OF THE MOLECULE: Protein (xi) DESCRIPTION OF THE SEQUENCE: SEQ ID NO: 6: Leu Thr Cys Gly Wing Ser Glu His Trp Asp Cys Lys Val Val Ser Cys 1 5 10 15 Lys Asn Cys Ser He Gln Arg Gly Leu Lys Lys His Leu Leu Leu Wing 20 25 30 Pro Ser Asp Val Wing Gly Tro Gly Thr Phe He Lys Glu Ser Val Gln 35"40 45 Lys Asn Glu Phe He Ser Glu Tyr Cys Gly Glu Leu He Ser Gln Asp 50 55 60 Glu Wing Asp Arg Arg Gly Lys Val Tyr Asp Lys Tyr Met Ser Ser Phe 65 70 75 80 Leu Phe Asn Leu Asn Asn Aso Phe Val Val Asp Wing Thr Arg Lys Gly 85 90 95 Asn Lys He Arg Phe Wing Asn His Ser Val Asn Pro Asn Cys Tyr Wing 100? R = = - (2) DATA ABOUT SEQ ID NO : 7: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 20 base pairs (B) TYPE: nucleotide (C) CHAIN FORM: single-chain (D) TOPOLOGY: linear (ii) TYPE OF THE MOLECULE: cDNA (iii) HYPOTHESIS: NONE (iv) ANTICIPATION: NO (ix) CHARACTERISTICS: synthetic adapter molecule (xi) DESCRIPTION OF THE SEQUENCE: SEQ ID NO: 7: AATTCTCGAG CTCGTOGACA 20 (2) DATA ABOUT SEQ ID NO: 8: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 31 base pairs (B) TYPE: nucleotide (C) CHAIN FORM: single-chain (D) TOPOLOGY: linear (ii) TYPE OF THE MOLECULE: DNA (iii) HYPOTHESIS: NONE (iv) ANTICIPATION: NO (ix) CHARACTERISTICS: synthetic primer molecule (xi) DESCRIPTION OF THE SEQUENCE: SED ID: NO: 8: ACTGAATTCG GC - TGGGGCAT CITTCITAAG G 31 (2) DATA ABOUT SEQ ID NO: 9: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 31 base pairs (B) TYPE: nucleotide (C) CHAIN FORM: single-chain (D) TOPOLOGY: linear (ii) TYPE OF THE MOLECULE: DNA (iii) HYPOTHESIS: NONE (iv) ANTICIPATION: NO (ix) CHARACTERISTICS: synthetic primer molecule (xi) DESCRIPTION OF THE SEQUENCE: SEQ ID NO: 9: ACICTAGACA A --- TTCCATTT CACGCTCTAT G 31 (2) DATA ABOUT SEQ ID NO: 10: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 30 base pairs (B) TYPE: nucleotide (C) CHAIN FORM: single-chain (D) TOPOLOGY: linear (ii) TYPE OF THE MOLECULE: DNA (iii) HYPOTHESIS: NONE (iv) ANTICIPATION: NO (ix) CHARACTERISTICS: synthetic primer molecule (xi) DESCRIPTION OF THE SEQUENCE: SEQ ID: NO: 10: ATATA-GTCT TCAA-CTCCAT TCAAAAGAGG 30 ¡2) DATA ABOUT SEQ ID NO: 11: (i) CHARACTERISTICS OF THE SEQUENCE: (A) LENGTH: 29 base pairs (B) TYPE: nucleotide (C) CHAIN FORM: single-stranded (D) TOPOLOGY: linear (ii) TYPE OF THE MOLECULE: DNA (iii) HYPOTHESIS: NONE (iv) ANTICIPATION: NONE (ix) CHARACTERISTICS: synthetic primer molecule (xi) DESCRIPTION OF THE SEQUENCE: SEQ ID NO: 11: CCAGG ^ COG TTGGIGCICT TTAAGACCG 29 Bibliography Adams et al., 1992, Genes & Dev. 6, 1589-1607. Alkema et al., 1995, Nature 374, 724-727. Allshire et al., 1994, Cell 76, 157-169. Bardwell and Treisman, 1994, Genes & Dev. 8, 1644-1677. Brunk et al., 1991, Nature 353, 351-355. Buck and Shore, 1995, Genes & Dev. 9, 370-384. Cleary, 1991, Cell 66, 619-622. Counter et al., 1992, Embo J. 11, 1921-1928. DeCamillis et al., 1992, Genes & Dev. 6, 223-232. Eissenberg et al., 1992, Genetics 131, 345-352. Friedman et al., 1994, Cancer Research 54, 6374-6382. Garzino et al., 1992, Embo J. 11, 4471-4479. Geraghty et al., 1993, Genomics 16, 440-446. Gibbons et al., 1995, Cell 80, 837-845. Gu et al., 1992, Cell 71, 701-708. Haupt et al., 1991, Cell 65, 753-763. Jolly, D., 1994, Cancer Gene Therapy 1, 51. Jones and Gelbart, 1993, MCB 13 (10), 6357-6366. Kennedy et al., 1995, Cell 80, 485-496. Locke et al., 1988, Genetics 120, 181-198. Malavsi, F. and Albertini, A., 1992, TIBTECH 10, 267-269. Messmer et al., 1992, Genes & Dev. 6, 1241-1254. Milner and Campbell, 1993, Biochem. J. 290, 811-818. Mitani, K. and Caskey, C.T., 1993, Trends in Biotechnology 11, 162-166. Nomura et al., 1994, unpublished, access number to GeneBank: D31891. Orlando and Paro, 1993, Cell 75, 1187-1198. Pardue, 1991, Cell 66, 427-431. Rastelli et al., 1993 Wmbo J. 12, 1513-1522. Renauld et al., 1993, Genes & Dev. 7, 1133-1145. Reuter and Spierer, 1992, BioEssays 14, 605-612. Rhein, R., 1993. The Journal of NIH Res. 5, 40-46. Smouse and Perrimon, 1990, Dev. Biol. 139, 169-185. Solomon et al., 1991, Science 254, 1153-1160. Tepper, R.l. and Mulé, J.J., 1994, Human Gene Therapy 5, 153.
Tkachuk et al., 1992, Cell 71, 691-700. Tschiersch et al., 1994, Embo J. 13 (16), 3822-3831. Vile, R. and Russel S., 1994, Gene Therapy 1, 88. Zatloukal, K., Schmidt, W., Cotten, M., Wagner, E., Stingl, G. and Birnstiel, ML, 1993, Gene 135 , 199

Claims (22)

  1. CLAIMS 1.- DNA molecules, which contain a sequence that encodes a chromatin regulatory protein, which has a SET domain, or a partial sequence thereof, characterized in that they contain the nucleotide sequence represented in Figure 6, Figure 7 or Figure 8, which encodes EZH2, SUV39H or EZH1, including its degenerate variants as well as mutants thereof.
  2. 2. DNA molecule according to claim 1, characterized in that it is a cDNA.
  3. 3. Molecule of DNA according to claim 1 or 2, characterized in that it is of human origin.
  4. 4. - cDNA according to claim 3 having the designation EZH2.
  5. 5. - cDNA according to claim 3 which has the designation SUV39H.
  6. 6. DNA molecule according to claim 4, characterized in that it contains the region encoding the SET domain of EZH2 or a segment thereof.
  7. 7. DNA molecule according to claim 5, characterized in that it contains the region encoding the SET domain of SUV39H or a segment thereof.
  8. 8. DNA molecule according to claim 4, characterized in that it encodes an EZH2 mutant "which is negative in terms of dominants.
  9. 9. DNA molecule according to claim 4, characterized in that it encodes a mutant of SUV39H negative in terms of dominants.
  10. 10. Recombinant DNA molecule, which contains a cDNA defined in claim 2, functionally linked with expression control sequences, for expression in prokaryotic or eukaryotic host organisms.
  11. 11.- Prokaryotic or eukaryotic host organisms transformed by a recombinant DNA according to claim 10.
  12. 12. - EZH2 protein, regulator of recombinant human chromatin or a segment thereof, obtainable by expression of a cDNA defined in the claim 4.
  13. 13. SUV39H protein, regulator of recombinant human chromatin or a segment thereof, obtainable by expression of a cDNA defined in claim 5.
  14. 14. Antibodies against EZH2.
  15. 15.- Antibodies against SUV39H.
  16. 16.- (Deoxy) antisense ribonucleotides having complementarity with a partial sequence of a DNA defined in claim 1.
  17. 17. - DNA molecule, which encodes the SET domain of a chromatin regulator gene or a segment thereof , for the treatment and diagnosis of diseases of human beings, which have to be attributed to a gene disorder of chromatin regulators, which have a SET domain.
  18. 18. DNA molecule according to claim 6 or 7 for the treatment and diagnosis of diseases of humans, which have to be attributed to a derangement of genes of chromatin regulators, which have a SET domain.
  19. 19. - Antibodies according to claim 14 or 15 for the treatment and diagnosis of diseases of humans, which have to be attributed to a derangement of genes of chromatin regulators, which have a SET domain.
  20. 20. - Transgenic mouse, which contains a transgene for the expression of a chromatin regulator gene, which has a SET domain, or a mutated version of such a protein.
  21. 21.- Mouse -fCnocJc-out, obtainable from embryonic stem cells, in which the endogenous mouse loci for Ezhl and Suv39h were interrupted by homologous recombination.
  22. 22. Process for the identification of genes of mammalian chromatin regulators having a SET domain, or mutated versions thereof, characterized in that cDNA libraries from mammalian animals or genomic DNA libraries are hybridized under conditions of low stringency. sity with a DNA molecule that encodes the SET domain or a segment of it.
MX9708401A 1995-05-10 1996-05-02 Chromatin-regulator genes. MX9708401A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
DE19516776.7 1995-05-10
DE19516776A DE19516776A1 (en) 1995-05-10 1995-05-10 Chromatin regulatory genes
PCT/EP1996/001818 WO1996035784A2 (en) 1995-05-10 1996-05-02 Chromatin-regulator genes

Publications (2)

Publication Number Publication Date
MXPA97008401A true MXPA97008401A (en) 1998-02-01
MX9708401A MX9708401A (en) 1998-02-28

Family

ID=7761318

Family Applications (1)

Application Number Title Priority Date Filing Date
MX9708401A MX9708401A (en) 1995-05-10 1996-05-02 Chromatin-regulator genes.

Country Status (11)

Country Link
US (2) US6689583B1 (en)
EP (1) EP0827537B1 (en)
JP (1) JP4295824B2 (en)
AT (1) ATE404667T1 (en)
CA (1) CA2220442C (en)
DE (2) DE19516776A1 (en)
DK (1) DK0827537T3 (en)
ES (1) ES2313726T3 (en)
MX (1) MX9708401A (en)
PT (1) PT827537E (en)
WO (1) WO1996035784A2 (en)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19516776A1 (en) 1995-05-10 1996-11-14 Boehringer Ingelheim Int Chromatin regulatory genes
US6239327B1 (en) * 1998-04-16 2001-05-29 Cold Spring Harbor Laboratory Seed specific polycomb group gene and methods of use for same
NZ530573A (en) 1999-03-11 2005-09-30 Genesis Res & Dev Corp Ltd Compositions and methods for the modification of gene transcription
US20020039776A1 (en) 2000-06-09 2002-04-04 Thomas Jenuwein Mammalian SUV39H2 proteins and isolated DNA molecules encoding them
US6555329B2 (en) * 2000-06-09 2003-04-29 Boehringer Ingelheim International Gmbh Method for identifying compounds altering higher-order chromatin-dependent chromosome stability
US20020155460A1 (en) * 2000-10-10 2002-10-24 Genencor International Inc. Information rich libraries
CA2451654A1 (en) 2001-06-22 2003-01-03 Ceres, Inc. Chimeric histone acetyltransferase polypeptides
US20040053848A1 (en) * 2001-08-23 2004-03-18 Allis C. David Antibodies specific for methylated lysines in histones
AU2002359916A1 (en) * 2001-12-27 2003-07-15 Takeda Chemical Industries, Ltd. Preventives/remedies for cancer
GB0228900D0 (en) * 2002-12-11 2003-01-15 Ml Lab Plc Cancer Immunotherapy
AU2003292737A1 (en) * 2002-12-24 2004-07-22 Takeda Pharmaceutical Company Limited Preventives/remedies for cancer
CN1914320A (en) * 2003-12-03 2007-02-14 中外制药株式会社 Expression system with the use of mammalian beta -actin promoter
JP2005170799A (en) * 2003-12-08 2005-06-30 Univ Kurume Hla (human leukocyte antigen)-a24 binding peptide of enhancer of zeste homolog 2
US7563589B2 (en) * 2004-06-01 2009-07-21 The University Of North Carolina At Chapel Hill Reconstituted histone methyltransferase complex and methods of identifying modulators thereof
EP1987148A4 (en) * 2006-02-27 2009-08-05 Imgen Co Ltd De-differentiation of astrocytes into neural stem cell using bmi-1
US8574832B2 (en) * 2010-02-03 2013-11-05 Massachusetts Institute Of Technology Methods for preparing sequencing libraries
WO2011160206A1 (en) 2010-06-23 2011-12-29 Morin Ryan D Biomarkers for non-hodgkin lymphomas and uses thereof
US20130195843A1 (en) * 2010-06-23 2013-08-01 British Columbia Cancer Agency Branch Biomarkers for Non-Hodgkin Lymphomas and Uses Thereof
HUE028977T2 (en) * 2010-09-10 2017-02-28 Epizyme Inc Method for determining the suitability of inhibitors of human ezh2 in treatment
US9175331B2 (en) * 2010-09-10 2015-11-03 Epizyme, Inc. Inhibitors of human EZH2, and methods of use thereof
EP2621942A4 (en) * 2010-09-29 2015-01-21 Gen Hospital Corp Agents providing controls and standards for immuno-precipitation assays
JO3438B1 (en) 2011-04-13 2019-10-20 Epizyme Inc Aryl- or heteroaryl-substituted benzene compounds
TWI598336B (en) 2011-04-13 2017-09-11 雅酶股份有限公司 Substituted benzene compounds
EP2836491B1 (en) 2012-04-13 2016-12-07 Epizyme, Inc. Salt form of a human histone methyltransferase ezh2 inhibitor
US9006242B2 (en) 2012-10-15 2015-04-14 Epizyme, Inc. Substituted benzene compounds
CA2905070A1 (en) 2013-03-14 2014-09-25 Genentech, Inc. Methods of treating cancer and preventing cancer drug resistance
CN103215352B (en) * 2013-03-29 2015-03-11 中国医学科学院血液病医院(血液学研究所) Assay kit for detecting stimulated emission depletion 2 (STED 2) gene mutation
DK3057962T3 (en) 2013-10-16 2023-11-06 Epizyme Inc HYDROCHLORIDE SALT FORM FOR EZH2 INHIBITION

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5486623A (en) 1993-12-08 1996-01-23 Prototek, Inc. Cysteine protease inhibitors containing heterocyclic leaving groups
DE19516776A1 (en) 1995-05-10 1996-11-14 Boehringer Ingelheim Int Chromatin regulatory genes
EP0956032A4 (en) 1996-12-20 2002-09-04 Univ Texas Proteins and compositions for modulating mitosis
US6004933A (en) 1997-04-25 1999-12-21 Cortech Inc. Cysteine protease inhibitors
US5972608A (en) * 1997-08-27 1999-10-26 University Of Massachusetts Assays and reagents for chromatin remodeling enzymes and their modulators
EP1029547A1 (en) 1999-02-15 2000-08-23 BOEHRINGER INGELHEIM INTERNATIONAL GmbH Pharmaceutically active compounds and method for identifying same
US20020039776A1 (en) * 2000-06-09 2002-04-04 Thomas Jenuwein Mammalian SUV39H2 proteins and isolated DNA molecules encoding them
US6555329B2 (en) * 2000-06-09 2003-04-29 Boehringer Ingelheim International Gmbh Method for identifying compounds altering higher-order chromatin-dependent chromosome stability
EP1227160A1 (en) 2001-01-19 2002-07-31 BOEHRINGER INGELHEIM INTERNATIONAL GmbH Compounds modulating sister chromatid separation and method for identifying same
US20020164620A1 (en) * 2001-01-19 2002-11-07 Boehringer Ingelheim International Gmbh Method for identifying compounds modulating sister chromatid separation
EP1527164A4 (en) * 2001-05-04 2006-02-01 Univ Florida Cloning and sequencing of pyruvate decarboxylase (pdc) genes from bacteria and uses therefor

Similar Documents

Publication Publication Date Title
MXPA97008401A (en) Regulatory genes of the cromat
CA2220442C (en) Chromatin regulator genes
US7560532B2 (en) Smad6 and uses thereof
Mariani et al. Two murine and human homologs of mab-21, a cell fate determination gene involved in Caenorhabditis elegans neural development
US20040180335A1 (en) Novel chromosome 21 gene marker, compositions and methods using same
AU747576B2 (en) Smad7 and uses thereof
US5965427A (en) Human RAD50 gene and methods of use thereof
WO1998004590A1 (en) Conservin compositions and therapeutic and diagnostic uses therefor
Ostendorff et al. Functional characterization of the gene encoding RLIM, the corepressor of LIM homeodomain factors
Rodrigues et al. Characterization of Ngef, a novel member of the Dbl family of genes expressed predominantly in the caudate nucleus
Bartoli et al. Cloning of human striatin cDNA (STRN), gene mapping to 2p22–p21, and preferential expression in brain
Chiang et al. Identification and Analysis of the Human and Murine Putative Chromatin Structure Regulator SUPT6H andSupt6h
WO1994013802A9 (en) D4 gene and methods of use thereof
AU5748694A (en) D4 gene and methods of use thereof
WO1997038004A1 (en) Gene family associated with neurosensory defects
US6503502B1 (en) Nucleotide sequences, proteins, drugs and diagnostic agents of use in treating cancer
US6268216B1 (en) Reagents and methods for the screening of compounds useful in the treatment of neurological diseases
Yaraghi The neuronal apoptosis inhibitory protein (NAIP) analysis of human and murine genetics.
EP0963436A1 (en) Nucleic acid encoding congenital heart disease protein and products related thereto
Dixon Isolation and Analysis of the Murine Orthologue of the Treacher Collins Syndrome Gene
JP2000152790A (en) Protein relevant to proliferation of neurocyte and gene coding for the protein
McKie Molecular characterisation of murine Nfe211
WO2001079467A2 (en) Identification of a renal nadph oxidase