AU743457B2 - Compositions and methods relating to DNA mismatch repair genes - Google Patents

Compositions and methods relating to DNA mismatch repair genes Download PDF

Info

Publication number
AU743457B2
AU743457B2 AU17284/99A AU1728499A AU743457B2 AU 743457 B2 AU743457 B2 AU 743457B2 AU 17284/99 A AU17284/99 A AU 17284/99A AU 1728499 A AU1728499 A AU 1728499A AU 743457 B2 AU743457 B2 AU 743457B2
Authority
AU
Australia
Prior art keywords
seq
leu
sequence
ser
ile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU17284/99A
Other versions
AU1728499A (en
Inventor
Sean M. Baker
Roni J Bollag
C. Eric Bronner
Richard D. Kolodner
Robert M. Liskay
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dana Farber Cancer Institute Inc
Oregon Health Science University
Original Assignee
Dana Farber Cancer Institute Inc
Oregon Health Science University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU14424/95A external-priority patent/AU1442495A/en
Application filed by Dana Farber Cancer Institute Inc, Oregon Health Science University filed Critical Dana Farber Cancer Institute Inc
Priority to AU17284/99A priority Critical patent/AU743457B2/en
Publication of AU1728499A publication Critical patent/AU1728499A/en
Application granted granted Critical
Publication of AU743457B2 publication Critical patent/AU743457B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Description

AUSTRALIA
Patents Act 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name of Applicants: OREGON HEALTH SCIENCES
UNIVERSITY
S
S..
DANA-FARBER CANCER INSTITUTE Actual Inventors: Address for Service: SEAN M. BAKER RONI J. BOLLAG RICHARD D. KOLODNER C. ERIC BRONNER ROBERT M. LISKAY CULLEN CO., Patent Trade Mark Attorneys, 240 Queen Street, Brisbane, Qld. 4000, Australia.
Invention Title: COMPOSITIONS AND METHODS RELATING TO DNA MISMATCH REPAIR
GENES
The following statement is a full description of this invention, including the best method of performing it known to us.
la COMPOSITIONS AND METHODS RELATING TO DNA MISMATCH REPAIR GENES This invention was made with government support under Agreement No. GM 32741 and Agreement No. HG00395/GM50006 awarded by the National Institute of Health in the General Sciences Division. The government has certain rights in the invention.
Field of the Invention The present invention involves DNA mismatch repair genes. In particular, the invention relates to identification of mutations and polymorphisms in DNA mismatch repair genes, to identification and characterization of DNA :i mismatch-repair-defective tumors, and to detection of genetic susceptibility to cancer.
~Background In recent years, with the development of powerful cloning and amplification techniques such as the polymerase chain reaction (PCR), in combination with a rapidly accumulating body of information concerning the structure and location of numerous human genes and markers, it has become .0 practical and advisable to collect and analyze samples of DNA or RNA from individuals who are members of families which are identified as exhibiting a high frequency of certain genetically transmitted disorders. For example, screening procedures are routinely used to screen for genes involved in sickle cell anemia, cystic fibrosis, fragile X chromosome syndrome and multiple sclerosis. For some types of disorders, early diagnosis can greatly improve the person's long-term prognosis by, for example, adopting an aggressive diagnostic routine, and/or by 2 making life style changes if appropriate to either prevent or prepare for an anticipated problem.
Once a particular human gene mutation is identified and linked to a disease, development of screening procedures to identify high-risk individuals can be relatively straight forward. For example, after the structure and abnormal phenotypic role of the mutant gene are understood, it is possible to design primers for use in PCR to obtain amplified quantities of the gene from individuals for testing. However, initial discovery of a mutant gene, its structure, location and linkage with a known inherited health problem, requires substantial experimental effort and creative research strategies.
One approach to discovering the role of a mutant gene in causing a disease begins with clinical studies on individuals who are in families which exhibit a high frequency of the disease. In these studies, the approximate location of the disease-causing locus is determined indirectly by searching for a chromosome marker which tends to segregate with the locus. A principal limitation of this approach is that, although the approximate genomic location of the gene can be determined, it does not generally allow actual isolation or sequencing of the gene. For example, Lindblom et al.
3 reported results of linkage analysis studies performed with SSLP (simple sequence length polymorphism) ZUo markers on individuals from a family known to exhibit a high incidence of hereditary non-polyposis colon cancer (HNPCC). Lindblom et al. found a "tight linkage" between a polymorphic marker on the short arm of human chromosome 3 (3p2l-23) and a disease locus apparently responsible for increasing an individual's risk of developing colon cancer. Even though 3p21-23 is a fairly specific location relative to the entire genome, it represents a huge DNA region relative to the probable size of the mutant gene. The mutant gene could be separated from the markers identifying the locus by millions of bases. At best, such linkage studies have only limited utility for screening purposes because in order to predict one person's risk, genetic analysis must be performed with tightly linked genetic markers on a number of related individuals in the family. It is often impossible to obtain such information, particularly if affected family members are deceased. Also, informative markers may not exist in the family under analysis. Without knowing the gene's structure, it is not possible to sample, amplify, sequence and determine directly whether an individual carries the mutant gene.
Another approach to discovering a disease-causing mutant gene begins with design and trial of PCR primers, based on known information about the disease, for example, theories for disease state mechanisms, related protein structures and function, possible analogous genes in humans or other species, etc.
The objective is to isolate and sequence candidate normal genes which are believed to sometimes occur in mutant forms rendering an individual disease prone. This approach is highly dependent on how much is known about the disease at the molecular level, and on the investigator's ability to construct strategies and methods for finding candidate genes. Association of a mutation in a candidate gene with a disease must ultimately be demonstrated by performing tests on members of a family which exhibits a high incidence of the disease. The most direct and definitive way to confirm such linkage in family studies is to use PCR primers which are designed to amplify portions of the candidate gene in samples collected from the family members. The amplified gene products are then sequenced and compared to the normal gene structure for the purpose of finding and characterizing mutations. A given mutation is ultimately implicated .Z by showing that affected individuals have it while unaffected individuals do not, and that the mutation causes a change in protein function which is not simply a polymorphism.
Another way to show a high probability of linkage between a candidate gene mutation and disease is by determining the chromosome location of the gene, then comparing the gene's map location to known regions of diseaselinked loci such as the one identified by Lindblom et al. Coincident map location of a candidate gene in the region of a previously identified disease-linked locus may strongly implicate an association between a mutation in the candidate gene and the disease.
There are other ways to show that mutations in a gene candidate may be linked to the disease. For example, artificially produced mutant forms of the gene can be introduced into animals. Incidence of the disease in animals carrying the mutant gene can then be compared to animals with the normal genotype. Significantly elevated incidence of disease in animals with the mutant genotype, relative to animals with the wild-type gene, may support the theory that mutations in the candidate gene are sometimes responsible for occurrence of the disease.
One type of disease which has recently received much attention because of the discovery of disease-linked gene mutations is Hereditary Nonpolyposis Colon Cancer (HNPCC).
12 Members of HNPCC families also display increased susceptibility to other cancers including endometrial, ovarian, gastric and breast. Approximately 10% of colorectal cancers are believed to be HNPCC. Tumors from HNPCC patients display an unusual genetic defect in which short, repeated DNA sequences, such as the dinucleotide repeat sequences found in human chromosomal DNA ("microsatellite DNA"), appear to be unstable. This genomic instability of short, repeated DNA sequences, sometimes 5 called the "RER+" phenotype, is also observed in a significant proportion of a wide variety of sporadic tumors, suggesting that many sporadic tumors may have acquired mutations that are similar (or identical) to mutations that are inherited in HNPCC.
Genetic linkage studies have identified two HNPCC loci thought to *p account for as much as 90% of HNPCC. The loci map to human chromosome 2pl5-16 (2p21) and 3p21-23. Subsequent studies have identified human DNA mismatch repair gene hMSH2 as being the gene on chromosome 2p21, in which mutations account for a significant fraction of HNPCC cancers.
1 2, 12 hMSH2 is one of several genes whose normal function is to identify and correct DNA mispairs including those that follow each round of chromosome replication.
The best defined mismatch repair pathway is the E.coli MutHLS pathway that promotes a long-patch (approximately 3Kb) excision repair reaction which is dependent on the mutH, mutL, mutS and mutU (uvrD) gene products.
The MutHLS pathway appears to be the most active mismatch repair pathway in E.coli and is known to both increase the fidelity of DNA replication and to act on recombination intermediates containing mispaired bases. The system has been reconstituted in vitro, and requires the mutH, mutL, mutS and uvrD (helicase II) proteins along with DNA polymerase III holoenzyme, DNA ligase, single-stranded DNA binding protein (SSB) and one of the single-stranded DNA exonucleases, Exo I, Exo VII or RecJ. hMSH2 is homologous to the bacterial mutS gene. A similar pathway in yeast includes the yeast MSH2 gene and two mutL-like genes referred to as PMS1 and MLH1.
With the knowledge that mutations in a human mutS type gene (hMSH2) sometimes cause cancer, and the discovery that HNPCC tumors exhibit microsatellite DNA instability, interest in other DNA mismatch repair genes and gene products, and their possible roles in HNPCC and/or other cancers, has intensified. It is estimated that as many as 1 in 200 individuals carry a mutation in either the hMSH2 gene or other related genes which encode for other proteins in the same DNA mismatch repair pathway.
*4 An important objective of our work has been to identify human genes which are useful for screening and identifying individuals who are at elevated risk of developing cancer. Other objects are: to determine the sequences of exons and flanking intron structures in such genes; to use the structural information to design testing procedures for the purpose of finding and characterizing mutations which result in an absence of or defect in a gene product which confers cancer susceptibility; and to distinguish such mutations from 4" "harmless" polymorphic variations. Another object is to use the structural information relating to exon and flanking intron sequences of a cancer-linked gene, to diagnose tumor types and prescribe appropriate therapy. Another object is to use the structural information relating to a cancer-linked gene to identify other related candidate human genes for study.
Summary of the Invention Based on our knowledge of DNA mismatch repair mechanisms in bacteria and yeast including conservation of mismatch repair genes, we reasoned that human DNA mismatch repair homologs should exist, and that mutations in such homologs affecting protein function, would be likely to cause genetic instability, possibly leading to an increased risk of developing certain forms of human cancer.
We have isolated and sequenced two human genes, hPMS1 and hMLHI each of which encodes for a protein involved in DNA mismatch repair.
hPMS1 and hMLHI are homologous to mutL genes found in E.coli. Our studies strongly support an association between mutations in DNA mismatch repair genes and susceptibility to HNPCC. Thus, DNA mismatch repair gene sequence information of the present invention, namely, cDNA and genomic structures relating to hMLH1 and hPMS1, make possible a number of useful methods relating to cancer risk determination and diagnosis. The invention also encompasses a large number of nucleotide and protein structures which are useful in such methods.
We mapped the location of hMLH1 to human chromosome 3p21-23.
This is a region of the human genome that, based upon family studies, harbors a locus that predisposes individuals to HNPCC. Additionally, we have found a mutation in a conserved region of the hMLHI cDNA in HNPCC-affected individuals from a Swedish family. The mutation is not found in unaffected individuals from the same family, nor is it a simple polymorphism. We have also found that a homologous mutation in yeast results in a defective DNA mismatch repair protein. We have also found a frameshift mutation in hMLHI of affected individuals from an English family. Our discovery of a cancer-linked mutations in hMLH1, combined with the gene's map position which is coincident with a previously identified HNPCC-linked locus, plus the likely role of the hMLH1 gene in mutation avoidance makes the hMLH1 gene a prime candidate for underlying one form of common inherited human cancer, and a prime candidate to screen and identify individuals who have an elevated risk of developing cancer.
hMLH1 has 19 exons and 18 introns. We have determined the location of each of the 18 introns relative to hMLHI cDNA. We have also determined the structure of all intron/exon boundary regions of hMLH1.
Knowledge of the intron/exon boundary structures makes possible efficient screening regimes to locate mutations which negatively affect the structure and function of gene products. Further, we have designed complete sets of oligonucleotide primer pairs which can be used in PCR to amplify individual complete exons together with surrounding intron boundary structures.
We mapped the location of hPMS1 to human chromosome 7.
Subsequent studies by others 39 have confirmed our prediction that mutations in this gene are linked to HNPCC.
The most immediate use of the present invention will be in screening tests on human individuals who are members of families which exhibit an unusually high frequency of early onset cancer, for example HNPCC.
Accordingly, one aspect of the invention comprises a method of diagnosing cancer susceptibility in a subject by detectirig a mutation in a mismatch repair gene or gene product in a iissue from the subject, wherein the mutation is indicative of the subject's susceptibility to cancer. In a preferred embodiment of the invention, the step of detecting comprises detecting a mutation in a human mutL homolog gene, for example, hMLH1 of hPMS1.
The method of diagnosing preferably comprises the steps of: 1) amplifying a segment of the mismatch repair gene or gene product from an isolated nucleic acid; 2) comparing the amplified segment with an analogous segment of a wild-type allele of the mismatch repair gene or gene product; and 3) detecting a difference between the amplified segment and the analogous segment, the difference being indicative of a mutation in the mismatch repair gene or gene product which confers cancer susceptibility.
Another aspect of the invention provides methods of determining whether the difference between the amplified segment and the analogous wildtype segment causes an affected phenotype, does the sequence alteration 0:000i affect the individual's ability to repair DNA mispairs.
The method of diagnosing may include the steps of: 1) reverse transcribing all or a portion of an RNA copy of a DNA mismatch repair gene; and 2) amplifying a segment of the DNA produced by reverse transcription. An amplifying step in the present invention may comprise: selecting a pair of oligonucleotide primers capable of hybridizing to opposite strands of the mismatch repair gene, in an opposite orientation; and performing a polymerase chain reaction utilizing the oligonucleotide primers such that nucleic acid of the mismatch repair chain intervening between the primers is amplified to become the amplified segment.
8 In preferred embodiments of the methods summarized above, the DNA mismatch repair gene is hMLHI or hPMS1. The segment of DNA corresponds to a unique portion of a nucleotide sequence selected from the group consisting of SEQ ID NOS: 6-24. "First stage" oligonucleotide primers selected from the group consisting of SEQ ID NOS: 44-82 are used in PCR to amplify the DNA segment are. The invention also provides a method of using "second stage" nested primers (SEQ ID NOS: 83-122), for use with the first stage primers to allow more specific amplification and conservation of template DNA.
Another aspect of the present invention provides a method of identifying and classifying a DNA mismatch repair defective tumor comprising detecting in a tumor a mutation in a mismatch repair gene or gene product, preferably a mutL homolog (hMLH1 or hPMS1), the mutation being indicative of a defect in a mismatch repair system of the tumor.
The present invention also provides useful nucleotide and protein S compositions. One such composition is an isolated nucleotide or protein structure including a segment sequentially corresponding to a unique portion of a human mutL homolog gene or gene product, preferably derived from either hMLH1 or hPMS1.
Other composition aspects of the invention comprise oligonucleotide "20 primers capable of being used together in a polymerase chain reaction to amplify specifically a unique segment of a human mutL homolog gene, preferably hMLH1 or hPMS1.
Another aspect of the present invention provides a probe including a nucleotide sequence capable of binding specifically by Watson/Crick pairing to complementary bases in a portion of a human mutL homolog gene; and a labelmoiety attached to the sequence, wherein the label-moiety has a property selected from the group consisting of fluorescent, radioactive and chemiluminescent.
We have also isolated and sequenced mouse MLH1 (mMLH1) and PMS1 (mPMSI) genes. We have used our knowledge of mouse mismatch repair genes to construct animal models for studying cancer. The models will be useful to identify additional oncogenes and to study environmental effects on mutagenesis.
9 We have produced polyclonal antibodies directed to a portion of the protein encoded by mPMS1 cDNA. The antibodies also react with hPMS1 protein and are useful for detecting the presence of the protein encoded by a normal hPMSI gene. We are also producing monoclonal antibodies directed to hMLH and hPMS1.
In addition to diagnostic and therapeutic uses for the genes, our knowledge ofhMLH1 and hPMS 1 can be used to search for other genes of related function which are candidate for playing a role in certain forms of human cancer.
Having broadly described the invention in the proceeding paragraphs, the various embodiments of the invention can thus be defined as set forth thereafter.
According to a first embodiment of the invention, there is provided an isolated nucleic acid molecule including a segment having a sequence selected from the group consisting of SEQ ID NOS: 6-24.
According to a second embodiment of the invention, there is provided an isolated nucleic acid molecule including a segment having a sequence selected from the group 15 consisting of SEQ ID NOS: 25-43.
i. According to a third embodiment of the invention, there is provided an isolated polynucleotide including a unique segment of a human mutL homolog gene, wherein said segment is selected from at least 13 nucleotides the same as any 13 nucleotides sequence in any one of SEQ ID NOS: 6-24 or 26-43.
According to a fourth embodiment of the invention, there is provided an isolated oooo• human nucleic acid molecule comprising a segment of at least 13 nucleotides the same as any According to a fifth embodiment of the invention, there is provided an isolated polynucleotide comprising a segment of at least 13 nucleotides the same as any 13 nucleotide 25 sequence in any one of SEQ ID NOS: 7-24.
According to a sixth embodiment of the invention, there is provided an isolated polynucleotide comprising a sequence of at least 13 nucleotides the same as any 13 nucleotide sequence in any one of SEQ ID NOS: 26-43.
According to a seventh embodiment of the invention, there is provided a composition comprising isolated nucleic acid molecules containing a human sequence encoding hMLH1, wherein said hMLH1 sequence is selected from the group consisting of: SEQ ID NOS: 6- 9a 24, nucleic acid sequences complementary to and nucleic acid sequences containing a segment of at least 13 nucleotides the same as any 13 nucleotide sequence in (a) or According to an eighth embodiment of the invention, there is provided an isolated nucleotide sequence encoding an hMLH1 protein having an amino acid sequence as set forth in SEQ ID NO: According to a ninth embodiment of the invention, there is provided an isolated polynucleotide comprising a segment having the same nucleotide sequence as shown in SEQ ID NO: 4 between nucleotide numbers 135-312.
According to a tenth embodiment of the invention, there is provided copies of a DNA sequence amplified from DNA or RNA, each copy comprising a segment of at least 13 nucleotides of hPMSI as shown in SEQ ID NO: 132.
According to a eleventh embodiment of the invention, there is provided an isolated human gene comprising exons having a combined nucleotide sequence as shown in SEQ ID 15 NO: 4, and introns intervening the exons.
According to a twelfth embodiment of the invention, there is provided isolated DNA comprising a first strand containing a first segment of at least 13 nucleotides of hPMSI cDNA as shown in SEQ ID NO: 132, and a second strand containing a second segment which is complementary to at least 13 nucleotides ofhPMSI cDNA as shown in SEQ ID NO: 132, wherein the sequences of the segments can be used to design a pair of oligonucleotide S* primers for amplifying specifically at least a portion of hPMSL.
According to a thirteenth embodiment of the invention, there is provided an isolated Shuman polynucleotide comprising a sequence of at least 13 nucleotides the same as any 13 nucleotide sequence in SEQ ID NO: 132, wherein the polynucleotide contains at least one 25 mutation that causes a failure to produce a functionally normal hPMS 1 protein.
According to a fourteenth embodiment of the invention, there is provided an isolated human polynucleotide comprising a sequence of at least 13 nucleotides the same as any 13 nucleotide sequence in SEQ ID NOS: 26-43, wherein the polynucleotide contains at least one mutation that causes a failure to produce a functionally normal hMLHI protein.
According to a fifteenth embodiment of the invention, there is provided an isolated segment of a human gene, comprising an intron portion and an exon portion, the exon portion //presponding sequentially to a fragment of hMLHI cDNA, wherein said fragment comprises 9b at least 13 nucleotides the same as any 13 nucleotide sequence in any one of SEQ ID NOS: 26-43.
According to a sixteenth embodiment of the invention, there is provided a method of diagnosing cancer susceptibility in a subject comprising detecting a mutation in a mutL homolog gene or gene product in a tissue of the subject, and diagnosing cancer susceptibility in the subject based on the detected mutation.
According to a seventeenth embodiment of the invention, there is provided a method of determining whether a person has a mutation in a DNA mismatch repair gene comprising determining a sample sequence of at least a segment of a mutL homolog gene from the person, and comparing the sample sequence to a wild type sequence of at least a segment of any one of SEQ ID NOS: 6-24 and 132.
According to an eighteenth embodiment of the invention, there is provided a method of determining whether a person has a mutation in a DNA mismatch repair gene comprising 15 determining a sample sequence of at least a segment of a mutL homolog gene from the person, and :comparing the sample sequence to a wild type sequence of at least a segment of SEQ ID NOS: 26 and 27.
According to a nineteenth embodiment of the invention, there is provided a method of determining whether a person has a mutation in a DNA mismatch repair gene comprising .ooo.i determining a sample sequence of at least a segment of a mutL homolog gene from the person, and comparing the sample sequence to a wild type sequence of at least a segment of the nucleotide sequence shown in SEQ ID NOS: 26-43.
25 According to a twentieth embodiment of the invention, there is provided a method of p determining whether there is an alteration in a mammalian DNA mismatch repair pathway comprising isolating a biological specimen from a mammal, testing the specimen for an alteration in a mutL homolog nucleotide sequence or its expression product, and comparing the results obtained in step with the results obtained from a wild-type control.
(7" 9c According to a twenty-first embodiment of the invention, there is provided a pair of oligonucleotide primers selected from the group consisting of SEQ ID NOS: 44-122.
According to a twenty-second embodiment of the invention, there is provided a method of diagnosing a DNA mismatch repair abnormality in a human subject, comprising the steps of collecting a sample from a human subject, and detecting whether there is an abnormal deficiency of a mutL homolog protein in the sample.
According to a twenty-third embodiment of the invention, there is provided a method of diagnosing a tumor associated with defective DNA mismatch repair in a human comprising isolating a tissue suspected of being a tumor from said human, and detecting an alteration in a mutL homolog gene or its expression product, wherein said alteration is indicative of a tumor associated with defective DNA mismatch repair.
15 According to a twenty-fourth embodiment of the invention, there is provided a method of diagnosing cancer in an individual, comprising ":comparing a polynucleotide sequence of a mutL homolog gene from a cancer cell from an individual with a polynucleotide sequence of the mutL homolog gene from a noncancer cell from the individual, and determining whether there is a difference between the polynucleotide sequence from the cancer cell in comparison to the polynucleotide sequence from the non-cancer cell.
According to a twenty-fifth embodiment of the invention, there is provided a kit for performing an assay to determine whether an individual has a mutation in a DNA mismatch repair gene comprising 25 a set of oligonucleotide primers which can be used to amplify specifically a portion of a mutL homolog gene.
According to a twenty-sixth embodiment of the invention, there is provided a kit for performing an assay to determine whether a human subject has an abnormal deficiency of a protein involved in a DNA mismatch repair pathway comprising a purified antibody that binds specifically to a mutL homolog protein.
According to a twenty-seventh embodiment of the invention, there is provided a kit i\\ST-F' -for performing an assay to determine whether an individual has a mutation in a DNA 9d mismatch repair gene comprising at least one or more allele-specific oligomer probe that is capable of detecting a mutant allele in a mutL homolog gene.
According to a twenty-eighth embodiment of the invention, there is provided a method of determining whether a person is susceptible to cancer comprising: collecting a sample from the person, contacting the sample with antibodies that bind specifically to a mutL homolog protein, and detecting binding or lack of binding antibodies to mutL homolog protein.
According to a twenty-ninth embodiment of the invention, there is provided a method of determining whether tumor tissue has a deficient amount of a mutL homolog protein comprising obtaining a cross-section of a tumor specimen, contacting the cross-section with a purified antibody that binds specifically to a mutL homolog protein, and 15 determining the presence or absence of antibody bound to a mutL homolog protein in the cross-section.
According to a thirtieth embodiment of the invention, there is provided a method of manufacturing antibodies that bind specifically to a mutL homolog protein comprising synthesizing or isolating at least a portion of a mutL homolog protein, overexpressing the protein in bacteria, purifying the protein, injecting the protein into a mouse, generating a hybridoma by fusing a lymphocyte from the mouse with a myeloma cell, and 25 isolating monoclonal antibodies produced by the hybridoma, wherein the antibodies
S
bind specifically to a mutL homolog protein.
Description of the Figures Figure 1 is a flow chart showing an overview of the sequence of experimental steps we used to isolate, characterize and use human and mouse PMS1 and MLH1 genes.
9e Figure 2 is an alignment of protein sequences for mutL homologs (SEQ ID NOS: 1-3) showing two highly-conserved regions (underlined) which we used to create degenerate PCR oligonucleotides for isolating additional mutL homologs.
Figure 3 shows the entire cDNA nucleotide sequence (SEQ ID NO: 4) for the human MLH1 gene, and the corresponding predicted amino acid sequence (SEQ ID NO: 5) for the human MLH1 protein. The underlined DNA sequences are the regions of cDNA that correspond to the degenerate PCR primers that were originally used to amplify a portion of the MLH1 gene (nucleotides 118-1385 and 343-359).
Figure 4A shows the nucleotide sequences of the 19 exons which collectively correspond to the entire hMLHI cDNA structure. The exons are flanked by intron boundary structures. Primer sites are underlined. The exons with their flanking intron structures correspond to SEQ ID NOS: 6-24. The exons, shown in non-underlined small case letters correspond to SEQ ID NOS: 25-43.
Figure 4B shows nucleotide sequences of primer pairs which have been used in PCR 15 to amplify the individual exons. The "second stage" *o *o *oo amplification primers (SEQ ID NOS: 83-122) are "nested" primers which are used to amplify target exons from the amplification product obtained with corresponding "first stage" amplification primers (SEQ ID NOS: 44-82). The structures in Figure 4B correspond to the structures in Tables 2 and 3.
Figure 5 is an alignment of the predicted amino acid sequences for human and yeast (SEQ ID NOS: 5 and 123, respectively) MLH1 proteins.
Amino acid identities are indicated by boxes and gaps are indicated by dashes.
Figure 6 is a phylogenetic tree of MutLrelated proteins.
Figure 7 is a two-panel photograph. The first panel is a metaphase spread showing hybridization of the hMLH1 gene of chromosome 3.
The second panel is a composite of chromosome 3 from multiple metaphase spreads aligned with a human chromosome 3 ideogram. The region of hybridization is indicated in the ideogram by a vertical bar.
Figure 8 is a comparison of sequence chromatograms from affected and unaffected individuals showing identification of a C to T transition mutation S"that produces a non-conservative amino acid substitution at position 44 of the hMLH1 protein.
Figure 9 is an amino acid sequence alignment (SEQ ID NOS: 124- 131) of the highly-conserved region of the MLH family of proteins surrounding "2 the site of the predicted amino acid substitution. Bold type indicates the position of the predicted serine to phenylalanine amino acid substitution in affected individuals. Also highlighted are the serine or alanine residues conserved at this position in MutL-like proteins. Bullets indicate positions of highest amino acid conservation. For the MLH1 protein, the dots indicate that the sequence has not been obtained. Sequences were aligned as described below in reference to the phylogenetic tree of Figure 6.
Figure 10 shows the entire nucleotide sequence for hPMS1 (SEQ ID NO: 132).
Figure 11 is an alignment of the predicted amino acid sequences for human and yeast PMS1 proteins (SEQ ID NOS: 133 and 134, respectively).
Amino acid identities are indicated by boxes and gaps are indicated by dashes.
11 Figure 12 is a partial nucleotide sequence of mouse MLH1 (mMLHI) cDNA (SEQ ID NO: 135).
Figure 13 is a comparison of the predicted amino acid sequence for mMLHI and hMLH1 proteins (SEQ ID NOS: 136 and 5, respectively).
Figure 14 shows the cDNA nucleotide sequence for mouse PMSI (mPMS1) (SEQ ID NO: 137).
Figure 15 is a comparison of the predicted amino acid sequences for mPMSI and hPMSL proteins (SEQ ID NOS: 138 and 133, respectively).
Definitions PMS1 "PMSI" is the name used throughout the specification including the claims for the DNA mismatch repair gene, sequence ID No. 132, shown in Fig. After the filing date of this application, some publications in the field refer to the same gene as "PMS2". Therefore, it should be noted that the mismatch repair gene referred to in some literature as "PMS2" is the same gene referred to and claimed in this patent specification as "PMSI". Similarly, the protein derived from the PMSI gene, which we refer to in this specification, as "PMSl", identified as sequence ID No. 133, and shown in Fig. 15, is referred to in literature published after the filing date of this application, as "PMS2".
gene "Gene" means a nucleotide sequence that contains a complete coding 20 sequence. Generally, "genes" also include nucleotide sequences found upstream promoter sequences, enhancers, etc.) or downstream transcription termination signals, polyadenylation sites, etc.) of the coding sequence that affect the expression of the encoded polypeptide.
gene product A "gene product" is either a DNA or RNA (mRNA) copy of a portion of a gene, or a corresponding amino acid sequence translated from mRNA.
wild type The term "wild type", when applied to nucleic acids and proteins of the present invention, means a version of a nucleic acid or protein that functions in a manner indistinguishable from a naturally-occurring, normal version of that nucleic acid or protein a nucleic acid or protein with wild-type activity). For example, a "wild-type" allele of a mismatch repair gene is capable of functionally replacing a normal, endogenous copy of the same gene within a host cell without detectably altering mismatch repair in that cell. Different wild-type versions of Ila the same nucleic acid or protein may or may not differ structurally from each other.
non-wild type The term "non-wild type" when applied to nucleic acids and proteins of the present invention, means a version of a nucleic acid or protein that *a a a. a functions in a manner distinguishable from a naturally-occurring, normal version of that nucleic acid or protein. Non-wild-type alleles of a nucleic acid of the invention may differ structurally from wild-type alleles of the same nucleic acid in any of a variety of ways including, but not limited to, differences in the amino acid sequence of an encoded polypeptide and/or differences in expression levels of an encoded nucleotide transcript of polypeptide product.
For example, the nucleotide sequence of a non-wild-type allele of a nucleic acid of the invention may'differ from that of a wild-type allele by, for example, addition, deletion, substitution, and/or rearrangement of nucleotides.
Similarly, the amino acid sequence of a non-wild-type mismatch repair protein may differ from that of a wild-type mismatch repair protein by, for example, addition, substitution, and/or rearrangement of amino acids.
oooo Particular non-wild-type nucleic acids or proteins that, when introduced into a normal host cell, interfere with the endogenous mismatch repair .oo.
15 pathway, are termed "dominant negative" nucleic acids or proteins.
0:000: homologous The term "homologous" refers to nucleic acids or polypeptides that .are highly related at the level of nucleotide or amino acid sequence. Nucleic acids or polypeptides that are homologous to each other are termed "homologues".
The term "homologous" necessarily refers to a comparison between :two sequences. In accordance with the invention, two nucleotide sequences are considered to be homologous if the polypeptides they encode are at least about 50-60% identical, preferably about 70% identical, for at least one stretch of at least 20 amino acids. Preferably, homologous nucleotide sequences are also characterized by the ability to encode a stretch of at least 4-5 uniquely specified amino acids. Both the identity and the approximate spacing of these amino acids relative to one another must be considered for nucleotide sequences to be considered to be homologous. For nucleotide sequences less than 60 nucleotides in length, homology is determined by the ability to encode a stretch of at least uniquely specified amino acids.
upstream/downstream The terms "upstream" and "downstream" are artunderstood terms referring to the position of an element of nucleotide sequence.
"Upstream" signifies an element that is more 5' than the reference element.
"Downstream" refers to an element that is more 3' than a reference element.
intron/exon The terms "exon" and "intron" are art-understood terms referring to various portions of genomic gene sequences. "Exons" are those portions of a genomic gene sequence that encbde protein. "Introns" are sequences of nucleotides found between exons in genomic gene sequences.
affected The term "affected", as used herein, refers to those members of a kindred that either have developed a characteristic cancer colon cancer in an HNPCC lineage) and/or are predicted, on the basis of, for example, genetic studies, to carry an inherited mutation that confers susceptibility to cancer.
unique A "unique" segment, fragment or portion of a gene or protein means a portion of a gene or protein which is different sequentially from any other gene or protein segment in an individual's genome. As a practical matter, a unique segment or fragment of a gene will typically be a nucleotide of at least about 13 0 bases in length and will be sufficiently different from other gene segments so that oligonucleotide primers may be designed and used to selectively and specifically amplify the segment. A unique segment of a protein is typically an amino acid sequence which can be translated from a unique segment of a gene.
References The following publications are referred to by number in the text of the application. Each of the publications is incorporated here by reference.
1. Fishel, et al. Cell 75, 1027-1038 (1993).
2. Leach, et al. Cell 75, 1215-1225 (1993).
3. Lindblom, Tannergard, PI, Werelius, B. Nordenskjold, M. Nature Genetics 5, 279-282 (1993).
4. Prolla, Christie, D.M. Liskay, R.M. Molec. and Cell. Biol. 14, 407- 415 (1994).
Strand, M. Prolla, Liskay, R-M. Petes, T.D. Nature 365, 274-276 (1993).
6. Aaltonen, et al. Science 260, 812-816 (1993).
7. Han, Yanagisawa, Kato, Park, J.G. Nakamura, Y. Cancer 53, 5087-5089 (1993).
8. Ionov, Peinado, M.A. Malkhosyan, Shibata, D. Perucho, M.
Nature 363, 558-561 (1993).
9. Risinger, J.1. et al. Cancer 53, 5100-5103 (1993).
Thibodeau, Bren, G. Shaid, D. Science 260, 816-819 (1993).
1t 11. Levinson, G. Gutman, G.A. Nucleic Acids Res. 15, 5323-5338 (1987).
12. Parsons, et al. Cell 75, 1227-1236 (1993).
13. Modrich, P. Ann. Rev, of Genet. 25, 229-53 (1991).
14. Reenan, R.A. Kolodner, R.D. Genetics 132, 963-73 (1992).
Bishop, Anderson, J. Kolodner, R.D. PNAS 86, 3713-3717 (1989).
00-1516. Kramer, Kramer, Williamson, M.S. Fogel, S. J. Bacteriol. 171, 5339-5346 (1989).
17. Williamson, Game, J.C. Fogel, Genetics 110, 609-646 (1985).
18. Prudhomme, Martin, Mejean, V. Claverys, J. J. Bacteriol. 171, 5332-5338 (1989).
19. Mankovich, JA., McIntyre, CA. Walker, G.C. J. Bacteriol. 171, 5325- 5331 (1989).
Lichter, et al. Science 247, 64-69 (1990).
21. Boyle, Feltquite, Dracopoli, Housman, D. Ward, D.C.
Genomnics 12, 106-115 (1992).
22. Lyon, M.F. Kirby, Mouse Genomne 91, 40-80 (1993).
23. Reenan, R.A. Kolodner, R.D. Genetics 132, 975-85 (1992).
24. Latif, F. et al. Cancer Research 52, 145 1-1456 (1992).
Naylor, Johnson, Minna, J.D. Sakaguchi, A.Y. Nature 329, 451-454 (1987).
26. Ali, Lidereau, R. Callahan, R. Journal of the National Cancer Institute 81, 1815-1820 (1989).
27. Higgins, Bleasby, A. Fuchs, R. Comput. Apple Biosci. 8, 189-191 (1992).
28. Fields, S. Song, O.K. Nature 340, 245-246 (1989).
29. Lynch, et al. Gastroenterology 104, 1535-1549 (1993).
30. Elledge, SJ., Mulligan, Ramer, Spottswood, M. Davis, R.W.
Proc. Natl. Acad. Sci. U.S.A. 88, 1731-1735 (1991).
31. Frohman, M. Amplifications, a forum for PCR users 1, 11-15 (1990).
32. Powell, et al. New England Journal of Medicine 329, 1982-1987 (1993).
33. Wu, Nozari, G. Schold, Conner, BJ. Wallace, R.B. DNA 8, 135-142 (1989).
34. Mullis, K.E.B. Faloona, F.A. Methods in Enzymology 155, 335-350 (1987).
35. Bishop, Thomas, H. Cancer Sur. 9, 585-604 (1990).
36. Capecchi, M.R. Scientific American 52-59 (March 1994).
37. Erlich, HA. PCR Technology, Principles and Applications for DNA Amplification (1989).
38. Papadopoulos et al. Science 263, 1625-1629 (March 1994).
39. Nicolaides et al. Nature 371, 75-80 (September 1994).
40. Tong et al. Anal. Chem. 64, 2672-2677 (1992).
41. Debuire et al. Clin. Chem. 39, 1682-5 (1993).
42. Wahlberg et al. Electrophoresis 13, 547-551 (1992).
43. Kaneoka et al. Biotechniques 10, 30, 32, 34 (1991).
44. Huhman et al. Biotechniques 10, 84-93 (1991).
45. Hultman et al. Nuc. Acid. Res. 17, 4937-46 (1989).
46. Zu et al. Mutn. Res. 288, 232-248 (1993).
47. Espelund et al. Biotechniques 13, 74-81 (1992).
48. Prolla et al. Science 265, 1091-1093 (1994).
49. Bishop et al. Mol. Cell. Biol. 6, 3401-3409 (1986).
50. Folger et al. Mol. Cell. Biol. 5, 70-74 (1985).
51. T.C. Brown et al. Cell 54, 705-711 (1988).
52. T.C. Brown et al. Genome 31, 578-583 (1989).
53. C. Muster-Nassal et al. Proc. Natl. Acad. Sci. U.S.A. 83, 7618-7622 (1986).
54. I. Varlet et al. Proc. Natl. Acad. Sci. U.S.A. 87, 7883-7887 (1990).
D.C. Thomas et al. J. Biol. Chem. 266, 3744-3751 (1991).
56. JJ. Holmes et al. Proc. Natl. Acad. Sci. U.S.A. 87, 5837-5841 (1990).
57. P. Branch et al. Nature 362, 652-654 (1993).
58. A. Kat et al. Proc. Natl. Acad. Sci. U.S.A. 90, 6424-6428 (1993).
59. K. Wiebauer et al. Nature 339, 234-236 (1989).
K. Wiebauer et al. Proc. Natl. Acad. Sci. U.S.A. 87, 5842-5845 (1990).
61. P. Neddermann et al. J. Biol. Chem. 268, 21218-24 (1993).
62. Kramer et al. Mol. Cell Biol. 9:4432-40 (1989).
63. Kramer et al. J. Bacteriol. 171:5339-5346 (1989).
Description of the Invention We have discovered mammalian genes which are involved in DNA &15 mismatch repair. One of the genes, hPMS1, encodes a protein which is homologous to the yeast DNA mismatch repair protein PMS1. We have mapped the locations of hPMSI to human chromosome 7 and the mouse PMS1 gene to mouse chromosome 5, band G. Another gene, hMLH1 (MutL Homolog) encodes a protein which is homologous to the yeast DNA mismatch repair protein MLH1.
2:.0 We have mapped the locations of hMLHI to human chromosome 3p21.3-23 and to mouse chromosome 9, band E.
Studies1 2 have demonstrated involvement of a human DNA mismatch repair gene homolog, hMSH2, on chromosome 2p in HNPCC. Based upon linkage data, a second HNPCC locus has been assigned to chromosome 3p21-23.
3 Examination of tumor DNA from the chromosome 3-linked kindreds revealed dinucleotide repeat instability similar to that observed for other HNPCC families 6 and several types of sporadic tumors.
7 1 0 Because dinucleotide repeat instability is characteristic of a defect in DNA mismatch repair, 5, 11. 12 we reasoned that HNPCC linked to chromosome 3p21-23 could result from a mutation in a second DNA mismatch repair gene.
Repair of mismatched DNA in Escherichia coli requires a number of genes including mutS, mutL and mutH, defects in any one of which result in 17 elevated spontaneous mutation rates.
13 Genetic analysis in the yeast Saccharomyces cerevisiae has identified three DNA mismatch repair genes: a mutS homolog, MSH2, 1 4 and two mutL homologs, PMS1 1 6 and MLH1.
4 Each of these three genes play an indispensable role in DNA replication fidelity, including the stabilization of dinucleotide repeats.
We believe that hMLH1 is the HNPCC gene previously linked to chromosome 3p based upon the similarity of the hMLH1 gene product to the yeast DNA mismatch repair proteif, MLH1, 4 the coincident location of the hMLH1 gene and the HNPCC locus on chromosome 3, and hMLHI missense 1" mutations which we found in affected individuals from chromosome 3-linked HNPCC families.
Our knowledge of the human and mouse MLH1 and PMS1 gene .structures has many important uses. The gene sequence information can be used to screen individuals for cancer risk. Knowledge of the gene structures makes it possible to easily design PCR primers which can be used to selectively amplify portions of hMLH1 and hPMS1 genes for subsequent comparison to the normal sequence and cancer risk analysis. This type of testing also makes it possible to search for and characterize hMLH1 and hPMS1 cancer-linked mutations for the purpose of eventually focusing the cancer screening effort on specific gene loci.
Specific characterization of cancer-linked mutations in hMLH1 and hPMS1 makes possible the production of other valuable diagnostic tools such as allele specific probes which may be used in screening tests to determine the presence or absence of specific gene mutations.
Additionally, the gene sequence information for hMLHI and/or hPMS1 can be used, for example, in a two hybrid system, to search for other genes of related function which are candidates for cancer involvement.
The hMLH1 and hPMS1 gene structures are useful for making proteins which are used to develop antibodies directed to specific portions or the complete hMLH1 and hPMS1 proteins. Such antibodies can then be used to isolate the corresponding protein and possibly related proteins for research and diagnostic purposes.
18 The mouse MLH1 and PMS1 gene sequences are useful for producing mice that have mutations in the respective gene. The mutant mice are useful for studying the gene's function, particularly its relationship to cancer.
Methods for Isolating and Characterizing Mammalian MLH1 and PMS1 Genes We have isolated and characterized four mammalian genes, i.e., human MLH1 (hMLH1), human PMS1 (hPMS1), mouse MLH1 (mPMS1) and mouse PMS1 (mPMS1). Due to the structural similarity between these genes, the methods we have employed to isolate and characterize them are generally the same. Figure 1 shows in broad terms, the experimental approach which we used to isolate and characterize the four genes. The following discussion refers to the step-by-step procedure shown in Figure 1.
Step 1 Design of degenerate oligonucleotide pools for PCR Earlier reports indicated that portions of three MutL-like proteins, two from bacteria, MutL and HexB, and one from yeast, PMS1 are highly conserved.
1 6 18 19 After inspection of the amino acid sequences of HexB, MutL and PMS1 proteins, as shown in Figure 2, we designed pools of degenerate oligonucleotide pairs corresponding to two highly-conserved regions, KELVEN .0 and GFRGEA, of the MutL-like proteins. The sequences (SEQ ID NOS: 139 and 140, respectively) of the degenerate oligonucleotides which we used to isolate the four genes are: 5'-CTTGATTCTAGAGC(T/C)TCNCCNC(T/G)(A/G)AANCC-3' and 5'-AGGTCGGAGCTCAA(A/G)GA(A/G)(T/C)TNGTNGANAA-3'.
The underlined sequences within the primers are XbaI and SacI restriction endonuclease sites respectively. They were introduced in order to facilitate the cloning of the PCR-amplified fragments. In the design of the oligonucleotides, we took into account the fact that a given amino acid can be coded for by more than one DNA triplet (codon). The degeneracy within these sequences are indicated by multiple nucleotides within parentheses or N, for the presence of any base at that position.
Step 2 Reverse transcription and PCR on poly A+ selected mRNA isolated from human cells We isolated messenger (poly A+ enriched) RNA from cultured human cells, synthesized double-stranded cDNA from the mRNA, and performed PCR with the degenerate oligonucleotides.
4 After trying a number of different PCR conditions, for example, adjusting the annealing temperature, we successfully amplified a DNA of the size predicted (-210bp) for a MutL-like protein.
Step 3 Cloning and sequencing of PCR-generated fragments; identification of two gene fragments representing human PMS1 and MLH1 We isolated the PCR amplified material (-210bp) from an agarose gel and cloned this material into a plasmid (pUC19). We determined the DNA 0 sequence of several different clones. The amino acid sequence inferred from the DNA sequence of two clones showed strong similarity to other known MutL-like proteins.
4 1 6 1 819 The predicted amino acid sequence for one of the clones was most similar to the yeast PMS1 protein. Therefore we named it hPMS1, for human PMS1. The second clone was found to encode a polypeptide that most closely resembles yeast MLH1 protein and was named, hMLH1, for human MLH1.
*o.29 Step 4 Isolation of complete human and mouse PMS1 and MLH1 cDNA clones using the PCR fragments as probes We used the 210bp PCR-generated fragments of the hMLH1 and hPMS1 cDNAs, as probes to screen both human and mouse cDNA libraries (from Stratagene, or as described in reference 30). A number of cDNAs were isolated that corresponded to these two genes. Many of the cDNAs were truncated at the end. Where necessary, PCR techniques 31 were used to obtain the 5' -end of the gene in addition to further screening of cDNA libraries. Complete composite cDNA sequences were used to predict the amino acid sequence of the human and mouse, MLH1 and PMS1 proteins.
Step 5 Isolation of human and mouse, PMS1 and MLH1 genomic clones Information on genomic and cDNA structure of the human MLH1 and PMS1 genes are necessary in order to thoroughly screen for mutations in cancer prone families. We have used human cDNA sequences as probes to isolate the genomic sequences of human PMS1 and MLH1. We have isolated four cosmids and two P1 clones for hPMS1, that together are likely to contain most, if not all, of the cDNA (exon) sequence. For hMLH1 we have isolated four overlapping X-phage clones containing 5'-MLH1 genomic sequences and four P1 clones (two full length clones and two which include the 5' coding end plus portions of the promoter region) P1 clone. PCR analysis using pairs of oligonucleotides specific to the 5' and 3' ends of the hMLH1 cDNA, clearly indicates that the P1 clone contains the complete hMLHI cDNA information.
Similarly, genomic clones for mouse PMS1 and MLH1 genes have been isolated *:0*I0 and partially characterized (described in Step 8).
Step 6 Chromosome positional mapping of the human and mouse, PMS1 and MLH1 genes by fluorescence in situ hybridization We used genomic clones isolated from human and mouse PMS1 and MLH1 for chromosomal localization by fluorescence in situ hybridization 0 (FISH).
2 0 21 We mapped the human MLH1 gene to chromosome 3p21.3-23, shown in Figure 7 as discussed in more detail below. We mapped the mouse MLH1 gene to chromosome 9 band E, a region of synteny between mouse and human.
22 In addition to FISH techniques, we used PCR with a pair of hMLHI-specific oligonucleotides to analyze DNA from a rodent/human somatic cell hybrid mapping panel (Coriell Institute for Medial Research, Camden, Our PCR results with the panel clearly indicate that hMLH1 maps to chromosome 3. The position of hMLHI 3p21.3-23 is coincident to a region known to harbor a second locus for HNPCC based upon linkage data.
We mapped the hPMS1 gene, as shown in Figure 12, to the long (q) arm of chromosome 7 (either 7q11 or 7q22) and the mouse PMS1 to chromosome band G, two regions of synteny between the human and the mouse.
22 We performed PCR using oligonucleotides specific to hPMS1 on DNA from a 21 rodent/human cell panel. In agreement with the FISH data, the location of hPMS1 was confirmed to be on chromosome 7. These observations assure us that our human map position for hPMS1 to chromosome 7 is correct. The physical localization of hPMS1 is useful for the purpose of identifying families which may potentially have a cancer linked mutation in hPMS1.
Step 7 Using genomic and cDNA sequences to identify mutations in hPMS1 and hMLH1 genes from HNPCC Families We have analyzed samples collected from individuals in HNPCC families for the purpose of identifying mutations in hPMS1 or hMLHI genes. Our c" approach is to design PCR primers based on our knowledge of the gene structures, to obtain exon/intron segments which we can compare to the known normal sequences. We refer to this approach as an "exon-screening".
:Using cDNA sequence information we have designed and are continuing to design hPMS1 and hMLH1 specific oligonucleotides to delineate :150 exon/intron boundaries within genomic sequences. The hPMS1 and hMLH1 specific oligonucleotides were used to probe genomic clones for the presence of S"exons containing that sequence. Oligonucleotides that hybridized were used as primers for DNA sequencing from the genomic clones. Exon-intron junctions were identified by comparing genomic with cDNA sequences.
Amplification of specific exons from genomic DNA by PCR and sequencing of the products is one method to screen HNPCC families for mutations.
12 We have identified genomic clones containing hMLH1 cDNA information and have determined the structures of all intron/exon boundary regions which flanks the 19 exons of hMCH1.
We have used the exon-screening approach to examine the MLH1 gene of individuals from HNPCC families showing linkage to chromosome 3.
3 As will be discussed in more detail below, we identified a mutation in the MLH1 gene of one such family, consisting of a C to T substitution. We predict that the C to T mutation causes a serine to phenylalanine substitution in a highlyconserved region of the protein. We are continuing to identify HNPCC families from whom we can obtain samples in order to find additional mutations in hMLH1 and hPMS1 genes.
We are also using a second approach to identify mutations in hPMS1 and hMLH1. The approach is to design hPMS1 or hMLHI specific oligonucleotide primers to produce first-strand cDNA by reverse transcription off RNA. PCR using gene-specific primers will allow us to amplify specific regions from these genes. DNA sequencing of the amplified fragments will allow us to detect mutations.
Step 8 Design targeting vectors to disrupt mouse PMS1 and MLH1 genes in ES cells; study mice deficient in mismatch repair.
We constructed a gene targeting vector based on our knowledge of 1 r the genomic mouse PMS1 DNA structure. We used the vector to disrupt the PMS1 gene in mouse embryonic stem cells.
36 The cells were injected into mouse blastocysts which developed into mice that are chimeric (mixtures) for cells carrying the PMS1 mutation. The chimeric animals will be used to breed mice that are heterozygous and homozygous for the PMS1 mutation. These mice will be useful for studying the role of the PMS1 gene in the whole organism.
Human MLH1 The following discussion is a more detailed explanation of our experimental work relating to hMLH1. As mentioned above, to clone mammalian M• MLH genes, we used PCR techniques like those used to identify the yeast MSH1, MSH2 and MLH1 genes and the human MSH2 gene.
1 2 4 14 As template in the PCR, we used double-stranded cDNA synthesized from poly enriched RNA prepared from cultured primary human fibroblasts. The degenerate oligonucleotides were targeted at the N-terminal amino acid sequences KELVEN and GFRGEA (see Figure two of the most conserved regions of the MutL family of proteins previously described for bacteria and yeast.
16 1 8 1 9 Two PCR products of the predicted size were identified, cloned and shown to encode a predicted amino acid sequence with homology to MutL-like proteins. These two fragments generated by PCR were used to isolate human cDNA and genomic DNA clones.
The oligonucleotide primers which we used to amplify human MutLrelated sequences were 5' 23 CTGATTCTAGAGC(T/C)TCNCCNC(T/G)(A/G)AANCC-3' (SEQ ID NO: 139) and 5' AGGTCGGAGCTCAA(A/G)GA(A/G)(T/C)TNGTNGANAA-3' (SEQ ID NO: 140). PCR was carried out in 50 pL reactions containing cDNA template, 1.0 /M each primer, 5 IU of Taq polymerase 50 mM KC1, 10 mM Tris buffer pH 7.5 and 1.5 mM MgCl. PCR was carried out for 35 cycles of 1 minute at 94 0 C 1 minute at 43°C and 1.5 minutes at 62°C. Fragments of the expected size, approximately 212 bp, were cloned into pUC19 and sequenced.
The cloned MLH1 PCR products were labeled with a random primer labeling kit (RadPrime, Gibco BRL) and used to probe human cDNA and genomic cosmid libraries by standard procedures. DNA sequencing of double-stranded plasmid DNAs was performed as previously described.' The hMLH1 cDNA nucleotide sequence as shown in Figure 3 encodes an open reading frame of 2268 bp. Also shown in Figure 3 is the predicted protein sequence encoded for by the hMLHI cDNA. The underlined DNA sequences are the regions of cDNA that correspond to the degenerate PCR primers that were originally used to amplify a portion of the MLH1 gene (nucleotides 118-135 and 343-359).
Figure 4A shows 19 nucleotide sequences corresponding to portions S."of hMLHI. Each sequence includes one of the 19 exons, in its entirety, ".M6 surrounded by flanking intron sequences. Target PCR primer cites are underlined. More details relating to the derivation and uses of the sequences shown in Figure 4A, are set forth below.
As shown in Figure 5, the hMLH1 protein is comprised of 756 amino acids and shares 41% identity with the protein product of the yeast DNA mismatch repair gene, MLH.
4 The regions of the hMLH1 protein most similar to yeast MLH1 correspond to amino acids 11 through 317, showing 55% identity, and the last 13 amino acids which are identical between the two proteins. Figure shows an alignment of the predicted human MLH1 and S. cerevisiae MLH1 protein sequences. Amino acid identities are indicated by boxes, and gaps are indicated by dashes. The pair wise protein sequence alignment was performed Swith DNAStar MegAlign using the clustal method.
27 Pair wise alignment parameters were a ktuple of 1, gap penalty of 3, window of 5 and diagonals of Figure 6 shows a phylogenetic tree of MutL-related proteins. The phylogenetic tree was constructed using the predicted amino acid sequences of 7 MutL-related proteins: human MLH1; mouse MLH1; S. cerevisiae MLH1; S.
cerevisiae PMS1; E. coli; MutL; S. typhimurium MutL and S. pneumoniae HexB.
Required sequences were obtained from GenBank release 7.3. The phylogenetic tree was generated with the PILEUP program of the Genetics Computer Group software using a gap penalty of 3 and a length penalty of 0.1. The recorded DNA sequences of hMLH1 and hPMS1 have been submitted to GenBank.
hMLHI Intron Location and Intron/Exon Boundary Structures In our previous U.S. Patent Application No. 08/209,521, we described the nucleotide sequence of a complimentary DNA (cDNA) clone of a human gene, hMLH1. The cDNA sequence of hMLH1 (SEQ ID NO: 4) is presented in this application in Figure 3. We note that there may be some variability between individuals hMLH1 cDNA structures, resulting from polymorphisms within the human population, and the degeneracy of the genetic •code.
In the present application, we report the results of our genomic sequencing studies. Specifically, we have cloned the human genomic region that go. o includes the hMLHI gene, with specific focus on individual exons and surrounding intron/exon boundary structures. Toward the ultimate goal of designing a comprehensive and efficient approach to identify and characterize mutations which confer susceptibility to cancer, we believe it is important to know the wildtype sequences of intron structures which flank exons in the hMLHI gene. One advantage of knowing the sequence of introns near the exon boundaries, is that it makes it possible to design primer pairs for selectively 'amplifying entire individual exons. More importantly, it is also possible that a mutation in an intron region, which, for example, may cause a mRNA splicing error, could result in a defective gene product, susceptibility to cancer, without showing any abnormality in an exon region of the gene. We believe a comprehensive screening approach requires searching for mutations, not only in the exon or cDNA, but also in the intron structures which flank the exon boundaries.
We have cloned the human genomic region that includes hMLH1 using approaches which are known in the art, and other known approaches could have been used. We used PCR to screen a P1 human genomic library for the hMLH1 gene. We obtained four clones, two that contained the whole gene and two which lacked the C-terminus. We characterized one of the full length clones by cycle sequencing, which resulted in our definition of all intron/exon junction sequences for both sides of the 19 hMLH1 exons. We then designed multiple sets of PCR primers to amplify each individual exon (first stage primers) and verified the sequence of each exon and flanking intron sequence by amplifying several different genomic DNA samples and sequencing the resulting fragments using an ABI 373 sequencer. In addition, we have determined the sizes of each hMLH1 exon using PCR methods. Finally, we devised a set of nested PCR primers 1 (second stage primers) for reamplification of individual exons. We have used the second stage primers in a multi-plex method for analyzing HNPCC families and tumors for hMLHI mutations. Generally, in the nested PCR primer approach, we perform a first multi-plex amplification with four to eight sets of "first stage" primers, each directed to a different exon. We then reamplify individual exons from the product of the first amplification step, using a single set of second stage primers. Examples and further details relating to our use of the first and second stage primers are set forth below.
Through our genomic sequencing studies, we have identified all nineteen exons within the hMLHI gene, and have mapped the intron/exon boundaries. One aspect of the invention, therefore, is the individual exons of the hMLH1 gene. Table 1 presents the nucleotide coordinates the point of insertion of each intron within the coding region of the gene) of the hMLH1 exons (SEQ ID NOS: 25-43). The presented coordinates are based on the hMLH1 cDNA sequence, assigning position to the of the start "ATG" (which A is nucleotide 1 in SEQ ID NO: 4) 0 0
S
26 Table 1 Intron Number cDNA Sequence Coordinates intron 1 116 117 intron 2 207 208 intron 3 306 307 intron 4 380 381 intron 5 453 454 intron 6 545 546 intron 7 592 593 intron 8 677 678 intron 9 790 791 intron 10 884 885 intron 11 1038 1039 intron 12 1409 1410 intron 13 1558 1559 intron 14 1667 1668 intron 15 1731 1732 intron 16 1896 1897 intron 17 1989 1990 intron 18 2103 2104 We have also determined the nucleotide sequence of intron regions which flank exons of the hMLH1 gene. SEQ ID NOS: 6-24 are individual exon sequences bounded by their respective upstream and downstream intron 0 0000
S
0 S
S.
S.1
S..
S
S
0**S9 15
SOSO
5
S
*SSS
SS
2d 27 sequences. The same nucleotide structures are shown in Fig. 4A, where the exons are numbered from N-terminus to C-terminus with respect to the chromosomal locus. The 5-digit numbers indicate the primers used to amplify the exon. All sequences are numbered assuming the A of the ATG codon is nucleotide 1. The numbers in are the nucleotide coordinates of the coding sequence found in the indicated exon. Uppercase is intron. Lowercase is exon or non-translated sequences found in the mRNA/cDNA clone. Lowercase and underlined sequences correspond to primers. The stop codon at 2269-2271 is in italics and underlined.
Table 2 presents the sequences of primer pairs ("first stage" primers) which we have used to amplify individual exons together with flanking intron structures.
Table 2 EXON PRIMER PRIMER PRIMER PRIMER NUCLEOTIDE NO. LOCATION NO. SEQ ID SEQUENCE
NO
1 upstream 18442 44 1 downstream 19109 45 2 upstream 19689 46 2 downstream 19688 47 3 upstream 19687 48 3 downstream 19786 49 4 upstream 18492 50 4 downstream 18421 51 upstream 18313 52 downstream 18179 53 0 EXON PRIMER PRIMER PRIMER PRIMER NUCLEOTIDE NO. LOCATION NO. SEQ, ID SEQUENCE
NO
6 upstream 18318 54 6 downstream 18317 55 7 upstream 19009 56 7downstream 19135 575cataaccttatctccacc 8 upstream 18197 58 8 downstream 18924 59 9 upstream 18765 60 9 downstream 18198 61 upstream 18305 62 downstream 18306 63 11 upstream 18182 64 11 downstream 19041 65 12 upstream 18579 66 12 downstream 18178 67 12 downstream 19070 68 13 upstream 18420 69 13 downstream 18443 70 14 upstream 19028 71 14 downstream 18897 72 upstream 19025 73 0 e 0 0*
S.
00 .i 0* EXON PRIMER PRIMER PRIMER PRIMER NUCLEOTIDE NO. LOCATION NO. SEQ ID SEQUENCE
NO
downstream 18575 74 16 upstream 18184 75 16 downstream 18314 76 17 upstream 18429 77 17 downstream 18315 78 18 upstream 18444 79 18 downstream 18581 80 19 upstream 18638 81 19 downstream 18637 82 Additionally, we have designed a set of "second stage" amplification primers, the structures of which are shown below in Table 3. We use the second stage primers in conjunction with the first stage primers in a nested amplification protocol, as described below.
Table 3 EXON PRIMER PRIMER PRIMER PRIMER NO. LOCATION NO. SEQ ID NUCLEOTIDE NO SEQUENCE 1 upstream 19295 83 gaggtgattggctgaa 1 downstream 19446 84 2 upstream 18685 85 tagagtagttgcaga .5 EXON PRIMER PRIMER PRIMER PRIMER NO. LOCATION NO. SEQ ID NUCLEOTIDE NO SEQUENCE 2 downstream 19067 86 3 upstream 18687 87 aaatgagtaacatgatt 3 downstream 19068 88 4 upstream 19294 89 cctttggtgaggtga 4 downstream 19077 90 upstream 19301 91 tttccccttgggattag downstream 19046 92 t 6 upstream 19711 93 attttcaagtacttctatgaatt 6 downstream 19079 94 *5 cagcaactgttcaatgtatgag cact 7 upstream 19293 95 tgtttttggcaac 7 downstream 19435 96 8 upstream 19329 97 5' tgtaaaacgacggccagtagcc atgagacaataaatccttg 8 downstream 19450 98 9upstream 19608 99 I ttcagaatctctttt EXON PRIMER PRIMER PRIMER PRIMER NO. LOCATION NO. SEQ, ID NUCLEOTIDE NO SEQUENCE 9 downstream 19449 100 upstream 19297 101 gtgtgaatgtacacctgtg downstream 19081 102 tg 11 upstream 19486 103 ctccccctcccacta 11 downstream 19455 104 12 upstream 20546 105 12 downstream 20002 106 5' tgtaaaacgacggccagtgtttg ctcagaggctgc 12 upstream 19829 107 12 downstream 19385 108 5' tgtaaaacgacggccagtttatt acagaataaaggaggtag 13 upstream 19300 109 cacaaaatttggctaag 13 downstream 19078 110 14 upstream 19456 11l 14 downstream 19472 112 tagtagctctgcttg upstream 19697 113 s5 7atttgtcccaactggttgta EXON PRIMER PRIMER PRIMER PRIMER NO. LOCATION NO. SEQ ID NUCLEOTIDE NO SEQUENCE downstream 19466 114 tgaaatgtcagaaagtg 16 upstream 19269 115 16 downstream 19047 116 17 upstream 19298 117 actggagaaatgggatttg 17 downstream 19080 118 aaat 18 upstream 19436 119 18 downstream 19471 120 ggtcctgtcctag 19 upstream 19447 121 19 downstream 19330 122 gaagaacacatcccaca In Table 3 an asteric indicates that the 5' nucleotide is biotinylated. Exons 1-7, 10, 13 and 16-19 can be specifically amplified in PCR reactions containing either 1.5 mM or 3 mM MgCI 2 Exons 11 and 14 can only be specifically amplified in PCR reactions containing 1.5 mM MgCl, and exons 8, 9, 12 and 15 can only be specifically amplified in PCR reactions containing 3 mM MgCL,. With respect to exon 12, the second stage amplification primers have been designed so that exon 12 is reamplified in two halves. The 20546 and 20002 primer set amplifies the N-terminal half. The primer set 19829 and 19835 amplifies the C-terminal half. An alternate primer for 18178 is 19070.
The hMLH1 sequence information provided by our studies and disclosed in this application and preceding related applications, may be used to design a large number of different oligonucleotide primers for use in identifying hMLH1 mutations that correlate with cancer susceptibility and/or with tumor development in an individual, including primers that will amplify more than one exon (and/or flanking intron sequences) in a single product band.
One of ordinary skill in the art would be familiar with considerations important to the design of PCR primers for use to amplify the desired fragment or gene.
37 These considerations may be similar, though not necessarily identical to those involved in design of sequencing primers, as discussed above. Generally it is important that primers hybridize relatively specifically have a Tm of greater than about 55 0 c, and preferably around 600C) For most cases, primers between about 17 and go nucleotides in length work well. Longer primers can be useful for amplifying longer fragments. In all cases, it is desirable to avoid using primers that are complementary to more than one sequence in the human genome, so that each o pair of PCR primers amplifies only a single, correct fragment. Nevertheless, it is only absolutely necessary that the correct band be distinguishable from other product bands in the PCR reaction.
The exact PCR conditions salt concentration, number of cycles, type of DNA polymerase, etc.) can be varied as known in the art to improve, for example, yield or specificity of the reaction. In particular, we have found it valuable to use nested primers in PCR reactions in order to reduce the amount of required DNA substrate and to improve amplification specificity.' Two examples follow. The first example illustrates use of a first stage primer pair (SEQ ID NOS: 69 and 70) to amplify intron/exon segment (SEQ ID NO: 18). The second example illustrates use of second stage primers to amplify a target intron/exon segment from the product of a first PCR amplification step employing first stage primers.
EXAMPLE 1: Amplification of hMLH1 genomic clones from a P1 phage library genomic DNA (or Ing of a P1 phage can be used) was used in PCR reactions including: 0.05mM dNTPs KCl 3mM Mg Tris-HCl pH 0.01% gelatin primers Reactions were performed on a Perkin-Elmer Cetus model 9600 thermal cycler.
Reactions were incubated at 950c for 5 minutes, followed by 35 cycles cycles from a P1 phage) of: 94 0 C for 30 seconds 55 0 C for 30 seconds 7 2 0 C for 1 minute.
A final 7 minute extension reaction was then performed at 72 C Desirable P1 clones were those from which an approximately bp product band was produced.
EXAMPLE 2: Amplification of hMLH1 sequences from genomic DNA using nested PCR primers We performed two-step PCR amplification of hMLH1 sequences from genomic DNA as follows. Typically, the first amplification was performed in a 25 microliter reaction including: 25ng of chromosomal DNA Perkin-Elmer PCR buffer II (any suitable buffer could be used) 3mM MgC12 each dNTP Taq DNA polymerase primers (SEQ ID NOS: 69, and incubated at 95C for 5 minutes, followed by 20 cycles of: 94 C for 30 seconds for 30 seconds.
The product band was typically small enough (less than an approximately 500 bp) that separate extension steps were not performed as part of each cycle. Rather, a single extension step was performed, at 720 C for 7 minutes, after the cycles were completed. Reaction products were stored at 40 C The second amplification reaction, usually 25 or 50 microliters in volume, included: 1 or 2 microliters (depending on the volume of the reaction) of the first amplification reaction product Perkin-Elmer PCR buffer II (any suitable buffer could be used) .1 3mM or MgCl 2 AuM each dNTP Taq DNA polymerase 0% nested primers (SEQ ID NOS: 109, 110), and was incubated at 95 C for 5 minutes, followed by 20-25 cycles of: 15 94 0 C for 30 seconds 0 C for 30 seconds a single extension step was performed, at 72 C for 7 minutes, after the cycles were completed. Reaction products were stored at 40 C.
Any set of primers capable of amplifying a target hMLH1 sequence can be used in the first amplification reaction. We have used each of the primer sets presented in Table 2 to amplify an individual hMLH1 exon in the first amplification reaction. We have also used combinations of those primer sets, S• thereby amplifying multiple individual hMLH1 exons in the first amplification reaction.
The nested primers used in the first amplification step were designed relative to the primers used in the first amplification reaction. That is, where a single set of primers is used in the first amplification reaction, the primers used in the second amplification reaction should be identical to the primers used in the first reaction except that the primers used in the second reaction should not include the 5'-most nucleotides of the first amplification reaction primers, and should extend sufficiently more at the 3' end that the Tm of the second amplification primers is approximately the same as the Tm of the first amplification reaction primers. Our second reaction primers typically lacked the 3 5'-most nucleotides of the first amplification reaction primers, and extended approximately 3-6 nucleotides farther on the 3' end. SEQ ID NOS: 109, 110 are examples of nested primer pairs that could be used in a second amplification reaction when SEQ ID NOS: 69 and 70 were used in the first amplification reaction.
We have also found that it can be valuable to include a standard sequence at the 5' end of one of the second amplification reaction primers to prime sequencing reactions. Additionally, we have found it useful to biotinylate 1 n that last nucleotide of one or both of the second amplification reaction primers so that the product band can easily be purified using magnetic beads 40 and then sequencing reactions can be performed directly on the bead-associated products.
41 -4 For additional discussion of multiplex amplification and sequencing 15 methods, see References by Zu et al. and Espelund et al.
4 47 hMLHI Link to Cancer As a first step to determine whether hMLH1 was a candidate for the HNPCC locus on human chromosome 3p21-23, 3 we mapped hMLH1 by fluorescence in situ hybridization (FISH).
2 0 21 We used two separate genomic fragments (data not shown) of the hMLH1 gene in FISH analysis. Examination of several metaphase chromosome spreads localized hMLH1 to chromosome 3p21.3-23.
Panel A of Figure 7 shows hybridization of hMLH1 probes in a metaphase spread. Biotinylated hMLHI genomic probes were hybridized to banded human metaphase chromosomes as previously described.
2 0.
21 Detection was performed with fluorescein isothiocyanate (FITC)-conjugated avidin (green signal); chromosomes, shown in blue, were counterstained with 4'6-diamino-2phenylindole (DAPI). Images were obtained with a cooled CCD camera, enhanced, pseudocoloured and merged with the following programs: CCD Image Capture; NIH Image 1.4; Adobe Photoshop and Genejoin Maxpix respectively.
Panel B of Figure 7 shows a composite of chromosome 3 from multiple metaphase spreads aligned with the human chromosome 3 ideogram. Region of hybridization (distal portion of 3 p 2 1.
3 23 is indicated in the ideogram by a vertical bar.
As independent confirmation of the location of hMLH1 on chromosome 3, we used both PCR with a pair of hMLH1-specific oligonucleotides and Southern blotting with a hMLHI-specific probe to analyze DNA from the NIGMS2 rodent/human cell panel (Coriell Inst. for Med. Res., Camden, NJ, USA). Results of both techniques indicated chromosome 3 linkage. We also mapped the mouse MLH1 gene by FISH to chromosome 9 band E. This is a position of synteny to human chromosome 3p.
22 Therefore, the hMLHI gene localizes to 3p21.3-23, within the genomic region implicated in chromosome 3linked HNPCC families.
3 Next, we analyzed blood samples from affected and unaffected individuals from two chromosome-3 candidate HNPCC families 3 for mutations.
One family, Family 1, showed significant linkage (lod score 3.01 at recombination fraction of 0) between HNPCC and a marker on 3p. For the second family, Family 2, the reported lod score (1.02) was below the commonly accepted level of significance, and thus only suggested linkage to the same marker S. on 3p. Subsequent linkage analysis of Family 2 with the microsatellite marker 2.0 D3S1298 on 3p21.3 gave a more significant lod score of 1.88 at a recombination fraction of 0. Initially, we screened for mutations in two PCR-amplified exons of the hMLH1 gene by direct DNA sequencing (Figure We examined these two exons from three affected individuals of Family 1, and did not detect any differences from the expected sequence. In Family 2, we observed that four individuals affected with colon cancer are heterozygous for a C to T substitution in an exon encoding amino acids 41-69, which corresponds to a highly-conserved region of the protein (Figure For one affected individual, we screened PCRamplified cDNA for additional sequence differences. The combined sequence information obtained from the two exons and cDNA of this one affected individual represents 95% all but the first 116 bp) of the open reading frame.
We observed no nucleotide changes other than the C to T substitution. In addition, four individuals from Family 2, predicted to be carriers based upon 38 linkage data, and as yet unaffected with colon cancer, were found to be heterozygous for the same C to T substitution. Two of these predicted carriers are below and two are above the mean age of onset (50 years) in this particular family. Two unaffected individuals examined from this same family, both predicted by linkage data to be non carriers, showed the expected normal sequence at this position. Linkage analysis that includes the C to T substitution in Family 2 gives a lod score of 2.23 at a recombination fraction 0. Using low stringency cancer diagnostic criteria, we calculated a lod score of 2.53. These data indicate the C to T substitution shows significant linkage to the HNPCC in Family 2.
Figure 8 shows sequence chromatograms indicating a C to T transition mutation that produces a non-conservative amino acid substitution at position 44 of the hMLH1 protein. Sequence analysis of one unaffected (top panels, plus and minus strands) and one affected individual (lower panels, plus o** 1S and minus strands) is presented. The position of the heterozygous nucleotide is indicated by an arrow. Analysis of the sequence chromatographs indicates that there is sufficient T signal in the C peak and enough A signal in the G peak for the affected individuals to be heterozygous at this site.
To determine whether this C to T substitution was a polymorphism, we sequenced this same exon amplified from the genomic DNA from 48 unrelated individuals and observed only the normal sequence. We have examined s** an additional 26 unrelated individuals using allele specific oligonucleotide (ASO) hybridization analysis.
33 The ASO sequences (SEQ ID NOS: 141 and 142, respectively) which we used are: 5'-ACTTGTGGATTTGC-3' and 5'-ACTTGTGAATITTGC-3'.
Based upon direct DNA sequencing and ASO analysis, none of these 74 unrelated individuals carry the C to T substitution. Therefore, the C to T substitution observed in Family 2 individuals is not likely to be a polymorphism. As mentioned above, we did not detect this same C to T substitution in affected individuals from a second chromosome 3-linked family, Family 1.
3 We are continuing to study individuals of Family 1 for mutations in hMLH1.
39 Table 4 below summarizes our experimental analysis of blood samples from affected and unaffected individuals from Family 2 and unrelated individuals.
Table 4 Number of Individuals with C to T Mutation/ Status Number of Individuals Tested a...9 a *s, a r
A
M
I
L
Y
Affected 4/4 Predicted Carriers 4/4 Predicted Non-carriers 0/2 2 Unrelated Individuals 0/74 Based on several criteria, we suggest that the observed C to T substitution in the coding region of hMLH1 represents the mutation that is the basis for HNPCC in Family 2.
3 First, DNA sequence and ASO analysis did not detect the C to T substitution in 74 unrelated individuals. Thus, the C to T substitution is not simply a polymorphism. Second, the observed C to T substitution is expected to produce a serine to phenylalanine change at position 44 (See Figure This amino acid substitution is a non-conservative change in a conserved region of the protein (Figures 3 and Secondary structure predictions using Chou-Fasman parameters suggest a helix-turn-beta sheet structure with position 44 located in the turn. The observed Ser to Phe substitution, at position 44 lowers the prediction for this turn considerably, suggesting that the predicted amino acid substitution alters the conformation of the hMLH1 protein. The suggestion that the Ser to Phe substitution is a mutation which confers cancer susceptibility is further supported by our experiments which show that an analogous substitution (alanine to phenylalanine) in a yeast MLH1 gene results in a nonfunctional mismatch repair protein. In bacteria and yeast, a mutation affecting DNA mismatch repair causes comparable increases in the rate of spontaneous mutation including additions and deletions within dinucleotide repeats.
4 5 11 1 3 14 1 5 1 6 In humans, mutation of hMSH2 is the basis of chromosome-2 HNPCC, 12 tumors which show microsatellite instability and an apparent defect in mismatch repair.
12 Chromosome 3-linked HNPCC is also associated with instability of dinucleotide repeats.
3 Combined with these observations, the high degree of conservation between the human MLH1 protein 1" and the yeast DNA mismatch repair protein MLH1 suggests that hMLH1 is likely to function in DNA mismatch repair. During isolation of the hMLH1 gene, we identified the hPMS1 gene. This observation suggests that mammalian DNA mismatch repair, like that in yeast, 4 may require at least two MutL-like proteins.
It should be noted that it appears that different HNPCC families 1' show different mutations in the MLH1 gene. As explained above, affected individuals in Family 1 showed "tight linkage" between HNPCC and a locus in the region of 3p21-23. However, affected individuals in Family 1 do not have the C to T mutation found in Family 2. It appears that the affected individuals in Family 1 have a different mutation in their MLH1 gene. Further, we have used the structure information and methods described in this application to find and characterize another hMLH1 mutation which apparently confers cancer susceptibility in heterozygous carriers of the mutant gene in a large English HNPCC family. The hMLH1 mutation in the English family is a 1 T frameshift which is predicted to lead to the synthesis of a truncated hMLH1 protein. Unlike, for example, sickle cell anemia, in which essentially all known affected individuals have the same mutation multiple hMLH1 mutations have been discovered and linked to cancer. Therefore, knowledge of the entire cDNA sequence for hMLH1 (and probably hPMS1), as well as genomic sequences particularly those that surround exons, will be useful and important for characterizing mutations in families identified as exhibiting a high frequency of cancer.
Subsequent to our discovery of a cancer conferring mutation in hMLH1, studies by others have resulted in the characterization of at least additional mutations in hMLH1, each of which appears to have conferred cancer susceptibility to individuals in at least one HNPCC family. For example, Papadopoulos et al. indentified such as a mutation, characterized by an in-frame deletion of 165 base pairs between codons 578 to 632. In another family, Papadopoulos et al. observed an hMLH1 mutation, characterized by a frame shift and substitution of new amino acids, namely, a 4 base pair deletion between codons 727 and 728. Papadopoulos et al. also reports an hMLHI cancer linked mutation, characterized by an extensibn of the COOH terminus, namely, a 4 base pair insertion between codons 755 and 756.
38 In summary, we have shown that DNA mismatch repair gene hMLHI which is likely to be the hereditary nonpolyposis colon cancer gene previously localized by linkage analysis to chromosome 3p21-23.
3 Availability of the hMLH1 gene sequence will facilitate the screening of HNPCC families for cancer-linked mutations. In addition, although loss of heterozygosity (LOH) of linked markers is not a feature of either the 2p or 3p forms of HNPCC, 3 6
LOH
involving the 3p21.3-23 region has been observed in several human cancers.
2 26 This suggests the possibility that hMLH1 mutation may play some role in these tumors.
*o g Human PMS1 Human PMS1 was isolated using the procedures discussed with reference to Figure 1. Figure 10 shows the entire hPMS1 cDNA nucleotide sequence. Figure 11 shows an alignment of the predicted human and yeast PMS1 protein sequences. We determined by FISH analysis that human PMS1 is located on chromosome 7. Subsequent to our discovery of hPMS1, others have identified mutations in the gene which appear to confer HNPCC susceptibility.
39 Mouse MLH1 Using the procedure outlined above with reference to Figure 1, we have determined a partial nucleotide sequence of mouse MLH1 cDNA, as shown in Figure 12 (SEQ ID NO: 135). Figure 13 shows the corresponding predicted amino acid sequence for mMLH1 protein (SEQ ID NO: 136) in comparison to 42 the predicted hMLH1 protein sequence (SEQ ID NO: Comparison of the mouse and human MLH1 proteins as well as the comparison of hMLH1 with yeast MLH1 proteins, as shown in Figure 9, indicate a high degree of conservation.
Mouse PMS1 Using the procedures discussed above with reference to Figure 1, we isolated and sequenced the mouse PMS1 gene, as shown in Figure 14 (SEQ ID NO: 137). This cDNA sequence encodes a predicted protein of 864 amino acids (SEQ ID NO: 138), as shown in Figure 15, where it is compared to the predicted amino acid sequence for hPMS1 (SEQ ID NO: 133). The degree of identity between the predicted mouse and human PMS1 proteins is high, as would be expected between two mammals. Similarly, as noted above, there is a strong similarity between the human PMS1 protein and the yeast DNA mismatch repair protein PMS1, as shown in Figure 11. The fact that yeast PMS1 and MLH1 function in yeast to repair DNA mismatches, strongly suggests that human and mice PMS1 and MLH1 are also mismatch repair proteins.
o. Uses for Mouse MLH1 and PMS1 We believe our isolation and characterization of mMLHI and mPMS1 genes will have many research applications. For example, as already discussed above, we have used our knowledge of the mPMS1 gene to produce antibodies which react specifically with hPMS1. We have already explained that antibodies directed to the human proteins, MLH1 or PMS1 may be used for both research purposes as well as diagnostic purposes.
We also believe that our knowledge of mPMS1 and mMLHI will be useful for constructing mouse models in order to study the consequences of DNA ,mismatch repair defects. We expect that mPMS1 or mMLHI defective mice will be highly prone to cancer because chromosome 2p and 3p-associated HNPCC are each due to a defect in a mismatch repair gene.
12 As noted above, we have already produced chimeric mice which carry an mPMSI defective gene. We are currently constructing mice heterozygous for mPMS1 or mMLH1 mutation. These 43 heterozygous mice should provide useful animal models for studying human cancer, in particular HNPCC. The mice will be useful for analysis of both intrinsic and extrinsic factors that determine cancer risk and progression. Also, cancers associated with mismatch repair deficiency may respond differently to conventional therapy in comparison to other cancers. Such animal models will be useful for determining if differences exist, and allow the development of regimes for the effective treatment of these types of tumors. Such animal models may also be used to study the relationship between hereditary versus dietary factors in carcinogenesis.
Distinguishing Mutations From Polymorphisms For studies of cancer susceptibility and for tumor identification and characterization, it is important to distinguish "mutations" from "polymorphisms".
A "mutation" produces a "non-wild-type allele" of a gene. A non-wild-type allele 15 of a gene produces a transcript and/or a protein product that does not function normally within a cell. "Mutations" can be any alteration in nucleotide sequence including insertions, deletions, substitutions, and rearrangements.
"Polymorphisms", on the other hand, are sequence differences that i are found within the population of normally-functioning "wild-type") genes.
Some polymorphisms result from the degeneracy of the nucleic acid code. That is, given that most amino acids are encoded by more than one triplet codon, many different nucleotide sequences can encode the same polypeptide. Other polymorphisms are simply sequence differences that do not have a significant effect on the function of the gene or encoded polypeptide. For example, polypeptides can often tolerate small insertions or deletions, or "conservative" substitutions in their amino acid sequence without significantly altering function of the polypeptide.
"Conservative" substitutions are those in which a particular amino acid is substituted by another amino acid of similar chemical characteristics. For example, the amino acids are often characterized as "non-polar (hydrophobic)" including alanine, leucine, isoleucine, valine, proline, phenylaline, tryptophan, and methionine; "polar neutral", including glycine, serine, threonine, cysteine, tyrosine, 44 asparagine, and glutamine; "positively charged (basic)", including arginine, lysine, and histidine; and "negatively charged (acidic)", including aspartic acid and glutamic acid. A substitution of one amino acid for another amino acid in the same group is generally considered to be "conservative", particularly if the side groups of the two relevant amino acids are of a similar size.
The first step in identifying a mutation or polymorphism in a mismatch repair gene sequence involves identification, using available techniques including those described herein, of a mismatch repair gene, (or gene fragment) sequence that differs from a known, normal wild-type) sequence of the same mismatch repair gene (or gene fragment). For example, a hMLH1 gene (or gene fragment) sequence could be identified that differs in at least one nucleotide position from a known normal wild-type) hMLH1 sequence such as any of SEQ ID NOS: 6-24.
Mutations can be distinguished from polymorphisms using any of a variety of methods, perhaps the most direct of which is data collection and correlation with tumor development. That is, for example, a subject might be identified whose hMLH1 gene sequence differs from a sequence reported in SEQ ID NOS: 6-24, but who does not have cancer and has no family history of cancer. Particularly if other, preferably senior, members of that subject's family have hMLH1 gene sequences that differ from SEQ ID NOS: 6-24 in the same way(s), it is likely that subject's hMLH1 gene sequence could be categorized as S"a "polymorphism". If other, unrelated individuals are identified with the same IhMLH1 gene sequence and no family history of cancer, the categorization may be confirmed.
Mutations that are responsible for conferring genetic susceptibility to cancer can be identified because, among other things, such mutations are likely to be present in all tissues of an affected individual and in the germ line of at least one of that individual's parents, and are not likely to be found in unrelated families with no history of cancer.
When distinguishing mutations from polymorphisms, it can sometimes be valuable to evaluate a particular sequence difference in the presence of at least one known mismatch repair gene mutation. In some instances, a particular sequence change will not have a detectable effect will appear to be a polymorphism) when assayed alone, but will, for example, increase the penetrance of a known mutation, such that individuals carrying both the apparent polymorphism difference and a known mutation have higher probability of developing cancer than do individuals carrying only the mutation. Sequence differences that have such an effect are properly considered to be mutations, albeit weak ones.
As discussed above and previously Patent Application Nos.
08/168,877 and 08/209,521), mutations in mismatch repair genes or gene products I) produced non-wild-type versions of those genes or gene products. Some mutations can therefore be distinguished from polymorphisms by their functional characteristics in in vivo or in vitro mismatch repair assays. Any available ."..mismatch repair assay can be used to analyze these characteristics.
49 It is generally desirable to utilize more than one mismatch repair assay before classifying a sequence change as a polymorphism, since some mutations will have effects that will not be observed in all assays.
For example, a mismatch repair gene containing a mutation would not be expected to be able to replace an endogenous copy of the same gene in a host cell without detectably affecting mismatch repair in that cell; whereas a mismatch repair gene containing a sequence polymorphism would be expected to be able to replace an endogenous copy of the same gene in a host cell without detectably affecting mismatch repair in that cell. We note that for such "replacement" studies, it is generally desirable to introduce the gene to be tested into a host cell of the same (or at least closely related) species as the cell from which the test gene was derived, to avoid complications due to, for example, the inability of a gene product from one species to interact with other mismatch repair gene products from another species. Similarly, a mutant mismatch repair protein would not be expected to function normally in an in vitro mismatch repair system (preferably from a related organism); whereas a polymorphic mismatch repair protein would be expected to function normally.
The methods described herein and previously allow identification of different kinds of mismatch repair gene mutations. The following examples illustrate protocols for distinguishing mutations from polymorphisms in DNA mismatch repair genes.
EXAMPLE 3: We have developed a system for testing in yeast, S.
cerevisiae the functional significance of mutations found in either the hMLH1 or hPMS1 genes. The system is described in this application using as an example, the serine (SER) to phenylalanine (PHE) causing mutation in hMLH1 that we found in a family with HNPCC, as described above. We have derived a yeast strain that it is essentially deleted for its MLH1 gene and hence is a strong mutator 1000 fold above the normal rate in a simple genetic marker assay involving reversion from growth dependence on a given amino acid to independence (reversion of the hom3-10 allele, Prolla, Christie and Liskay, Mol Cell Biol, 14:407-415, 1994). When we placed the normal yeast MLH1 gene (complete with all known control regions) on a yeast plasma that is stably maintained as a single copy into the MLH1-deleted strain, the mutator phenotype is fully corrected using the reversion to amino acid independence assay. However, if we introduce a deleted copy of the yeast MLHI there is no correction. We next tested the mutation that in the HNPCC family caused a SER to PHE alteration.
We found that the resultant mutant yeast protein cannot correct the mutator o phenotype, strongly suggesting that the alteration from the wild-type gene sequence probably confers cancer susceptibility, and is therefore classified as a mutation, not a polymorphism. We subsequently tested proteins engineered to contain other amino acids at the "serene" position and found that most changes result in a fully mutant, or at least partially mutant phenotype.
As other "point" mutations in MLH1 and PMS1 genes are found in cancer families, they can be engineered into the appropriate yeast homolog gene and their consequence on protein function studied. In addition, we have identified a number of highly conserved amino acids in both the MLH1 and PMS1 genes. We also have evidence that hMLHI interacts with yeast PMS1. This finding raises the possibility that mutations observed in the hMLH1 gene can be more directly tested in the yeast system. We plan to systematically make mutations that will alter the amino acid at these conserved positions and determine what amino acid substitutions are tolerated and which are not. By collecting mutation information relating to hMLH1 and hPMS1, both by determining and documenting actual found mutations in HNPCC families, and by artificially synthesizing mutants for testing in experimental systems, it may be eventually possible to practice a cancer susceptibility testing protocol which, once the individuals hMLH1 or hPMS1 structure is determined, only requires comparison of that structure to known mutation versus polymorphism data.
EXAMPLE 4: Another method which we have employed to study physical interactions between hMLHI and hPMS1, can also be used to study whether a particular alteration in a gene product results in a change in the degree of protein-protein interaction. Information concerning changes in protein-protein interaction may demonstrate or confirm whether a particular genomic variation is a mutation or a polymorphism. Following our labs findings on the interaction between yeast MLH1 and PMS1 proteins in vitro and in vivo, Patent Application Serial No. 08/168,877), the interaction between the human counterparts of these two DNA mismatch repair proteins was tested. The human MLH1 and human PMS1 proteins were tested for in vitro interaction using maltose binding protein (MBP) affinity chromatography. hMLH1 protein was prepared as an MBP fusion protein, immobilized on an amylose resin column via the MBP, and tested for binding to hPMS1, synthesized in vitro. The hPMS1 protein bound to the MBP-hMLH1 matrix, whereas control proteins showed no affinity for the matrix. When the hMLH1 protein, translated in vitro, was passed over an MBP-hPMS1 fusion protein matrix, the hMLH1 protein bound to the MBP-hPMS1 matrix, whereas control proteins did not.
Potential in vivo interactions between hMLH1 and hPMS1 were tested using the yeast "two hybrid" system.
2 Our initial results indicate that hMLH1 and hPMS1 interact in vivo in yeast. The same system can also be used to detect changes in protein-protein interaction which result from changes in gene or gene product structure and which have yet to be classified as either a polymorphism or a mutation which confers cancer susceptibility.
48 Detection of HNPCC Families and Their Mutation(s) It has been estimated that approximately 1,000,000 individuals in the United States carry (are heterozygous for) an HNPCC mutant gene.
29 Furthermore, estimates suggest that 50-60% of HNPCC families segregate mutations in the MSH2 gene that resides on chromosome 2p.
12 Another significant fraction appear to be associated with the HNPCC gene that maps to chromosome 3p21-22, presumably due to hMLH1 mutations such as the C to T transition discussed above. Identification of families that segregate mutant alleles of either the hMSH2 or hMLH1 gene, and the determination of which individuals in these families actually have the mutation will be of great utility in the early intervention into the disease. Such early intervention will likely include early detection through screening and aggressive follow-up treatment of affected individuals. In addition, determination of the genetic basis for both familial and sporadic tumors could direct the method of therapy in the primary tumor, or in recurrences.
Initially, HNPCC candidate families will be diagnosed partly through the study of family histories, most likely at the local level, by hospital oncologists. One criterion for HNPCC is the observation of microsatellite o. instability in individual's tumors.
36 The presenting patient would be tested for mutations in hMSH2, hMLH1, hPMS1 and other genes involved in DNA mismatch repair as they are identified. This is most easily done by sampling •blood from the individual. Also highly useful would be freshly frozen tumor tissue. It is important to note for the screening procedure, that affected individuals are heterozygous for the offending mutation in their normal tissues.
The available tissues, blood and tumor, are worked up for PCR-based mutation analysis using one or both of the following procedures: 1) Linkage analysis with a microsatellite marker tightly linked to the hMLH1 gene.
One approach to identify cancer prone families with a hMLH1 mutation is to perform linkage analysis with a highly polymorphic marker located within or tightly linked to hMLH1. Microsatellites are highly polymorphic and therefore are very useful as markers in linkage analysis. Because we possess the hMLHI gene on a single large genomic fragment in a P1 phage clone 100kbp), it is very likely that one or more microsatellites, tracts of dinucleotide repeats, exist within, or very close to, the hMLH1 gene. At least one such microsatellite has been reported.
8 Once such markers have been identified, PCR primers will be designed to amplify the stretches of DNA containing the microsatellites. DNA of affected and unaffected individuals from a family with a high frequency of cancer will be screened to determine the segregation of the MLH1 markers and the presence of cancer. The resulting data can be used to calculate a lod score and hence determine the likelihood of linkage between hMLH1 and the occurrence of cancer. Once linkage is established in a given family, the same polymorphic marker can be used to test other members of the kindred for the likelihood of their carrying the hMLH mutation.
2) Sequencing of reverse transcribed cDNA.
a) RNA from affected individuals, unaffected and unrelated individuals is reverse transcribed (RTd), followed by PCR to amplify the cDNA in 4-5 overlapping portions.
3 4 37 It should be noted that for the purposes of PCR, many different oligonucleotide primer pair sequences may potentially be used to amplify relevant portions of an individual's hMLH1 or hPMS1 gene for genetic screening purposes. With the knowledge of the cDNA structures for the genes, it is a straight-forward exercise to construct primer pairs which are likely to be effective for specifically amplifying selected portions of the gene. While primer S* sequences are typically between 20 to 30 bases long, it may be possible to use shorter primers, potentially as small as approximately 13 bases, to amplify specifically selected gene segments. The principal limitation on how small a primer sequence may be is that it must be long enough to hybridize specifically to the targeted gene segment. Specificity of PCR is generally improved by lengthening primers and/or employing nested pairs of primers.
The PCR products, in total representing the entire cDNA, are then sequenced and compared to known wild-type sequences. In most cases a mutation will be observed in the affected individual. Ideally, the nature of mutation will indicate that it is likely to inactivate the gene product. Otherwise, the possibility that the alteration is not simply a polymorphism must be determined.
b) Certain mutations, those affecting splicing or resulting in translation stop codons, can destabilize the messenger RNA produced from the mutant gene and hence comprise the normal RT-based mutation detection method. One recently reported technique can circumvent this problem by testing whether the mutant cDNA can direct the synthesis of normal length protein in a coupled in vitro transcription/translation system.
32 3) Direct sequencing of genomic DNA.
A second route to detect mutations relies on examining the exons %a and the intron/exon boundaries by PCR cycle sequencing directly off a DNA template.' This method requires the use of oligonucleotide pairs, such as those described in Tables 2 and 3 above, that amplify individual exons for direct PCR cycle sequencing. The method depends upon genomic DNA sequence information at each intron/exon boundary (50bp, or greater, for each boundary).
The advantage of the technique is two fold. First, because DNA is more stable than RNA, the condition of the material used for PCR is not as important as it is for RNA-based protocols. Second, most any mutation within the actual transcribed region of the gene, including those in an intron affecting splicing, will be detectable.
For each candidate gene, mutation detection may require knowledge of both the entire cDNA structure, and all intron/exon boundaries of the genomic structure. With such information, the type of causal mutation in a particular family can be determined. In turn, a more specific and efficient mutation detection scheme can be adapted for the particular family. Screening for the disease (HNPCC) is complex because it has a genetically heterogeneous basis in the sense that more than one gene is involved, and for each gene, multiple types of mutations are involved.
2 Any given family is highly likely to segregate one particular mutation. However, as the nature of the mutation in multiple families is determined, the spectrum of the most prevalent mutations in the population will be determined. In general, determination of the most frequent mutations will direct and streamline mutation detection.
51 Because HNPCC is so prevalent in the human population, carrier detection at birth could become part of standardized neonatal testing. Families at risk can be identified and all members not previously tested can be tested.
Eventually, all affected kindreds could be determined.
Mode of Mutation Screening and Testing DNA-based Testing Initial testing, including identifying likely HNPCC families by standard diagnosis and family history study, will likely be done in local and o smaller DNA diagnosis laboratories. However, large scale testing of multiple family members, and certainly population wide testing, will ultimately require *see large efficient centralized commercial facilities.
Tests will be developed based on the determination of the most common mutations for the major genes underlying HNPCC, including at least the S13 hMSH2 gene on chromosome 2p and the MLH1 gene on chromosome 3p. A variety of tests are likely to be developed. For example, one possibility is a set of tests employing oligonucleotide hybridizations that distinguish the normal vs.
mutant alleles.
33 As already noted, our knowledge of the nucleotide structures for IhMLH1, hPMS and hMSH2 genes makes possible the design of numerous ""26 oligonucleotide primer pairs which may be used to amplify specific portions of an individual's mismatch repair gene for genetic screening and cancer risk analysis.
"Our knowledge of the genes' structures also makes possible the design of labeled probes which can be quickly used to determine the presence or absence of all or a portion of one of the DNA mismatch repair genes. For example, allele-specific oligomer probes (ASO) may be designed to distinguish between alleles. ASOs are short DNA segments that are identical in sequence except for a single base difference that reflects the difference between normal and mutant alleles. Under the appropriate DNA hybridization conditions, these probes can recognize a single base difference between two otherwise identical DNA sequences. Probes can be labeled radioactively or with a variety of non-radioactive reporter molecules, for example, fluorescent or chemiluminescent moieties. Labeled probes are then used to analyze the PCR sample for the presence of the diseasecausing allele. The presence or absence of several different disease-causing genes can readily be determined in a single sample. The length of the probe must be long enough to avoid non-specific binding to nucleotide sequences other than the target. All tests will depend ultimately on accurate and complete structural information relating to hMLH1, hMSH2, hPMS1 and other DNA mismatch repair genes implicated in HNPCC.
Protein Detection-Based Screening Tests based on the funtionality of the protein product, per se, may also be used. The protein-examining tests will most likely utilize antibody reagents specific to either the hMLH1, hPMS1 and hMSH2 proteins or other related "cancer" gene products as they are identified.
For example, a frozen tumor specimen can be cross-sectioned and prepared for antibody staining using indirect fluorescence techniques. Certain gene mutations are expected to alter or destabilize the protein structure sufficiently such as to give an altered or reduced signal after antibody staining.
It is likely that such tests will be performed in cases where gene involvement in a family's cancer has yet to be established. We are in the process of developing diagnostic monoclonal antibodies against the human MLH1 and PMS1 proteins.
We are overexpressing MLH1 and PMS1 human proteins in bacteria. We will purify the proteins, inject them into mice and derive protein specific monoclonal antibodies which can be used for diagnostic and research purposes.
Identification and Characterization of DNA Mismatch Repair Tumors In addition to their usefulness in diagnosing cancer susceptibility in a subject, nucleotide sequences that are homologous to a bacterial mismatch repair gene can be valuable for, among other things, use in the identification and characterization of mismatch-repair-defective tumors. Such identification and characterization is valuable because mismatch-repair-defective tumors may respond better to particular therapy regimens. For example, mismatch-repairdefective tumors might be sensitive to DNA damaging agents, especially when administered in combination with other therapeutic agents.
53 Defects in mismatch repair genes need not be present throughout an individual's tissues to contribute to tumor formation in that individual.
Spontaneous mutation of a mismatch repair gene in a particular cell or tissue can contribute to tumor formation in that tissue. In fact, at least in some cases, a single mutation in a mismatch repair gene is not sufficient for tumor development.
In such instances, an individual with a single mutation in a mismatch repair gene is susceptible to cancer, but will not develop a tumor until a secondary mutation occurs. Additionally, in some instances, the same mismatch repair gene mutation that is strictly tumor-associated in an individual will be responsible for conferring cancer susceptibility in a family with a hereditary predisposition to cancer development.
In yet another aspect of the invention, the sequence information we have provided can be used with methods known in the art to analyze tumors (or 00000 tumor cell lines) and to identify tumor-associated mutations in mismatch repair 00060: 0 15 genes. Preferably, it is possible to demonstrate that these tumor-associated mutations are not present in non-tumor tissues from the same individual. The information described in this application is particularly useful for the identification of mismatch repair gene mutations withini tumors (or tumor cell S. lines) that display genomic instability of short repeated DNA elements.
00 The sequence information and testing protocols of the present invention can also be used to determine whether two tumors are related, i.e., whether a second tumor is the result of metastasis from an earlier found first tumor which exhibits a particular DNA mismatch repair gene mutation.
Isolating Additional Genes of Related Function Proteins that interact physically with either hMLH1 and/or hPMS1, are likely to be involved in DNA mismatch repair. By analogy to hMLH1 and hMSH2, mutations in the genes which encode for such proteins would be strong candidates for potential cancer linkage. A powerful molecular genetic approach using yeast, referred to as a "two-hybrid system", allows the relatively rapid detection and isolation of genes encoding proteins that interact with a gene product of interest, hMLH1.
2 8 The two-hybrid system involves two plasmid vectors each intended to encode a fusion protein. Each of the two vectors contains a portion, or domain, of a transcription activator. The yeast cell used in the detection scheme contains a "reporter" gene. The activator alone cannot activate transcription.
However, if the two domains are brought into close proximity then transcription may occur. The cDNA for the protein of interest, hMLH1 is inserted within a reading frame in one of the vectors. This is termed the "bait". A library of human cDNAs, inserted into a secorid plasmid vector so as to make fusions with the other domain of the transcriptional activator, is introduced into the yeast cells harboring the "bait" vector. If a particular yeast cell receives a library member that contains a human cDNA encoding a protein that interacts with hMLH1 protein, this interaction will bring the two domains of the transcriptional activator into close proximity, activate transcription of the reporter gene and the yeast cell 0. will turn blue. Next, the insert is sequenced to determine whether it is related to 0.
M any sequence in the data base. The same procedure can be used to identify yeast proteins in DNA mismatch repair or a related process. Performing the yeast and S0. •human "hunts" in parallel has certain advantages. The function of novel yeast o0. 0 •homologs can be quickly determined in yeast by gene disruption and subsequent examination of the genetic consequences of being defective in the new found *20 gene. These yeast studies will help guide the analysis of novel human "hMLH1-or hPMS 1-interacting" proteins in much the same way that the yeast studies on PMS1 and MLH1 have influenced our studies of the human MLH1 and PMS1 genes.
@@oo@ Production of Antibodies By using our knowledge of the DNA sequences for hMLH1 and hPMS1, we can synthesize all or portions of the predicted protein structures for the purpose of producing antibodies. One important use for antibodies directed to hMLH1 and hPMS1 proteins will be for capturing other proteins which may be involved in DNA mismatch repair. For example, by employing coimmunoprecipitation techniques, antibodies directed to either hMLH1 or hPMS1 may be precipitated along with other associated proteins which are functionally and/or physically related. Another important use for antibodies will be for the purpose of isolating hMLH1 and hPMS1 proteins from tumor tissue. The hMLH1 and hPMS1 proteins from tumors can then be characterized for the purpose of determining appropriate treatment strategies.
We are in the process of developing monoclonal antibodies directed to the hMLH1 and hPMS1 proteins.
EXAMPLE 5: We have also used the following procedure to produce polyclonal antibodies directed to the human and mouse forms of PMS1 protein.
We inserted a 3' fragment of the mouse PMS1 cDNA in the 0 bacterial expression plasmid vector, pET (Novagen, Madison, WI). The expected expressed portion of the mouse PMS1 protein corresponds to a region of approximately 200 amino acids at the end of the PMS1 protein. This portion of the mPMS1 is conserved with yeast PMS1 but is not conserved with either the human or the mouse MLH1 proteins. One reason that we selected this portion of the PMS1 protein for producing antibodies is that we did not want the resulting antibodies to cross-react with MLH1. The mouse PMS1 protein fragment was "highly expressed in E. coli., purified from a polyacrylamide gel and the eluted protein was then prepared for animal injections. Approximately 2 mg of the PMS1 protein fragment was sent to the Pocono Rabbit Farm (PA) for injections into rabbits. Sera from rabbits multiple times was tittered against the PMS1 antigen using standard ELISA techniques. Rabbit antibodies specific to mouse PMS1 protein were affinity-purified using columns containing immobilized mouse PMS1 protein. The affinity-purified polyclonal antibody preparation was tested further using Western blotting and dot blotting. We found that the polyclonal antibodies recognized, not only the mouse PMS1 protein, but also the human PMS1 protein which is very similar. Based upon the Western blots, there is no indication that other proteins were recognized strongly by our antibody, including either the human or mouse MLH1 proteins.
DNA Mismatch Repair Defective Mice EXAMPLE 6: In order to create a experimental model system for studying DNA mismatch repair defects and resultant cancer in a whole animal system we have derived DNA mismatch repair defective mice using embryonic stem (ES) cell technology. Using genomic DNA containing a portion of the mPMS1 gene we constructed a vector that upon homologous recombination causes disruption of the chromosomal mPMS1 gene. Mouse ES cells from the 129 mouse strain were confirmed to contain a disrupted mPMS1 allele. The ES cells were injected into C57/BL6 host blastocysts to produce animals that were chimeric or a mixture of 129 and C57/BL6 cells. The incorporation of the ES cells was determined by the presence of patches of agouti coat coloring (indicative of ES cell contribution). All male chimeras were bred with C57/BL6 female mice.
Subsequently, twelve offspring (F 2 were born in which the agouti coat color was detected indicating the germline transmission of genetic material from the ES cells. Analysis of DNA extracted from the tail tips of the twelve offspring indicated that six of the animals were heterozygous (contained one wild- 5 type and one mutant allele) for the mPMS1 mutation. Of the six heterozygous animals, three were female, (animals F 2
F
2 -11 and F 2 -12) and three were males
(F
2
F
2 -10 and F 2 Four breeding pens were set up to obtain mice that were Shomozygous for mPMS1 mutation, and additional heterozygous mice. Breeding pen #1 which contained animals F,-11 and F 2 -10, yielded a total of thirteen mice 0 in three litters, four of which have been genotyped. Breeding pen #2 (animals
F
2 -8 and F,-13) gave twenty-two animals and three litters, three of which have been genotyped. Of the seven animals genotyped, three homozygous female animals have been identified. One animal died at six weeks of age from unknown causes. The remaining homozygous females are alive and healthy at twelve weeks of age. The results indicate that mPMS1 homozygous defective mice are viable.
Breeding pens #3 and #4 were used to backcross the mPMS1 mutation into the C57/BL6 background. Breeding pen #3 (animal F,-12 crossed to a C57/BL6 mouse) produced twenty-one animals in two litters, nine of which have been genotyped. Breeding pen #4 (animal F 2 -6 crossed with a C57/BL6 mouse) gave eight mice. In addition, the original male chimera (breeding pen has produced thirty-one additional offspring.
57 To genotype the animals, a series of PCR primers have been developed that are used to identify mutant and wild-type mPMS1 genes. They are: (SEQ ID NOS: 143-148, respectively) Primer 1: 5'TTCGGTGACAGATITGTAAATG-3' Primer 2: 5'TITACGGAGCCCTGGC-3' Primer 3: 5'TCACCATAAAAATAGTITCCCG-3' Primer 4: 5'TCCTGGATCATATITTCTGAGC-3' Primer 5: 5'TITCAGGTATGTCCTGTTACCC-3' Primer 6: 5'TGAGGCAGCTITTAAGAAACTC-3' Primers 1+2 Primers 1+3 Primers 4+5 (3'targeted) S" Primers 4+6 (3'untargeted) The mice we have developed provide an animal model system for studying the consequences of defects in DNA mismatch repair and resultant HNPCC. The long term survival of mice homozygous and heterozygous for the mPMS1 mutation and the types and timing of tumors in these mice will be determined. The mice will be screened daily for any indication of cancer onset as indicated by a hunched appearance in combination with deterioration in coat condition. These mice carrying mPMS1 mutation will be used to test the effects of other factors, environmental and genetic, on tumor formation. For example, the effect of diet on colon and other type of tumors can be compared for normal mice versus those carrying mPMS1 mutation either in the heterozygous or homozygous genotype. In addition, the mPMS1 mutation can be put into different genetic backgrounds to learn about interactions between genes of the mismatch repair pathway and other genes involved in human cancer, for example, p53.
Mice carrying mPMS1 mutations will also be useful for testing the efficacy of somatic gene therapy on the cancers that arise in mice, for example, the expected colon cancers. Further, isogenic fibroblast cell lines from the homozygous and heterozygous mPMS1 mice can be established for use in various cellular studies, including the determination of spontaneous mutation rates.
We are currently constructing a vector for disrupting the mouse mMLHI gene to derive mice carrying mutation in mMLH1. We will compare mice carrying defects in mPMS1 to mice carrying defects in mMLH1. In addition, we will construct mice that carry mutations in both genes to see whether there is a synergistic effect of having mutations in two HNPCC genes. Other studies on the mMLH1 mutant mice will be as described above for the mPMS1 mutant mice.
I SEQUENCE LISTING GENERAL INFORMATION: APPLICANT: Liskay, Robert M.
Bronner, C. Eric Baker, Sean M.
Bollag, Roni J.
Kolodner, Richard D.
(ii) TITLE OF INVENTION: COMPOSITIONS AND METHODS RELATING TO DNA MISMATCH REPAIR GENES (iii) NUMBER OF SEQUENCES: 148 (iv) CORRESPONDENCE ADDRESS: ADDRESSEE: Kolisch, Hartwell, Dickinson, McCormack Heuser STREET: 520 S.W. Yamhill Street, Suite 200 CITY: Portland STATE: Oregon COUNTRY: U.S.A.
ZIP: 97204 COMPUTER READABLE FORM: MEDIUM TYPE: Floppy disk COMPUTER: IBM PC compatible OPERATING SYSTEM: PC-DOS/MS-DOS SOFTWARE: PatentIn Release Version #1.25 (vi) CURRENT APPLICATION DATA: APPLICATION NUMBER: FILING DATE:
CLASSIFICATION:
(viii) ATTORNEY/AGENT INFORMATION: NAME: Van Rysselberghe, Pierre C.
REGISTRATION NUMBER: 33,557 REFERENCE/DOCKET NUMBER: OHSU 306B (ix) TELECOMMUNICATION INFORMATION: TELEPHONE: (503) 224-6655 TELEFAX: (503) 295-6679 TELEX: 360619 INFORMATION FOR SEQ ID NO:1: SEQUENCE CHARACTERISTICS: LENGTH: 361 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1: Met Pro Ile Gin Val Leu Pro Pro Gin Leu Ala Asn Gin Ile
S
555*
S.
Ala Ala Val Gly Glu Val Asn Ser Leu 35 Glv Glv Ala Val Asp Glu Arg Pro Ala Ser Val Lys Glu 25 Arg Ala Gly Ala Thr 40 Ile Val Asp Ile Asp Cys Leu Val Glu Ile Glu Arg Gly Ile Lys Lys Leu Ile 50 Lys Glu Arg Ala Arg Asp Asn Glu Leu Ala Leu 70 Leu Ala Arg His 75 Ser Thr Ser Lys Ile Ala Ser Leu Asp Asp Leu Glu Ala Ile Glu Ala Leu Ala Ser Ile Ser Ser Val Ile 90 Ser Leu Gly Phe Arg Gly Arg Leu Thr 100 Glu 105 Trp Leu Thr Ser 110 Glu Gly Arg Arg Thr Ala Gin Ala Glu Ala Gin Ala Tyr Ala 125 115 120 61 Asp Met Asp Val Thr Val Lys Pro Ala Ala His Pro Val Gly Thr Thr 130 135 140 Leu Glu Val Leu Asp Leu Phe Tyr Asn Thr Pro Ala Arg Arg Lys Phe 145 150 155 160 Met Arg Thr Glu Lys Thr Glu Phe Asn His Ile Asp Glu Ile Ile Arg 165 170 175 Arg Ile Ala Leu Ala Arg Phe Asp Val Thr Leu Asn Leu Ser His Asn 180 185 190 Gly Lys Leu Val Arg Gln Tyr Arg Ala Val Ala Lys Asp Gly Gin Lys 195 200 205 Glu Arg Arg Leu Gly Ala Ile Cys Gly Thr Pro Phe Leu Glu Gin Ala 210 215 220 Leu Ala Ile Glu Trp Gin His Gly Asp Lys Thr Lys Arg Gly Trp Val 225 230 235 240 Ala Asp Pro Asn His Thr Thr Thr Ala Leu Thr Glu Ile Gin Tyr Cys 245 250 255 Tyr Val Asn Gly Arg Met Met Arg Asp Arg Leu Ile Asn His Ala Ile 260 265 270 Arg Gin Ala Cys Glu Asp Lys Leu Gly Ala Asp Gin Gin Pro Ala Phe 275 280 285 Val Leu Tyr Leu Glu Ile Asp Pro His Gin Val Asp Val Asn Val His 290 295 300 Pro Ala Lys His Glu Val Arg Phe His Gin Ser Arg Leu Val His Asp 305 310 315 320 Phe Ile Tyr Gin Gly Val Leu Ser Val Leu Gin Gin Gin Thr Glu Thr 325 330 335 Ala Leu Pro Leu Glu Glu Ile Ala Pro Ala Pro Arg His Val Gin Glu 340 345 350 Asn Arg Ile Ala Ala Gly Arg Asn His 355 360 INFORMATION FOR SEQ ID NO:2: SEQUENCE CHARACTERISTICS: LENGTH: 538 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: Met Ser His Ile Ile Glu Leu Pro Glu Met Leu Ala Asn Gin Ile Ala 1 5 10 Ala Gly Glu Val Ile Glu Arg Pro Ala Ser Val Cys Lys Glu Leu Val 25 Glu Asn Ala Ile Asp Ala Gly Ser Ser Gin Ile Ile Ile Glu Ile Glu 40 62 Glu Ala Gly Leu Lys Lys Val Gin Ile Thr Asp Asn Gly His Gly Ile 55 Ala His Asp Glu Val Giu Leu Ala Leu Arg Arg His Ala Thr Ser Lys 70 75 Ile Lys Asn Gin Ala Asp Leu Phe Arg Ile Arg Thr Leu Gly Phe Arg 90 Gly Giu Ala Leu Pro Ser Ile Ala Ser Val Ser Val Leu Thr Leu Leu 100 105 110 Thr Ala Val Asp Gly Ala Ser His Gly Thr Lys Leu Val Ala Arg Gly 115 120 125 Gly Giu Val Giu Giu Val lie Pro Ala Thr Ser Pro Val Gly Thr Lys 130 135 140 Val Cys Val Giu Asp Leu Phe Phe Asn Thr Pro Ala Arg Leu Lys Tyr 145 150 155 160 Met Lys Ser Gin Gin Ala Giu Leu Ser His Ile Ile Asp Ile Val Asn 165 170 175 Arg Leu Gly Leu Ala His Pro Giu Ile Ser Phe Ser Leu Ile Ser Asp *180 185 190 Gly Lys Glu Met Thr Arg Thr Ala Gly Thr Gly Gin Leu Arg Gin Ala 195 200 205 Ile Ala Gly Ile Tyr Gly Leu Val Ser Ala Lys Lys Met Ile Giu Ile 210 215 220 a a a.
a a a a.
a a Giu 225 Giu Arg Giy His Giu 305 Ala Giu Thr His Asp 385 Ser Ile Pro Gin Giu 465 Val As n Leu Tyr Ser Ile 290 Vai Ile Asn Ile Giy 370 Gin Ile Phe Leu Phe 450 Ser Ser Arg Leu Gly 530 Ser Thr Ile Lys 275 Asp Arg Aia Leu Leu 355 Thr His Gly Giu Leu 435 Ile Gly Ile Ser Leu 515 Arg Asp Arg Lys 260 Leu Pro Ile Asn Ala 340 Pro Tyr Ala Asn Phe 420 Giu Leu Ile Lys Ile 500 Tyr Pro Leu Ala 245 Asn Met Tyr Ser Ser 325 Lys Leu Leu Ala Val 405 Pro Giu Arg Tyr Lys 485 Lys Gin Val Asp 230 Asn Phe Val Leu Lys 310 Leu Ser Ser Phe Gin 390 Asp Ala Val Giu Giu 470 Tyr Ala Leu Leu Phe Arg Leu Gly Ala 295 Gi u Lys Thr Phe Ala 375 Giu Gin Asp Gly His 455 Met Arg Asn Ser Val1 535 63 Giu Asn Leu Arg 280 Asp Lys Giu Val Pro 360 Gin Arg Ser Asp Val1 440 Pro Cys Ala His Gin 520 His Ile Tyr Asn 265 Phe Val Giu Gin Arg 345 Glu Gly Val Gin Ala 425 Phe Ile Asp Giu Arg 505 Cys Phe Ser Ile 250 Arg Pro Asn Leu Thr 330 Asn Leu Arg Lys Gin 410 Leu Leu Trp Met Leu 490 Ile Asp Thr Gly 235 Ser Ala Leu Val Met 315 Leu Arg Giu Asp Tyr 395 Gin Arg Ala Met Leu 475 Ala Asp Asn Phe Leu Ile Ala His 300 Thr Ile Giu Phe Gly 380 Giu Leu Leu Giu Ala 460 Leu Ile Asp Pro Val Phe Leu Vai 285 Pro Leu Pro Lys Phe 365 Leu Giu Leu Lys Tyr 445 Giu Leu Met His Tyr 525 Ser Ile Asp 270 Ile Thr Vai Asp Val 350 Gly Tyr Tyr Val Giu 430 Gly Giu Thr Met Ser 510 Asn Leu Asn 255 Gly His Lys Ser Ala 335 Giu Gin Ile Arg Pro 415 Arg Giu Giu Lys Ser 495 Ala Cys Pro 240 Giy Phe Ile Gin Giu 320 Leu Gin Met Ile Giu 400 Tyr Met Asn Ile Giu 480 Cys Arg Pro Ly s Gin His 64 INFORMATION FOR SEQ ID NO:3: SEQUENCE CHARACTERISTICS: LENGTH: 607 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: Met Phe His His Ile Glu Asn Leu Leu Ile Glu Thr Giu Lys Arg Cys 1 5 10 Lys Gin Lys Glu Gin Arg Tyt Ile Pro Val Lys Tyr Leu Phe ser met 25 Thr Gin Ile His Gin Ile Asn Asp Ile Asp Val His Arg Ile Thr Ser 40 Gly Gin Val Ile Thr Asp Leu Thr Thr Ala Val Lys Giu Leu Val Asp 55 Asn Ser Ile Asp Ala Asn Ala Asn Gi~n Ile Glu Ile Ile Phe Lys Asp **65 70 75 :Tyr Gly Leu Glu Ser Ile Glu Cys Ser Asp Asn Giy Asp Gly Ile Asp 90 Pro Ser Asn Tyr Glu Phe Leu Aia Leu Lys His Tyr Thr Ser Lys Ile 100 105 110 Ala Lys Phe Gin Asp Val Ala Lys Vai Gin Thr Leu Gly Phe Arg Giy ***115 120 125 a,.aGlu Ala Leu Ser Ser Leu Cys Gly Ile Ala Lys Leu Ser Val Ile Thr 130 135 140 Thr Thr Ser Pro Pro Lys Ala Asp Lys Glu Leu Tyr Asp Met Val Gly 145 150 155 160 His Ile Thr Ser Lys Thr Thr Thr Ser Arg Asn Lys Gly Thr Thr Val 165 170 175 Leu Val Ser Gin Leu Phe His Asn Leu Pro Val Arg Gin Lys Giu Phe 180 185 190 Ser Lys Thr Phe Lys Arg Gin Phe Thr Lys Cys Leu Thr Val Ile Gin 195 200 205 Gly Tyr Ala Ile Ile Asn Ala Ala Ile Lys Phe Ser Val Trp Asn Ile 210 215 220 Thr Pro Lys Gly Lys Lys Asn Leu Ile Leu Ser Thr Met Arg Asn Ser 225 230 235 240 Ser Met Arg Lys Asn Ile Ser Ser Val Phe Gly Ala Gly Gly Met Arg 245 250 255 Gly Glu Leu Glu Val Asp Leu Val Leu Asp Leu Asn Pro Phe Lys Asn 260 265 270 Arg Met Leu Gly Lys Tyr Thr Asp Asp Pro Asp Phe Leu Asp Leu Asp 275 280 285 Tyr Lys Ile Arg Val Lys Gly Tyr Ile Ser Gin Asn Ser Phe Gly Cys 290 295 300 Gly Arg Asn Ser Lye Asp Arg Gin Phe Ile Tyr Val Asn Lys Arg Pro 305 310 315 320 Val Giu Tyr Ser Thr Leu Leu Lys Cys Cys Asn Giu Val Tyr Lys Thr 325 330 335 Phe Asn Asn Val Gin Phe Pro Ala Val Phe Leu Asn Leu Giu Leu Pro 340 345 350 Met Ser Leu Ile Asp Val Asn Val Thr Pro Asp Lys Arg Val Ile Leu 355 360 365 Leu His Asn Giu Arg Ala Val Ile Asp Ile Phe Lys Thr Thr Leu Ser 370 375 380 Asp Tyr Tyr Aen Arg Gin Giu Leu Ala Leu Pro Lys Arg Met Cys Ser 385 390 395 400 Gin Ser Giu Gin Gin Ala Gin Lye Arg Leu Leu Thr Giu Vai Phe Asp 405 410 415 ***Asp Asp Phe Lys Lys Met Giu Val Val Giy Gin Phe Asn Leu Gly Phe *420 425 430 Ile Ile Val Thr Arg Lye Val Asp Asn Lye Ser Asp Leu Phe Ile Val *435 440 445 Asp Gin His Ala Ser Asp Giu Lys Tyr Aen Phe Giu Thr Leu Gin Ala 450 455 460 *Val Thr Vai Phe Lys Ser Gin Lye Leu Ile Ile Pro Gin Pro Val Giu *465 470 475 480 Leu Ser Val Ile Asp Giu Leu Val Vai Leu Asp Asn Leu Pro Vai Phe *.:485 490 495 Giu Lys Asn Gly Phe Lye Leu Lye Ile Asp Giu Giu Giu Giu Phe Giy 500 505 510 Ser Arg Val Lye Leu Leu Ser Leu Pro Thr Ser Lye Gin Thr Leu Phe 0 515 520 525 *.Asp Leu Gly Asp Phe Aen Giu Leu Ile His Leu Ile Lye Giu Asp Gly 530 535 540 Gly Leu Arg Arg Asp Aen Ile Arg Cys Ser Lye Ile Arg Ser Met Phe 545 550 555 560 Ala Met Arg Ala Cys Arg Ser Ser Ile Met Ile Gly Lye Pro Leu Asn 565 570 575 Lye Lye Thr Met Thr Arg Val Val His Asn Leu Ser Giu Leu Asp Lys 580 585 590 Pro Trp Aen Cys Pro His Giy Arg Pro Thr Met Arg His Leu Met 595 600 605 66 INFORMATION FOR SEQ ID NO:4: SEQUENCE CHARACTERISTICS: LENGTH: 2484 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: CTTGGCTCTT CTGGCGCCAA AATGTCGTTC GTGGCAGGGG TTATTCGGCG GCTGGACGAG 00O* 00*0 *0@ 0000 60*0 0.
0*55 0
*S
S
ACAGTGGTGA ACCGCATCGC GGCGGGGGAA GTTATCCAGC GAGATGATTG AGAACTGTTT AGATGCAAAA TCCACAAGTA GGAGGCCTGA AGTTGATTCA GATCCAAGAC AATGGCACCG GATATTGTAT GTGAAAGGTT CACTACTAGT AAACTGCAGT ATTTCTACCT ATGGCTTTCG AGGTGAGGCT TTGGCCAGCA ACTATTACAA CGAAAACAGC TGATGGAAAG TGTGCATACA AAACTGAAAG CCCCTCCTAA ACCATGTGCT GGCAATCAAG GACCTTTTTT ACAACATAGC CACGAGGAGA AAAGCTTTAA GGGAAAATTT TGGAAGTTGT TGGCAGGTAT TCAGTACACA GTTAAAAAAC AAGGAGAGAC AGTAGCTGAT GTTAGGACAC GACAATATTC GCTCCATCTT TGGAAATGCT GTTAGTCGAG GAGGATAAAA CCCTAGCCTT CAAAATGAAT GGTTACATAT AAGAAGTGCA TCTTCTTACT CTTCATCAAC CATCGTCTGG AAAGCCATAG AAACAGTGTA TGCAGCCTAT TTGCCCAAAA CTCAGTTTAG AAATCAGTCC CCAGAATGTG GATGTTAATG GTTCACTTCC TGCACGAGGA GAGCATCCTG GAGCGGGTGC CTCCTGGGCT CCAATTCCTC CAGGATGTAC TTCACCCAGA GGCCCCTCTG GGGAGATGGT TAAATCCACA ACAAGTCTGA AGTAGTGATA AGGTCTATGC CCACCAGATG GTTCGTACAG GATGCATTTC TGCAGCCTCT GAGCAAACCC CTGTCCAGTC GAGGATAAGA CAGATATTTC TAGTGGCAGG GCTAGGCAGC CTCCCAGCCC CTGCTGAAGT GGCTGCCAAA AATCAGAGCT GGGACTTCAG AAATGTCAGA GAAGAGAGGA CCTACTTCCA CGGGAAGATT CTGATGTGGA AATGGTGGAA GATGATTCCC TGTACCCCCC GGAGAAGGAT CATTAACCTC ACTAGTGTTT AATGAGCAGG GACATGAGGT TCTCCGGGAG ATGTTGCATA GTGAATCCTC AGTGGGCCTT GGCACAGCAT CAAACCAAGT AAGCTTAGTG AAGAACTGTT CTACCAGATA CTCATTTATG CTCAGGTTAT CGGAGCCAGC ACCGCTCTTT GACCTTGCCA GAGAGTGGCT GGACAGAGGA AGATGGTCCC AAAGAAGGAC TTTCTGAAGA AGAAGGCTGA GATGCTTGCA GACTATTTCT GGGAACCTGA TTGGATTACC CCTTCTGATT GACAACTATG CCTATCTTCA TTCTTCGACT AGCCACTGAG GTGAATTGGG GAAAGCCTCA GTAAAGAATG CGCTATGTTC TATTCCATCC GAGTCGACCC TCTCAGGCCA GCAGAGTGAA GTGCCTGGCT TGGACTGTGG AACACATTGT CTATAAAGCC TTGCGCTCAC GGCCAGCTAA TGCTATCAAA TTCAAGTGAT TGTTAAAGAG GGATCAGGAA AGAAGATCTG CCTTTGAGGA TTTAGCCAGT TAAGCCATGT GGCTCATGTT GAGCAAGTTA CTCAGATGGA GGACCCAGAT CACGGTGGAG AAAATCCAAG TGAAGAATAT ATGCAGGCAT TAGTTTCTCA TACCCAATGC CTCAACCGTG AACTGATAGA AATTGGATGT CCAATGCAAA CTACTCAGTG TAGAATCAAC TTCCTTGAGA ACACACACCC ATTCCTGTAC TGCACCCCAC AAAGCATGAA AGCAGCACAT CGAGAGCAAG CTTTGCTACC AGGACTTGCT CCTCGTCTTC TACTTCTGGA ATTCCCGGGA ACAGAAGCTT AGCCCCAGGC CATTGTCACA AAGATGAGGA GATGCTTGAA TGGAGGGGGA TACAACAAAG GCAACCCCAG AAAGAGACAT GAAAGGAAAT GACTGCAGCT TGAGTCTCCA GGAAGAAATT ACCACTCCTT CGTGGGCTGT TATACCTTCT CAACACCACC ATTTTGCCAA TTTTGGTGTT TGCTTGCCTT AGATAGTCCA TTGCTGAATA CATTGTTGAG CTTTGGAAAT TGATGAGGAA TGCCCCCTTT GGAGGGACTG ACGAAGAAAA GGAATGTTTT GGAAGCAGTA CATATCTGAG CCATTCCAAA CTCCTGGAAG ACATTCTGCC TCCTAAACAT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220
TTCACAGAAG
GAGAGGTGTT
CGATACAAAG
CACTTAAGAC
AATAAATAGA
ATGGAAATAT CCTGCAGCTT GCTAACCTGC CTGATCTATA CAAAGTCTTT AAATATGGTT ATTTATGCAC TGTGGGATGT GTTCTTCTTT CTCTGTATTC TGTTGTATCA AAGTGTGATA TACAAAGTGT ACCAACATAA GTGTTGGTAG TTATACTTGC CTTCTGATAG TATTCCTTTA TACACAGTGG ATTGATTATA TGTGTCTTAA CATA 2280 2340 2400 2460 2484 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 756 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met 1 Asn Ser Phe Val Ala Gly Val 5 Ile Arg Arg 10 Val Ile Gln Leu Asp Giu Thr Val Val Arg Ile Ala 20 Glu Met Ile Ala Gly Glu Arg Pro Ala 25 Lvs Glu Asn Cys 35 Val Ile Val Leu 40 Leu Asp Ala Lys Ser Thr Ile Asn Ala Ile Ser Ile Gin Gin Asp Asn Lys Giu Gly Lys Leu Ile Gly Thr Gin Val Gly Ile Arg Thr Lys 70 Gin Asp Leu Asp Cys Glu Arg Phe Thr Ser Lys Leu Ser Phe Glu Asp 90 Ser Ala Ser Ile Ser Thr Tyr Gly Phe Val Thr Ile 115 Ser Tyr Ser Arg 100 Thr Gly Giu Ala Leu Ala 105 Asp Ile Ser His Val Ala His 110 Tyr Arg Ala Thr Lys Thr Ala 120 Lys Gly Lys Cys Ala 125 Pro Asp Gly Lys 130 Asn Gin Leu 135 Thr Ala Pro Pro Lys 140 Phe Cys Ala Gly Gly Thr Gin 145 Thr Arg Arg Lys Ile 150 Leu Val Glu Asp Tyr Asn Ile Ala 160 Ala 165 Lys Asn Pro Ser 170 His Glu Tyr Gly Lys Ile 175 Leu Giu Val Ser Vai Lys 195 Asn Ala Ser Val 180 Lys Gly Arg Tyr Ser Asn Ala Gly Gin Gly Glu Thr 200 Ile Ala Asp Val Arg 205 Gly Ile Ser Phe 190 Thr Leu Pro Asn Ala Val Thr Val Asp Arg Ser Ile 210 Ser Arg 225 Phe 220 Lys Glu Leu Ile Glu 230 Gly Cys Glu Thr Leu Ala Phe 240 Lys Ile Arg His Val 305 Ser Ser Ala Ser Arg 385 Ser Thr Giu Gly Thr 465 Met Arg Ile Ser Thr 545 Tyr Ser Met Phe Lys Pro 290 Asn Ile Asn Giy Ser 370 Thr Lys Asp Leu Asp 450 Ser Val Arg Asn Phe 530 Lys Gin Giu Asn Leu Ala 275 Phe Val Leu Ser Pro 355 Thr Asp Pro Ile Pro 435 Thr Ser Giu Arg Giu 515 Vai Leu Ile Pro Gly Leu 260 Ile Leu His Giu Ser 340 Ser Ser Ser Leu Ser 420 Ala Thr Asn Asp Ile 500 Gin Gly Tyr Leu Ala 580 Tyr 245 Phe Giu Tyr Pro Arg 325 Arg Giy Giy Arg Ser 405 Ser Pro Lys Pro Asp 485 Ile Gly Cys Leu Ile 565 Pro Ile Ile Thr Leu Thr 310 Val Met Giu Ser Giu 390 Ser Gly Aia Giy Arg 470 Ser Asn His Vali Leu 550 Tyr Leu Ser Asn Val1 Ser 295 Lys Gin Tyr Met Ser 375 Gin Gin Arg Giu Thr 455 Lys Arg Leu Giu Asn 535 Asn Asp Phe 68 Asn His Tyr 280 Leu His Gin Phe Val1 360 Asp Lys Pro Ala Val 440 Ser Arg Ly s Thr Val1 520 Pro Thr Phe Asp Al a Arg 265 Ala Giu Giu His Thr 345 Lys Lys Leu Gin Arg 425 Ala Giu His Giu Ser 505 Leu Gin Thr Al a Leu 585 Asn 250 Leu Ala Ile Val Ile 330 Gin Ser Val Asp Ala 410 Gin Ala Met Arg Met 490 Vai Arg Trp Lys Asn 570 Ala Tyr Val Tyr Ser His 315 Giu Thr Thr Tyr Ala 395 Ile Gin Lys Ser Giu 475 Thr Leu Glu Ala Leu 555 Phe Met Ser Giu Leu Pro 300 Phe Ser Leu Thr Ala 380 Phe Vai Asp Asn Giu 460 Asp Ala Ser Met Leu 540 Ser Gly Leu Vai Lys Ser Thr 270 Pro Lys 285 Gin Asn Leu His Lys Leu Leu Pro 350 Ser Leu 365 His Gin Leu Gin Thr Giu Glu Giu 430 Gin Ser 445 Lys Arg Ser Asp Ala Cys Leu Gin 510 Leu His 525 Ala Gin Giu Giu Val Leu Ala Leu 590 Lys 255 Ser Asn Val Giu Leu 335 Gly Thr Met Pro Asp 415 Met Leu Gly Val1 Thr 495 Giu Asn His Leu Arg 575 Asp Cys Leu Thr Asp Giu 320 Gly Leu Ser Val Leu 400 Lys Leu Giu Pro Giu 480 Pro Glu His Gin Phe 560 Leu Ser Pro Giu Ser 595 Glu Tvr Ile Gly Trp Thr Glu Val Giu Phe 610 Tyr Phe Leu 615 Asp Glu Asp 600 Lys Lys Glu Glu Gly Pro Lys Lys Ala Glu 620 Gly Asn Leu Glu 605 Met Gly Leu Ala Leu Ala Asp Ser Leu Glu 625 Leu Ile 630 Tyr Ile Gly Leu Pro 640 Leu Ile Asp Val Pro Pro Leu 650 Trp Gly Leu Pro Ile Phe 655 Glu Cys Ile Leu Arg Phe Giu Ser 675 Gin Tyr Ile Leu 660 Thr Giu Val Asn 665 Asp Giu Glu Leu Ser Lys Glu Cys Ala 680 Thr 'Leu Met Phe Tyr 690 Pro Gly Ser Ser Giu Glu Ile Pro Asn 710 Leu Arg Ser Ser 695 Ser Ser Gly Ser Ile Arg 685 Gin Ser Glu Glu His Ile Trp Lys Trp 705 Tyr Lys Val Vai 720 Thr 715 Pro Lys Ala His Ile Leu Pro 730 Leu Lys His Phe Thr Glu 735 Lys Val Asp Gly Asn Phe Giu Arg 755 Ile 740 Cys Gin Leu Ala Pro Asp Leu Tyr 750 INFORMATION FOR SEQ ID NO:6: SEQUENCE CHARACTERISTICS: LENGTH: 397 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: TGGCTGGATG CTAAGCTACA GCTGAAGGAA GAACGTGAGC ACGAGGCACT GAGGTGATTG GCTGAAGGCA CTTCCGTTGA GCATCTAGAC GTTTCCTTGG CTCTTCTGGC GCCAAAATGT CGTTCGTGGC AGGGGTTATT CGGCGGCTGG ACGAGACAGT GGTGAACCGC ATCGCGGCGG GGGAAGTTAT CCAGCGGCCA GCTAATGCTA TCAAAGAGAT GATTGAGAAC TGGTACGGAG GGAGTCGAGC CGGGCTCACT TAAGGGCTAC GACTTAACGG GCCGCGTCAC TCAATGGCGC GGACACGCCT CTTTCCCCGG GCAGAGGCAT GTACAGCGCA TGCCCACAAC GGCGGAGGCC GCCGGGTTCC CTACGTGCCA TAAGCCTTCT CCTTTTC 120 180 240 300 360 397 INFORMATION FOR SEQ ID NO:7: SEQUENCE CHARACTERISTICS: LENGTH: 393 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: AAACACGTTA ATGAGGCACT ATTGTTTGTA TTTGGAGTTT GTTATCATTG CTTGGCTCAT ATTAAAATAT GTACATTAGA GTAGTTGCAG ACTGATAAAT TATTTTCTGT TTGATTTGCC AGTTTAGATG CAAAATCCAC AAGTATTCAP( GTGATTGTTA AAGAGGGAGG CCTGAAGTTG ATTCAGATCC AAGACAATGG CACCGGGATC AGGGTAAGTA AAACCTCAAA GTAGCAGGAT GTTTGTGCGC TTCATGGAAG AGTCAGGACC TTTCTCTGTT CTGGAAACTA GGCTTTTGCA GATGGGATTT TTTCACTGAA AAATTCAACA CCAACAATAA ATATTTATTG AGTACCTATT ATTTGCGGGG CACTGTTCAG GGGATGTGTC AGT 120 180 240 300 360 393 r INFORMATION FOR SEQ ID NO:8: SEQUENCE CHARACTERISTICS: LENGTH: 352 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: TTTCCTGGAT TAATCAAGAA ATGGAATTCA AAGAGATTTG GAAAATGAGT TTTACTCATC TTTTTGGTAT CTAACAGAAA GAAGATCTGG ATATTGTATG ACTACTAGTA AACTGCAGTC CTTTGAGGAT TTAGCCAGTA TTTCTACCTA GGTGAGGTAA GCTAAAGATT CAAGAAATGT GTAAAATATC CTCCTGTGAT GTCATTTGTT AGTATGTATT TCTCAACATA GATAAATAAG GTTTGGTACC TAAATGTATG CAAATCTGAG CAAACTTAAT GAACTTTAAC TTTCAAAGAC
AACATGATTA
TGAAAGGTTC
TGGCTTTCGA
GACATTGTCT
TTTTACTTGT
TG
120 180 240 300 352 INFORMATION FOR SEQ ID NO:9: SEQUENCE CHARACTERISTICS: LENGTH: 287 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:
TGGAAGCAGC
AGTTTTTCTT
GCTCATGTTA
GACTTCTTTT
ATCCTGTCTC
AGCAGATAAC
TCAGTCTATT
CTATTACAAC
ACTCATATAT
AACACCAGTG
CTTTCCCTTT GGTGAGGTGA CAGTGGGTGA CCCAGCAGTG TTCTTTTCTT CCTTAGGCTT TGGCCAGCAT AAGCCATGTG GAAAACAGCT GATGGAAAGT GTGCATACAG GTATAGTGCT ATTCATTCTG AAATGTATTT TGGGCCTAGG TCTCAGAGTA TTATCTTTGG CAGAGATCTT GAGTACG 120 180 240 287 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 336 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE'TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID TTGATATGAT TTTCTCTTTT CCCCTTGGGA TTAGTATCTA TCTCTCTACT GGATATTAAT TTGTTATATT TTCTCATTAG AGCAAGTTAC TCAGATGGAA AACTGAAAGC CCCTCCTAAA CCATGTGCTG GCAATCAAGG GACCCAGATC ACGGTAAGAA TGGTACATGG GAGAGTAAAT TGTTGAAGCT TTGTTTGTAT AAATATTGGA ATAAAAAATA AAATTGCTTC TAAGTTTTCA GGGTAATAAT AAAATGAATT TGCACTAGTT AATGGAGGTC CCAAGATATC CTCTAAGCAA GATAAATGAC TATTGGCTTT TTGGCATGGC AGCCTG INFORMATION FOR SEQ ID NO:1l: SEQUENCE CHARACTERISTICS: LENGTH: 275 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:l1: GCTTTTGCCA GGACCATCTT GGGTTTTATT TTCAAGTACT TCTATGAATT TACAAGAAAA ATCAATCTTC TGTTCAGGTG GAGGACCTTT TTTACAACAT AGCCACGAGG AGAAAAGCTT TAAAAAATCC AAGTGAAGAA TATGGGAAAA TTTTGGAAGT TGTTGGCAGG TACAGTCCAA AATCTGGGAG TGGGTCTCTG AGATTTGTCA TCAAAGTAAT GTGTTCTAGT GCTCATACAT TGAACAGTTG CTGAGCTAGA TGGTGAAAAG TAAAA 120 180 240 300 336 a 120 180 240 275 INFORMATION FOR SEQ ID NO:12: SEQUENCE CHARACTERISTICS: LENGTH: 389 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: CAGCAACCTA TAAAAGTAGA GAGGAGTCTG TGTTTTGACG CAGCACCTTT TTTGGATGAA GTTTCTGCTG GTTTATTTTT CTGTGGGTAA AATATTAATA AGATATTTTT CTTTATATGT ACCTTTGTTT AGATTACTCA ACTCCACTAA TAAAACGGGG CTCTGACATC TAGTGTGTGT TTTTGGCAAC TCTTTTCTTA TTTCTTTTCC AGGTATTCAG TACACAATGC AGGCATTAGT TTCTCAGTTA TTCTTGGTTT ATGGGGGATG GTTTTGTTTT ATGAAAAGAA AAAAGGGGAT TTGCTGGTGG AGATAAGGTT ATGATGTTT
AGCATTTTTA
GGCTGTATGG
TTTATTTAAC
CTCTTTTGTT
AAAAAGTAAG
TTTTAATAGT
120 180 240 300 360 389 INFORMATION FOR SEQ ID NO:13: SEQUENCE CHARACTERISTICS: LENGTH: 381 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: ATGTTTCAGT CTCAGCCATG AGACAATAAA TCCTTGTGTC TTCTGCTGTT TGTTTATCAG CAAGGAGAGA CAGTAGCTGA TGTTAGGACA CTACCCAATG CCTCAACCGT GGACAATATT CGCTCCATCT TTGGAAATGC TGTTAGTCGG TATGTCGATA ACCTATATAA AAAAATCTTT TACATTTATT ATCTTGGTTT ATCATTCCAT CACATTATTT GGGAACCTTT CAAGATATTA TGTGTGTTAA GAGTTTGCTT TAGTCAAATA CACAGGCTTG TTTTATGCTT CAGATTTGTT AATGGAGTTC TTATTTCACG TAATCAACAC TTTCTAGGTG TATGTAATCT CCTAGATTCT GTGGCGTGAA TCATGTGTTC T 120 180 240 300 360 381 a INFORMATION FOR SEQ ID NO:14: SEQUENCE CHARACTERISTICS: LENGTH: 526 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14:
ACTGAGTAGG
GGATGGGTGG
ACCTCAAATG
CCTTGTGTTT
TTCAGAATCT
CCTTCAAAAT
TACTCTTCAT
CCACAGGGAA
GTAGGTGGGT
GTGAATGGGT
GACCAAGTCT
TTAAATTCTG
CTTTTCTAAT
GAATGGTTAC
CAACCGTAAG
TTTTATGGGA
GAGTGGGTGG GTGGGTGGGT GAACAGACAA ATGGATGGAT TCGGGGCCCT CATTTCACAA ATTCTTTTGT AATGTTTGAG AGAGAACTGA TAGAAATTGG ATATCCAATG CAAACTACTC TTAAAAAGAA CCACATGGGA CCATGGAAAA ATTTCTGAGT GGGTGGATGG ATGGATGGGA GAATGGACAG GCACAGGAGG AGTTAGTTTA TGGGAAGGAA TTTTGAGTAT TTTCAAAAGC ATGTGAGGAT AAAACCCTAG AGTGAAGAAG TGCATCTTCT AATCCACTCA CAGGAAACAC CCATAGGTTT GATTAAACAT 120 180 240 300 360 420 480 526 GGAGAAACCT CATGGCAAAG TTTGGTTTTA TTGGGAAGCA TGTATA INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 434 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID ATAGTGGGCT GGAAAGTGGC CACAGGTAAA GGTGCACCTT TCTTCCTGGG GATGTGATGT GCATATCACT ACAGAAATGT CTTTCCTGAG GTGATGTCAT GACTTTGTGT GAATGTACAC CTGTGACCTC ACCCCTCAGG ACAGTTTTGA ACTGGTTGCT TTCTTTTTAT TGTTTAGATC 120 180 .0 0 GTCTGGTAGA ATCAACTTCC TTGAGAAAAG CCATAGAAAC AGTGTATGCA GCCTATTTGC CCAAAAACAC ACACCCATTC CTGTACCTCA GGTALATGTAG CACCAAACTC CTCAACCAAG ACTCACAAGG AACAGATGTT CTATCAGGCT CTCC!TCTTTG AAAGAGATGA GCATGCTAAT AGTACAATCA GAGTGAATCC CATACACCAC TGGCAAAAGG ATGTTCTGTC CCTTCTTACA GGTACAAGGC ACAG INFORMATION FOR SEQ ID NO:16: SEQUENCE CHARACTERISTICS: LENGTH: 458 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: CTTACGCAAA GCTACACAGC TCTTAAGTAG, CAGTGCCAAT ATTTGAACAC ACTCAGACTC GAGCCTGAGG TTTTGACCAC TGTGTCATCT GGCCTCAAAT CTTCTGGCCA CCACATACAC CATATGTGGG CTTTTTCTCC CCCTCCCACT ATCTAAGGTA ATTGTTCTCT CTTATTTTCC TGACAGTTTA GAAATCAGTC CCCAGAATGT GGATGTTAAT GTGCACCCCA CAAAGCATGA AGTTCACTTC CTGCACGAGG, AGAGCATCCT GGAGCGGGTG CAGCAGCACA TCGAGAGCAA GCTCCTGGGC TCCAATTCCT CCAGGATGTA CTTCACCCAG GTCAGGGCGC TTCTCATCCA GCTACTTCTC TGGGGCCTTT GAAATGTGCC CGGCCAGACG TGAGAGCCCA GATTTTTGCT GTTATTTAGG AACTTTTTTT GAAGTATTAC CTGGATAG INFORMATION FOR SEQ ID NO:17: SEQUENCE CHARACTERISTICS: LENGTH: 618 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 240 300 360 420 434 120 180 240 300 360 420 458
GATAATTATA
CTATACTTCT
TTTTTTTTTT
AAATCCACAA
CACCAGATGG
AGCAAACCCC
AGTGGCAGGG
GCTGCCAAAA
AAGAGAGGAC
TCCTTTATTC
TCTCTCTGCC
CCTCATACTA
TATTCTGAGT
TAATACAGAC
CAAGTCTGAC
TTCGTACAGA
TGTCCAGTCA
CTAGGCAGCA
GCTTCTTTCT
CTCTCCACTA
TTTGCTACCA
CTCGTCTTCT
TTCCCGGGAA
GCCCCAGGCC
AGATGAGGAG
TAGTACTGCT CCATTTGGGG ACCTGTATAT TATATATATA TATATATATA TTTTTTTTTT GGACTTGCTG GCCCCTCTGG GGAGATGGTT ACTTCTGGAA GTAGTGATAA GGTCTATGCC CAGAAGCTTG ATGCATTTCT GCAGCCTCTG ATTGTCACAG AGGATAAGAC AGATATTTCT ATGCTTGAAC TCCCAGCCCC TGCTGAAGTG ACAACAAAGG GGACTTCAGA AATGTCAGAG TATGGCCTTT TGGGAAAAGT ACAGCCTACC ACTTTGGCTT TTCATGAATC ACTTGCATCT 120 180 240 300 360 420 480 540 600 618 ATCAGAGCTT GGAGGGGGAT CTACTTCCAG CAACCCCAGG TGTAATAAAA CTGCCTTCTA
GACTTCCC
74 INFORMATION FOR SEQ ID NO:18: SEQUENCE CHARACTERISTICS: LENGTH: 478 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18:
CTGTGCTCCA
AAATGCAACC
CTTTTCTTCA
ATTCCCGAAA
GTGTTTTGAG
GGCCTGCCTG
ATGTTGTCAG
GCACAGGTCA
CACAAAATTT
TTGCAGAAAG
GGAAATGACT
TCTCCAGGAA
GGATGCATAG
AGTGGCACTA
TCCAGCTCTG TAGACCAGCG GGCTAAGTTT AAAAACAAGA AGACATCGGG AAGATTCTGA GCAGCTTGTA CCCCCCGGAG GAAATTAATG AGCAGGGACA GGCCTCAACT GCCAAGGTTT CAGTTTTGAT GGGCAAGCTC CAGAGAAGTT GCTTGCTCCC ATAATAATGA TCTGCACTTC TGTGGAAATG GTGGAAGATG AAGGATCATT AACCTCACTA TGAGGGTACG TAAACGCTGT TGGAAATGGA GAAAGCAGTC CTCTTCCTTT ACTAACCCAC a.
a AATAGCATCA GCTTAAAGAC AATTTTTGAT TGGGAGAAAA GGGAGAAAAT AATCTCTG INFORMATION FOR SEQ ID NO:19: SEQUENCE CHARACTERISTICS: LENGTH: 377 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: CAGTTTTCAC CAGGAGGCTC AAATCAGGCC TTTGCTTACT TGGTGTCTCT AGTTCTGGTG CCTGGTGCTT TGGTCAATGA AGTGGGGTTG GTAGGATTCT ATTACTTACC TGTTTTTTGG TTTTATTTTT TGTTTTGCAG TTCTCCGGGA GATGTTGCAT AACCACTCCT TCGTGGGCTG TGTGAATCCT CAGTGGGCCT TGGCACAGCA TCAAACCAAG TTATACCTTC TCAACACCAC CAAGCTTAGG TAAATCAGCT GAGTGTGTGA ACAAGCAGAG CTACTACAAC AATGGTCCAG GGAGCACAGG CACAAAAGCT AAGGAGAGCA GCATGAAGGT AGTTGGGAAG GGCACAGGCT TTGGAGTCAG CACATGT INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 325 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID CCCCTGGTTG AAGCGTTGGA ATCCCACTCT TTGGAAGATT GTGTTAGACT GTTAACCAGA TTCCACAGCC AGGCAGAACT ATGTCTGTCT CATCCATGTG TCAGGGATTA CGTCTCCCAT TTGTCCCAAC TGGTTGTATC TCAAGCATGA ATTCAGCTTT TCCTTAAAGT CACTTCATTT TTATTTTCAG TGAAGAACTG TTCTACCAGA TACTCATTTA TGATTTTGCC AATTTTGGTG 120 180 240 300 360 420 478 120 180 240 300 360 377 120 180 240 TTCTCAGGTT ATCGGTAAGT TTAGATCCTT TTCACTTCTG ACATTTCAAC TGACCGCCCC 300 GCAAACAGTA GCTCTCCACT AAATA 325 INFORMATION FOR SEQ ID NO:21: SEQUENCE CHARACTERISTICS: LENGTH: 341 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: CATTTATGGT TTCTCACCTG CCATTCTGAT AGTGGATTCT TGGGAATTCA GGCTTCATTT GGATGCTCCG TTAAAGCTTG CTCCTTCATG TTCTTGCTTC TTCCTAGGAG CCAGCACCGC 120 TCTTTGACCT TGCCATGCTT GCCTTAGATA GTCCAGAGAG TGGCTGGACA GAGGAAGATG 180 GTCCCAAAGA AGGACTTGCT GAATACATTG TTGAGTTTCT GAAGAAGAAG GCTGAGATGC 240 TTGCAGACTA TTTCTCTTTG GAAATTGATG AGGTGTGACA GCCATTCTTA TACTTCTGTT 300 GTATTCTCCA AATAAAATTT CCAGCCGGGT GCATTGGCTC A 341 INFORMATION FOR SEQ ID NO:22: SEQUENCE CHARACTERISTICS: LENGTH: 260 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) SEQUENCE DESCRIPTION: SEQ ID NO:22: CAGATAGGAG GCACAAGGCC TGGGAAAGGC ACTGGAGAAA TGGGATTTGT TTAAACTATG ACAGCATTAT TTCTTGTTCC CTTGTCCTTT TTCCTGCAAG CAGGAAGGGA ACCTGATTGG 120 ATTACCCCTT CTGATTGACA ACTATGTGCC CCCTTTGGAG GGACTGCCTA TCTTCATTCT 180 TCGACTAGCC ACTGAGGTCA GTGATCAAGC AGATACTAAG CATTTCGGTA CATGCATGTG 240 TGCTGGAGGG AAAGGGCAAA 260 INFORMATION FOR SEQ ID NO:23: SEQUENCE CHARACTERISTICS: LENGTH: 340 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: CTATATCTTC CCAGCAATAT TCACAGTCCG TTTACAGTTT TAACGCCTAA AGTATCACAT TTCGTTTTTT AGCTTTAAGT AGTCTGTGAT CTCCGTTTAG AATGAGAATG TTTAAATTCG 120 TACCTATTTT GAGGTATTGA ATTTCTTTGG ACCAGGTGAA TTGGGACGAA GAAAAGGAAT 180 GTTTTGAAAG CCTCAGTAAA GAATGCGCTA TGTTCTATTC CATCCGGAAG CAGTACATAT 240 CTGAGGAGTC GACCCTCTCA GGCCAGCAGG TACAGTGGTG ATGCACACTG GCACCCCAGG 300 ACTAGGACAG GACCTCATAC ATCTTAGGAG ATGAAACTTG 340 INFORMATION FOR SEQ ID NO:24: SEQUENCE CHARACTERISTICS: LENGTH: 563 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: AATCCTCTTG TGTTCAGGCC TGTGGATCCC TGAGAGGCTA GCCCACAAGA TCCACTTCAA AAGCCCTAGA TAACACCAAG TCTTTCCAGA CCCAGTGCAC ATCCCATCAG CCAGGACACC 120 AGTGTATGTT GGGATGCAAA CAGGGAGGCT TATGACATCT AATGTGTTTT CCAGAGTGAA 180 GTGCCTGGCT CCATTCCAAA CTCCTGGAAG TGGACTGTGG AACACATTGT CTATAAAGCC 240 TTGCGCTCAC ACATTCTGCC TCCTAAACAT TTCACAGAAG ATGGAAATAT CCTGCAGCTT 300 GCTAACCTGC CTGATCTATA CAAAGTCTTT GAGAGGTGTT AAATATGGTT ATTTATGCAC 360 TGTGGGATGT GTTCTTCTTT CTCTGTATTC CGATACAAAG TGTTGTATCA AAGTGTGATA 420 TACAAAGTGT ACCAACATAA GTGTTGGTAG CACTTAAGAC TTATACTTGC CTTCTGATAG 480 TATTCCTTTA TACACAGTGG ATTGATTATA AATAAATAGA TGTGTCTTAA CATAATTTCT 540 TATTTAATTT TATTATGTAT ATA 563 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 137 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID CTTGGCTCTT CTGGCGCCAA AATGTCGTTC GTGGCAGGGG TTATTCGGCG GCTGGACGAG ACAGTGGTGA ACCGCATCGC GGCGGGGGAA GTTATCCAGC GGCCAGCTAA TGCTATCAAA 120 GAGATGATTG AGAACTG 137 INFORMATION FOR SEQ ID NO:26: SEQUENCE CHARACTERISTICS: LENGTH: 91 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: TTTAGATGCA AAATCCACAA GTATTCAAGT GATTGTTAAA GAGGGAGGCC TGAAGTTGAT TCAGATCCAA GACAATGGCA CCGGGATCAG G 91 77 INFORMATION FOR SEQ ID NO:27: SEQUENCE CHARACTERISTICS: LENGTH: 99 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: AAAGAAGATC TGGATATTGT ATGTGAAAGG TTCACTACTA GTAAACTGCA GTCCTTTGAG GATTTAGCCA GTATTTCTAC CTATGGCTTT CGAGGTGAG 99 INFORMATION FOR SEQ ID NO:28: SEQUENCE CHARACTERISTICS: LENGTH: 74 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: GCTTTGGCCA GCATAAGCCA TGTGGCTCAT GTTACTATTA CAACGAAAAC AGCTGATGGA SAAGTGTGCAT ACAG 74 INFORMATION FOR SEQ ID NO:29: SEQUENCE CHARACTERISTICS: LENGTH: 73 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: AGCAAGTTAC TCAGATGGAA AACTGAAAGC CCCTCCTAAA CCATGTGCTG GCAATCAAGG GACCCAGATC ACG 73 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 92 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID GTGGAGGACC TTTTTTACAA CATAGCCACG AGGAGAAAAG CTTTAAAAAA TCCAAGTGAA GAATATGGGA AAATTTTGGA AGTTGTTGGC AG 92 78 INFORMATION FOR SEQ ID NO:31: SEQUENCE CHARACTERISTICS: LENGTH: 43 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: GTATTCAGTA CACAATGCAG GCATTAGTTT CTCAGTTAAA AAA 43 INFORMATION FOR SEQ ID NO:32: SEQUENCE CHARACTERISTICS: LENGTH: 89 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: CAAGGAGAGA CAGTAGCTGA TGTTAGGACA CTACCCAATG CCTCAACCGT GGACAATATT CGCTCCATCT TTGGAAATGC TGTTAGTCG 89 INFORMATION FOR SEQ ID NO:33: SEQUENCE CHARACTERISTICS: LENGTH: 113 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: AGAACTGATA GAAATTGGAT GTGAGGATAA AACCCTAGCC TTCAAAATGA ATGGTTACAT ATCCAATGCA AACTACTCAG TGAAGAAGTG CATCTTCTTA CTCTTCATCA ACC 113 INFORMATION FOR SEQ ID NO:34: SEQUENCE CHARACTERISTICS: LENGTH: 94 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: ATCGTCTGGT AGAATCAACT TCCTTGAGAA AAGCCATAGA AACAGTGTAT GCAGCCTATT TGCCCAAAAA CACACACCCA TTCCTGTACC TCAG 94 79 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 154 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA.
(xi) SEQUENCE DESCRIPTION: SEQ ID TTTAGAAATC AGTCCCCAGA ATGTGGATGT TAATGTGCAC CCCACAAAGC ATGAAGTTCA CTTCCTGCAC GAGGAGAGCA TCCTGGAGCG GGTGCAGCAG CACATCGAGA GCAAGCTCCT GGGCTCCAAT TCCTCCAGGA TGTACTTCAC CCAG 000 0 0 0 INFORMATION FOR SEQ ID NO:36: SEQUENCE CHARACTERISTICS: LENGTH: 371 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: ACTTTGCTAC CAGGACTTGC TGGCCCCTCT GGGGAGATGG TTAAATCCAC ACCTCGTCTT CTACTTCTGG AAGTAGTGAT AAGGTCTATG CCCACCAGAT GATTCCCGGG AACAGAAGCT TGATGCATTT CTGCAGCCTC TGAGCAAACC CAGCCCCAGG CCATTGTCAC AGAGGATAAG ACAGATATTT CTAGTGGCAG CAAGATGAGG AGATGCTTGA ACTCCCAGCC CCTGCTGAAG TGGCTGCCAA TTGGAGGGGG ATACAACAAA GGGGACTTCA GAAATGTCAG AGAAGAGAGG AGCAACCCCA G
AACAAGTCTG
GGTTCGTACA
CCTGTCCAGT
GGCTAGGCAG
AAATCAGAGC
ACCTACTTCC
INFORMATION FOR SEQ ID NO:37: SEQUENCE CHARACTERISTICS: LENGTH: 149 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: AAAGAGACAT CGGGAAGATT CTGATGTGGA AATGGTGGAA GATGATTCCC GAAAGGAAAT GACTGCAGCT TGTACCCCCC GGAGAAGGAT CATTAACCTC ACTAGTGTTT TGAGTCTCCA GGAAGAAATT AATGAGCAGG GACATGAGG 120 180 240 300 360 371 120 149 INFORMATION FOR SEQ ID NO:38: SEQUENCE CHARACTERISTICS: LENGTH: 109 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: TTCTCCGGGA GATGTTGCAT AACCACTCCT TCGTGGGCTG TGTGAATCCT CAGTGGGCCT TGGCACAGCA TCAAACCAAG TTATACCTTC TCAACACCAC CAAGCTTAG 109 INFORMATION FOR SEQ ID NO:39: SEQUENCE CHARACTERISTICS: LENGTH: 64 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: TGAAGAACTG TTCTACCAGA TACTCATTTA TGATTTTGCC AATTTTGGTG TTCTCAGGTT ATCG 64 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 165 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID GAGCCAGCAC CGCTCTTTGA CCTTGCCATG CTTGCCTTAG ATAGTCCAGA GAGTGGCTGG ACAGAGGAAG ATGGTCCCAA AGAAGGACTT GCTGAATACA TTGTTGAGTT TCTGAAGAAG 120 AAGGCTGAGA TGCTTGCAGA CTATTTCTCT TTGGAAATTG ATGAG 165 INFORMATION FOR SEQ ID NO:41: SEQUENCE CHARACTERISTICS: LENGTH: 93 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: GAAGGGAACC TGATTGGATT ACCCCTTCTG ATTGACAACT ATGTGCCCCC TTTGGAGGGA CTGCCTATCT TCATTCTTCG ACTAGCCACT GAG 93 INFORMATION FOR SEQ ID NO:42: SEQUENCE CHARACTERISTICS: LENGTH: 114 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: GTGAATTGGG ACGAAGAAAA GGAATGTTTT GAAAGCCTCA GTAAAGAATG CGCTATGTTC TATTCCATCC GGAAGCAGTA CATATCTGAG GAGTCGACCC TCTCAGGCCA GCAG 114 INFORMATION FOR SEQ ID NO:43: SEQUENCE CHARACTERISTICS: LENGTH: 360 base pairs TYPE: nucleic acid STRANDEDNESS: single S(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: AGTGAAGTGC CTGGCTCCAT TCCAAACTCC TGGAAGTGGA CTGTGGAACA CATTGTCTAT AAAGCCTTGC GCTCACACAT TCTGCCTCCT AAACATTTCA CAGAAGATGG AAATATCCTG 120 CAGCTTGCTA ACCTGCCTGA TCTATACAAA GTCTTTGAGA GGTGTTAAAT ATGGTTATTT 180 ATGCACTGTG GGATGTGTTC TTCTTTCTCT GTATTCCGAT ACAAAGTGTT GTATCAAAGT 240 GTGATATACA AAGTGTACCA ACATAAGTGT TGGTAGCACT TAAGACTTAT ACTTGCCTTC 300 TGATAGTATT CCTTTATACA CAGTGGATTG ATTATAAATA AATAGATGTG TCTTAACATA 360 INFORMATION FOR SEQ ID NO:44: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc_feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: AGGCACTGAG GTGATTGGC 19 82 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID TCGTAGCCCT TAAGTGAGC 19 INFORMATION FOR SEQ ID NO:46: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc_feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: AATATGTACA TTAGAGTAGT TG 22
S*.
INFORMATION FOR SEQ ID NO:47: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: miscfeature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: CAGAGAAAGG TCCTGACTC 19 83 INFORMATION FOR SEQ ID NO:48: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: AGAGATTTGG AAAATGAGTA AC 22 INFORMATION FOR SEQ ID NO:49: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: miscfeature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: ACAATGTCAT CACAGGAGG 19 S(2) INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID AACCTTTCCC TTTGGTGAGG 84 INFORMATION FOR SEQ ID NO:51: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc_feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: GATTACTCTG AGACCTAGGC INFORMATION FOR SEQ ID NO:52: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic :intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: GATTTTCTCT TTTCCCCTTG GG 22 S: INFORMATION FOR SEQ ID NO:53: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: CAAACAAAGC TTCAACAATT TAC 23 INFORMATION FOR SEQ ID NO:54: SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: miscfeature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: GGGTTTTATT TTCAAGTACT TCTATG 26 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear
FEATURE:
NAME/KEY: miscfeature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" S (xi) SEQUENCE DESCRIPTION: SEQ ID GCTCAGCAAC TGTTCAATGT ATGAGC 26 INFORMATION FOR SEQ ID NO:56: SEQUENCE CHARACTERISTICS: LENGTH: 18 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: CTAGTGTGTG TTTTTGGC 18 86 INFORMATION FOR SEQ ID NO:57: SEQUENCE CHARACTERISTICS: LENGTH: 18 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: CATAACCTTA TCTCCACC 18 e INFORMATION FOR SEQ ID NO:58: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: CTCAGCCATG AGACAATAAA TCC 23 INFORMATION FOR SEQ ID NO:59: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: miscfeature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic
DNA"
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: GGTTCCCAAA TAATGTGATG G 21 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 18 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc_feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID CAAAAGCTTC AGAATCTC 18 INFORMATION FOR SEQ ID NO:61: "0 SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: miscfeature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:61: CTGTGGGTGT TTCCTGTGAG TGG 23 INFORMATION FOR SEQ ID NO:62: SEQUENCE CHARACTERISTICS: LENGTH: 24 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: miscfeature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: CATGACTTTG TGTGAATGTA CACC 24 88 INFORMATION FOR SEQ ID NO:63: SEQUENCE CHARACTERISTICS: LENGTH: 24 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc_feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: GAGGAGAGCC TGATAGAACA TCTG 24 INFORMATION FOR SEQ ID NO:64: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic Sintron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:64: GGGCTTTTTC TCCCCCTCCC INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 18 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID AAAATCTGGG CTCTCACG 18 89 INFORMATION FOR SEQ ID NO:66: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc_feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:66: AATTATACCT CATACTAGC 19 INFORMATION FOR SEQ ID NO:67: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" S(xi) SEQUENCE DESCRIPTION: SEQ ID NO:67: GTTTTATTAC AGAATAAAGG AGG 23 INFORMATION FOR SEQ ID NO:68: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: AAGCCAAAGT TAGAAGGCA 19 INFORMATION FOR SEQ ID NO:69: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: TGCAACCCAC AAAATTTGGC INFORMATION FOR SEQ ID S. SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID CTTTCTCCAT TTCCAAAACC
S
INFORMATION FOR SEQ ID NO:71: SEQUENCE CHARACTERISTICS: LENGTH: 18 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:71: TGGTGTCTCT AGTTCTGG 18 91 INFORMATION FOR SEQ ID NO:72: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: CATTGTTGTA GTAGCTCTGC INFORMATION FOR SEQ ID NO:73: SEQUENCE CHARACTERISTICS: LENGTH: 18 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: miscfeature 9 LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: CCCATTTGTC CCAACTGG 18 INFORMATION FOR SEQ ID NO:74: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:74: CGGTCAGTTG AAATGTCAG 19 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID CATTTGGATG CTCCGTTAAA GC 22 INFORMATION FOR SEQ ID NO:76: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:76: CACCCGGCTG GAAATTTTAT TTG 23 °o INFORMATION FOR SEQ ID NO:77: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:77: GGAAAGGCAC TGGAGAAATG GG 22 INFORMATION FOR SEQ ID NO:78: SEQUENCE CHARACTERISTICS: LENGTH: 25 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:78: CCCTCCAGCA CACATGCATG TACCG INFORMATION FOR SEQ ID NO:79: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:79: TAAGTAGTCT GTGATCTCCG INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 18 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID ATGTATGAGG TCCTGTCC 18 94 INFORMATION FOR SEQ ID NO:81: SEQUENCE CHARACTERISTICS: LENGTH: 18 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:81: GACACCAGTG TATGTTGG 18 INFORMATION FOR SEQ ID NO:82: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:82: GAGAAAGAAG AACACATCCC INFORMATION FOR SEQ ID NO:83: SEQUENCE CHARACTERISTICS: LENGTH: 38 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: TGTAAAACGA CGGCCAGTCA CTGAGGTGAT TGGCTGAA 38 INFORMATION FOR SEQ ID NO:84: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:84: TAGCCCTTAA GTGAGCCCG 19 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 38 base pairs B) TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID TGTAAAACGA CGGCCAGTTA CATTAGAGTA GTTGCAGA 38 S(2) INFORMATION FOR SEQ ID NO:86: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:86: AGGTCCTGAC TCTTCCATG 19 96 INFORMATION FOR SEQ ID NO:87: SEQUENCE CHARACTERISTICS: LENGTH: 40 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc_feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:87: TGTAAAACGA CGGCCAGTTT GGAAAATGAG TAACATGATT INFORMATION FOR SEQ ID NO:88: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: miscfeature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:88: TGTCATCACA GGAGGATAT 19 o.
INFORMATION FOR SEQ ID NO:89: SEQUENCE CHARACTERISTICS: S: LENGTH: 38 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:89: TGTAAAACGA CGGCCAGTCT TTCCCTTTGG TGAGGTGA 38 97 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID TACTCTGAGA CCTAGGCCCA INFORMATION FOR SEQ ID NO:91: SEQUENCE CHARACTERISTICS: LENGTH: 40 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:91: TGTAAAACGA CGGCCAGTTC TCTTTTCCCC TTGGGATTAG .o INFORMATION FOR SEQ ID NO:92: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:92: ACAAAGCTTC AACAATTTAC TCT 9 98 INFORMATION FOR SEQ ID NO:93: SEQUENCE CHARACTERISTICS: LENGTH: 46 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:93: TGTAAAACGA CGGCCAGTGT TTTATTTTCA AGTACTTCTA TGAATT 46 INFORMATION FOR SEQ ID NO:94: SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:94: CAGCAACTGT TCAATGTATG AGCACT 26 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 36 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID TGTAAAACGA CGGCCAGTGT GTGTGTTTTT GGCAAC 36 INFORMATION FOR SEQ ID NO:96: SEQUENCE CHARACTERISTICS: LENGTH: 18 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:96: AACCTTATCT CCACCAGC 18 INFORMATION FOR SEQ ID NO:97: SEQUENCE CHARACTERISTICS: LENGTH: 41 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:97: TGTAAAACGA CGGCCAGTAG CCATGAGACA ATAAATCCTT G 41 INFORMATION FOR SEQ ID NO:98: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs *i TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:98: TCCCAAATAA TGTGATGGAA TG 22 100 INFORMATION FOR SEQ ID NO:99: SEQUENCE CHARACTERISTICS: LENGTH: 37 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:99: TGTAAAACGA CGGCCAGTAA GCTTCAGAAT CTCTTTT 37 INFORMATION FOR SEQ ID NO:100: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:100: TGGGTGTTTC CTGTGAGTGG ATT 23 INFORMATION FOR SEQ ID NO:101: SEQUENCE CHARACTERISTICS: S: LENGTH: 42 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:101: TGTAAAACGA CGGCCAGTAC TTTGTGTGAA TGTACACCTG TG 42 101 INFORMATION FOR SEQ ID NO:102: SEQUENCE CHARACTERISTICS: LENGTH: 24 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:102: GAGAGCCTGA TAGAACATCT GTTG 24 INFORMATION FOR SEQ ID NO:103: SEQUENCE CHARACTERISTICS: LENGTH: 39 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:103: TGTAAAACGA CGGCCAGTCT TTTTCTCCCC CTCCCACTA 39 INFORMATION FOR SEQ ID NO:104: SEQUENCE CHARACTERISTICS: LENGTH: 17 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:104: TCTGGGCTCT CACGTCT 17 102 INFORMATION FOR SEQ ID NO:105: SEQUENCE CHARACTERISTICS: LENGTH: 18 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: miscfeature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:105: CTTATTCTGA GTCTCTCC 18 INFORMATION FOR SEQ ID NO:106: SEQUENCE CHARACTERISTICS: LENGTH: 35 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:106: TGTAAAACGA CGGCCAGTGT TTGCTCAGAG GCTGC INFORMATION FOR SEQ ID NO:107: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:107: GATGGTTCGT ACAGATTCCC G 21 103 INFORMATION FOR SEQ ID NO:108: SEQUENCE CHARACTERISTICS: LENGTH: 41 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: miscfeature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:108: TGTAAAACGA CGGCCAGTTT ATTACAGAAT AAAGGAGGTA G 41 INFORMATION FOR SEQ ID NO:109: SEQUENCE CHARACTERISTICS: LENGTH: 39 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:109: TGTAAAACGA CGGCCAGTAA CCCACAAAAT TTGGCTAAG 39 INFORMATION FOR SEQ ID NO:110: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:110: TCTCCATTTC CAAAACCTTG 104 INFORMATION FOR SEQ ID NO:111: SEQUENCE CHARACTERISTICS: LENGTH: 18 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:111: TGTCTCTAGT TCTGGTGC 18 INFORMATION FOR SEQ ID NO:112: SEQUENCE CHARACTERISTICS: LENGTH: 38 base pairs TYPE: nucleic acid STRANDEDNESS: single V TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:112: TGTAAAACGA CGGCCAGTTG TTGTAGTAGC TCTGCTTG 38 INFORMATION FOR SEQ ID NO:113: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:113: ATTTGTCCCA ACTGGTTGTA 105 INFORMATION FOR SEQ ID NO:114: SEQUENCE CHARACTERISTICS: LENGTH: 39 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:114: TGTAAAACGA CGGCCAGTTC AGTTGAAATG TCAGAAGTG 39 INFORMATION FOR SEQ ID NO:115: SEQUENCE CHARACTERISTICS: LENGTH: 18 base pairs s*o TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 o OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:115: TGTAAAACGA CGGCCAGT 18 INFORMATION FOR SEQ ID NO:116: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:116: CCGGCTGGAA ATTTTATTTG GAG 23 106 INFORMATION FOR SEQ ID NO:117: SEQUENCE CHARACTERISTICS: LENGTH: 41 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: miscfeature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:117: TGTAAAACGA CGGCCAGTAG GCACTGGAGA AATGGGATTT G 41 INFORMATION FOR SEQ ID NO:118: o SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: miscfeature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:118: TCCAGCACAC ATGCATGTAC CGAAAT 26 INFORMATION FOR SEQ ID NO:119: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: miscfeature LOCATION: 1 OTHER INFORMATION: /note= "primer directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:119: GTAGTCTGTG ATCTCCGTTT 107 INFORMATION FOR SEQ ID NO:120: SEQUENCE CHARACTERISTICS: LENGTH: 36 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:120: TGTAAAACGA CGGCCAGTTA TGAGGTCCTG TCCTAG 36 INFORMATION FOR SEQ ID NO:121: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:121: ACCAGTGTAT GTTGGGATG 19 INFORMATION FOR SEQ ID NO:122: SEQUENCE CHARACTERISTICS: LENGTH: 39 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: misc feature LOCATION: 1 OTHER INFORMATION: /note= "primers directed to genomic intron DNA" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:122: TGTAAAACGA CGGCCAGTGA AAGAAGAACA CATCCCACA 39 108 INFORMATION FOR SEQ ID NO:123: SEQUENCE CHARACTERISTICS: LENGTH: 770 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:123: Met Ser Leu Arg Ile Lys Ala Leu Asp Ala Ser Val Val Asn Lys Ile 1 5 10 Ala Ala Gly Glu Ile Ile Ile Ser Pro Val Asn Ala Leu Lys Glu Met 25 Met Glu Asn Ser Ile Asp Ala Asn Ala Thr Met Ile Asp Ile Leu Val 35 40 Lys Glu Gly Gly Ile Lys Val Leu Gin Ile Thr Asp Asn Gly Ser Gly 50 55 Ile Asn Lys Ala Asp Leu Pro Ile Leu Cys Glu Arg Phe Thr Thr Ser 65 70 75 Lys Leu Gin Lys Phe Glu Asp Leu Ser Gin Ile Gin Thr Tyr Gly Phe 90 Arg Gly Glu Ala Leu Ala Ser Ile Ser His Val Ala Arg Val Thr Val 00 105 110 Thr Thr Lys Val Lys Glu Asp Arg Cys Ala Trp Arg Val Ser Tyr Ala 115 120 125 "Glu Gly Lys Met Leu Glu Ser Pro Lys Pro Val Ala Gly Lys Asp Gly 130 135 140 Thr Thr Ile Leu Val Glu Asp Leu Phe Phe Asn Ile Pro Ser Arg Leu 145 150 155 160 Arg Ala Leu Arg Ser His Asn Asp Glu Tyr Ser Lys Ile Leu Asp Val 165 170 175 Val Gly Arg Tyr Ala Ile His Ser Lys Asp Ile Gly Phe Ser Cys Lys 180 185 190 Lys Phe Gly Asp Ser Asn Tyr Ser Leu Ser Val Lys Pro Ser Tyr Thr 195 200 205 Val Gin Asp Arg Ile Arg Thr Val Phe Asn Lys Ser Val Ala Ser Asn 210 215 220 Leu Ile Thr Phe His Ile Ser Lys Val Glu Asp Leu Asn Leu Glu Ser 225 230 235 240 Val Asp Gly Lys Val Cys Asn Leu Asn Phe Ile Ser Lys Lys Ser Ile 245 250 255 Ser Leu Ile Phe Phe Ile Asn Asn Arg Leu Val Thr Cys Asp Leu Leu 260 265 270 Arg Arg Ala Leu Asn Ser Val Tyr Ser Asn Tyr Leu Pro Lys Gly Phe 275 280 285 109 Arg Pro Phe Ile Tyr Leu Gly Ilie Val Ile Asp Pro Ala Ala Val Asp 290 295 300 Val Asn Val His Pro Thr Lys Arg Giu Val Arg Phe Leu Ser Gin Asp 305 310 315 320 Giu Ile Ile Giu Lys Ile Ala Asn Gin Leu His Ala Giu Leu Ser Ala 325 330 335 Ile Asp Thr Ser Arg Thr Phe Lys Ala Ser Ser Ile Ser Thr Asn Lys 340 345 350 Pro Giu Ser Leu Ile Pro Phe Asn Asp Thr Ile Giu Ser Asp Arg Asn 355 360 365 Arg Lys Ser Leu Arg Gin Ala Gin Val Val Glu Asn Ser Tyr Thr Thr 370 375 380 Ala Asn Ser Gin Leu Arg Lys Ala Lys Arg Gin Giu Asn Lys Leu Val 385 390 395 400 Arg Ile Asp Ala Ser Gin Ala Lys Ile Thr Ser Phe Leu Ser Ser Ser .:405 410 415 Gin Gin Phe Asn Phe Giu Giy Ser Ser Thr Lys Arg Gin Leu Ser Giu *420 425 430 *.Pro Lys Val Thr Asn Vai Ser His Ser Gin Giu Ala Giu Lys Leu Thr 435 440 445 *Leu Asn Giu Ser Giu Gin Pro Arg Asp Ala Asn Thr Ile Asn Asp Asn *450 455 460 Asp Leu Lys Asp Gin Pro Lys Lys Lys Gin Lys Gin Leu Gly Asp Tyr :465 470 475 480 *Lys Val Pro Ser Ile Ala Asp Asp Giu Lys Asn Ala Leu Pro Ile Ser 485 490 495 Lys Asp Gly Tyr Ile Arg Val Pro Lys Giu Arg Val Asn Val Asn Leu ::500 505 510 Thr Ser Ile Lys Lys Leu Arg Glu Lys Val Asp Asp Ser Ile His Arg 515 520 525 Giu Leu Thr Asp Ile Phe Ala Asn Leu Asn Tyr Val Gly Vai Val Asp 530 535 540 Giu Giu Arg Arg Leu Ala Aia Ile Gin His Asp Leu Lys Leu Phe Leu 545 550 555 560 Ile Asp Tyr Gly Ser Val Cys Tyr Giu Leu Phe Tyr Gin Ile Gly Leu 565 570 575 Thr Asp Phe Ala Asn Phe Gly Lys Ile Asn Leu Gin Ser Thr Asn Val 580 585 590 Ser Asp Asp Ile Val Leu Tyr Asn Leu Leu Ser Giu Phe Asp Giu Leu 595 600 605 Asn Asp Asp Ala Ser Lys Giu Lys Ilie Ile Ser Lys Ile Trp Asp Met 610 615 620 Ser Ser Met Leu Asn Giu Tyr Tyr Ser Ile Giu Leu Val Asn Asp Gly 625 630 635 640 Leu Asp Asn Asp Leu 645 Ile Lys Ser Val Lys Leu 650 Lys Ser Leu Pro Leu Leu 655 Leu Lys Gly Arg Leu Gly 675 Glv Ile Leu Tyr 660 Pro Ser Leu Val 665 Glu Lys Leu Pro Phe Lys Glu Val Asp Trp 680 Leu Asp Glu Gin Glu 685 Asp Phe Ile Tyr 670 Cys Leu Asp Met Val Pro Arg Glu Ile 690 Lys Val 705 Ala 695 Ala Leu Tyr Ile Asp Thr Leu Asp 710 Glu Ser Leu Ser Glu 715 Leu Glu Lys Ala Gin 720 Leu Phe Ile Asn Arg Lys 725 His Ile Ser Ser 730 Ala Leu Glu His Val 735 Phe Pro Cys Asp Val Val 755 Arg Cys 770 Ile 740 Lys Arg Arg Phe Leu 745 Pro Pro Arg His Ile Leu Lys 750 Val Phe Glu Glu Ile Ala Asn Leu 760 Asp Leu Tyr Lys 765 INFORMATION FOR SEQ ID NO:124: SEQUENCE CHARACTERISTICS: LENGTH: 64 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:124: Val Asn Arg Ile Ala Ala Gly Glu Val Ile Gin 1 5 10 Arg Pro Ala Asn Ala Ile Lys Glu Met Ile Glu Asn Cys Leu Asp Ala Lys Phe 25 Gin Val Ile Val Lys Glu Gly Gly Leu Lys Leu Ile Gin 40 Thr Ser Ile Ile Gin Asp Cys Glu Arg Asn Gly Thr Gly Ile Arg Lys Glu Asp Leu Asp 55 INFORMATION FOR SEQ ID NO:125: SEQUENCE CHARACTERISTICS: LENGTH: 64 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein Val Ile (xi) SEQUENCE DESCRIPTION: SEQ ID NO:125: Val Asn Arg Ile Ala Ala Gly Glu Val Ile 1 5 10 Ile Lys Glu Met Ile Glu Asn Cys Leu Asp 25 Gin Val Ile Val Lys Glu Gly Gly Leu Lys Gin Arg Pro Ala Asn Ala Ala Lys Ser Leu Ile 40 Asn Gly Thr Gly Ile Arg Lys Glu Gin Val Thr Ser Ile Ile Gin Asp Cys Glu Arg Asp Leu Asp INFORMATION FOR SEQ ID NO:126: SEQUENCE CHARACTERISTICS: LENGTH: 52 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:126: Pro Ala Asn Ala Ile Lys Glu Met Ile Glu 1 5 10 Ser Thr Asn Ile Gin Val Val Val Lvs Glu Asn Cys Leu Asp Gly Gly Leu Gin Ile Gin Asp Asn Gly Thr Gly 40 Val Cys Glu Arg 25 Ile Lys Leu Ala Lys Leu Ile Asp Ile Arg Lys Glu Asp INFORMATION FOR SEQ ID NO:127: SEQUENCE CHARACTERISTICS: LENGTH: 64 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:127: Val Asn Lys Ile Ala Ala Gly Glu Ile Ile Ile 1 5 10 Ser Pro Val Asn Ala Leu Lys Glu Met Met Glu Asn Ser Ile Asp Ala Asn 25 Asp Ile Leu Val Lys Glu Gly Gly Ile Lys Val Leu 40 Asn Gly Ser Gly Ile Asn Lys Ala Asp Leu Pro Ile 55 Ala Thr Met Ile Gin Ile Thr Asp Leu Cys Glu Arg INFORMATION FOR SEQ ID NO:128: SEQUENCE CHARACTERISTICS: LENGTH: 64 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:128: Val His Arg Ile Thr Ser Gly Gin Val Ile 1 5 10 Thr Asp Leu Thr Thr Ala Val Lys Glu Leu Val Asp Asn Ser Glu Ile Ile Phe Lys Asp Tyr Gly 40 Asn Gly Asp Gly Ile Asp Pro Ser 55 Ile 25 Leu Asp Ala Asn Ala Glu Ser Ile a Glu Leu Asn Gin Ile Cys Ser Asp Ala Leu Lys Asn Tyr Glu Phe INFORMATION FOR SEQ ID NO:129: SEQUENCE CHARACTERISTICS: LENGTH: 64 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:129: Ala Asn Gin Ile Ala Ala Gly Glu Val Val Glu Arg Pro 1 Val Lys Glu Leu Val Glu Asn Ser Asp Ala Gly Ala Ala Ser Val Thr Arg Ile Ile Arg Asp Leu Ala Arg Asp Ile Asp Ile Glu Arg Gly Gly 40 Asn Gly Cys Gly Ile Lys Lys Asp Lys Leu Ile Arg Ala Glu Leu Ala INFORMATION FOR SEQ ID NO:130: SEQUENCE CHARACTERISTICS: LENGTH: 64 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:130: Ala Asn Gin Ile Ala Ala Gly Glu Val Val Glu Arg Pro Ala Ser Val 1 5 10 Val Lys Glu Leu Val Glu Asn Ser Asp Ile Asp Ile Glu Arg Gly Gly 40 Asn Gly Cys Gly Ile Lys Lys Glu 55 Leu Asp Ala 25 Ala Lys Leu Glu Leu Ala Gly Ala Thr Arg Val Ile Arg Ile Arg Asp Leu Ala Leu Ala Arg INFORMATION FOR SEQ ID NO:131: SEQUENCE CHARACTERISTICS: LENGTH: 64 amino acids TYPE: amino acid, STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:131: Ala Asn Gin Ile Ala Ala Gly Glu Val Ile 1 5 10 Cys Lys Glu Leu Val Glu Asn Ala Ile Asp
S
Glu Arg Pro Ala Ser Val Ala Gly Ser 25 Ile Ile Glu Ile Glu Glu Ala Gly Leu Lys Lys Val 35 40 Asn Gly His Gly Ile Ala His Asp Glu Val Glu Leu 55 INFORMATION FOR SEQ ID NO:132: SEQUENCE CHARACTERISTICS: LENGTH: 2687 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (viii) POSITION IN GENOME: MAP POSITION: 7q (xi) SEQUENCE DESCRIPTION: SEQ ID NO:132: Gin Ala Ser Gin Ile Ile Thr Asp Leu Arg Arg CCATGGAGCG AGCTGAGAGC TCGAGTACAG GGAAGTCAGT CCATCAGATT TGCTCTGGGC AGGAGTTAGT AGAAAACAGT CTGGATGCTG ACTATGGAGT GGATCTTATT GAAGTTTCAG TCGAAGGCTT AACTCTGAAA CATCACACAT AGGTTGAAAC TTTTGGCTTT CGGGGGGAAG TCACCATTTC TACCTGCCAC GCATCGGCGA ATGGGAAAAT TATCCAGAAA ACCCCCTACC AGCAGTTATT TTCCACACTA CCTGTGCGCC AGTATGCCAA AATGGTCCAG GTCTTACATG TAAGTTGCAC CAATCAGCTT GGACAAGGAA
AACCTGCTAA
AGGTGGTACT
GTGCCACTAA
ACAATGGATG
CTAAGATTCA
CTCTGAGCTC
AGGTTGGAAC
CCCGCCCCAG
ATAAGGAATT
CATACTGTAT
AACGACAGCC
GGCCATCAAA CCTATTGATC GAGTCTAAGC ACTGCGGTAA TATTGATCTA AAGCTTAAGG TGGGGTAGAA GAAGAAAACT AGAGTTTGCC GACCTAACTC ACTTTGTGCA CTGAGCGATG TCGACTGATG TTTGATCACA AGGGACCACA GTCAGCGTGC TCAAAGGAAT ATTAAGAAGG CATTTCAGCA GGCATCCGTG TGTGGTATGC ACAGGTGGAA 120 180 240 300 360 420 480 540 600 660
GCCCCAGCAT
TTCCTTTTGT
CGGATGCTCT
TTGGAAGGAG
CAAAGGTCTG
TTGTTGTTCT
AAAGGCAAAT
TAGGAATGTT
TTGAAGGTAA
AGGATCAATC
TGCGAGAGGC
CAGAACCAAG
GTGCCATCTC
GACCCAGTGA
CCGTGGATTC
CGGCCAGCTC
CTGAAACTGA
GTAAATTTCG
AAAAAGAAGA
TGTCAGCCTC
TTTCTATGAG
AAGGGGAACA
CCGAAGATGA
GTCAGTTTAA
AGCATGCCAC
GGCAGAGGCT
TAGAAAATCT
CAGTCACTGA
CCCAGGACGT
CTTCCCGAGT
CTGCTCTCAA
CTGGAACTGT
CAGAACTGAC
AAAGACAGAG
AAAGGAAAAT
TCAGCTGCCC
GCATAATCTT
TTCAACAGAC
CAGACTCGTG
TAACATTTCT
TTTGCTACAA
TGATAGTGAT
CTTAATAAAA
CCCTTCATTA
CTTTTCTCTT
AAGGAGCCCT
TGACAAAGGC
CCCTACGGAC
TGAGGGGTTC
CCCAGGGGAC
CGACTCTTTT
AGTTTTGCCT
AATTCTTTCC
TCAGGTTGAT
TTCTTTAGCT
GAATTACAGG
ACTAAGAAAA
CCTGGGATTT
GGACGAGAAG
CATAGCACCT
GGAAATATTT
AAGGGCTAAA
CGATGAACTG
CAAGCAGATG
CACAAGCGAA
ATCGGCTCTG
CCTAGTGACT
TTTTACATCT
AGACAGTTTT
AATGAGGTCT
GTTGATTCAG
GAGGAAAAGC
GTCAACAAGC
ATGCATGCAG
AGGACTGGAG
CGTCACACAA
CTAGGACAGA
GTCCTGAGAT
AGAGCGGAGG
AGCATCCCAG
AGGGGCTCGC
TCAGATGTGG
CAGCCAACTA
CAGGTTTCAT
TCTTTATCAA
ACCACATGTA
AATGCGTTGA
TTTTGTTGGC
TAAATGTCAG
CGGATTTGGA
AAGAAAAAAA
CAGAGAACAA
AAAGGGGTAT
CTCAGAAAGA
TGGAGAAGGA
ACACGGGCAG
AGGAACATGT
ACTGCCATTC
ATCTCGCAAC
TTCACAATGC
CCGGCGGCCT
TAATCGACAC
TATCAATGTT
AGTTTTAAAG
TCAGCAGCCA
AAAGCCCATG
AGACGTGTCC
GCCTCACAGC
GCTGTCTTCT
GGCAGTGAGT
CTCGGGGCAC
TCACTGCAGC
GGACTCTCAG
AAACCAGGAA.
CCCAAACACA
GTTAGTAAAT
GAAAGTTGTG
TCATGAAGCA
TCCTGGAGAA
TGCAGAAATG
GGATATCTTC
GCAGCACACC
TGTTAATGAA
TGTTATCGAT
TAAAAACTGG
CCCTGGGGTC
TGTTTGGGCA GAAGCAGTTG 1CCGTGTGTGA AGAGTACGGT
CAAAGCCTCA
TTGAGCTGTT
ACGCATGGAG
TGTGACCCAG
CAGTATCCAT
ACTCCAGATA
ACCTCTTTGA
CTGCTGGATG
GTAGAAAAGC
ATTTCCAGAC
CCAAAGACTC.
AGCACTTCAG
TCCAGTCACG
GGCAGCACTT
AGCGAGTATG
GAGAAAGCGC
GATACCGGAT
AAGCGTTTTA
ACTCAGGACA.
CCCCTGGACT
CAGCA.AAGTG
AATCAAGCAG
GAAATCATTG
ATAGTGGACC
GTGCTCCAGG
GCTGTTCTGA
GAAA.ATGCTC
ACCTTCGGAC
ATGTGCCGCC
ATGATTGGGA
TGGGCCACCC
TGTCATTTCT
TTATGTTTTG
720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2687 AGTTCTGACA TTTGTCAAAA TGAGCTGTGA AAATTAATAA AAACGAATAA AGCAGTTACA AAGTTTAGGG CAAAGATTTG GAGATAAGTA AAACGATGTT ATAATAACCA AACTGAATGA TATAACTTCG AGATGCTGCA CAGACTCTCA ACTTAACTGC AGAAAGAATG GCTTTGATTT CTGATTTCCT TGCCAACTAG ATCTTCATGC TGAGCGACAG TTTGCCTCCA. GAGCCTGCCG GAAGTCGGTG TGAAGAAACT GATCACCCAC ATGGGGGAGA CCCCATGGAA GGCCACCATG AGACACATCG CCAACCTGGG CGTAGTCACT GTATGGAATA ATTGGTTTTA TCGCAGATTT TCTTCACTAA CCTTTTTTGT TTTAAAATGA AACCTGC INFORMATION FOR SEQ ID NO:133: SEQUENCE CHARACTERISTICS: LENGTH: 862 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:133: Met Glu Arg Ala Glu Ser Ser Ser Thr Glu Pro Ala Lys Ala Ile Lys 1 5 10 115 Pro Ile Asp Arg Lys Ser Vai His Gin Ile Cys Ser Gly Gin Val Val 25 Leu Ser Leu Ser Thr Ala Vai Lys Giu Leu Vai Glu Asn Ser Leu Asp 40 Ala Gly Ala Thr Asn Ile Asp Leu Lys Leu Lys Asp Tyr Gly Val Asp 55 Leu Ile Giu Val Ser Asp Asn Gly Cys Gly Val Glu Giu Giu Asn Phe 70 75 Giu Gly Leu Thr Leu Lys His His Thr Ser Lys Ile Gin Giu Phe Ala 90 Asp Leu Thr Gin Val Giu ihr Phe Gly Phe Arg Gly Giu Ala Leu Ser 100 105 110 Ser Leu Cys Ala Leu Ser Asp Val Thr Ile Ser Thr Cys His Ala Ser 404 115 120 125 Lys Val Giy Thr Arg Leu Met Phe Asp His Asn Gly Lys Ile Ile 130 135 140 Gin Lys Thr Pro Tyr Pro Arg Pro Arg Gly Thr Thr Vai Ser Val Gin *145 150 155 160 **Gin Leu Phe Ser Thr Leu Pro Val Arg His Lys Giu Phe Gin Arg Asn 165 170 175 Sle Lys Lys Giu Tyr Ala Lys Met Val Gin Val Leu His Ala Tyr Cys *180 185 190 Ile Ile Ser Aia Gly Ile Arg Val Ser Cys Thr Asn Gin Leu Gly Gin *195 200 205 Gly Lys Arg Gin Pro Val Vai Cys Ile Gly Gly Ser Pro Ser Ile Lys 210 215 220 Giu Asn Ile Gly Ser Val Phe Gly Gin Lys Gin Leu Gin Ser Leu Ile .::225 230 235 240 Pro Phe Val Gin Leu Pro Pro Ser Asp Ser Val Cys Giu Giu Tyr Gly 245 250 255 Leu Ser Cys Ser Asp Ala Leu His Asn Leu Phe Tyr Ile Ser Gly Phe 260 265 270 Ile Ser Gin Cys Thr His Gly Val Gly Arg Ser Ser Thr Asp Arg Gin 275 280 285 Phe Phe Phe Ile Asn Arg Arg Pro Cys Asp Pro Ala Lys Val Cys Arg 290 295 300 Leu Val Asn Giu Val Tyr His Met Tyr Asn Arg His Gin Tyr Pro Phe 305 310 315 320 Val Val Leu Asn Ile Ser Val Asp Ser Glu Cys Val Asp Ile Asn Val 325 330 335 Thr Pro Asp Lys Arg Gin Ile Leu Leu Gin Glu Glu Lys Leu Leu Leu 340 .345 350 Ala Val Leu Lye Thr Ser Leu Ile Gly Met-Phe Asp Ser Asp Val Asn 355 36035 365 116 Lys Leu Asn Val Ser Gin Gin Pro Leu Leu Asp Val Glu Gly Asn Leu 370 375 380 Ile Lys Met His Ala Ala Asp Leu Glu Lys Pro Met Val Glu His Gin 385 390 395 400 Asp Gin Ser Pro Ser Leu Arg Ile Gly Glu Glu Lys Lys Asp Val Ser 405 410 415 Ile Ser Arg Leu Arg Glu Ala Phe Ser Leu Arg His Thr Thr Glu Asn 420 425 430 Lys Pro His Ser Pro Lys Thr Pro Glu Pro Arg Arg Ser Pro Leu Gly 435 440 445 Gin Lys Arg Gly Met Leu Ser Ser Ser Thr Ser Gly Ala Ile Ser Asp 450 455 460 Lys Gly Val Leu Arg Ser Gin Lys Glu Ala Val Ser Ser Ser His Gly 465 470 475 480 Pro Ser Asp Pro Thr Asp Arg Ala Glu Val Glu Lys Asp Ser Gly His 485 490 495 Gly Ser Thr Ser Val Asp Ser Glu Gly Phe Ser Ile Pro Asp Thr Gly 500 505 510 Ser His Cys Ser Ser Glu Tyr Ala Ala Ser Ser Pro Gly Asp Arg Gly 515 520 525 Ser Gin Glu His Val Asp Ser Gin Glu Lys Ala Pro Glu Thr Asp Asp 530 535 540 Ser Phe Ser Asp Val Asp Cys His Ser Asn Gin Glu Asp Thr Gly Cys 545 550 555 560 Lys Phe Arg Val Leu Pro Gin Pro Ile Asn Leu Ala Thr Pro Asn Thr 565 570 575 Lys Arg Phe Lys Lys Glu Glu Ile Leu Ser Ser Ser Asp Ile Cys Gin 580 585 590 Lys Leu Val Asn Thr Gln Asp Met Ser Ala Ser Gin Val Asp Val Ala 595 600 605 Val Lys Ile Asn Lys Lys Val Val Pro Leu Asp Phe Ser Met Ser Ser 610 615 620 Leu Ala Lys Arg Ile Lys Gin Leu His His Glu Ala Gin Gin Ser Glu 625 630 635 640 Gly Glu Gin Asn Tyr Arg Lys Phe Arg Ala Lys Ile Cys Pro Gly Glu 645 650 655 Asn Gin Ala Ala Glu Asp Glu Leu Arg Lys Glu Ile Ser Lys Thr Met 660 665 670 Phe Ala Glu Met Glu Ile Ile Gly Gin Phe Asn Leu Gly Phe Ile Ile 675 680 685 Thr Lys Leu Asn Glu Asp Ile Phe Ile Val Asp Gin His Ala Thr Asp 690 695 700 Glu Lys Tyr Asn Phe Glu Met Leu Gin Gin His Thr Val Leu Gin Gly 705 715 720 117 Gin Arg Leu Ile Ala Pro Gin Thr Leu Asn Leu Thr Ala Val Asn Glu 725 730 735 Ala Val Leu Ile Glu Asn Leu Glu Ile Phe Arg Lys Asn Gly Phe Asp 740 745 750 Phe Val Ile Asp Glu Asn Ala Pro Val Thr Glu Arg Ala Lys Leu Ile 755 760 765 Ser Leu Pro Thr Ser Lys Asn Trp Thr Phe Gly Pro Gin Asp Val Asp 770 775 780 Glu Leu Ile Phe Met Leu Ser Asp Ser Pro Gly Val Met Cys Arg Pro 785 790 795 800 Ser Arg Val Lys Gln Met Phe Ala Ser Arg Ala Cys Arg Lys Ser Val 805 810 815 Met Ile Gly Thr Ala Leu Asn Thr Ser Glu Met Lys Lys Leu Ile Thr 820 825 830 His Met Gly Glu Met Gly His Pro Trp Asn Cys Pro His Gly Arg Pro 835 840 845 Thr Met Arg His Ile Ala Asn Leu Gly Val Ile Ser Gin Asn S850 855 860 INFORMATION FOR SEQ ID NO:134: SEQUENCE CHARACTERISTICS: LENGTH: 903 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:134: Met Phe His His Ile Glu Asn Leu Leu Ile Glu Thr Glu Lys Arg Cys 1 5 10 Lys Gin Lys Glu Gin Arg Tyr Ile Pro Val Lys Tyr Leu Phe Ser Met 25 Thr Gin Ile His Gin Ile Asn Asp Ile Asp Val His Arg Ile Thr Ser 40 Gly Gin Val Ile Thr Asp Leu Thr Thr Ala Val Lys Glu Leu Val Asp 55 Asn Ser Ile Asp Ala Asn Ala Asn Gin Ile Glu Ile Ile Phe Lys Asp 70 75 Tyr Gly Leu Glu Ser Ile Glu Cys Ser Asp Asn Gly Asp Gly Ile Asp 90 Pro Ser Asn Tyr Glu Phe Leu Ala Leu Lys His Tyr Thr Ser Lys Ile 100 105 110 Ala Lys Phe Gin Asp Val Ala Lys Val Gin Thr Leu Gly Phe Arg Gly 115 120 125 Glu Ala Leu Ser Ser Leu Cys Gly Ile Ala Lys Leu Ser Val Ile Thr 130 135 140 118 Thr Thr Ser Pro Pro Lys Ala Asp Lys Leu Giu Tyr Asp Met Val Gly 145 150 155 160 His Ile Thr Ser Lys Thr Thr Ser Arg Asn Lys Gly Thr Thr Val Leu 165 170 175 Vai Ser Gin Leu Phe His Asn Leu Pro Val Arg Gin Lys Giu Phe Ser 180 185 190 Lys Thr Phe Lys Arg Gin Phe Thr Lye CyB Leu Thr Val Ile Gin Gly 195 200 205 Tyr Ala Ile Ile Aen Ala Ala Ile Lys Phe Ser Val Trp Asn Ile Thr 210 215 220 Pro Lys Giy Lys Lys Aen Lei Ile Leu Ser Thr Met Arg Asn Ser Ser 225 230 235 240 Met Arg Lys Asn Ile Ser Ser Vai Phe Gly Ala Giy Giy Met Phe Gly 245 250 255 Leu Giu Giu Val Asp Leu Val Leu Asp Leu Asn Pro Phe Lys Asn Arg 260 265 270 Met Leu Giy Lys Tyr Thr Asp Asp Pro Asp Phe Leu Asp Lou Asp Tyr .:275 280 285 Lys Ile Arg Val Lys Gly Tyr Ile Ser Gin Asn Ser Phe Gly Cys Giy 290 295 300 Arg Asn Ser Lye Asp Arg Gin Phe Ile Tyr Vai Asn Lys Arg Pro Vai *305 310 315 320 Giu Tyr Ser Thr Lou Leu Lys Cys Cys Asn Giu Vai Tyr Lys Thr Phe 325 330 335 Asn Asn Vai Gin Phe Pro Ala Vai Phe Leu Asn Leu Giu Leu Pro Met :340 345 350 Ser Leu Ile Asp Vai Asn Val Thr Pro Asp Lys Arg Val Ile Leu Leu 355 360 365 His Asn Giu Arg Ala Vai Ile Asp Ile Phe Lys Thr Thr Lou Ser Asp *370 375 380 Tyr Tyr Asn Arg Gin Giu Leu Ala Leu Pro Lys Arg Met Cys Ser Gin 385 390 395 400 Ser Giu Gin Gin Ala Gin Lys Arg Leu Lys Thr Giu Val Phe Asp Asp 405 410 415 Arg Ser Thr Thr His Giu Ser Asp Asn Giu Asn Tyr His Thr Ala Arg 420 425 430 Ser Giu Ser Asn Gin Ser Asn His Ala His Phe Asn Ser Thr Thr Giy 435 440 445 Val Ile Asp Lys Ser Asn Giy Thr Giu Leu Thr Ser Val Met Asp Gly 450 455 460 Asn Tyr Thr Asn Val Thr Asp Val Ile Giy Ser Giu Cys Giu Vai Ser 465 470 475 480 Val Asp Ser Ser Val Val Lou Asp Giu Gly Asn Ser Ser Thr pro Thr 485 490 495 S.
S
*5 Lys Asn Asp Asp 545 Asp Cys Giu Arg Arg 625 Leu Gly Giu Giu Giu 705 Val G iu Gin Leu Leu 785 Ser Giu Ile Lys Leu Lys 530 Ile Gly Cys Aia Thr 610 Ser Giu Lys Asn Lys 690 Vai Asp Lys Lys Vali 770 Lys Leu Leu Arg Leu Asn 515 Aia Asp Leu His Asp 595 Pro Leu Tyr Gin Ile 675 Tyr Val Asn Tyr Leu 755 Val Ile Pro Ile Cys 835 Pro 500 Asn Arg Gly Vai Gin 580 Ser Leu Ser Asn Met 660 Ile Leu Gly Lys Asn 740 Ile Leu Asp Thr His 820 Ser Ser Phe Ser Giu Phe 565 Giu Ile Lys Asp Leu 645 Ser Lys Thr Gin Ser 725 Phe Ile Asp Giu Ser 805 Leu Lys Ile Ser Leu Lys 550 Val Arg Tyr Asn Giy 630 Ser Ser Asn Leu Phe 710 Lys Giu Pro Asn Giu 790 Lys Ile Ile Ly s As n Giu 535 Phe Asp Akg Ala Ser 615 Leu Thr Ile Lys Thr 695 Asn Leu Thr Gin Leu 775 Giu Gin Lys Arg 119 Thr Pro 520 Lys Gin Asn Giy Giu 600 Arg Thr Lys Ile Asp 680 Vai Leu Phe Leu Pro 760 Pro Giu Thr Giu Ser 840 Asp 505 Giu Vai Giu Giu Ser 585 Ile Lys His Asn Ser 665 Giu Ser Giy Ile Gin 745 Val1 Vali Phe Leu Asp 825 Met Ser Phe Vai Lys Cys3 570 Thr Giu Ser Arg Phe 650 Lys Leu Lys Phe Vai 730 Aia G iu Phe Giy Phe 810 Gly Phe Gin Gin Giu Aia 555 His Asp Pro Ile Lys 635 Lys Arg Giu Asn Ile 715 Asp Val1 Leu Giu Ser 795 Asp Giy Ala Aen Asn Giu 540 Vai Giu Ile Vali Ser 620 Phe Giu Lys Asp Asp 700 Ile Gin Thr Ser Lys 780 Arg Leu Leu Met Leu Ser 510 Ile Thr 525 Pro Vai Leu Ser His Thr Giu Gin 590 Giu Ile 605 Lys Asp Giu Asp Ile Ser Ser Giu 670 Phe Giu 685 Phe Lys Val Thr His Aia Val Phe 750 Vai Ile 765 Asn Gly Vai Lys Giy Asp Arg Arg 830 Arg Ala 845 Asp Ser Tyr Gin Asn 575 Asp Asn Asn Giu Lys 655 Ala Gin Lys Arg Ser 735 Lys Asp Phe Leu Phe 815 Asp cy 5 Leu Pro Phe Ala 560 Asp Asp Val Tyr Ile 640 Asn Gin Gly Met Lys 720 Asp Ser Giu Lys Leu 800 Asn Asn Arg Ser Ser Ile Met Ile Gly Lys Pro Leu Asn 850 855 Vai Val His Asn Leu Ser Glu Leu Asp Lys 865 870 Gly Arg Pro Thr Met Arg His Leu Met Glu 885 890 Met Thr Arg Lys Lys Thr 860 Pro Trp Asn Cys Pro 875 Ile His 880 Ser Arg Asp Trp Ser 895 Phe Ser Lys Asp Tyr Giu Ile 900 INFORMATION FOR SEQ ID NO:135: SEQUENCE CHARACTERISTICS: LENGTH: 2577 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:135:
TTCCGGCCAA
TTCAAGTGGT
GAATCAGGAA
CTTTTGAGGA
TAAGTCATGT
GAGCAAGTTA
GCACCCTGAT
AAAATCCAAG
ATTCAGGCAT
TGCCCAATGC
AACTGATAGA
CGAATGCAAA
TAGAATCAGC
CACACACCCA
ACCCCACCAA
AGCACATTGA
TGCTTCCAGG
CTCATCCACT
CCGGGATCAG
TGCTATCAAA
TGTTAAGGAA
GGAAGATCTG
TTTAGCCAGT
GGCCCATGTC
CTCAGATGGA
CACGGTGGAA
TGAAGAGTAC
TAGTATCTCA
CACAACCGTG
AGTTGGGTGT
GTATTCAGTG
TGCCTTGAGA
TTCCTGTACC
GACAGAAGTT
GAGCAAGCTG
ACTTGCTGGG
AGTGGAAGTG
AAGCTTGACG
GAGATGATAG
GGTGGCCTGA
GATATTGTGT
ATTTCTACCT
ACTATTACAA
AAGCTGCAAG
GACCTTTTTT
GGAAAAATTT
GTTAAAAAAC
GACAACATTC
GAGGATAAAA
AAGAAGTGCA
AAAGCCATTG
TCAGTTTGAA
CATTTTCTGC
CTGGGCTCCA
CCTCTGGGGA
GCGACAAGGT
CCTTTCTGCA
AAAACTGTTT
AGCTAATTCA
GTGAGAGGTT
ATGGCTTTCG
CCAAAACAGC
CCCCTCCTAA
ACAACATAAT
TGGAAGTTGT
AAGGTGAGAC
GCTCCATCTT
CCCTAGCTTT
TTTTCCTACT
AAACTGTATA
ATCAGCCCTC
ACGAGGAGAG
ATTCCTCCAG
GGCAGCTAGA
CTACGCTTAC
GCCTGTAACC
GACAGAGGGC
CCCCGCTGAA
AGACGCAGCC
GGAGGACTCT
CTACCCCAGG
TGAGCGGTGC
GAATCCTCAG
GCTCAGTGAA
GAGGTTATCG
AGTGGCTGGA
AGATGCAAAA TCTACAAATA GATCCAAGAC AATGGCACTG CACTACGAGT AAACTGCAGA TGGTGAGCAT TTGGCAAGCA TGATGGGAAA TGTGCGTACA ACCCTGTGCA GGCAACCAGG CACAAGGAGG AAAGCTTTAA TGGCAGGTAT TCAATACACA AGTATCTGAT GTCAGAACAC TGGAAATGCG GTTAGTCGAG CAAAATGAAT GGCTATATAT CTTCATCAAC CACCGTCTGG TGCAGCATAC TTGCCAAAAA AGAACGTGAC GTCAATGTAC CATTCTGCAG CGTGTGCAGC GATGTATTTC ACCCAGACCT CCCACGACAG GGGTGGCTTC CAGATGTCGC GTACGGACTC AGCCTTGTGC CCAGCCAGCC TCTCCTGAAA GGGCCACGCG GCAGCTGCTG AGAGTGAGAA CAGAAAGCGG CACCCACTTC GATGTGGAAA TGGTGGAAAA AGGAGGATCA TTAACCTCAC CATGAGACTC TCCGGGAGAT TGGGCCTTGG CACAGCACCA GAGCTGTTCT ACCAGATACT GAACCAGCGC CACTCTTCGA CAGAGGACGA CGGCCCGAAG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 CCAGGACCCT CCCCCTGTCC GAGGGGCCAG GGAGGATGAG GAGATGCTTG CTCTCCCAGC CTTGGAGAGG GAATCACTAA TGGAGACTTC CAGTCCAGGA AGCTCCAGAA AGAGTCATCG TGCTTCCGGG AAGGAAATGA CAGCTGCTTG CAGCGTCTTG AGTCTCCAGG AAGAGATTAG, ACTCCGTAAC CATTCCTTTG TGGGCTGTGT GACCAAGCTA TACCTCCTCA ACACTACCAA CATTTATGAT TTTGCCAACT TTGGTGTTCT CCTGGCCATG CTGGCTTAGA CAGTCCTGAA
AAGGGCTTGC
CTCTGTGAGA
CACCTTTGGA
AGAAAAGGAG
GCAGTATATA
GTCAAAGCCC
CCTACCTCCG
TCTATACAAA
CCATCCAAGG
AGCTCCAGGG
TTAATGTACT
TTCCTGCAGC
TTTGTTCCCT
CTGTGTGAAA
AGAGTACATT
TCGATGAGAA
GGGACTGCCT
TGTTTTGAAA
CTGGAGGAGT
TGGAAGTGGA
AAGCATTTCA
GTCTTTGAGC
CGAAGTGTAT
TTTTCAGTGC
TCACCTGTGG
CCGGGGGATC
TTAGTGAGGG
TTGTTATCCG
GTCGAGTTTC
GGGAACCTGA
ATCTTCATTC
GTCTCAGTAA
CGACCCTCTC
CTGTGGAGCA
CAGAAGATGG
GGTGTTAAAT
GGTACTAATC
TCACTATTCT
ATTGGCTGCA
CACTAGTTCT
TTAATTTCGA
CTCACAATTC
TGAAGAGAAG, CGAGATGCTT GCAGACTATT TTGATTACTC TTCTGATGAC AGCTATGTGC TTCGACTGGC CACTGAGGTG AATTGGGTGA AGAATGTGCT ATGTTTTACT CCATTCGGAA AGGCCAGCAG AGTGACATGC CTGGCTCCAC CATTATCTAT AAAGCCTTCC GCTCACACCT CA.ATGTCCTG CAGCTTGCCA ACCTGCCAGA ACAATCATAG CCACCGTAGA GACTGCATGA TGGAAGCCAC AGAATAGGAC ACTTGGTTTC TGTTCTGTAT CCCAGTATTG GTGCTGCA.AC AATAAACTCA CGTGTATTGG AAAAAAGGAA AGAGCGGCCG CCACCGGTGG AGCTCCAGCT GCTTGGCGTA ATCATGGTCA TAGCTGTTTC CACACAACAT ACGAGCCGGA AGCATAA 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2577 0* INFORMATION FOR SEQ ID NO:136: SEQUENCE CHARACTERISTICS: LENGTH: 728 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:136: Pro 1 Ser Gin Ala Asn Ala Ile Lys Giu Met Ile Giu 10 Giu Asn Cys Leu Asp Ala Lys Leu Ile Thr Asn Ile 20 Ile Gin Asp Val Val Val Lys Ile Giy Giy Leu Asn Gly Thr Giy Arg Lys Giu 35 Glu 40 Ser Asp Phe Leu Asp Ile Giu Asp Leu Val Cys Ala Ser Arg Phe Thr Thr 55 Gly Lys Leu Gin Thr His Ile Ser Thr Ser Tyr 70 Val Phe Arg Gly Leu Ala Ser Ile His Vai Ala Thr Ile Thr Thr Lys Thr Ala Asp Gly Lys Cys Ala Tyr Lys Pro Cys 115 Phe Tyr Asn Arg 100 Ala Ser Tyr Ser Asp 105 Thr Giy Lys Leu Gin Gly Asn Gin Gly 120 Arg Leu Ile Thr Ala Pro Pro 110 Giu Asp Leu Pro Ser Giu Ile Ile Thr 130 Giu Tyr 145 Ser Gly Arg 135 Giu Lys Ala Leu Lys 140 Tyr Gly Lys Ile Ile Ser Ile 165 Leu 150 Ser Val Val Gly Ser Ile His Asn 160 Asp Val Lys Lys Giu Thr Val Ser 175 122 Val Arg Thr Leu Pro Asn Ala Thr Thr Val Asp Asn Ile Arg Ser Ile 180 185 190 Phe Gly Asn Ala Val Ser Arg Glu Leu Ile Giu Val Gly Cys Glu Asp 195 200 205 Lys Thr Leu Ala Phe Lys Met Asn Gly Tyr Ile Ser Asn Ala Lys Tyr 210 215 220 Ser Val Lys Lys Cys Ile Phe Leu Leu Phe Ile Asn His Arg Leu Vai 225 230 235 240 Giu Ser Ala Ala Leu Arg Lys Ala Ilie Giu Thr Val Tyr Ala Ala Tyr 245 250 255 Leu Pro Lys Thr His Thr Hid Ser Cys Thr Ser Val Gix Asn Gin Pro 260 265 270 Ser Giu Arg Asp Val Asn Val His Pro Thr Lys Thr Giu Val His Phe 275 280 285 Leu His Glu Giu Ser Ile Leu Gin Arg Val Gin Gin His Ile Giu Ser 290 295 300 ***Lys Leu Leu Giy Ser Asn Ser Ser Arg Met Vai Phe His Pro Asp Leu *.:305 310 315 320 Ala Ser Arg Thr Cys Trp, Ala Ser Giy Glu Ala Ala Arg Pro Thr Thr 325 330 335 Gly Val Ala Ser Ser Ser Thr Ser Gly Ser Gly Asp Lys Val Tyr Ala 340 345 350 Tyr Gin Met Ser Arg Thr Asp Ser Arg Asp Gin Lys Leu Asp Ala Phe 355 360 365 Leu Gin Pro Val Ser Ser Leu Val Pro Ser Gin Pro Gin Asp Pro Arg *.:370 375 380 Pro Val Arg Gly Ala Arg Thr Giu Gly Ser Pro Glu Arg Ala Thr Arg 385 390 395 400 Glu Asp Giu Giu Met Leu Ala Leu Pro Ala Pro Ala Glu Ala Ala Ala *.*405 410 415 *.Glu Ser Giu Asn Leu Giu Arg Giu Ser Leu Met Glu Thr Ser Asp Ala 420 425 430 Ala Gin Lys Ala Ala Pro Thr Ser Ser Pro Gly Ser Ser Arg Lys Ser 435 440 445 His Arg Glu Asp Ser Asp Val Giu Met Val Giu Asn Ala Ser Gly Lys 450 455 460 Giu Met Thr Ala Ala Cys Tyr Pro Arg Arg Arg Ile Ile Asn Leu Thr 465 470 475 480 Ser Val Leu Ser Leu Gin Giu Giu Ile Ser Giu Arg Cys His Giu Thr 485 490 495 Leu Arg Giu Ile Leu Arg Asn His Ser Phe Val Gly Cys Val Asn Pro 500 505 510 Gin Trp Ala Leu Ala Gin His Gin Thr Lys Leu Tyr Leu Leu Asn Thr 515 520 525 123 Thr Lys 530 Ala Asn Leu Ser Giu Glu Leu 535 Arg Phe Tyr Gin Ile Leu Ser Giu Pro 555 Leu 540 Ala Ile Tyr Asp Phe Phe Gly Val Leu 550 G ix Pro Leu Phe Asp 560 Leu Ala Met Leu Ala 565 Arg Thr Val Leu Val Ala Giy Gin Arg Thr 575 Thr Ala Arg Lys Arg Asp 595 Pro Asp Gix Arg 580 Ala Ala Cys Arg Val1 585 Ser Cys Arg Val Cys Arg Leu Phe 600 Glx Val Arg Ser Ser Giu Giu 590 Arg Arg Glu Phe Giy Gly Leu Leu Phe 610 Thr Ala G lx 615 Ser Gin Leu Cys Ala 620 Giy Tyr Leu His Thr Gly His Gix Giu Leu Giy 625 Glu 635 G iu Lys Giu Cys Phe 645 Gin Ser Leu Ser Cys Aia Met Phe Tyr 655 Ser Ile Arg Gin Ser Asp 675 Giu His Ile Lys 660 Tyr Ile Leu Giu 665 Ser Ser Thr Leu Ser Gly Gin 670 Trp Thr Vai Met Pro Giy Ser Thr 680 Phe Lys Pro Trp Lys 685 Leu Ile Tyr Lys 690 His Phe 705 Ala 695 Arg Ser His Leu 700 Ala Pro Pro Lys Thr Glu Asp Gly Asn Val 710 Giu Arg Cys Leu Gin Leu 715 Asn Leu Pro Leu Tyr Lys Val Phe 725
C
INFORMATION FOR SEQ ID NO:137: SEQUENCE CHARACTERISTICS: LENGTH: 3065 base pairs TYPE:'nucleic acid STRANDEDNESS: single TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:137: CGGTGAAGGT CCTGAAGAAT GTCGTCAGGT AACGATGGTG TCCCGAGAGC GGCACCGCAA TGGAGCAAAC CGAAGGCGTG AGTCAGTCCA TCAAATTTGT AGTTGATAGA AAATAGTGTA ATGGGGTGGA CCTCATTGAA AAGGTCTAGC TCTGAAACAT TTGAAACTTT CGGCTTTCGG CTATATCTAC CTGCCACGGG TTCCAGATTC CTGAGTATCA TTGGAGGAGA CAGATAACCT TATATGCAAC AGAAATGGGT CTCTCCCGCG GTGACTGTGA AGTACAGAAT GTGCTAAGGC TCTGGGCAGG TGATACTCAG GATGCTGGTG CTACTACTAT GTTTCAGACA ATGGATGTGG CACACATCTA AGATTCAAGA GGGGAAGCTC TGAGCTCTCT TCTGCAAGCG TTGGGACTCG
GTTCCTGGAG
CTGGAGGAGT
CATCAAGCCT
TTTAAGCACC
TGATCTAAGG
GGTAGAAGAA
GTTTGCCGAC
GTGTGCACTA
ACTGGTGTTT
ACGCGTCTTT
CCTGCATCCA
ATTGATGGGA
GCTGTGAAGG
CTTAAAGACT
GAAAACTTTG
CTCACGCAGG
AGTGATGTCA
GACCATAATG
120 180 240 300 360 420 480 540 600 660 GGAAAATCAC CCAGAAAACT CCCTACCCCC GACCTAAAGG AACCACAGTC AGTGTGCAGC ACTTATTTTA TACACTACCC GTGCGTTACA AAGAGTTTCA ATTCCAAAAT GGTGCAGGTC TTACAGGCGT ACTGTATCAT GCTGCACTAA TCAGCTCGGA CAGGGGAAGC GGCACGCTGT CTGGCATGAA GGAAAATATC GGGTCTGTGT TTGGCCAGAA CTTTTGTTCA GCTGCCCCCT AGTGACGCTG TGTGTGAAGA GACGCCACAA AACCTTTTCT ACGTTTTCGG GCTTCATTTC GGAGGAGTGC AACAGACAGG CAGTTTTTCT TCATCAATCA AGGTCTCTAA GCTTGTCAAT GAGGTTTATC ACATGTATAA TCGTCCTTAA CGTTTCCGTT GACTCAGAAT GTGTGGATAT GGCAAATTCT ACTACAAGAA GAGAAGCTAT TGCTGGCCGT GAATGTTTGA CAGTGATGCA AACAAGCTTA ATGTCAACCA AAGGTAACTT AGTAAAGTCG CATACTGCAG AACTAGAAAA ATAACTCTCC TTCACTGAAG AGCACAGCAG ACGAGAAAAG TGAGAGAGGC CTTTTCTCTT CATCCTACTA AAGAGATCAA CTGAACTGAC ACGGAGTTTT CCAAGTGAGA AAAGGGGCGT ACGTCATCTC TTACAGAGGC CTCCGTGGCT CGCAGGACAA GCCCTGGTGA CTGTATGGAC AGAGAGAAAA TAGAAAAAGA CAGCTGGCTC TGAGGAAGAG TTCAGCACCC CAGAAGTGGC ATAACGTGAG CTCCCTAGAA GACAGACCTT CTCAGGAAAC TGCCGTCCTC CAGGTACAGG ACAGTCCTTG AAGCCAGAAG GCTCTACCTC TAGCTCGTCT GTCACCCACA AATGCCAAGC CCTCAAATGT CAACATATCT CAAAGATTGC CTGGTCCTCA TCGATGTAGC CATAAAAATG AATAAGAGAT CGTGCTCCTC TGAAGCAGTT ACAGCACCTA AAGGCGCAGA ACAAACATGA GGGCCAAGAT TTGCCCTGGA GAAAACCAAG CAGCAGAAGA GAGGAACATT AAAAAGGAGT CTCAGCAGGC GTCCGTGTAA GGTGTGCACA AGCGGCACGT GCAGTTGCAA AGCCTCATTC GTACGGCCTG AGCACTTCAG ACAGTGCACG CACGGCGCCG GAGGCCCTGT GACCCAGCAA CCGGCATCAG TACCCATTTG TAATGTAACT CCAGATAAAA TTTAAAGACC TCCTTGATAG GCAGCCACTG CTAGATGTTG GCCTGTGCCA GGAAAGCAAG GGTAGCATCC ATCTCCAGGC GTCTAGGGGT CCAGAGACTG GTTATCCTCT TATCCTTCAG ATTGGTGAGT CCCACGGACA CTCAGGGCTC AGCAGCACCT CAGTAGCTTT AGCAGTGACT CATAAACTGT GGTGACCTGC ACCATGGATA TCAATGCAAA GCTTCAAGAC AGAGGAAGAC GAGCACCTCA GCAGCTGAGG GAGTTCTCTA GCTAAGCGAA ACTGAGTTAC AGAAAATTTA TGAACTCAGA AAAGAGATTA TAACCTGGGA TTTATAGTAA 720 780 840 900 960 1020 1080 11-40 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3065 GTAAATCGAT GTTTGCAGAG CCAAACTGAA AGAGGACCTC TTGAGATGCT GCAGCAGCAC CAGGCTTCAG AGTTCCCAGA TGATAGAAAA TCTGGAAATA CTCCAGTCAC TGAAAGGGCT GACCCCAAGA TATAGATGAA GGCCCTCACG AGTCAGACAG GAACGGCGCT CAATGCGAGC ACCCCTGGAA CTGCCCCCAC TCTCTCAGAA CTGACACACC AAGAGAAGGT TTTAAGTAAT TACTGGATCC ATTTAAAAGC CTACTTGGGT GATCCGGTGG ACATTCATGA GACTCAATTC
AAAAA
ATGGAGATCT TGGGTCAGTT TTCCTGGTGG ACCAGCATGC ACGGTGCTCC AGGCGCAGAG
CCCCAGACTC
TTCAGAAAGA
AAATTGATTT
CTGATCTTTA
ATGTTTGCTT
GAGATGAAGA
GGCAGGCCAA
CCTTGTAGCA
CTGATTATCG
AGTGTTAAGG
GAGCTCATGT
AAGGACAAAA
TGAACTTAAC
ATGGCTTTGA
CCTTACCAAC
TGTTAAGTGA
CCAGAGCCTG
AGCTCATCAC
CCATGAGGCA
TAGAGTTTAT
TTGTACAAAA
CAGGCATGAT
GAGCCCAGGA
AAAAAA.AGAT
TGCGGATGAG
GCTCATCACG
TGCTGTCAAT
CTTTGTCATT
TAGTAAAAAC
CAGCCCTGGG
TCGGAAGTCA
CCACATGGGT
CGTTGCCAAT
TACAGATTGT
ATTAGCATGC
GGAGTGTTCC
AAGTACAACT
TGGGTGCACA
GAAGCTGTAC
GATGAGGATG
TGGACCTTTG
GTCATGTGCC
GTGATGATTG
GAGATGGACC
CTGGATGTCA
TCGGTTCGCA
TGCTTTAATG
TCTAGCTCAG
CTTTGAGACC ACTCCGAGCC ATTTTTGAAG CCTTTThAAA 125 INFORMATION FOR SEQ ID NO:138: SEQUENCE CHARACTERISTICS: LENGTH: 864 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:138: Met Giu Gin Thr Giu Gly Val Ser Thr Giu Cys Ala Lys Ala Ile Lys 1 5 10 Pro Ile Asp Gly Lys Ser Vdl His Gin Ile Cys Ser Gly Gin Val. Ile 25 Leu Ser Leu Ser Thr Ala Val Lys Giu Leu Ile Glu Asn Ser Val Asp 40 Ala Gly Ala Thr Thr Ile Asp Leu Arg Leu Lys Asp Tyr Gly Val Asp 55 *Leu Ile Giu Val Ser Asp Asn Gly Cys Gly Val Giu Giu Giu Asn Phe 70 75 Glu Gly Leu Ala Leu Lys His His Thr Ser Lys Ile Gin Giu Phe Ala 90 *Asp Leu Thr Gin Val Giu Thr Phe Gly Phe Arg Gly Giu Ala Leu Ser 100 105 110 Ser Leu Cys Aia Leu Ser Asp Vai Thr Ile Ser Thr Cys His Gly Ser 115 120 125 Ala Ser Val Gly Thr Arg Leu Val Phe Asp His Asn Gly Lys Ile Thr *:130 135 140 Gin Lys Thr Pro Tyr Pro Arg Pro Lys Gly Thr Thr Val Ser Val Gin 145 150 155 160 His Leu Phe Tyr Thr Leu Pro Vai Arg Tyr Lys Giu Phe Gin Arg Asn :::165 170 175 I:.:le Lys Lys Giu Tyr Ser Lys Met Vai Gin Vai Leu Gin Ala Tyr Cys 180 185 190 Ile Ile Ser Aia Gly Vai Arg Val Ser Cys Thr Asn Gin Leu Gly Gin 195 200 205 Giy Lys Arg His Ala Vai Vai Cys Thr Ser Gly Thr Ser Gly Met Lys 210 215 220 Giu Asn Ile Gly Ser Vai Phe Gly Gin Lys Gin Leu Gin Ser Leu Ile 225 230 235 240 Pro Phe Val Gin Leu Pro Pro Ser Asp Ala Val Cys Giu Giu Tyr Gly 245 250 255 Leu Ser Thr Ser Gly Arg His Lys Thr Phe Ser Thr Phe Ser Gly Phe 260 265 270 Ile Ser Gin Cys Thr His Giy Ala Giy Arg Ser Aia Thr Asp Arg Gin 275 280 285 Phe Leu 305 Val Thr Ala Lys Val 385 Asp Ser Ile Ser Tyr 465 Ser Leu Val Arg Arg 545 Ser Asp Pro Lys Gin 625 Phe 290 Val Vai Pro Val Leu 370 Lys Asn Ile Lys Giu 450 Arg Pro Ser Aia Pro 530 Tyr Ser Arg Gin Arg 610 His Phe Asn Leu Asp Leu 355 Asn Ser Ser Ser Ser 435 Lys Gly Gly Ser Ser 515 Ser Arg Thr Giy Ser 595 Ser Leu Ile G iu Asn Lys 340 Lys Vai His Pro Arg 420 Arg Arg Leu Asp Thr 500 Ser Gin Thr Ser Arg 580 Thr Cys Lys Gin Tyr 310 Ser Gin Ser Gin Aia 390 Leu Arg Pro Val Gly 470 Met Ala Ser Thr Leu 550 Ser Ser Ala Ser Gin 630 Arg 295 His Val Ile Leu GiQ; 375 Giu Lys Giu Giu Leu 455 Ser Asp Giy Ser Ile 535 Giu Ser As n Ala Ser 615 Asn 126 Pro Met Asp Leu Ile 360 Pro Leu Ser Ala Thr 440 Ser Gin Arg Ser Asp 520 Asn Ala Val1 Val1 Glu 600 Ser Lys CY13 Tyr Ser Leu 345 Gly Leu Giu Thr Phe 425 Ala Ser Asp Giu Glu 505 Tyr Cys Arg Thr As n 585 Val1 Leu His Asp Aen Glu 330 Gin Met Leu Lys Ala 410 Ser Glu Tyr Lys Lys 490 Glu Asn Gly Arg His 570 Ile Asp Ala Glu Pro Arg 315 CyB Glu Phe Asp Pro 395 Asp Leu Leu Pro Leu 475 Ile Glu Val Asp Pro 555 Lye Ser Val Lye Leu 635 Ala 300 His Val Giu Asp Val 380 Val G iu His Thr Ser 460 Val Giu Phe Ser Leu 540 Trp Cys Gin Ala Arg 620 Ser Lye Gin Asp Lye Ser 365 Giu Pro Lye Pro Arg 445 Asp Ser Lye Ser Ser 525 Leu Ile Gin Arg Ile 605 Met Tyr Val Tyr Ile Leu 350 Asp Gly Giy Arg Thr 430 Ser Val Pro Asp Thr 510 Leu Pro Ser Ala Leu 590 Lye Lye, Arg Ser Pro Asn 335 Leu Ala Aen Lye Val 415 Lye Phe Ile Thr Ser 495 Pro Giu Ser Met Leu 575 Pro Met 'Gin Lys Lys Phe 320 Val Leu Asn Leu Gin 400 Ala Giu Pro Ser Asp 480 Gly Giu Asp Ser Gin 560 Gin Giy Asn Leu Phe 640 127 Arg Ala Lys Ile Cys Pro Gly Glu Asn Gin Ala Ala Glu Asp Glu Leu 645 650 655 Arg Lys Glu Ile Ser Lys Ser Met Phe Ala Glu Met Glu Ile Leu Gly 660 665 670 Gin Phe Asn Leu Gly Phe Ile Val Thr Lys Leu Lys Glu Asp Leu Phe 675 680 685 Leu Val Asp Gin His Ala Ala Asp Glu Lys Tyr Asn Phe Glu Met Leu 690 695 700 Gin Gin His Thr Val Leu Gin Ala Gin Arg Leu Ile Thr Trp Val His 705 710 715 720 Thr Gly Phe Arg Val Pro Arg Pro Gin Thr Leu Asn Leu Thr Ala Val 725 730 735 Asn Glu Ala Val Leu Ile Glu Asn Leu Glu Ile Phe Arg Lys Asn Gly 740 745 750 Phe Asp Phe Val Ile Asp Glu Asp Ala Pro Val Thr Glu Arg Ala Lys 755 760 765 Leu Ile Ser Leu Pro Thr Ser Lys Asn Trp Thr Phe Gly Pro Gin Asp 770 775 780 Ile Asp Glu Leu Ile Phe Met Leu Ser Asp Ser Pro Gly Val Met Cys 785 790 795 800 Arg Pro Ser Arg Val Arg Gin Met Phe Ala Ser Arg Ala Cys Arg Lys 805 810 815 Ser Val Met Ile Gly Thr Ala Leu Asn Ala Ser Glu Met Lys Lys Leu 820 825 830 e* Ile Thr His Met Gly Glu Met Asp His Pro Trp Asn Cys Pro His Gly 835 840 845 Arg Pro Thr Met Arg His Val Ala Asn Leu Asp Val Ile Ser Gin Asn 850 855 860 INFORMATION FOR SEQ ID NO:139: SEQUENCE CHARACTERISTICS: LENGTH: 29 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:139: CTTGATTCTA GAGCYTCNCC NCKRAANCC 29 INFORMATION FOR SEQ ID NO:140: SEQUENCE CHARACTERISTICS: LENGTH: 29 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear 128 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:140: AGGTCGGAGC TCAARGARYT NGTNGANAA 29 INFORMATION FOR SEQ ID NO:141: SEQUENCE CHARACTERISTICS: LENGTH: 15 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:141: ACTTGTGGAT TTTGC f INFORMATION FOR SEQ ID NO:142: SEQUENCE CHARACTERISTICS: LENGTH: 15 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:142: ACTTGTGAAT TTTGC INFORMATION FOR SEQ ID NO:143: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid *w STRANDEDNESS: single So* TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:143: TTCGGTGACA GATTTGTAAA TG 22 INFORMATION FOR SEQ ID NO:144: SEQUENCE CHARACTERISTICS: LENGTH: 16 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:144: TTTACGGAGC CCTGGC 16 129 INFORMATION FOR SEQ ID NO:145: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:145: TCACCATAAA AATAGTTTCC CG 22 INFORMATION FOR SEQ ID NO:146: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:146: TCCTGGATCA TATTTTCTGA GC 22 INFORMATION FOR SEQ ID NO:147: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:147: STTTCAGGTAT GTCCTGTTAC CC 22 *o INFORMATION FOR SEQ ID NO:148: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:148: TGAGGCAGCT TTTAAGAAAC TC 22

Claims (85)

1. An isolated nucleic acid molecule including a segment having a sequence selected from the group consisting of SEQ ID NOS: 6-24.
2. An isolated nucleic acid molecule including a segment having a sequence selected from the group consisting of SEQ ID NOS: 25-43.
3. An isolated polynucleotide including a unique segment of a human mutL homolog gene, wherein said segment is selected from at least 13 nucleotides the same as any 13 nucleotide sequence in any one of SEQ ID NOS: 6-24 or 26-43.
4. The polynucleotide of claim 3, wherein the mutL homolog gene is hMLHI or hPMS1 The polynucleotide of claim 4, wherein the segment is homologous to the DNA sequence between the two sets of underlined bases in Figure 3.
6. An isolated human nucleic acid molecule comprising a segment of at least 13 nucleotides the same as any 13 nucleotide sequence in any one of SEQ ID NOS: 6-24. 15 7. An isolated polynucleotide comprising a segment of at least 13 nucleotides the same as any 13 nucleotide sequence in any one of SEQ ID NOS: 7-24.
8. An isolated polynucleotide comprising a sequence of at least 13 nucleotides the same as any 13 nucleotide sequence in any one of SEQ ID NOS: 26-43.
9. A composition comprising isolated nucleic acid molecules containing a human sequence encoding hMLH1, wherein said hMLHI sequence is selected from the group consisting of: SEQ ID NOS: 6-24, nucleic acid sequences complementary to and nucleic acid sequences containing a segment of at least 13 nucleotides the same as any 13 nucleotide sequence in or An isolated nucleotide sequence encoding an hMLH1 protein having an amino 25 acid sequence as set forth in SEQ ID NO:
11. An isolated polynucleotide comprising a segment having the same nucleotide sequence as shown in SEQ ID NO: 4 between nucleotide numbers 135-312.
12. The nucleic acid molecule of claim 6 further comprising a detectable label- moiety attached to the sequence.
13. The nucleic acid molecule of claim 12, wherein the label-moiety has a property selected from the group consisting of fluorescent, radioactive and chemiluminescent. 131
14. The polynucleotide of claim 3, comprising a sequence encoding an hMLH1 protein having an amino acid sequence, as set forth in SEQ ID NO: The polynucleotide of claim 3, comprising a segment having the same nucleotide sequences shown in SEQ ID NO: 4 between nucleotide numbers 135-312.
16. The molecule of claim 6, wherein the molecule includes at least two separate segments, each segment having a sequence of at least 13 nucleotides the same as any 13 nucleotide sequence in any one of SEQ ID NOS: 6-24, wherein the sequences of the segments can be used to design a pair of oligonucleotide primers for amplifying specifically at least a portion of hMLH1.
17. The polynucleotide of claim 7, wherein the polynucleotide includes at least two separate segments, each segment having a sequence of at least 13 nucleotides the same as any 13 nucleotide sequence in any one of SEQ ID NOS: 7-24, wherein the sequences of the segments can be used to design a pair of oligonucleotide primers for amplifying specifically at least a portion of hMLHI. 15 18. The polynucleotide of claim 8, wherein the polynucleotide includes at least two separate segments, each segment having a sequence of at least 13 nucleotides the same as any 13 nucleotide sequence in any one of SEQ ID NOS: 26-43, wherein the sequences of the segments can be used to design a pair of oligonucleotide primers for amplifying specifically at least a portion of hMLH1.
19. The composition of claim 9, wherein the hMLHI sequence includes at least *000** two separate segments, each segment having a sequence of at least 13 nucleotides the same as any 13 nucleotide sequence in or wherein the sequences of the segments can be used Sto design a pair of oligonucleotide primers for amplifying specifically at least a portion of oo hMLH1. 25 20. The polynucleotide of claim 11, wherein the polynucleotide includes at least two separate segments, each segment having a sequence of at least 13 nucleotides the same as any 13 nucleotide sequence in SEQ ID NO: 4 between nucleotide numbers 135-312, wherein the sequences of the segments can be used to design a pair of oligonucleotide primers for amplifying specifically at least a portion of hMLH1.
21. The polynucleotide of claim 3, wherein the polynucleotide includes at least two separate segments, each segment having a sequence of at least 13 nucleotides the same as //-ST'any 13 nucleotide sequence in any one of SEQ ID NOS: 6-24, wherein the sequences of the 00 0 S 4 a *6 a S 0*@e 132 segments can be used to design a pair of oligonucleotide primers for amplifying specifically at least a portion of hMLH1.
22. The polynucleotide of claim 3, wherein the polynucleotide includes at least two separate segments, each segment having a sequence of at least 13 nucleotides the same as any 13 nucleotide sequence in any one of SEQ ID NOS: 26-43, wherein the sequences of the segments can be used to design a pair of oligonucleotide primers for amplifying specifically at least a portion of hMLH1.
23. Copies of a DNA sequence amplified from DNA or RNA, each copy comprising a segment of at least 13 nucleotides of hPMS1 as shown in SEQ ID NO: 132.
24. The polynucleotide of claim 4, wherein the segment has a sequence of at least 13 nucleotides of hPMS1 as shown in SEQ ID NO: 132. The polynucleotide of claim 24, wherein the sequence has the nucleotide structure shown in SEQ ID NO: 132.
26. An isolated human gene comprising exons having a combined nucleotide sequence as shown in SEQ ID NO: 4, and introns intervening the exons.
27. Isolated DNA comprising a first strand containing a first segment of at least 13 nucleotides of hPMS1 cDNA as shown in SEQ ID NO: 132, and a second strand containing a second segment which is complementary to at least 13 nucleotides ofhPMS1 cDNA as shown in SEQ ID NO: 132, wherein the sequences of the segments can be used to design a pair of oligonucleotide primers for amplifying specifically at least a portion of hPMS1.
28. The polynucleotide of claim 24, further comprising a second segment of at least 13 nucleotides the same as any 13 nucleotide sequence in SEQ ID NO: 132, wherein the sequences of the first and second segments can be used to design a pair of oligonucleotide primers for amplifying specifically at least a portion ofhPMS1.
29. The polynucleotide of claim 24, wherein the sequence is cDNA. The polynucleotide of claim 24, wherein the sequence is labeled.
31. An isolated human polynucleotide comprising a sequence of at least 13 nucleotides the same as any 13 nucleotide sequence in SEQ ID NO: 132, wherein the polynucleotide contains at least one mutation that causes a failure to produce a functionally normal hPMS1 protein. ys-r~ E
32. An isolated human polynucleotide comprising a sequence of at least 13 nucleotides the same as any 13 nucleotide sequence in SEQ ID NOS: 26-43, wherein the polynucleotide contains at least one mutation that causes a failure to produce a functionally normal hMLHI protein.
33. An isolated segment of a human gene, comprising an intron portion and an exon portion, the exon portion corresponding sequentially to a fragment of hMLH1 cDNA, wherein said fragment comprises at least 13 nucleotides the same as any 13 nucleotide sequence in any one of SEQ ID NOS: 26-43.
34. The segment of claim 33, wherein the segment contains at least one mutation that causes a failure to produce a functionally normal hMLH1 protein. The nucleic acid molecule of claim 6, comprising a segment of at least nucleotides the same as any 20 nucleotide sequence in any one of SEQ ID NOS: 6-24.
36. The polynucleotide of claim 3, comprising a segment of at least 20 nucleotides the same as any 20 nucleotide sequence in SEQ ID NOS: 26-43. 15 37. The polynucleotide of claim 3, comprising a segment of at least 20 nucleotides the same as any 20 nucleotide sequence in SEQ ID NO: 132.
38. The segment of claim 34, wherein the mutation is located in the intron portion
39. A method of diagnosing cancer susceptibility in a subject comprising detecting a mutation in a mutL homolog gene or gene product in a tissue of the subject, and diagnosing cancer susceptibility in the subject based on the detected mutation. A method of determining whether a person has a mutation in a DNA mismatch repair gene comprising determining a sample sequence of at least a segment of a mutL homolog gene from the person, and *o 25 comparing the sample sequence to a wild type sequence of at least a segment of any o one of SEQ ID NOS: 6-24 and 132.
41. The method of claim 40 further comprising sequencing hMLHI or a fragment thereof.
42. The method of claim 40 further comprising sequencing hPMS1 or a fragment thereof.
43. The method of claim 40 further comprising a step of ascertaining whether any difference between the sample sequence and the wild type sequence is a polymorphism. OFF S z y A O C 134
44. The method of claim 40 further comprising a step of ascertaining whether any difference between the sample sequence and the wild type sequence results in a defective gene product. The method of claim 40, wherein the determining step includes the step of collecting and sequencing a DNA sample from the person.
46. The method of claim 40, wherein the determining step includes the step of collecting and sequencing an RNA sample from the person.
47. The method of claim 40 further comprising collecting a sample containing hMLH1 or a fragment thereof from the person.
48. The method of claim 40 further comprising collecting a sample containing hPMS1 or a fragment thereof from the person.
49. The method of claim 40, wherein the person is suspected to be at increased risk of developing cancer based on family history, further comprising collecting the sample sequence from phenotypically normal tissue of the person. 15 50. The method of claim 40, wherein the person has been diagnosed as having cancer, further comprising collecting a sample from a tumor in the person. 0000 S51. A method of determining whether a person has a mutation in a DNA mismatch repair gene comprising 0:00 determining a sample sequence of at least a segment of a mutL homolog gene from the 20 person, and 000000 S•comparing the sample sequence to a wild type sequence of at least a segment of SEQ ID NOS: 26 and 27. o 52. A method of determining whether a person has a mutation in a DNA mismatch *se0s: repair gene comprising '0 25 determining a sample sequence of at least a segment of a mutL homolog gene from the person, and comparing the sample sequence to a wild type sequence of at least a segment of the nucleotide sequence shown in SEQ ID NOS: 26-43.
53. A method of determining whether there is an alteration in a mammalian DNA mismatch repair pathway comprising isolating a biological specimen from a mammal, 135 testing the specimen for an alteration in a mutL homolog nucleotide sequence or its expression product, and comparing the results obtained in step with the results obtained from a wild-type control.
54. The method of claim 53, wherein the mammal is a human. The method of claim 53, wherein the mammal is a mouse.
56. The method of claim 53, wherein an alteration is indicative of a predisposition to malignant growth of cells in the mammal.
57. The method of claim 53, wherein the nucleotide sequence is a gene.
58. The method of claim 57, wherein the gene is hMLH1.
59. The method of claim 57, wherein the gene is hPMS1. The method of claim 53, wherein the expression product is mRMA.
61. The method of claim 53, wherein the specimen is tested for an alteration in the expression product and the expression product is a protein. 15 62. The method of claim 53, wherein the alteration in the pathway is in the nucleotide sequence of the DNA.
63. The method of claim 62, wherein the alteration is detected using a method of i DNA amplification.
64. The method of claim 63, wherein the method of DNA amplification detects an alteration in at least one intron or exon.
65. The method of claim 64, wherein the alteration is detected in the hMLHI gene using a pair of oligonucleotide primers.
66. The method of claim 65, wherein the oligonucleotide primers are selected from the group consisting of SEQ ID NOS: 44-122. 25 67. A pair of oligonucleotide primers selected from the group consisting of SEQ ID NOS: 44-122.
68. A method of diagnosing a DNA mismatch repair abnormality in a human subject, comprising the steps of collecting a sample from a human subject, and detecting whether there is an abnormal deficiency of a mutL homolog protein in the sample. FF 136
69. The method of claim 68, wherein the collecting step includes the step of obtaining a sample from a tumor in the human subject. The method of claim 68 further comprising the step of determining whether there is an abnormal deficiency ofhMLH1 protein in the sample.
71. The method of claim 68, wherein the detecting step includes the step of contacting a sample with an antibody that binds specifically to a mutL homolog protein.
72. The method of claim 68, wherein the detecting step includes the step of contacting a sample with a monoclonal antibody that binds specifically to a mutL homolog protein.
73. The method of claim 68, wherein the detecting step includes the step of contacting a sample with a polyclonal antibody that binds specifically to a mutL homolog protein.
74. The method of claim 68, wherein the detecting step includes the step of contacting a sample with an antibody that binds specifically to a protein selected from the 15 group consisting of hMLH1 and hPMS1. 000
75. The method of claim 68, wherein the detecting step includes the step of Scontacting a sample with an antibody that binds specifically to a protein, at least a portion of which has a deduced amino acid sequence as shown in SEQ ID NOS: 26-43.
76. The method of claim 68, wherein the detecting step includes the step of contacting a sample with an antibody that binds specifically to a protein, at least a portion of •••oo which has a deduced amino acid sequence as shown in SEQ ID NOS: 26 and 27. •0*0 0
77. The method of claim 68, wherein the detecting step includes the step of contacting a sample with an antibody that is conjugated to a fluorescent compound. 0
78. A method of diagnosing a tumor associated with defective DNA mismatch 25 repair in a human comprising isolating a tissue suspected of being a tumor from said human, and detecting an alteration in a mutL homolog gene or its expression product, wherein said alteration is indicative of a tumor associated with defective DNA mismatch repair.
79. The method of claim 78, wherein the tumor associated with defective DNA mismatch repair is selected from the group of tumors consisting of colorectal, ovarian, endometrial, gastric and breast. A method of diagnosing cancer in an individual, comprising 137 comparing a polynucleotide sequence of a mutL homolog gene from a cancer cell from an individual with a polynucleotide sequence of the mutL homolog gene from a non- cancer cell from the individual, and determining whether there is a difference between the polynucleotide sequence from the cancer cell in comparison to the polynucleotide sequence from the non-cancer cell.
81. A kit for performing an assay to determine whether an individual has a mutation in a DNA mismatch repair gene comprising a set of oligonucleotide primers which can be used to amplify specifically a portion of a mutL homolog gene.
82. The kit of claim 81, wherein the oligonucleotide primers can be used to amplify specifically at least a segment of a mutL homolog gene selected from the group consisting of hMLH1 and hPMS1.
83. The kit of claim 81, wherein the oligonucleotide primers can be used to amplify specifically at least a segment of SEQ ID NOS: 26-43. 15 84. The kit of claim 81, wherein the oligonucleotide primers can be used to amplify specifically at least a segment of SEQ ID NOS: 25-43.
85. The kit of claim 81, wherein the oligonucleotide primers can be used to 0 amplify specifically at least a segment of SEQ ID NOS: 6-24.
86. The kit of claim 81, wherein the oligonucleotide primers are selected from the group consisting of SEQ ID NOS: 44-82.
87. A kit for performing an assay to determine whether a human subject has an 0 S abnormal deficiency of a protein involved in a DNA mismatch repair pathway comprising 00 a purified antibody that binds specifically to a mutL homolog protein. 0
88. The kit of claim 87, wherein the antibody is a monoclonal antibody. 25 89. The kit of claim 87, wherein the antibody is a polyclonal antibody. The kit of claim 87, wherein the antibody is conjugated to a fluorescent compound.
91. A kit for performing an assay to determine whether an individual has a mutation in a DNA mismatch repair gene comprising at least one or more allele-specific oligomer probe that is capable of detecting a mutant allele in a mutL homolog gene. 138
92. The kit of claim 91, wherein said at least one allele-specific oligomer probe is capable of detecting a mutant allele in a DNA mismatch repair gene selected from the group consisting of hMLHI and hPMS1.
93. A composition comprising purified antibodies that bind specifically to a human mutL homolog protein.
94. The composition of claim 93, wherein the antibodies bind specifically to hMLH1 protein encoded by a unique fragment of SEQ ID NOS: 26-43. The composition of claim 93, wherein the composition includes antibodies that bind specifically to hPMS1 protein as shown in SEQ ID NO: 133.
96. The composition of claim 93, wherein the antibodies are polyclonal.
97. The composition of claim 93, wherein the antibodies are monoclonal.
98. A method of determining whether a person is susceptible to cancer comprising: collecting a sample from the person, 15 contacting the sample with antibodies that bind specifically to a mutL homolog oe* protein, and detecting binding or lack of binding antibodies to mutL homolog protein.
99. The method of claim 98 comprising specific binding of antibodies to hMLH1 protein encoded by a unique fragment of SEQ ID NOS: 26-43.
100. A composition comprising purified antibodies that bind specifically to a human mutL homolog protein.
101. The composition of claim 100, wherein the antibodies are monoclonal antibodies.
102. The composition of claim 100, wherein the protein is hMLH1 or hPMS1. 25 103. A method of determining whether tumor tissue has a deficient amount of a mutL homolog protein comprising obtaining a cross-section of a tumor specimen, contacting the cross-section with a purified antibody that binds specifically to a mutL homolog protein, and determining the presence or absence of antibody bound to a mutL homolog protein in the cross-section.
104. The method of claim 103, wherein the antibody is a monoclonal antibody. 139
105. The method of claim 103, wherein the antibody is a polyclonal antibody.
106. The method of claim 103, wherein the antibody is conjugated to a fluorescent compound.
107. The method of claim 103 further comprising prescribing a therapy regimen based at least partially on results of the determining step.
108. The method of claim 103 further comprising eluting unbound antibody from the cross-section.
109. The method of claim 103, wherein the antibody binds specifically to hMLH1 protein.
110. A method of manufacturing antibodies that bind specifically to a mutL homolog protein comprising synthesizing or isolating at least a portion of a mutL homolog protein, overexpressing the protein in bacteria, 15 purifying the protein, injecting the protein into a mouse, generating a hybridoma by fusing a lymphocyte from the mouse with a myeloma cell, and isolating monoclonal antibodies produced by the hybridoma, wherein the antibodies bind specifically to a mutL homolog protein.
111. The method of claim 110, wherein the monoclonal antibodies bind specifically 0 S to hMLH1 protein as shown in SEQ ID NO: 0 O Dated this fourth day of December 2001 25 OREGON HEALTH SCIENCES UNIVERSITY DANA-FARBER CANCER INSTITUTE By their Patent Attorneys CULLEN CO.
AU17284/99A 1993-12-17 1999-02-15 Compositions and methods relating to DNA mismatch repair genes Ceased AU743457B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU17284/99A AU743457B2 (en) 1993-12-17 1999-02-15 Compositions and methods relating to DNA mismatch repair genes

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US168877 1993-12-17
US352902 1994-12-09
AU14424/95A AU1442495A (en) 1993-12-17 1994-12-16 Compositions and methods relating to dna mismatch repair genes
US209521 1998-12-10
AU17284/99A AU743457B2 (en) 1993-12-17 1999-02-15 Compositions and methods relating to DNA mismatch repair genes

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
AU14424/95A Division AU1442495A (en) 1993-12-17 1994-12-16 Compositions and methods relating to dna mismatch repair genes

Publications (2)

Publication Number Publication Date
AU1728499A AU1728499A (en) 1999-05-13
AU743457B2 true AU743457B2 (en) 2002-01-24

Family

ID=3704554

Family Applications (1)

Application Number Title Priority Date Filing Date
AU17284/99A Ceased AU743457B2 (en) 1993-12-17 1999-02-15 Compositions and methods relating to DNA mismatch repair genes

Country Status (1)

Country Link
AU (1) AU743457B2 (en)

Also Published As

Publication number Publication date
AU1728499A (en) 1999-05-13

Similar Documents

Publication Publication Date Title
US6538108B1 (en) Compositions and methods relating to DNA mismatch repair genes
US5922855A (en) Mammalian DNA mismatch repair genes MLH1 and PMS1
US5352775A (en) APC gene and nucleic acid probes derived therefrom
EP0820526B1 (en) Coding sequences of the human brca1 gene
US7022472B2 (en) Mutations in human MLH1 and human MSH2 genes useful in diagnosing colorectal cancer
KR101693387B1 (en) Aberrant mitochondrial dna, associated fusion transcripts and hybridization probes therefor
WO1992013103A1 (en) Inherited and somatic mutations of apc gene in colorectal cancer of humans
JP4312827B2 (en) Human mutator gene hMSH2 and hereditary nonpolyposis colorectal cancer
JPH044898A (en) Detection of deficiency of wild type p53 gene
KR101206029B1 (en) Multiple SNP for diagnosing colorectal cancer, microarray and kit comprising the same, and method for diagnosing colorectal cancer using the same
US20100129824A1 (en) Distinguishing pca3 messenger rna species in benign and malignant prostate tissues
Dobbie et al. Mutational analysis of the first 14 exons of the adenomatous polyposis coli (APC) gene
WO1995014085A2 (en) A method for detection of alterations in the dna mismatch repair pathway
Matsumoto et al. Breakpoint sequences of an 1; 8 translocation in a family with Gilles de la Tourette syndrome
AU743457B2 (en) Compositions and methods relating to DNA mismatch repair genes
KR101720555B1 (en) Aberrant mitochondrial dna, associated fusion transcripts and translation products and hybridization probes therefor
JP2000500329A (en) Diagnosis and treatment of cancer susceptibility
KR20130057760A (en) Brca1 and brca2 germline mutations useful for predicting genetic predisposition of breast cancer or ovarian cancer
EP1641937B1 (en) Mutations in the slc40a1 gene associated to impaired iron homeostasis
JP2002502584A (en) Cancer susceptibility diagnostic assays
JP2005505272A (en) CAP-2 genes and proteins expressed in the brain associated with bipolar disorder
WO1998006871A1 (en) Materials and methods relating to the diagnosis and prophylactic and therapeutic treatment of papillary renal cell carcinoma
KR20220169190A (en) Markers for watermelon (citrullus lanatus) generation shortening of backcross breeding and uses thereof
JP5354484B2 (en) Cancer detection method

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)