AU774440B2 - Nucleic acids encoding human TBC-1 protein and polymorphic markers thereof - Google Patents

Nucleic acids encoding human TBC-1 protein and polymorphic markers thereof Download PDF

Info

Publication number
AU774440B2
AU774440B2 AU51878/99A AU5187899A AU774440B2 AU 774440 B2 AU774440 B2 AU 774440B2 AU 51878/99 A AU51878/99 A AU 51878/99A AU 5187899 A AU5187899 A AU 5187899A AU 774440 B2 AU774440 B2 AU 774440B2
Authority
AU
Australia
Prior art keywords
tbc
polynucleotide
seq
sequence
nucleotides
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU51878/99A
Other versions
AU5187899A (en
Inventor
Marta Blumenfeld
Lydie Bougueleret
Ilya Chumakov
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Merck Biodevelopment SAS
Original Assignee
Genset SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genset SA filed Critical Genset SA
Publication of AU5187899A publication Critical patent/AU5187899A/en
Assigned to GENSET S.A. reassignment GENSET S.A. Amend patent request/document other than specification (104) Assignors: GENSET
Application granted granted Critical
Publication of AU774440B2 publication Critical patent/AU774440B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Plant Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Peptides Or Proteins (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Description

WO 00/08209 PCT/IB99/01444 Nucleic acids encoding human TBC-1 protein and polymorphic markers thereof.
FIELD OF THE INVENTION The invention concerns genomic and cDNA sequences of the human TBC-1 gene. The invention also concerns polypeptides encoded by the TBC-1 gene. The invention also deals with antibodies directed specifically against such polypeptides that are useful as diagnostic reagents.
The invention further encompasses biallelic markers of the TBC-1 gene useful in genetic analysis.
BACKGROUND OF THE INVENTION The incidence of prostate cancer has dramatically increased over the last decades. It averages 30-50/100,000 males in Western European countries as well as within the US White male population. In these countries, it has recently become the most commonly diagnosed malignancy, being one of every four cancers diagnosed in American males. Prostate cancer's incidence is very much population specific, since it varies from 2/100,000 in China, to over 80/100,000 among African-American males.
In France, the incidence of prostate cancer is 35/100,000 males and it is increasing by 10/100,000 per decade. Mortality due to prostate cancer is also growing accordingly. It is the second cause of cancer death among French males, and the first one among French males aged over This makes prostate cancer a serious burden in terms of public health.
Prostate cancer is a latent disease. Many men carry prostate cancer cells without overt signs of disease. Autopsies of individuals dying of other causes show prostate cancer cells in 30 of men at age 50 and in 60 of men at age 80. Furthermore, prostate cancer can take up to 10 years to kill a patient after the initial diagnosis.
The progression of the disease usually goes from a well-defined mass within the prostate to a breakdown and invasion of the lateral margins of the prostate, followed by metastasis to regional lymph nodes, and metastasis to the bone marrow. Cancer metastasis to bone is common and often associated with uncontrollable pain.
Unfortunately, in 80 of cases, diagnosis of prostate cancer is established when the disease has already metastasized to the bones. Of special interest is the observation that prostate cancers frequently grow more rapidly in sites of metastasis than within the prostate itself.
Early-stage diagnosis of prostate cancer mainly relies today on Prostate Specific Antigen (PSA) dosage, and allows the detection of prostate cancer seven years before clinical symptoms become apparent. The effectiveness of PSA dosage diagnosis is however limited, due to its inability to discriminate between malignant and non-malignant affections of the organ and because not all prostate cancers give rise to an elevated serum PSA concentration. Furthermore, PSA dosage and WO 00/08209 PCT/IB99/01444 2 other currently available approaches such as physical examination, tissue biopsy and bone scans are of limited value in predicting disease progression.
Therefore, there is a strong need for a reliable diagnostic procedure which would enable a more systematic early-stage prostate cancer prognosis.
Although an early-stage prostate cancer prognosis is important, the possibility of measuring the period of time during which treatment can be deferred is also interesting as currently available medicaments are expensive and generate important adverse effects. However, the aggressiveness of prostate tumors varies widely. Some tumors are relatively aggressive, doubling every six months whereas others are slow-growing, doubling once every five years. In fact, the majority of prostate cancers grows relatively slowly and never becomes clinically manifest. Very often, affected patients are among the elderly and die from another disease before prostate cancer actually develops. Thus, a significant question in treating prostate carcinoma is how to discriminate between tumors that will progress and those that will not progress during the expected lifetime of the patient.
Hence, there is also a strong need for detection means which may be used to evaluate the aggressiveness or the development potential of prostate cancer tumors once diagnosed.
Furthermore, at the present time, there is no means to predict prostate cancer susceptibility.
It would also be very beneficial to detect individual susceptibility to prostate cancer. This could allow preventive treatment and a careful follow up of the development of the tumor.
A further consequence of the slow growth rate of prostate cancer is that few cancer cells are actively dividing at any one time, rendering prostate cancer generally resistant to radiation and chemotherapy. Surgery is the mainstay of treatment but it is largely ineffective and removes the ejaculatory ducts, resulting in impotence. Oral oestrogens and luteinizing releasing hormone analogs are also used for treatment of prostate cancer. These hormonal treatments provide marked improvement for many patients, but they only provide temporary relief. Indeed, most of these cancers soon relapse with the development of hormone-resistant tumor cells and the oestrogen treatment can lead to serious cardiovascular complications. Consequently, there is a strong need for preventive and curative treatment of prostate cancer.
Efficacy/tolerance prognosis could be precious in prostate cancer therapy. Indeed, hormonal therapy, the main treatment currently available, presents important side effects. The use of chemotherapy is limited because of the small number of patients with chemosensitive tumors.
Furthermore the age profile of the prostate cancer patient and intolerance to chemotherapy make the systematic use of this treatment very difficult.
Therefore, a valuable assessment of the eventual efficacy of a medicament to be administered to a prostate cancer patent as well as the patent's eventual tolerance to it may permit to enhance the benefit/risk ratio of prostate cancer treatment.
It is known today that there is a familial risk of prostate cancer. Clinical studies in the 1950s had already demonstrated a familial aggregation in prostate cancer. Control-case clinical studies WO 00/08209 PCT/IB99/01444 3 have been conducted more recently to attempt to evaluate the incidence of the genetic risk factors in the disease. Thus Steinberg et al., 1990, and McWhorter et al., 1992 confirm that the risk of prostate cancer is increased in subjects having one or more relatives already affected by the disease and when forms of early diagnosis in the relatives exist.
It is now well established that cancer is a disease caused by the deregulation of the expression of certain genes. In fact, the development of a tumor necessitates an important succession of steps. Each of these steps comprises the deregulation of an important gene intervening in the normal metabolism of the cell and the emergence of an abnormal cellular sub-clone which overwhelms the other cell types because of a proliferative advantage. The genetic origin of this concept has found confirmation in the isolation and the characterization of genes which could be responsible. These genes, commonly called "cancer genes", have an important role in the normal metabolism of the cell and are capable of intervening in carcinogenesis following a change.
Recent studies have identified three groups of genes which are frequently mutated in cancer. The first group of genes, called oncogenes, are genes whose products activate cell proliferation. The normal non-mutant versions are called protooncogenes. The mutated forms are excessively or inappropriately active in promoting cell proliferation, and act in the cell in a dominant way in that a single mutant allele is enough to affect the cell phenotype. Activated oncogenes are rarely transmitted as germline mutations since they may probably be lethal when expressed in all the cells. Therefore oncogenes can only be investigated in tumor tissues.
The second group of genes which are frequently mutated in cancer, called tumor suppressor genes, are genes whose products inhibit cell growth. Mutant versions in cancer cells have lost their normal function, and act in the cell in a recessive way in that both copies of the gene must be inactivated in order to change the cell phenotype. Most importantly, the tumor phenotype can be rescued by the wild type allele, as shown by cell fusion experiments first described by Harris and colleagues (1969). Germline mutations of tumor suppressor genes may be transmitted and thus studied in both constitutional and tumor DNA from familial or sporadic cases. The current family of tumor suppressors includes DNA-binding transcription factors p53, WTI), transcription regulators RB, APC, probably BRCA1), protein kinase inhibitors p16), among others (for review, see Haber D Harlow E, 1997).
The third group of genes which are frequently mutated in cancer, called mutator genes, are responsible for maintaining genome integrity and/or low mutation rates. Loss of function of both alleles increases cell mutation rates, and as a consequence, proto-oncogenes and tumor suppressor genes may be mutated. Mutator genes can also be classified as tumor suppressor genes, except for the fact that tumorigenesis caused by this class of genes cannot be suppressed simply by restoration of a wild-type allele, as described above. Genes whose inactivation may lead to a mutator phenotype include mismatch repair genes MLH, MSH2), DNA helicases BLM, WRN) or other genes involved in DNA repair and genomic stability p53, possibly BRCA1 and BRCA2) (For review see Haber D Harlow E, 1997; Fishel R Wilson T. 1997; Ellis NA, 1997).
There is growing evidence that a critical event in the progression of a tumor cell from a non-metastatic to metastatic phenotype is the loss of function of metastasissuppressor genes. These genes specifically suppress the ability of a cell to metastasize.
Work from several groups has demonstrated that human chromosomes 8, 10, 11 and 17 encode prostate cancer metastasis suppressor activities. However, other human chromosomes such as chromosomes 1, 7, 13, 16, and 18 may also be associated to prostate cancer.
It thus remains to localize and to identify the genes specifically involved in the development and the progression of prostate cancers starting from the genetic analysis of the hereditary and the non-hereditary forms and to define their clinical implications in terms of prognosis and therapeutic innovations.
SUMMARY OF THE INVENTION The present invention pertains to nucleic acid molecules comprising the genomic sequence of a novel human gene which encodes a TBC-1 protein. The TBC-1 genomic sequences comprise regulatory sequence located upstream (5'-end) and downstream (3'-end) of the transcribed portion of said gene, these regulatory sequences being also part of the invention. The human TBC-1 genomic sequence is included in a previously unknown candidate region of prostate cancer located on chromosome 4.
The invention also deals with the two complete cDNA sequences encoding the TBC-i protein, as well as with the corresponding translation product.
25 Oligonucleotide probes or primers hybridizing specifically with a TBC-1 genomic or cDNA sequence are also part of the present invention, as well as DNA amplification and detection methods using said primers and probes.
The invention also consists of recombinant vectors comprising any of the nucleic acid sequences described above, and in particular of recombinant vectors •o 30 comprising a TBC-1 regulatory sequence or a sequence encoding a TBC-1 protein, as 00. well as of cell hosts and transgenic non human animals comprising said nucleic acid sequences or recombinant vectors.
The invention also concerns a TBC-1 related biallelic marker and the use o *thereof.
So.:i Finally, the invention is directed to methods for the screening of substances or molecules that inhibit the expression of TBC-1, as well as with methods for the screening of substances or molecules that interact with a TBC-1 polypeptide.
Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is solely for the purpose of providing a context for the present invention. It is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed before the priority date of each claim of this application.
Throughout this specification the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.
00..
0. 0 0.
WO 00/08209 PCT/IB99/01444 BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 An amino acid alignment of a portion of the amino acid sequence of the TBC-1 protein of SEQ ID No 5 with other proteins sharing amino acid homology with TBC-I. The amino acid numbering refers to the murine TBC-1.
Brief Description of the sequences provided in the Sequence Listing SEQ ID No 1 contains a first part of the TBC-1 genomic sequence comprising the regulatory sequence and the exons 1, Ibis, and 2.
SEQ ID No 2 contains a second part of the TBC-1 genomic sequence comprising the 12 last exons of the TBC-1 gene and the 3'regulatory sequence.
SEQ ID No 3 contains a first cDNA sequence of the TBC-1 gene.
SEQ ID No 4 contains a second cDNA sequence of the TBC-1 gene.
SEQ ID No 5 contains the amino acid sequence encoded by the cDNAs of SEQ ID Nos 3 and 4.
SEQ ID No 6 contains a primer containing the additional PU 5' sequence described further in Example 3.
SEQ ID No 7 contains a primer containing the additional RP 5' sequence described further in Example 3.
In accordance with the regulations relating to Sequence Listings, the following codes have been used in the Sequence Listing to indicate the locations of biallelic markers within the sequences and to identify each of the alleles present at the polymorphic base. The code in the sequences indicates that one allele of the polymorphic base is a guanine, while the other allele is an adenine.
The code in the sequences indicates that one allele of the polymorphic base is a thymine, while the other allele is a cytosine. The code in the sequences indicates that one allele of the polymorphic base is an adenine, while the other allele is an cytosine. The code in the sequences indicates that one allele of the polymorphic base is a guanine, while the other allele is a thymine.
The code in the sequences indicates that one allele of the polymorphic base is a guanine, while the other allele is a cytosine. The code in the sequences indicates that one allele of the polymorphic base is an adenine, while the other allele is an thymine. The nucleotide code of the original allele for each biallelic marker is the following: Biallelic marker Original allele 99-430-352 G 99-20508-456 C 99-20469-213 C 5-254-227 A 5-257-353 C 99-20511-32 T WO 00/08209 PCT/IB99/01444 6 99-20511-221
A
99-20504-90
G
99-20493-238
A
99-20499-221
G
99-20499-364 A 99-20499-399
A
5-249-304
G
99-20485-269 A 99-20481-131
G
99-20481-419 T 99-20480-233 A DETAILED DESCRIPTION OF THE INVENTION The present invention concerns polynucleotides and polypeptides related to the human TBC-1 gene (also termed "TBC-1 gene" throughout the present specification), which is potentially involved in the regulation of the differentiation of various cell types in mammals. A deregulation or an alteration of TBC-1 expression, or alternatively an alteration in the amino acid sequence of the TBC-1 protein may be involved in the generation of a pathological state related to cell differentiation in a patient, more particularly to abnormal cell proliferation leading to cancer states, such as prostate cancer.
Definitions Before describing the invention in greater detail, the following definitions are set forth to illustrate and define the meaning and scope of the terms used to describe the invention herein.
The term "TBC-1 gene", when used herein, encompasses mRNA and cDNA sequences encoding the TBC-1 protein. In the case of a genomic sequence, the TBC-1 gene also includes native regulatory regions which control the expression of the coding sequence of the TBC-1 gene.
The term "functionally active fragment" of the TBC-1 protein is intended to designate a polypeptide carrying at least one of the structural features of the TBC-1 protein involved in at least one of the biological functions and/or activity of the TBC-1 protein.
A "heterologous" or "exogenous" polynucleotide designates a purified or isolated nucleic acid that has been placed, by genetic engineering techniques, in the environment of unrelated nucleotide sequences, such as the final polynucleotide construct does not occur naturally. An illustrative, but not limitative, embodiment of such a polynucleotide construct may be represented by a polynucleotide comprising a regulatory polynucleotide derived from the TBC-1 gene sequence and a polynucleotide encoding a cytokine, for example GM-CSF. The polypeptide WO 00/08209 PCT/IB99/01444 7 encoded by the heterologous polynucleotide will be termed an heterologous polypeptide for the purpose of the present invention.
By a "biologically active fragment or variant" of a regulatory polynucleotide according to the present invention is intended a polynucleotide comprising or alternatively consisting in a fragment of said polynucleotide which is functional as a regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide in a recombinant cell host.
For the purpose of the invention, a nucleic acid or polynucleotide is "functional" as a regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide if said regulatory polynucleotide contains nucleotide sequences which contain transcriptional and translational regulatory information, and such .sequences are "operatively linked" to nucleotide sequences which encode the desired polypeptide or the desired polynucleotide. An operable linkage is a linkage in which the regulatory nucleic acid and the DNA sequence sought to be expressed are linked in such a way as to permit gene expression.
A "promoter" refers to a DNA sequence recognized by the synthetic machinery of the cell required to initiate the specific transcription ofa gene.
A sequence which is "operably linked" to a regulatory sequence such as a promoter means that said regulatory element is in the correct location and orientation in relation to the nucleic acid to control RNA polymerase initiation and expression of the nucleic acid of interest.
As used herein, the term "operably linked" refers to a linkage of polynucleotide elements in a functional relationship. For instance, a promoter or enhancer is operably linked to a coding sequence if it affects the transcription of the coding sequence. More precisely, two DNA molecules (such as a polynucleotide containing a promoter region and a polynucleotide encoding a desired polypeptide or polynucleotide) are said to be "operably linked" if the nature of the linkage between the two polynucleotides does not result in the introduction of a frame-shift mutation or (2) interfere with the ability of the polynucleotide containing the promoter to direct the transcription of the coding polynucleotide. The promoter polynucleotide would be operably linked to a polynucleotide encoding a desired polypeptide or a desired polynucleotide if the promoter is capable of effecting transcription of the polynucleotide of interest.
The term "primer" denotes a specific oligonucleotide sequence which is complementary to a target nucleotide sequence and used to hybridize to the target nucleotide sequence. A primer serves as an initiation point for nucleotide polymerization catalyzed by either DNA polymerase, RNA polymerase or reverse transcriptase.
The term "probe" denotes a defined nucleic acid segment (or nucleotide analog segment, polynucleotide as defined hereinbelow) which can be used to identify a specific polynucleotide sequence present in samples, said nucleic acid segment comprising a nucleotide sequence complementary of the specific polynucleotide sequence to be identified.
WO 00/08209 PCT/IB99/01444 8 The terms "sample" or "material sample" are used herein to designate a solid or a liquid material suspected to contain a polynucleotide or a polypeptide of the invention. A solid material may be, for example, a tissue slice or biopsy within which is searched the presence of a polynucleotide encoding a TBC-1 protein, either a DNA or RNA molecule or within which is searched the presence of a native or a mutated TBC-1 protein, or alternatively the presence of a desired protein of interest the expression of which has been placed under the control of a TBC-1 regulatory polynucleotide. A liquid material may be, for example, any body fluid like serum, urine etc., or a liquid solution resulting from the extraction of nucleic acid or protein material of interest from a cell suspension or from cells in a tissue slice or biopsy. The term "biological sample" is also used and is more precisely defined within the Section dealing with DNA extraction.
As used herein, the term "purified" does not require absolute purity; rather, it is intended as a relative definition. Purification if starting material or natural material to at least one order of magnitude, preferably two or three orders, and more preferably four or five orders of magnitude is expressly contemplated. As an example, purification from 0.1% concentration to 10% concentration is two orders of magnitude.
The term "isolated" requires that the material be removed from its original environment the natural environment if it is naturally occurring). For example, a naturally-occurring polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide or DNA or polypeptide, separated from some or all of the coexisting materials in the natural system, is isolated. Such polynucleotide could be part of a vector and/or such polynucleotide or polypeptide could be part of a composition and still be isolated in that the vector or composition is not part of its natural environment.
The term "polypeptide" refers to a polymer of amino acids without regard to the length of the polymer; thus, peptides, oligopeptides, and proteins are included within the definition of polypeptide. This term also does not specify or exclude post-expression modifications of polypeptides, for example, polypeptides which include the covalent attachment of glycosyl groups, acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by the term polypeptide. Also included within the definition are polypeptides which contain one or more analogs of an amino acid (including, for example, non-naturally occurring amino acids, amino acids which only occur naturally in an unrelated biological system, modified amino acids from mammalian systems etc.), polypeptides with substituted linkages, as well as other modifications known in the art, both naturally occurring and non-naturally occurring.
The term "recombinant polypeptide" is used herein to refer to polypeptides that have been artificially designed and which comprise at least two polypeptide sequences that are not found as contiguous polypeptide sequences in their initial natural environment, or to refer to polypeptides which have been expressed from a recombinant polynucleotide.
WO 00/08209 PCT/IB99/01444 9 The term "purified" is used herein to describe a polypeptide of the invention which has been separated from other compounds including, but not limited to nucleic acids, lipids, carbohydrates and other proteins. A polypeptide is substantially pure when at least about 50%, preferably 60 to of a sample exhibits a single polypeptide sequence. A substantially pure polypeptide typically comprises about 50%, preferably 60 to 90% weight/weight of a protein sample, more usually about and preferably is over about 99% pure. Polypeptide purity or homogeneity is indicated by a number of means well known in the art, such as polyacrylamide gel electrophoresis of a sample, followed by visualizing a single polypeptide band upon staining the gel. For certain purposes higher resolution can be provided by using HPLC or other means well known in the art.
As used herein, the term "non-human animal" refers to any non-human vertebrate, birds and more usually mammals, preferably primates, farm animals such as swine, goats, sheep, donkeys, and horses, rabbits or rodents, more preferably rats or mice. As used herein, the term "animal" is used to refer to any vertebrate, preferable a mammal. Both the terms "animal" and "mammal" expressly embrace human subjects unless preceded with the term "non-human".
As used herein, the term "antibody" refers to a polypeptide or group of polypeptides which are comprised of at least one binding domain, where an antibody binding domain is formed from the folding of variable domains of an antibody molecule to form three-dimensional binding spaces with an internal surface shape and charge distribution complementary to the features of an antigenic determinant of an antigen, which allows an immunological reaction with the antigen. Antibodies include recombinant proteins comprising the binding domains, as wells as fragments, including Fab, Fab', F(ab) 2 and F(ab') 2 fragments.
As used herein, an "antigenic determinant" is the portion of an antigen molecule, in this case a TBC-1 polypeptide, that determines the specificity of the antigen-antibody reaction. An "epitope" refers to an antigenic determinant of a polypeptide. An epitope can comprise as few as 3 amino acids in a spatial conformation which is unique to the epitope. Generally an epitope consists of at least 6 such amino acids, and more usually at least 8-10 such amino acids. Methods for determining the amino acids which make up an epitope include x-ray crystallography, 2dimensional nuclear magnetic resonance, and epitope mapping e.g. the Pepscan method described by Geysen et al. 1984; PCT Publication No. WO 84/03564; and PCT Publication No. WO 84/03506.
Throughout the present specification, the expression "nucleotide sequence" may be employed to designate indifferently a polynucleotide or an oligonucleotide or a nucleic acid. More precisely, the expression "nucleotide sequence" encompasses the nucleic material itself and is thus not restricted to the sequence information the succession of letters chosen among the four base letters) that biochemically characterizes a specific DNA or RNA molecule.
As used interchangeably herein, the term "oligonucleotides", and "polynucleotides" include RNA, DNA, or RNA/DNA hybrid sequences of more than one nucleotide in either single chain or WO 00/08209 PCT/IB99/01444 duplex form. The term "nucleotide" as used herein as an adjective to describe molecules comprising RNA, DNA, or RNA/DNA hybrid sequences of any length in single-stranded or duplex form. The term "nucleotide" is also used herein as a noun to refer to individual nucleotides.or varieties of nucleotides, meaning a molecule, or individual unit in a larger nucleic acid molecule, comprising a purine or pyrimidine, a ribose or deoxyribose sugar moiety, and a phosphate group, or phosphodiester linkage in the case of nucleotides within an oligonucleotide or polynucleotide.
Although the term "nucleotide" is also used herein to encompass "modified nucleotides" which comprise at least one modification an alternative linking group, an analogous form of purine, an analogous form of pyrimidine, or an analogous sugar, for examples of analogous linking groups, purine, pyrimidines, and sugars see for example PCT publication No WO 95/04064.
However, the polynucleotides of the invention are preferably comprised of greater than conventional deoxyribose nucleotides, and most preferably greater than 90% conventional deoxyribose nucleotides. The polynucleotide sequences of the invention may be prepared by any known method, including synthetic, recombinant, ex vivo generation, or a combination thereof, as well as utilizing any purification methods known in the art.
The term "heterozygosity rate" is used herein to refer to the incidence of individuals in a population which are heterozygous at a particular allele. In a biallelic system, the heterozygosity rate is on average equal to 2 Pa(1-Pa), where Pa is the frequency of the least common allele. In order to be useful in genetic studies, a genetic marker should have an adequate level of heterozygosity to allow a reasonable probability that a randomly selected person will be heterozygous.
The term "genotype" as used herein refers the identity of the alleles present in an individual or a sample. In the context of the present invention a genotype preferably refers to the description of the biallelic marker alleles present in an individual or a sample. The term "genotyping" a sample or an individual for a biallelic marker consists of determining the specific allele or the specific nucleotide carried by an individual at a biallelic marker.
The term "polymorphism" as used herein refers to the occurrence of two or more alternative genomic sequences or alleles between or among different genomes or individuals. "Polymorphic" refers to the condition in which two or more variants of a specific genomic sequence can be found in a population. A "polymorphic site" is the locus at which the variation occurs. A single nucleotide polymorphism is a single base pair change. Typically a single nucleotide polymorphism is the replacement of one nucleotide by another nucleotide at the polymorphic site. Deletion of a single nucleotide or insertion of a single nucleotide, also give rise to single nucleotide polymorphisms. In the context of the present invention "single nucleotide polymorphism" preferably refers to a single nucleotide substitution. However, the polymorphism can also involve an insertion or a deletion of at least one nucleotide, preferably between 1 and 5 nucleotides.
Typically, between different genomes or between different individuals, the polymorphic site may be occupied by two different nucleotides.
WO 00/08209 PCT/IB99/01444 11 The term "biallelic polymorphism" and "biallelic marker" are used interchangeably herein to refer to a single nucleotide polymorphism having two alleles at a fairly high frequency in the population. A "biallelic marker allele" refers to the nucleotide variants present at a biallelic marker site. Typically, the frequency of the less common allele of the biallelic markers of the present invention has been validated to be greater than preferably the frequency is greater than more preferably the frequency is at least 20% heterozygosity rate of at least 0.32), even more preferably the frequency is at least 30% heterozygosity rate of at least 0.42). A biallelic marker wherein the frequency of the less common allele is 30% or more istermed a "high quality biallelic marker".
The location of nucleotides in a polynucleotide with respect to the center of the polynucleotide are described herein in the following manner. When a polynucleotide has an odd number of nucleotides, the nucleotide at an equal distance from the 3' and 5' ends of the polynucleotide is considered to be "at the center" of the polynucleotide, and any nucleotide immediately adjacent to the nucleotide at the center, or the nucleotide at the center itself is considered to be "within 1 nucleotide of the center." With an odd number of nucleotides in a polynucleotide any of the five nucleotides positions in the middle of the polynucleotide would be considered to be within 2 nucleotides of the center, and so on. When a polynucleotide has an even number of nucleotides, there would be a bond and not a nucleotide at the center of the polynucleotide. Thus, either of the two central nucleotides would be considered to be "within 1 nucleotide of the center" and any of the four nucleotides in the middle of the polynucleotide would be considered to be "within 2 nucleotides of the center", and so on. For polymorphisms which involve the substitution, insertion or deletion of 1 or more nucleotides, the polymorphism, allele or biallelic marker is "at the center" of a polynucleotide if the difference between the distance from the substituted, inserted, or deleted polynucleotides of the polymorphism and the 3' end of the polynucleotide, and the distance from the substituted, inserted, or deleted polynucleotides of the polymorphism and the 5' end of the polynucleotide is zero or one nucleotide. If this difference is 0 to 3, then the polymorphism is considered to be "within 1 nucleotide of the center." If the difference is 0 to 5, the polymorphism is considered to be "within 2 nucleotides of the center." If the difference is 0 to 7, the polymorphism is considered to be "within 3 nucleotides of the center," and so on.
As used herein the terminology "defining a biallelic marker" means that a sequence includes a polymorphic base from a biallelic marker. The sequences defining a biallelic marker may be of any length consistent with their intended use, provided that they contain a polymorphic base from a biallelic marker. The sequence has between 1 and 500 nucleotides in length, preferably between 5, 10, 15, 20, 25, or 40 and 200 nucleotides and more preferably between 30 and nucleotides in length. Each biallelic marker therefore corresponds to two forms of a polynucleotide sequence included in a gene, which, when compared with one another, present a nucleotide WO 00/08209 PCT/IB99/01444 12 modification at one position. Preferably, the sequences defining a biallelic marker include a polymorphic base selected from the group consisting of the biallelic markers Al to A19 and the complements thereof. In some embodiments the sequences defining a biallelic marker comprise one of the sequences selected from the group consisting of P1 to P7, P9 to P13, P15 to P19 and the complementary sequences thereto. Likewise, the term "marker" or "biallelic marker" requires that the sequence is of sufficient length to practically (although not necessarily unambiguously) identify the polymorphic allele, which usually implies a length of at least 4, 5, 6, 10, 15, 20, 25, or nucleotides.
The term "upstream" is used herein to refer to a location which is toward the 5' end of the polynucleotide from a specific reference point.
The terms "base paired" and "Watson Crick base paired" are used interchangeably herein to refer to nucleotides which can be hydrogen bonded to one another be virtue of their sequence identities in a manner like that found in double-helical DNA with thymine or uracil residues linked to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three hydrogen bonds (See Stryer, Biochemistry, 4 th edition, 1995).
The terms "complementary" or "complement thereof' are used herein to refer to the sequences of polynucleotides which is capable of forming Watson Crick base pairing with another specified polynucleotide throughout the entirety of the complementary region. For the purpose of the present invention, a first polynucleotide is deemed to be complementary to a second polynucleotide when each base in the first polynucleotide is paired with its complementary base.
Complementary bases are, generally, A and T (or A and or C and G. "Complement" is used herein as a synonym from "complementary polynucleotide", "complementary nucleic acid" and "complementary nucleotide sequence". These terms are applied to pairs ofpolynucleotides based solely upon their sequences and not any particular set of conditions under which the two polynucleotides would actually bind.
Variants and fragments 1. Polynucleotides The invention also relates to variants and fragments of the polynucleotides described herein, particularly of a TBC-1 gene containing one or more biallelic markers according to the invention.
Variants of polynucleotides, as the term is used herein, are polynucleotides that differ from a reference polynucleotide. A variant of a polynucleotide may be a naturally occurring variant such as a naturally occurring allelic variant, or it may be a variant that is not known to occur naturally.
Such non-naturally occurring variants of the polynucleotide may be made by mutagenesis techniques, including those applied to polynucleotides, cells or organisms. Generally, differences are limited so that the nucleotide sequences of the reference and the variant are closely similar overall and, in many regions, identical.
WO 00/08209 PCT/IB99/01444 13 Variants of polynucleotides according to the invention include, without being limited to, nucleotide sequences that are at least 95% identical to any of SEQ ID Nos 1-4 or the sequences complementary thereto or to any polynucleotide fragment of at least 8 consecutive nucleotides of any of SEQ ID Nos 1-4 or the sequences complementary thereto, and preferably at least 98% identical, more particularly at least 99.5% identical, and most preferably at least 99.9% identical to any of SEQ ID Nos 1-4 or the sequences complementary thereto or to any polynucleotide fragment of at least 8 consecutive nucleotides of any of SEQ ID Nos 1-4 or the sequences complementary thereto.
Changes in the nucleotide of a variant may be silent, which means that they do not alter the amino acids encoded by the polynucleotide.
However, nucleotide changes may also result in amino acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the reference sequence. The substitutions, deletions or additions may involve one or more nucleotides. The variants may be altered in coding or non-coding regions or both. Alterations in the coding regions may produce conservative or non-conservative amino acid substitutions, deletions or additions.
In the context of the present invention, particularly preferred embodiments are those in which the polynucleotides encode polypeptides which retain substantially the same biological function or activity as the mature TBC-1 protein.
A polynucleotide fragment is a polynucleotide having a sequence that entirely is the same as part but not all of a given nucleotide sequence, preferably the nucleotide sequence of a TBC-1 gene, and variants thereof. The fragment can be a portion of an exon or of an intron of a TBC-1 gene. It can also be a portion of the regulatory sequences of the TBC-1 gene. Preferably, such fragments comprise the polymorphic base of a biallelic marker selected from the group consisting of the biallelic markers Al to A19 and the complements thereof.
Such fragments may be "free-standing", i.e. not part of or fused to other polynucleotides, or they may be comprised within a single larger polynucleotide of which they form a part or region.
However, several fragments may be comprised within a single larger polynucleotide.
As representative examples of polynucleotide fragments of the invention, there may be mentioned those which have from about 4, 6, 8, 15, 20, 25, 40, 10 to 20, 10 to 30, 30 to 55, 50 to 100, 75 to 100 or 100 to 200 nucleotides in length. Preferred are those fragments having about 49 nucleotides in length, such as those of P1 to P7, P9 to P13, P15 to P19 or the sequences complementary thereto and containing at least one of the biallelic markers of a TBC-1 gene which are described herein.
2. Polypeptides.
The invention also relates to variants, fragments, analogs and derivatives of the polypeptides described herein, including mutated TBC-1 proteins.
WO 00/08209 PCT/IB99/01444 14 The variant may be 1) one in which one or more of the amino acid residues are substituted with a conserved or non-conserved amino acid residue (preferably a conserved amino acid residue) and such substituted amino acid residue may or may not be one encoded by the.genetic code, or 2) one in which one or more of the amino acid residues includes a substituent group, or 3) one in which the mutated TBC-1 is fused with another compound, such as a compound to increase the half-life of the polypeptide (for example, polyethylene glycol), or 4) one in which the additional amino acids are fused to the mutated TBC-1, such as a leader or secretory sequence or a sequence which is employed for purification of the mutated TBC-1 or a preprotein sequence. Such variants are deemed to be within the scope of those skilled in the art.
More particularly, a variant TBC-1 polypeptide comprises amino acid changes ranging from 1, 2, 3, 4, 5, 10 to 20 substitutions, additions or deletions of one aminoacid, preferably from 1 to more preferably from 1 to 5 and most preferably from 1 to 3 substitutions, additions or deletions of one amino acid. The preferred amino acid changes are those which have little or no influence on the biological activity or the capacity of the variant TBC-I polypeptide to be recognized by antibodies raised against a native TBC-1 protein.
By homologous peptide according to the present invention is meant a polypeptide containing one or several aminoacid additions, deletions and/or substitutions in the amino acid sequence of a TBC-1 polypeptide. In the case of an aminoacid substitution, one or several consecutive or non-consecutive- aminoacids are replaced by equivalent aminoacids.
The expression "equivalent" amino acid is used herein to designate any amino acid that may be substituted for one of the amino acids having similar properties, such that one skilled in the art of peptide chemistry would expect the secondary structure and hydropathic nature of the polypeptide to be substantially unchanged. Generally, the following groups of amino acids represent equivalent changes: Ala, Pro, Gly, Glu, Asp, Gin, Asn, Ser, Thr; Cys, Ser, Tyr, Thr; Val, Ile, Leu, Met, Ala, Phe; Lys, Arg, His; Phe, Tyr, Trp, His.
By an equivalent aminoacid according to the present invention is also meant the replacement of a residue in the L-form by a residue in the D form or the replacement of a Glutamic acid residue by a Pyro-glutamic acid compound. The synthesis of peptides containing at least one residue in the D-form is, for example, described by Koch (1977).
A specific, but not restrictive, embodiment of a modified peptide molecule of interest according to the present invention, which consists in a peptide molecule which is resistant to proteolysis, is a peptide in which the -CONH- peptide bond is modified and replaced by a (CH 2
NH)
reduced bond, a (NHCO) retro inverso bond, a (CH 2 methylene-oxy bond, a (CH 2
-S)
thiomethylene bond, a (CH 2 CH2) carba bond, a (CO-CH 2 cetomethylene bond, a (CHOH-CH2) hydroxyethylene bond), a bound, a E-alcene bond or also a -CH=CH- bond.
WO 00/08209 PCT/IB99/01444 The polypeptide accoding to the invention could have post-translational modifications. For example, it can present the following modifications: acylation, disulfide bond formation, prenylation, carboxymethylation and phosphorylation.
A polypeptide fragment is a polypeptide having a sequence that entirely is the same as part but not all of a given polypeptide sequence, preferably a polypeptide encoded by a TBC-1 gene and variants thereof. Preferred fragments include those regions possessing antigenic properties and which can be used to raise antibodies against the TBC-1 protein.
Such fragments may be "free-standing", i.e. not part of or fused to other polypeptides, or they may be comprised within a single larger polypeptide of which they form a part or region.
However, several fragments may be comprised within a single larger polypeptide.
As representative examples of polypeptide fragments of the invention, there may be mentioned those which comprise at least about 5, 6, 7, 8, 9 or 10 to 15, 10 to 20, 15 to 40, or 30 to amino acids of the TBC-1. In some embodiments, the fragments contain at least one amino acid mutation in the TBC-1 protein.
Identity Between Nucleic Acids Or Polypeptides The terms "percentage of sequence identity" and "percentage homology" are used interchangeably herein to refer to comparisons among polynucleotides and polypeptides, and are determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide or polypeptide sequence in the comparison window may comprise additions or deletions gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity. Homology is evaluated using any of the variety of sequence comparison algorithms and programs known in the art. Such algorithms and programs include, but are by no means limited to, TBLASTN, BLASTP, FASTA, TFASTA, and CLUSTALW (Pearson and Lipman, 1988; Altschul et al., 1990; Thompson et al., 1994; Higgins et al., 1996; Altschul et al., 1993). In a particularly preferred embodiment, protein and nucleic acid sequence homologies are evaluated using the Basic Local Alignment Search Tool ("BLAST') which is well known in the art (see, Karlin and Altschul, 1990; Altschul et al., 1990, 1993, 1997). In particular, five specific BLAST programs are used to perform the following task: BLASTP and BLAST3 compare an amino acid query sequence against a protein sequence database; BLASTN compares a nucleotide query sequence against a nucleotide sequence database; WO 00/08209 PCT/IB99/01444 16 BLASTX compares the six-frame conceptual translation products of a query nucleotide sequence (both strands) against a protein sequence database; TBLASTN compares a query protein sequence against a nucleotide sequence database translated in all six reading frames (both strands); and TBLASTX compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
The BLAST programs identify homologous sequences by identifying similar segments, which are referred to herein as "high-scoring segment pairs," between a query amino or nucleic acid sequence and a test sequence which is preferably obtained from a protein or nucleic acid sequence database.
High-scoring segment pairs are preferably identified aligned) by means of a scoring matrix, many of which are known in the art. Preferably, the scoring matrix used is the BLOSUM62 matrix (Gonnet et al., 1992; Henikoff and Henikoff, 1993). Less preferably, the PAM or PAM250 matrices may also be used (see, Schwartz and Dayhoff, eds., 1978). The BLAST programs evaluate the statistical significance of all high-scoring segment pairs identified, and preferably selects those segments which satisfy a user-specified threshold of significance, such as a userspecified percent homology. Preferably, the statistical significance of a high-scoring segment pair is evaluated using the statistical significance formula of Karlin (see, Karlin and Altschul, 1990).
Stringent Hybridization Conditions By way of example and not limitation, procedures using conditions of high stringency are as follows: Prehybridization of filters containing DNA is carried out for 8 h to overnight at 65 0 C in buffer composed of 6X SSC, 50 mM Tris-HCl (pH 1 mM EDTA, 0.02% PVP, 0.02% Ficoll, 0.02% BSA, and 500 pg/ml denatured salmon sperm DNA. Filters are hybridized for 48 h at 65 0
C,
the preferred hybridization temperature, in prehybridization mixture containing 100 pg/ml denatured salmon sperm DNA and 5-20 X 10 6 cpm of 32 P-labeled probe. Alternatively, the hybridization step can be performed at 65 0 C in the presence of SSC buffer, 1 x SSC corresponding to 0.15M NaCI and 0.05 M Na citrate. Subsequently, filter washes can be done at 37 0 C for 1 h in a solution containing 2 x SSC, 0.01% PVP, 0.01% Ficoll, and 0.01% BSA, followed by a wash in 0.1 X SSC at 50 0 C for 45 min. Alternatively, filter washes can be performed in a solution containing 2 x SSC and 0.1% SDS, or 0.5 x SSC and 0.1% SDS, or 0.1 x SSC and 0.1% SDS at 68 0 C for minute intervals. Following the wash steps, the hybridized probes are detectable by autoradiography. Other conditions of high stringency which may be used are well known in the art and as cited in Sambrook et al., 1989; and Ausubel et al., 1989, are incorporated herein in their entirety. These hybridization conditions are suitable for a nucleic acid molecule of about nucleotides in length. There is no need to say that the hybridization conditions described above are to be adapted according to the length of the desired nucleic acid, following techniques well known to the one skilled in the art. The suitable hybridization conditions may for example be adapted WO 00/08209 PCT/IB99/01444 17 according to the teachings disclosed in the book of Hames and Higgins (1985) or in Sambrook et al.(1989).
Candidate Region On The Chromosome 4 (Linkage Analysis).
In order to localize the prostate cancer gene(s) starting from families, a systematic familial study of genetic link research is carried out using markers of the microsatellite type described at the Genethon laboratory by the Jean Weissenbach team (Dib et al., 1996).
The studies of genetic link or of "linkage" are based on the principle according to which two neighboring sequences on a chromosome do not present (or very rarely present) recombinations by crossing-over during meiosis. To do this, microsatellite DNA sequences (chromosomal markers) constantly co-inherited with the disease studied are searched for in a family having a predisposition for this disease. These DNA sequences organized in the form of a repetition of di-, tri- or tetranucleotides are systematically present along the genome, and thus allow the identification of chromosomal fragments harboring them. More than 5000 microsatellite markers, have been localized with precision on the genome as a result of the first studies on the genetic map carried out at Genethon under the supervision of Jean Weissenbach, and on the physical map (using the "Yeast Artificial Chromosomes"), work conducted by Daniel Cohen at C.E.P.H. and at Genethon (Chumakov et al., 1995). Genetic link analysis calculates the probabilities of recombinations of the target gene with the chromosomal markers used, according to the genealogical tree, the transmission of the disease, and the transmission of the markers. Thus if a particular allele of a given marker is transmitted with the disease more often than chance would have it (recombination level of between 0 and it is possible to deduce that the target gene in question is found in the neighborhood of the marker. Using this technique, it has been possible to localize several genes of genetic predisposition to familial cancers. In order to be able to be included in a genetic link study, the families affected by a hereditary form of the disease must satisfy the "informativeness" criteria: several affected subjects (and whose constitutional DNA is available) per generation, and at best having a large number of siblings.
By linkage analysis, the inventors have identified a candidate region for prostate cancer on chromosome 4. Indeed, the LOD scores at 2 points between the disease and the markers on a total population of approximately fifty families present a value of 2.49 for marker D4S398 which indicates a probable genetic link with this marker. The curve of the variation of the LOD score on a map of 5 markers is centered on D4S398 and the value higher than 3.3 indicates that a gene involved in familial prostate cancer is probably found in the region located between markers D4S2978 and D4S3018, or a space of approximately 9.7 cM.
WO 00/08209 PCT/IB99/01444 18 Homologies Of The Novel Human Gene Translation Product With A Known Murine Protein.
A novel human gene was found in this candidate region. It presents a good probability to be involved in cancer. Database homology searches have allowed the inventors todetermine that the translation product of this novel human gene has significant identity with a murine protein called tbcl. Therefore, the novel human gene of the invention has thus been called TBC-1 throughout the present specification. TBC-1 comprises an open Reading frame that encodes a novel protein, the TBC-1 protein. Based on sequence similarity, an alignment of a portion of the TBC-1 amino acid sequence with the known tbcl murine protein, it is expected that TBC1 protein may play a role in the cell cycle and in differentiation of various tissues. Indeed, the TBC1 protein contains a 200 amino acid domain called the TBC domain that is homologous to regions in the tre2-oncogene and in the yeast regulators of mitosis BUB2 and cdc 16.
The cDNA of the murine tbcl gene has been described in US Patent No US 5,700,927 and it encodes a putative protein product of 1141 amino acids. The N-terminus of the murine tbcl protein contains stretches of cysteines and histidines which may form zinc finger structures in the mature polypeptides. The N-terminus also comprises short stretches of basic amino acids which may be involved in a nuclear localization signal. The TBC domain of the murine tbc 1 protein contains several tyrosine residues which are conserved in BUB2 and cdc 16. The C-terminus of the murine tbc 1 protein contains a long stretch of evenly spaced leucine residues which are susceptible to form a leucine zipper motif.
The murine tbcl gene has been shown to be highly expressed in testis and kidney. However, lower levels of expression have also be identified in lung, spleen, brain, and heart. Moreover, murine tbcl is a nuclear protein which is expressed in a cell- and stage-specific manner.
Studies of murine bone marrow have demonstrated that erythroid cells and megakaryocytes expressed substantial levels of the murine tbc 1 protein, but none was detected in mature neutrophils.
Similarly, spermatogonia do not express murine tbcl, but primary and secondary spermatocytes express abundant tbcl. Later in the differentiation of the germ cells, the tbcl levels appear to decrease in spermatids and active sperm. The differentiation program of spermatogonia to spermatocytes therefore involves a significant upregulation of murine tbcl expression.
The general distribution of murine tbcl is not tissue-specific, but is cell-specific within individual tissues and intimately linked to tissue differentiation. The developmental expression of murine tbcl, particularly in hematopoietic and germ cells, suggests that this gene plays a role in the terminal differentiation program of several tissues.
Consequently, an alteration in the expression of the TBC-1 gene or in the amino acid sequence of the TBC-1 protein leading to an altered biological activity of the latter is likely to cause, directly or indirectly, cell proliferation disorders and thus diseases related to an abnormal cell proliferation such as cancer, particularly prostate cancer.
WO 00/08209 PCT/IB99/01444 19 Genomic Sequence Of TBC-1 The present invention concerns the genomic sequence of TBC-1. The present invention encompasses the TBC-1 gene, or TBC-1 genomic sequences consisting of, consisting essentially of, or comprising a sequence selected from the group consisting of SEQ ID Nos 1 and 2, a sequence complementary thereto, as well as fragments and variants thereof. These polynucleotides may be purified, isolated, or recombinant.
The inventors have sequenced two portions of the TBC-1 genomic sequence. The first portion of the TBC-1 gene sequence contains the three first exons of the TBC-1 gene, designated as Exon 1, Exon Ibis and Exon 2, and the 5' regulatory sequence located upstream of the transcribed sequences. The sequence of the first portion of the genomic sequence is disclosed in SEQ ID No 1.
The second portion contains the twelve last exons of the TBC-1 gene, designated as exons A, B, C, D, E, F, G, H, I, J, K, and L, and the 3' regulatory sequence which is located downstream of the transcribed sequences.
The exon positions in SEQ ID Nos 1 and 2 are detailed below in Table A.
Table A Exon Position in SEQ ID No 1 Intron Position in SEQ ID No 1 Beginning End Beginning End 1 2001 2077 1 2078 12739 Ibis 12292 12373 Ibis 12374 12739 2 12740 13249 2 13250 at least 17590 Exon Position in SEQ ID No 2 Intron Position in SEQ ID No 2 Benning EndBegi End nning End A 4661 4789 A 4790 6115 B 6116 6202 B 6203 9918 C 9919 10199 C 10200 14520 D 14521 14660 D 14661 50256 E 50257 50442 E 50443 56255 F 56256 56417 F 56418 63325 G 63326 63484 G 63485 76035 H 76036 76280 H 76281 78363 I 78364 78523 I 78524 85294 J 85295 85464 J 85465 93416 K 93417 93590 K 93591 97475 97476 97960 97476 97960 Intron 1 refers to the nucleotide sequence located between Exon 1 and Exon 2; Intron Ibis refers to the nucleotide sequence located between Exon Ibis and Exon 2; Intron A refers to the nucleotide sequence located between Exon A and Exon B; and so on. The position of the introns is detailed in Table A.
WO 00/08209 PCT/IB99/01444 The TBC-1 introns defined hereinafter for the purpose of the present invention are not exactly what is generally understood as "introns" by the one skilled in the art and will consequently be further defined below.
Generally, an intron is defined as a nucleotide sequence that is present both in the genomic DNA and in the unspliced mRNA molecule, and which is absent from the mRNA molecule which has already gone through splicing events. In the case of the TBC-1 gene, the inventors have found that at least two different spliced mRNA molecules are produced when this gene is transcribed, as it will be described in detail in a further section of the specification. The first spliced mRNA molecule comprises Exons 1 and 2. Thus, the genomic nucleotide sequence comprised between Exon 1 and Exon 2 is an intronic sequence as regards to this first mRNA molecule, despite the fact that this intronic sequence contains Exon 1bis. In contrast, Exon 1bis is of course an exonic nucleotide sequence as regards to the second TBC-1 mRNA molecule.
For the purpose of the present invention and in order to make a clear and unambiguous designation of the different nucleic acids encompassed, it has been postulated that the polynucleotides contained both in any of the nucleotide sequences of SEQ ID Nos 1 or 2 and in any of the nucleotide sequences of SEQ ID Nos 3 or 4 are considered as exonic sequences. Conversely, the polynucleotides contained in any of the nucleotide sequences of SEQ ID Nos 1 or 2 but which are absent both from the nucleotide sequence of SEQ ID No 3 and from the nucleotide sequence of SEQ ID No 4 are considered as intronic sequences.
The nucleic acids defining the TBC-1 introns described above, as well as their fragments and variants, may be used as oligonucleotide primers or probes in order to detect the presence of a copy of the TBC-1 gene in a test sample, or alternatively in order to amplify a target nucleotide sequence within the TBC-1 intronic sequences.
Thus, the invention embodies purified, isolated, or recombinant polynucleotides comprising a nucleotide sequence selected from the group consisting of the 15 exons of the TBC-1 gene which are described in the present invention, or a sequence complementary thereto. The invention also deals with purified, isolated, or recombinant nucleic acids comprising a combination of at least two exons of the TBC-1 gene, wherein the polynucleotides are arranged within the nucleic acid, from the to the 3'-end of said nucleic acid, in the same order as in SEQ ID Nos 1 and 2.
Thus, the invention embodies purified, isolated, or recombinant polynucleotides comprising a nucleotide sequence selected from the group consisting of the introns of the TBC-1 gene, or a sequence complementary thereto.
The invention also encompasses a purified, isolated, or recombinant polynucleotide comprising a nucleotide sequence having at least 70, 75, 80, 85, 90, or 95% nucleotide identity with a sequence selected from the group consisting of SEQ ID Nos 1 and 2 or a complementary sequence thereto or a fragment thereof. The nucleotide differences as regards to the nucleotide sequence of SEQ ID Nos 1 or 2 may be generally randomly distributed throughout the entire nucleic acid.
WO 00/08209 PCT/IB99/01444 21 Nevertheless, preferred nucleic acids are those wherein the nucleotide differences as regards to the nucleotide sequence of SEQ ID Nos I or 2 are predominantly located outside the coding sequences contained in the exons. These nucleic acids, as well as their fragments and variants, may be used as oligonucleotide primers or probes in order to detect the presence of a copy of the TBC-1 gene in a test sample, or alternatively in order to amplify a target nucleotide sequence within the TBC-1 sequences.
Another object of the invention consists of a purified, isolated, or recombinant nucleic acid that hybridizes with a sequence selected from the group consisting of SEQ ID Nos 1 and 2 or a complementary sequence thereto or a variant thereof, under the stringent hybridization conditions as defined above.
Particularly preferred nucleic acids of the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of a nucleotide sequence selected from the group consisting of SEQ ID Nos 1 and 2, or the complements thereof. Additionally preferred nucleic acids of the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 1: 1-1000, 1001-2000, 2001-3000, 3001-4000, 4001-5000, 5001-6000, 6001-7000, 7001-8000, 8001-9000, 9001-10000, 10001-11000, 11001-12000, 12001-13000, 13001-14000, 14001-15000, 15001-16000, 16001-17000, and 17001-17590. Other preferred nucleic acids of the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 2 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 2: 1-5000, 5001-10000, 10001-15000, 15001-20000, 20001-25000, 25001-30000, 30001-35000, 35001-40000, 40001-45000, 45001-50000, 50001- 55000, 55001-60000, 60001-65000, 65001-70000, 70001-75000, 75001-80000, 80001-85000, 85001-90000, 90001-95000, and 95001-99960.
While this section is entitled "Genomic Sequences of TBC-1," it should be noted that nucleic acid fragments of any size and sequence may also be comprised by the polynucleotides described in this section, flanking the genomic sequences of TBC-1 on either side or between two or more such genomic sequences.
TBC-1 cDNA Sequences The inventors have discovered that the expression of the TBC-1 gene leads to the production of at least two mRNA molecules, respectively a first and a second TBC-1 transcription WO 00/08209 PCT/IB99/01444 22 product, as the results of alternative splicing events. They result from two distinct first exons, namely Exon 1 and Exon Ibis.
The first transcription product comprises Exons 1,2, A, B, C, D, E, F, G, H, I, J, K, and L.
This cDNA of SEQ ID No 3 includes a 5'-UTR region, spanning the whole Exon 1 and part of Exon 2. This 5'-UTR region starts from the nucleotide at position 1 and ends at the nucleotide at position 170 of the nucleotide sequence of SEQ ID No 3. The cDNA of SEQ ID No 3 includes a 3'- UTR region starting from the nucleotide at position 3726 and ending at the nucleotide at position 3983 of the nucleotide sequence of SEQ ID No 3. This first transcription product harbors a polyadenylation signal located between the nucleotide at position 3942 and the nucleotide at position 3947 of the nucleotide sequence of SEQ ID No 3.
The second TBC-1 transcription product comprises Exons Ibis, 2, A, B, C, D, E, F, G, H, I, J, K, and L. This cDNA of SEQ ID No 4 includes a 5'-UTR region starting from the nucleotide at position 1 and ending at the nucleotide at position 175 of the nucleotide sequence of SEQ ID No 4.
This second cDNA also includes a 3'-UTR region starting from the nucleotide at position 3731 and ending at the nucleotide at position 3988 of the nucleotide sequence of SEQ ID No 4. This second transcription product harbors a polyadenylation signal located between the nucleotide at position 3947 and the nucleotide at position 3952 of the nucleotide sequence of SEQ ID No 4.
The 5'-end sequence of this second TBC-1 mRNA, more particularly the nucleotide sequence comprised between the nucleotide in position 1 and the nucleotide in position 458 of the nucleic acid of SEQ ID No 4 molecule corresponds to the nucleotide sequence of a 5'-EST that has been obtained from a human pancreas cDNA library and characterized following the teachings of the PCT Application No WO 96/34981. This 5'-EST is also part of the invention.
Another object of the invention consists of a purified or isolated nucleic acid comprising a polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ID Nos 3 and 4 and to nucleic acid fragments thereof.
Preferred nucleic acid fragments of the nucleotide sequences of SEQ ID Nos 3 and 4 consist in polynucleotides comprising their respective Open Reading Frames encoding the TBC-1 protein.
Other preferred nucleic acid fragments of the nucleotide sequences of SEQ ID Nos 3 and 4 consist in polynucleotides comprising at least a part of their respective 5'-UTR or 3'-UTR regions.
The invention also pertains to a purified or isolated nucleic acid having at least a 95% of nucleotide identity with any one of the nucleotide sequences of SEQ ID Nos 3 and 4, or a fragment thereof.
Another object of the invention consists of purified, isolated or recombinant nucleic acids comprising a polynucleotide that hybridizes, under the stringent hybridization conditions defined herein, with any one of the nucleotide sequences of SEQ ID Nos 3 and 4, or a sequence complementary thereto or a fragment thereof.
WO 00/08209 PCT/IB99/01444 23 The invention also relates to isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of a nucleotide sequence selected from the group consisting of SEQ ID Nos 3 and 4, or the complements thereof. Particularly preferred nucleic acids of the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 3 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 3: 1-500, 501-1000, 1001-1500, 1501-2000, 2001- 2500, 2501-3000, 3001-3500, and 3501-3983. Additionally preferred nucleic acids of the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 4 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or of the following nucleotide positions of SEQ ID No 4: 1-500, 501-1000, 1001-1500, 1501-2000, 2001-2500, 2501-3000, 3001-3500, and 3501-3988. Such a nucleic acid is notably useful as polynucleotide probe or primer specific for the TBC-1 gene or the TBC-1 mRNAs and cDNAs.
While this section is entitled TBC-1 cDNA Sequences," it should be noted that nucleic acid fragments of any size and sequence may also be comprised by the polynucleotides described in this section, flanking the genomic sequences of TBC-1 on either side or between two or more such genomic sequences.
Coding Regions The TBC-1 open reading frame is contained in the two TBC-1 mRNA molecules of about 4 kilobases isolated by the inventors.
More precisely, the effective TBC-1 coding sequence is comprised between the nucleotide at position 171 and the nucleotide at position 3725 of SEQ ID No 3, and between the nucleotide at position 176 and the nucleotide at position 3730 of the nucleotide sequence of SEQ ID No 4.
The invention further provides a purified or isolated nucleic acid comprising a polynucleotide selected from the group consisting of a polynucleotide comprising a nucleic acid sequence located between the nucleotide at position 171 and the nucleotide at position 3725 of SEQ ID No 3, and a polynucleotide comprising a nucleic acid sequence located between the nucleotide at position 176 and the nucleotide at position 3730 of SEQ ID No 4 or a variant or fragment thereof or a sequence complementary thereto.
The present invention concerns a purified or isolated nucleic acid encoding a human TBC-1 protein, wherein said TBC-1 protein comprises an amino acid sequence of SEQ ID No 5, a nucleotide sequence complementary thereto, a fragment or a variant thereof. The present invention also embodies isolated, purified, and recombinant polynucleotides which encode a polypeptides comprising a contiguous span of at least 6 amino acids, preferably at least 8 or 10 amino acids, WO 00/08209 PCT/IB99/01444 24 more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 5. In a preferred embodiment, the present invention embodies isolated, purified, and recombinant polynucleotides which encode a polypeptides comprising a contiguous span of at least 6 amino acids, preferably at least 8 or 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 5 wherein said contiguous span includes at least 1, 2, 3, 5 or 10 of the following amino acid positions in SEQ ID No 5: 1-300, 301-600, 601-900, and 901-1168.
The above disclosed polynucleotide that contains only coding sequences derived from the TBC-1 ORF may be expressed in a desired host cell or a desired host organism, when said polynucleotide is placed under the control of suitable expression signals. Such a polynucleotide, when placed under the suitable expression signals, may be inserted in a vector for its expression.
Regulatory Sequences Of TBC-1 The invention further deals with a purified or isolated nucleic acid comprising the nucleotide sequence of a regulatory region which is located either upstream of the first exon of the TBC-1 gene and which is contained in the TBC-1 genomic sequence of SEQ ID No 1, or downstream of the last exon of the TBC-1 gene and which is contained in the TBC-1 genomic sequence of SEQ ID No 2.
The 5'-regulatory sequence of the TBC-1 gene is localized between the nucleotide in position 1 and the nucleotide in position 2000 of the nucleotide sequence of SEQ ID No 1. The 3'regulatory sequence of the TBC-1 gene is localized between nucleotide position 97961 and nucleotide position 99960 of SEQ ID No 2.
Polynucleotides derived from the 5' and 3' regulatory regions are useful in order to detect the presence of at least a copy of a nucleotide sequence of SEQ ID Nos 1 or 2 or a fragment thereof in a test sample.
The promoter activity of the 5' regulatory regions contained in TBC-1 can be assessed as described below.
Genomic sequences lying upstream of the TBC-1 Exons are cloned into a suitable promoter reporter vector, such as the pSEAP-Basic, pSEAP-Enhancer, ppgal-Basic, p3gal-Enhancer, or pEGFP-1 Promoter Reporter vectors available from Clontech. Briefly, each of these promoter reporter vectors include multiple cloning sites positioned upstream of a reporter gene encoding a readily assayable protein such as secreted alkaline phosphatase, beta galactosidase, or green fluorescent protein. The sequences upstream of the TBC-1 coding region are inserted into the cloning sites upstream of the reporter gene in both orientations and introduced into an appropriate host cell. The level of reporter protein is assayed and compared to the level obtained from a vector which lacks an insert in the cloning site. The presence of an elevated expression level in the vector containing the insert with respect to the control vector indicates the presence of a promoter in the insert. If necessary, the upstream sequences can be cloned into vectors which contain an enhancer WO 00/08209 PCT/IB99/01444 for increasing transcription levels from weak promoter sequences. A significant level of expression above that observed with the vector lacking an insert indicates that a promoter sequence is present in the inserted upstream sequence.
Promoter sequences within the upstream genomic DNA may be further defined by constructing nested deletions in the upstream DNA using conventional techniques such as Exonuclease III digestion. The resulting deletion fragments can be inserted into the promoter reporter vector to determine whether the deletion has reduced or obliterated promoter activity. In this way, the boundaries of the promoters may be defined. If desired, potential individual regulatory sites within the promoter may be identified using site directed mutagenesis or linker scanning to obliterate potential transcription factor binding sites within the promoter, individually or in combination. The effects of these mutations on transcription levels may be determined by inserting the mutations into the cloning sites in the promoter reporter vectors.
Thus, the minimal size of the promoter of the TBC-1 gene can be determined through the measurement of TBC-1 expression levels. For this assay, an expression vector comprising decreasing sizes from the promoter generally ranging from 2 kb to 100 bp, with a 3' end which is constant, operably linked to TBC-I coding sequence or to a reporter gene is used. Cells, which are preferably prostate cells and more preferably prostate cancer cells, are transfected with this vector and the expression level of the gene is assessed.
The strength and the specificity of the promoter of the TBC-1 gene can be assessed through the expression levels of the gene operably linked to this promoter in different types of cells and tissues. In one embodiment, the efficacy of the promoter of the TBC-1 gene is assessed in normal and cancer cells. In a preferred embodiment, the efficacy of the promoter of the TBC-1 gene is assessed in normal prostate cells and in prostate cancer cells which can present different degrees of malignancy.
Polynucleotides carrying the regulatory elements located both at the 5' end and at the 3' end of the TBC-1 cDNAs may be advantageously used to control the transcriptional and translational activity of an heterologous polynucleotide of interest.
Thus, the present invention also concerns a purified or isolated nucleic acid comprising a polynucleotide which is selected from the group consisting of the 5' and 3' regulatory regions, or a sequence complementary thereto or a biologically active fragment or variant thereof. regulatory region" refers to the nucleotide sequence located between positions 1 and 2000 of SEQ ID No 1.
regulatory region" refers to the nucleotide sequence located between positions 97961 and 99960 of SEQ ID No 2.
The invention also pertains to a purified or isolated nucleic acid comprising a polynucleotide having at least 95% nucleotide identity with a polynucleotide selected from the group consisting of the 5' and 3' regulatory regions, advantageously 99 nucleotide identity, preferably 99.5% nucleotide identity and most preferably 99.8% nucleotide identity with a WO 00/08209 PCT/IB99/01444 26 polynucleotide selected from the group consisting of the 5' and 3' regulatory regions, or a sequence complementary thereto or a variant thereof or a biologically active fragment thereof.
Another object of the invention consists of purified, isolated or recombinant nucleic acids comprising a polynucleotide that hybridizes, under the stringent hybridization conditions defined herein, with a polynucleotide selected from the group consisting of the nucleotide sequences of the and 3' regulatory regions, or a sequence complementary thereto or a variant thereof or a biologically active fragment thereof.
The 5'UTR and 3'UTR regions of a gene are of particularimportance in that they often comprise regulatory elements which can play a role in providing appropriate expression levels, particularly through the control ofmRNA stability.
A 5' regulatory polynucleotide of the invention may include the 5'-UTR located between the nucleotide at position 1 and the nucleotide at position 170 of SEQ ID No 3, or a biologically active fragment or variant thereof.
Alternatively, a 5'-regulatory polynucleotide of the invention may include the located between the nucleotide at position 1 and the nucleotide at position 175 of SEQ ID No 4, or a biologically active fragment or variant thereof.
A 3' regulatory polynucleotide of the invention may include the 3'-UTR located between the nucleotide at position 3726 and the nucleotide at position 3983 of SEQ ID No 4, or a biologically active fragment or variant thereof.
Thus, the invention also pertains to a purified or isolated nucleic acid which is selected from the group consisting of: a) a nucleic acid comprising the nucleotide sequence of the 5' regulatory region; b) a nucleic acid comprising a biologically active fragment or variant of the nucleic acid of the 5' regulatory region.
Preferred fragments of the nucleic acid of the 5' regulatory region have a length of about 1000 nucleotides, more particularly of about 400 nucleotides, more preferably of about 200 nucleotides and most preferably about 100 nucleotides. More particularly, the invention further includes specific elements within this regulatory region, these elements preferably including the promoter region.
Preferred fragments of the 3' regulatory region are at least 50, 100, 150, 200, 300 or 400 bases in length.
By a "biologically active fragment or variant" of a TBC-1 regulatory polynucleotide according to the present invention is intended a polynucleotide comprising or alternatively consisting in a fragment of said polynucleotide which is functional as a regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide in a recombinant cell host.
For the purpose of the invention, a nucleic acid or polynucleotide is "functional" as a regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide if said WO 00/08209 PCT/IB99/01444 27 regulatory polynucleotide contains nucleotide sequences which contain transcriptional and translational regulatory information, and if such sequences are "operatively linked" to nucleotide sequences which encode the desired polypeptide or the desired polynucleotide. An operable linkage is a linkage in which the regulatory nucleic acid and the DNA sequence sought to be expressed are linked in such a way as to permit gene expression.
In order, to identify the relevant biologically active polynucleotide derivatives of the 5' or 3' regulatory region, the one skill in the art will refer to the book of Sambrook et al. (Sambrook, 1989) in order to use a recombinant vector carrying a marker gene beta galactosidase, chloramphenicol acetyl transferase, etc.) the expression of which will be detected when placed under the control of a biologically active derivative polynucleotide of the 5' or 3' regulatory region.
Regulatory polynucleotides of the invention may be prepared from any of the nucleotide sequences of SEQ ID Nos 1 or 2 by cleavage using the suitable restriction enzymes, the one skill in the art being guided by the book of Sambrook et al. (1989). Regulatory polynucleotides may also be prepared by digestion of any of the nucleotide sequences of SEQ ID Nos 1 or 2 by an exonuclease enzyme, such as Bal31 (Wabiko et al., 1986). These regulatory polynucleotides can also be prepared by chemical synthesis, as described elsewhere in the specification, when the synthesis of oligonucleotide probes or primers is disclosed.
The regulatory polynucleotides according to the invention may be advantageously part of a recombinant expression vector that may be used to express a coding sequence in a desired host cell or host organism. The recombinant expression vectors according to the invention are described elsewhere in the specification.
The invention also encompasses a polynucleotide comprising a) a nucleic acid comprising a regulatory nucleotide sequence of the 5' regulatory region, or a biologically active fragment or variant thereof; b) a polynucleotide encoding a desired polypeptide or nucleic acid, operably linked to the nucleic acid comprising a regulatory nucleotide sequence of the 5' regulatory region, or its biologically active fragment or variant.
c) Optionally, a nucleic acid comprising a 3' regulatory polynucleotide, preferably a 3'regulatory polynucleotide of the invention.
The desired polypeptide encoded by the above described nucleic acid may be of various nature or origin, encompassing proteins ofprokaryotic or eukaryotic origin. Among the polypeptides expressed under the control of a TBC-1 regulatory region, it may be cited bacterial, fungal or viral antigens. Are also encompassed eukaryotic proteins such as intracellular proteins, such as "house keeping" proteins, membrane-bound proteins, like receptors, and secreted proteins like the numerous endogenous mediators such as cytokines.
WO 00/08209 PCT/IB99/01444 28 The desired nucleic acid encoded by the above described polynucleotide, usually a RNA molecule, may be complementary to a TBC-1 coding sequence and thus useful as an antisense polynucleotide.
Such a polynucleotide may be included in a recombinant expression vector in order to express a desired polypeptide or a desired polynucleotide in host cell or in a host organism. Suitable recombinant vectors that contain a polynucleotide such as described hereinbefore are disclosed elsewhere in the specification.
TBC-1 Polypeptide And Peptide Fragments Thereof It is now easy to produce proteins in high amounts by genetic engineering techniques through expression vectors such as plasmids, phages or phagemids. The polynucleotide that code for one the polypeptides of the present invention is inserted in an appropriate expression vector in order to produce the polypeptide of interest in vitro.
Thus, the present invention also concerns a method for producing one of the polypeptides described herein, and especially a polypeptide of SEQ ID No 5 or a fragment or a variant thereof, wherein said method comprises the steps of: a) culturing, in an appropriate culture medium, a cell host previously transformed or transfected with the recombinant vector comprising a nucleic acid encoding a TBC-1 polypeptide, or a fragment or a variant thereof; b) harvesting the culture medium thus conditioned or lyse the cell host, for example by sonication or by an osmotic shock; c) separating or purifying, from the said culture medium, or from the pellet of the resultant host cell lysate the thus produced polypeptide of interest.
d) Optionally characterizing the produced polypeptide of interest.
In a specific embodiment of the above method, step a) is preceded by a step wherein the nucleic acid coding for a TBC-1 polypeptide, or a fragment or a variant thereof, is inserted in an appropriate vector, optionally after an appropriate cleavage of this amplified nucleic acid with one or several restriction endonucleases. The nucleic acid coding for a TBC-1 polypeptide or a fragment or a variant thereof may be the resulting product of an amplification reaction using a pair of primers according to the invention (by SDA, TAS, 3SR NASBA, TMA etc.).
The polypeptides according to the invention may be characterized by binding onto an immunoaffinity chromatography column on which polyclonal or monoclonal antibodies directed to a polypeptide of SEQ ID No 5, or a fragment or a variant thereof, have previously been immobilized.
Purification of the recombinant proteins or peptides according to the present invention may be carried out by passage onto a Nickel or Cupper affinity chromatography column. The Nickel chromatography column may contain the Ni-NTA resin (Porath et al., 1975).
WO 00/08209 PCT/IB99/01444 29 The polypeptides or peptides thus obtained may be purified, for example by high performance liquid chromatography, such as reverse phase and/or cationic exchange HPLC, as described by Rougeot et al. (1994). The reason to prefer this kind of peptide or protein purification is the lack of byproducts found in the elution samples which renders the resultant purified protein or peptide more suitable for a therapeutic use.
Another object of the present invention consists in a purified or isolated TBC-1 polypeptide or a fragment or a variant thereof.
In a preferred embodiment, the TBC-1 polypeptide comprises an amino acid sequence of SEQ ID No 5 or a fragment or a variant thereof. The present invention also embodies isolated, purified, and recombinant polypeptides comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, 100, 150 or 200 amino acids of SEQ ID No 5. The present invention also embodies isolated, purified, and recombinant polypeptides comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, 100, 150 or 200 amino acids of SEQ ID No 5, wherein said contiguous span includes at least 1, 2, 3, 5 or 10 of the following amino acid positions: 1-200, 201-400, 401-600, 601-800, 801-1000, 1001-1168.
The invention also encompasses a purified, isolated, or recombinant polypeptides comprising an amino acid sequence having at least 90, 95, 98 or 99% amino acid identity with the amino acid sequence of SEQ ID No 5 or a fragment thereof.
The TBC-1 polypeptide of the invention possesses amino acid homologies as regards to the murine TBC-1 protein of 1141 amino acids in length which is described in US Patent No US 5,700,927. The TBC-1 protein of the invention also possesses some homologies with two other proteins the Pollux drosophila protein (Zhang et al., 1996) and the CDC 16 protein from Caenorhabditis elegans (Wilson et al., 1994). Figure 1 represents an amino acid alignment of a portion of the amino acid sequence of the TBC-1 protein of SEQ ID No 5 with other proteins sharing amino acid homology with TBC-1. The upper line shows the whole amino acid sequence of the murine tbc-1 protein described in US Patent No US 5,700,927; the second line represents part of the amino acid sequence of the TBC-1 protein of SEQ ID No 5; the third line (Genbank access No dmuS0542) depicts the amino acid sequence of the Pollux protein mentioned above; the fourth line (Genbank access No celf35h12) shows the amino acid sequence of the C. elegans protein mentioned above; the fifth line presents positions in which consensus amino acids are identified, i.e.
amino acids shared by the sequences presented in the four upper lines, when present.
The TBC-1 polypeptide of the amino acid sequence of SEQ ID No 5 has 1168 amino acids in length. The TBC-1 polypeptide includes a "TBC domain" which is spanning from the amino acid in position 786 to the amino acid in position 974 of the amino acid sequence of SEQ ID No 5. This TBC domain is represented in Figure 1 as a grey area spanning from the amino acid numbered 758 to the amino acid numbered 949. This TBC domain is likely to regulate protein-protein interactions.
WO 00/08209 PCT/IB99/01444 Moreover, the TBC-1 TBC domain includes the amino acid sequence EVGYCQGL, spanning from the amino acid in position 886 to the amino acid in position 893 of the amino acid sequence of SEQ ID No 5. The EVGYCQGL amino acid sequence spans from the amino acid numbered 861 to the amino acid numbered 868 of Figure 1. This site may interact with a kinase. Based on the structural similarity to cdcl6, a yeast regulator of mitosis, TBC-1 is likely to regulate mitosis and cytokinesis by interacting with other proteins which also participate with the regulation of mitosis, cytokinesis and septum formation.
Preferred polypeptides of the invention comprise the TBC domain of TBC-1, or alternatively at least the EVGYCQGL amino acid sequence motif.
A further object of the present invention concerns a purified or isolated polypeptide which is encoded by a nucleic acid comprising a nucleotide sequence selected from the group consisting of SEQ ID Nos 1, 2, 3, and 4 or fragments or variants thereof.
A single variant molecule of the TBC-1 protein is explicitly excluded from the scope of the present invention, which is a polypeptide having the same amino acid sequence than the murine tbcl protein described in the US Patent No 5,700,927.
Amino acid deletions, additions or substitutions in the TBC-1 protein are preferably located outside of the TBC domain as defined above. Most preferably, a mutated TBC-1 protein has an intact "EVGYCQGL" amino acid motif.
Such a mutated TBC-1 protein may be the target of diagnostic tools, such as specific monoclonal or polyclonal antibodies, useful for detecting the mutated TBC-1 protein in a sample.
The invention also encompasses a TBC-1 polypeptide or a fragment or a variant thereof in which at least one peptide bound has been modified as described in the "Definitions" section.
Antibodies That Bind TBC-1 Polypeptides of the Invention Any TBC-1 polypeptide or whole protein may be used to generate antibodies capable of specifically binding to an expressed TBC-1 protein or fragments thereof as described.
One antibody composition of the invention is capable of specifically binding or specifically bind to the variant of the TBC-1 protein of SEQ ID No 5. For an antibody composition to specifically bind to TBC-1, it must demonstrate at least a 10%, 15%, 20%, 25%, 50%, or 100% greater binding affinity for TBC-1 protein than for another protein in an ELISA, RIA, or other antibody-based binding assay.
In a preferred embodiment, the invention concerns antibody compositions, either polyclonal or monoclonal, capable of selectively binding, or selectively bind to an epitope-containing a polypeptide comprising a contiguous span of at least 6 amino acids, preferably at least 8 to amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, 100, 150 or 200 amino acids of SEQ ID No 5; Optionally said epitope comprises at least 1, 2, 3, 5 or 10 of the following amino acid positions 1-200, 201-400, 401-600, 601-800, 801-1000, 1001-1168.
WO 00/08209 PCT/IB99/01444 31 The invention also concerns a purified or isolated antibody capable of specifically binding to a mutated TBC-1 protein or to a fragment or variant thereof comprising an epitope of the mutated TBC-1 protein. In another preferred embodiment, the present invention concerns an antibody capable of binding to a polypeptide comprising at least 10 consecutive amino acids of a TBC-1 protein and including at least one of the amino acids which can be encoded by the trait causing mutations.
In a preferred embodiment, the invention concerns the use in the manufacture of antibodies of a polypeptide comprising a contiguous span of at least 6 amino acids, preferably at least 8 to amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, 100, 150 or 200 amino acids of SEQ ID No 5; Optionally said polypeptide comprises at least 1, 2, 3, 5 or 10 of the following amino acid positions 1-200, 201-400, 401-600, 601-800, 801-1000, 1001-1168.
The antibodies of the invention may be labeled by any one of the radioactive, fluorescent or enzymatic labels known in the art.
The TBC-1 polypeptide of SEQ ID No 5 or a fragment thereof can be used for the preparation of polyclonal or monoclonal antibodies.
The TBC-1 polypeptide expressed from a DNA sequence comprising at least one of the nucleic acid sequences of SEQ ID Nos 1, 2, 3 and 4 may also be used to generate antibodies capable of specifically binding to the TBC-1 polypeptide of SEQ ID No 5or a fragment thereof.
Preferred antibodies according to the invention are prepared using TBC-1 peptide fragments that do not comprise the EVGYCQGL amino acid motif.
Other preferred antibodies of the invention are prepared using TBC-1 peptide fragments that do not comprise the TBC domain defined elsewhere in the specification.
The antibodies may be prepared from hybridomas according to the technique described by Kohler and Milstein in 1975. The polyclonal antibodies may be prepared by immunization of a mammal, especially a mouse or a rabbit, with a polypeptide according to the invention that is combined with an adjuvant of immunity, and then by purifying of the specific antibodies contained in the serum of the immunized animal on a affinity chromatography column on which has previously been immobilized the polypeptide that has been used as the antigen.
The present invention also includes, chimeric single chain Fv antibody fragments (Martineau et al., 1998), antibody fragments obtained through phage display libraries (Ridder et al., 1995; Vaughan et al., 1995) and humanized antibodies (Reinmann et al., 1997; Leger et al., 1997).
Antibody preparations prepared according to either protocol are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological sample. The antibodies may also be used in therapeutic compositions for killing cells expressing the protein or reducing the levels of the protein in the body.
WO 00/08209 PCT/IB99/01444 32 Consequently, the invention is also directed to a method for detecting specifically the presence of a TBC-1 polypeptide according to the invention in a biological sample, said method comprising the following steps a) bringing into contact the biological sample with a polyclonal or monoclonal antibody that specifically binds a TBC-1 polypeptide comprising an amino acid sequence of SEQ ID No 5, or to a peptide fragment or variant thereof; and b) detecting the antigen-antibody complex formed.
The invention also concerns a diagnostic kit for detecting in vitro the presence of a TBC-1 polypeptide according to the present invention in a biological sample, wherein said kit comprises: a) a polyclonal or monoclonal antibody that specifically binds a TBC-1 polypeptide comprising an amino acid sequence of SEQ ID No 5, or to a peptide fragment or variant thereof, optionally labeled; b) a reagent allowing the detection of the antigen-antibody complexes formed, said reagent carrying optionally a label, or being able to be recognized itself by a labeled reagent, more particularly in the case when the above-mentioned monoclonal or polyclonal antibody is not labeled by itself.
TBC-1-Related Biallelic Markers The inventors have discovered nucleotide polymorphisms located within the genomic DNA containing the TBC-1 gene, and among them SNP that are also termed biallelic markers. The biallelic markers of the invention can be used for example for the generation of genetic map, the linkage analysis, the association studies.
A- Identification Of TBC-1-related Biallelic Markers There are two preferred methods through which the biallelic markers of the present invention can be generated. In a first method, DNA samples from unrelated individuals are pooled together, following which the genomic DNA of interest is amplified and sequenced. The nucleotide sequences thus obtained are then analyzed to identify significant polymorphisms.
One of the major advantages of this method resides in the fact that the pooling of the DNA samples substantially reduces the number of DNA amplification reactions and sequencing which must be carried out. Moreover, this method is sufficiently sensitive so that a biallelic marker obtained therewith usually shows a sufficient degree of informativeness for conducting association studies.
In a second method for generating biallelic markers, the DNA samples are not pooled and are therefore amplified and sequenced individually. The resulting nucleotide sequences obtained are then also analyzed to identify significant polymorphisms.
It will readily be appreciated that when this second method is used, a substantially higher number of DNA amplification reactions must be carried out. It will further be appreciated that WO 00/08209 PCT/IB99/01444 33 including such potentially less informative biallelic markers in association studies to identify potential genetic associations with a trait may allow in some cases the direct identification of causal mutations, which may, depending on their penetrance, be rare mutations. This method is usually preferred when biallelic markers need to be identified in order to perform association studies within candidate genes.
In both methods, the genomic DNA samples from which the biallelic markers of the present invention are generated are preferably obtained from unrelated individuals corresponding to a heterogeneous population of known ethnic background, or from familial cases.
The number of individuals from whom DNA samples are obtained can vary substantially, preferably from about 10 to about 1000, preferably from about 50 to about 200 individuals. It is usually preferred to collect DNA samples from at least about 100 individuals in order to have sufficient polymorphic diversity in a given population to generate as many markers as possible and to generate statistically significant results.
As for the source of the genomic DNA to be subjected to analysis, any test sample can be foreseen without any particular limitation. The preferred source of genomic DNA used in the context of the present invention is the peripheral venous blood of each donor.
The techniques of DNA extraction are well-known to the skilled technician. Details of a preferred embodiment are provided in Example 2.
DNA samples can be pooled or unpooled for the amplification step. DNA amplification techniques are well-known to those skilled in the art.
Amplification techniques that can be used in the context of the present invention include, but are not limited to, the ligase chain reaction (LCR) described in EP-A- 320 308, WO 9320227 and EP-A-439 182, the polymerase chain reaction (PCR, RT-PCR) and techniques such as the nucleic acid sequence based amplification (NASBA) described in Guatelli et a1.(1990) and in Compton J.(1991), Q-beta amplification as described in European Patent Application No 4544610, strand displacement amplification as described in Walker et al.(1996) and EP A 684 315 and, target mediated amplification as described in PCT Publication WO 9322461.
LCR and Gap LCR are exponential amplification techniques, both depend on DNA ligase to join adjacent primers annealed to a DNA molecule. In Ligase Chain Reaction (LCR), probe pairs are used which include two primary (first and second) and two secondary (third and fourth) probes, all of which are employed in molar excess to target. The first probe hybridizes to a first segment of the target strand and the second probe hybridizes to a second segment of the target strand, the first and second segments being contiguous so that the primary probes abut one another in 5' phosphate- 3'hydroxyl relationship, and so that a ligase can covalently fuse or ligate the two probes into a fused product. In addition, a third (secondary) probe can hybridize to a portion of the first probe and a fourth (secondary) probe can hybridize to a portion of the second probe in a similar abutting fashion. Of course, if the target is initially double stranded, the secondary probes also will WO 00/08209 PCT/IB99/01444 34 hybridize to the target complement in the first instance. Once the ligated strand of primary probes is separated from the target strand, it will hybridize with the third and fourth probes, which can be ligated to form a complementary, secondary ligated product. It is important to realize that the ligated products are functionally equivalent to either the target or its complement. By repeated cycles of hybridization and ligation, amplification of the target sequence is achieved. A method for multiplex LCR has also been described (WO 9320227). Gap LCR (GLCR) is a version of LCR where the probes are not adjacent but are separated by 2 to 3 bases.
For amplification of mRNAs, it is within the scope of the present invention to reverse transcribe mRNA into cDNA followed by polymerase chain reaction (RT-PCR); or, to use a single enzyme for both steps as described in U.S. Patent No. 5,322,770 or, to use Asymmetric Gap LCR (RT-AGLCR) as described by Marshall et al.(1994). AGLCR is a modification of GLCR that allows the amplification of RNA.
The PCR technology is the preferred amplification technique used in the present invention.
A variety of PCR techniques are familiar to those skilled in the art. For a review of PCR technology, see White (1997) and the publication entitled "PCR Methods and Applications" (1991, Cold Spring Harbor Laboratory Press). In each of these PCR procedures, PCR primers on either side of the nucleic acid sequences to be amplified are added to a suitably prepared nucleic acid sample along with dNTPs and a thermostable polymerase such as Taq polymerase, Pfu polymerase, or Vent polymerase. The nucleic acid in the sample is denatured and the PCR primers are specifically hybridized to complementary nucleic acid sequences in the sample. The hybridized primers are extended. Thereafter, another cycle of denaturation, hybridization, and extension is initiated. The cycles are repeated multiple times to produce an amplified fragment containing the nucleic acid sequence between the primer sites. PCR has further been described in several patents including US Patents 4,683,195; 4,683,202; and 4,965,188.
The PCR technology is the preferred amplification technique used to identify new biallelic markers. A typical example of a PCR reaction suitable for the purposes of the present invention is provided in Example 3.
One of the aspects of the present invention is a method for the amplification of a TBC-1 gene, particularly the genomic sequences of SEQ ID Nos 1 and 2 or of the cDNA sequence of SEQ ID Nos 3 or 4 or a fragment or variant thereof in a test sample, preferably using the PCR technology. The method comprises the steps of contacting a test sample suspected of containing the target TBC-1 sequence or portion thereof with amplification reaction reagents comprising a pair of amplification primers.
Thus, the present invention also relates to a method for the amplification of a TBC-1 gene sequence, particularly of a fragment of the genomic sequence of SEQ ID No 1 or of the cDNA sequence of SEQ ID No 2 or 3, or a fragment or a variant thereof in a test sample, said method comprising the steps of: WO 00/08209 PCT/IB99/01444 a) contacting a test sample suspected of containing the targeted TBC-J gene sequence or portion thereof with amplification reaction reagents comprising a pair of amplification primers located on either side of the TBC-J region to be amplified, and b) optionally, detecting the amplification products.
The invention also concerns a kit for the amplification of a TBC-] gene sequence, particularly of a portion of the genomic sequence of SEQ ID Nos 1 or 2, or of the cDNA sequence of SEQ ID Nos 3 or 4, or a variant thereof in a test sample, wherein said kit comprises: a) a pair of oligonucleotide primers located on either side of the TBC-1 region to be amplified; b) optionally, the reagents necessary for performing the amplification reaction.
In one embodiment of the above amplification method and kit, the amplification product is detected by hybridization with a labeled probe having a sequence which is complementary to the amplified region. In another embodiment of the above amplification method and kit, primers comprise a sequence which is selected from the group consisting ofBl to B15, C1 to C15, D1 to D19, and El to E19.
In a first embodiment of the present invention, biallelic markers are identified using genomic sequence information generated by the inventors. Sequenced genomic DNA fragments are used to design primers for the amplification of 500 bp fragments. These 500 bp fragments are amplified from genomic DNA and are scanned for biallelic markers. Primers may be designed using the OSP software (Hillier L. and Green 1991). All primers may contain, upstream of the specific target bases, a common oligonucleotide tail that serves as a sequencing primer. Those skilled in the art are familiar with primer extensions, which can be used for these purposes.
Preferred primers, useful for the amplification of genomic sequences encoding the candidate genes, focus on promoters, exons and splice sites of the genes. A biallelic marker presents a higher probability to be an eventual causal mutation if it is located in these functional regions of the gene. Preferred amplification primers of the invention include the nucleotide sequences ofBl to B15 and C1 to C15 further detailed in Example 3.
The amplification products generated as described above with the primers of the invention are then sequenced using methods known and available to the skilled technician. Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing reactions using a dyeprimer cycle sequencing protocol. Following gel image analysis and DNA sequence extraction, sequence data are automatically processed with adequate software to assess sequence quality.
A polymorphism analysis software is used that detects the presence ofbiallelic sites among individual or pooled amplified fragment sequences. Polymorphism search is based on the presence of superimposed peaks in the electrophoresis pattern. These peaks which present distinct colors correspond to two different nucleotides at the same position on the sequence. The polymorphism has to be detected on both strands for validation.
WO 00/08209 PCT/IB99/01444 36 19 biallelic markers were found in the TBC-1 gene. They are detailed in the Table 2. They are located in intronic regions.
B- Genotyping Of TBC-1-Related Biallelic Markers The polymorphisms identified above can be further confirmed and their respective frequencies can be determined through various methods using the previously described primers and probes. These methods can also be useful for genotyping either new populations in association studies or linkage analysis or individuals in the context of detection of alleles of biallelic markers which are known to be associated with a given trait. The genotyping of the biallelic markers is also important for the mapping. Those skilled in the art should note that the methods described below can be equally performed on individual or pooled DNA samples.
Once a given polymorphic site has been found and characterized as a biallelic marker as described above, several methods can be used in order to determine the specific allele carried by an individual at the given polymorphic base.
The identification of biallelic markers described previously allows the design of appropriate oligonucleotides, which can be used as probes and primers, to amplify a TBC-1 gene containing the polymorphic site of interest and for the detection of such polymorphisms.
The biallelic markers according to the present invention may be used in methods for the identification and characterization of an association between alleles for one or several biallelic markers of the sequence of the TBC-1 gene and a trait.
The identified polymorphisms, and consequently the biallelic markers of the invention, may be used in methods for the detection in an individual of TBC-1 alleles associated with a trait, more particularly a trait related to a cell differentiation or abnormal cell proliferation disorders, and most particularly a trait related to cancer diseases, specifically prostate cancer.
In one embodiment the invention encompasses methods of genotyping comprising determining the identity of a nucleotide at a TBC-1-related biallelic marker or the complement thereof in a biological sample; optionally, wherein said TBC-l-related biallelic marker is selected from the group consisting of Al to A19, and the complements thereof, or optionally the biallelic markers in linkage disequilibrium therewith; optionally, wherein said biological sample is derived from a single subject; optionally, wherein the identity of the nucleotides at said biallelic marker is determined for both copies of said biallelic marker present in said individual's genome; optionally, wherein said biological sample is derived from multiple subjects; Optionally, the genotyping methods of the invention encompass methods with any further limitation described in this disclosure, or those following, specified alone or in any combination; Optionally, said method is performed in vitro; optionally, further comprising amplifying a portion of said sequence comprising the biallelic marker prior to said determining step; Optionally, wherein said amplifying is performed by PCR, LCR, or replication of a recombinant vector comprising an origin of replication and said fragment in a host cell; optionally, wherein said determining is performed by a WO 00/08209 PCT/IB99/01444 37 hybridization assay, a sequencing assay, a microsequencing assay, or an enzyme-based mismatch detection assay.
Source of Nucleic Acids for genotvping Any source of nucleic acids, in purified or non-purified form, can be utilized as the starting nucleic acid, provided it contains or is suspected of containing the specific nucleic acid sequence desired. DNA or RNA may be extracted from cells, tissues, body fluids and the like as described above. While nucleic acids for use in the genotyping methods of the invention can be derived from any mammalian source, the test subjects and individuals from which nucleic acid samples are taken are generally understood to be human.
Amplification Of DNA Fragments Comprising Biallelic Markers Methods and polynucleotides are provided to amplify a segment ofnucleotides comprising one or more biallelic marker of the present invention. It will be appreciated that amplification of DNA fragments comprising biallelic markers may be used in various methods and for various purposes and is not restricted to genotyping. Nevertheless, many genotyping methods, although not all, require the previous amplification of the DNA region carrying the biallelic marker of interest.
Such methods specifically increase the concentration or total number of sequences that span the biallelic marker or include that site and sequences located either distal or proximal to it. Diagnostic assays may also rely on amplification of DNA segments carrying a biallelic marker of the present invention. Amplification of DNA may be achieved by any method known in the art. Amplification techniques are described above in the section entitled, "Identification of TBC-1-related biallelic markers." Some of these amplification methods are particularly suited for the detection of single nucleotide polymorphisms and allow the simultaneous amplification of a target sequence and the identification of the polymorphic nucleotide as it is further described below.
The identification of biallelic markers as described above allows the design of appropriate oligonucleotides, which can be used as primers to amplify DNA fragments comprising the biallelic markers of the present invention. Amplification can be performed using the primers initially used to discover new biallelic markers which are described herein or any set of primers allowing the amplification of a DNA fragment comprising a biallelic marker of the present invention.
In some embodiments the present invention provides primers for amplifying a DNA fragment containing one or more biallelic markers of the present invention. Preferred amplification primers are listed in Example 2. It will be appreciated that the primers listed are merely exemplary and that any other set of primers which produce amplification products containing one or more biallelic markers of the present invention are also of use.
The spacing of the primers determines the length of the segment to be amplified. In the context of the present invention, amplified segments carrying biallelic markers can range in size from at least about 25 bp to 35 kbp. Amplification fragments from 25-3000 bp are typical, WO 00/08209 PCT/IB99/01444 38 fragments from 50-1000 bp are preferred and fragments from 100-600 bp are highly preferred. It will be appreciated that amplification primers for the biallelic markers may be any sequence which allow the specific amplification of any DNA fragment carrying the markers. Amplification primers may be labeled or immobilized on a solid support as described in "Oligonucleotide probes and primers".
Methods of Genotyping DNA samples for Biallelic Markers Any method known in the art can be used to identify the nucleotide present at a biallelic marker site. Since the biallelic marker allele to be detected has been identified and specified in the present invention, detection will prove simple for one of ordinary skill in the art by employing any of a number of techniques. Many genotyping methods require the previous amplification of the DNA region carrying the biallelic marker of interest. While the amplification of target or signal is often preferred at present, ultrasensitive detection methods which do not require amplification are also encompassed by the present genotyping methods. Methods well-known to those skilled in the art that can be used to detect biallelic polymorphisms include methods such as, conventional dot blot analyzes, single strand conformational polymorphism analysis (SSCP) described by Orita et al.(1989), denaturing gradient gel electrophoresis (DGGE), heteroduplex analysis, mismatch cleavage detection, and other conventional techniques as described in Sheffield et al.(1991), White et al.(1992), Grompe et al.(1989 and 1993). Another method for determining the identity of the nucleotide present at a particular polymorphic site employs a specialized exonuclease-resistant nucleotide derivative as described in US patent 4,656,127.
Preferred methods involve directly determining the identity of the nucleotide present at a biallelic marker site by sequencing assay, enzyme-based mismatch detection assay, or hybridization assay. The following is a description of some preferred methods. A highly preferred method is the microsequencing technique. The term "sequencing" is generally used herein to refer to polymerase extension of duplex primer/template complexes and includes both traditional sequencing and microsequencing.
1) Sequencing Assays The nucleotide present at a polymorphic site can be determined by sequencing methods. In a preferred embodiment, DNA samples are subjected to PCR amplification before sequencing as described above. DNA sequencing methods are described in "Sequencing Of Amplified Genomic DNA And Identification Of Single Nucleotide Polymorphisms".
Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing reactions using a dye-primer cycle sequencing protocol. Sequence analysis allows the identification of the base present at the biallelic marker site.
2) Microsequencing Assays WO 00/08209 PCT/IB99/01444 39 In microsequencing methods, the nucleotide at a polymorphic site in a target DNA is detected by a single nucleotide primer extension reaction. This method involves appropriate microsequencing primers which, hybridize just upstream of the polymorphic base of interest in the target nucleic acid. A polymerase is used to specifically extend the 3' end of the primer with one single ddNTP (chain terminator) complementary to the nucleotide at the polymorphic site. Next the identity of the incorporated nucleotide is determined in any suitable way.
Typically, microsequencing reactions are carried out using fluorescent ddNTPs and the extended microsequencing primers are analyzed by electrophoresis on ABI 377 sequencing machines to determine the identity of the incorporated nucleotide as described in EP 412 883, the disclosure of which is incorporated herein by reference in its entirety. Alternatively capillary electrophoresis can be used in order to process a higher number of assays simultaneously. An example of a typical microsequencing procedure that can be used in the context of the present invention is provided in Example 4.
Different approaches can be used for the labeling and detection of ddNTPs. A homogeneous phase detection method based on fluorescence resonance energy transfer has been described by Chen and Kwok (1997) and Chen et al.(1997). In this method, amplified genomic DNA fragments containing polymorphic sites are incubated with a 5'-fluorescein-labeled primer in the presence of allelic dye-labeled dideoxyribonucleoside triphosphates and a modified Taq polymerase. The dye-labeled primer is extended one base by the dye-terminator specific for the allele present on the template. At the end of the genotyping reaction, the fluorescence intensities of the two dyes in the reaction mixture are analyzed directly without separation or purification. All these steps can be performed in the same tube and the fluorescence changes can be monitored in real time. Alternatively, the extended primer may be analyzed by MALDI-TOF Mass Spectrometry. The base at the polymorphic site is identified by the mass added onto the microsequencing primer (see Haff and Smirnov, 1997).
Microsequencing may be achieved by the established microsequencing method or by developments or derivatives thereof. Alternative methods include several solid-phase microsequencing techniques. The basic microsequencing protocol is the same as described previously, except that the method is conducted as a heterogeneous phase assay, in which the primer or the target molecule is immobilized or captured onto a solid support. To simplify the primer separation and the terminal nucleotide addition analysis, oligonucleotides are attached to solid supports or are modified in such ways that permit affinity separation as well as polymerase extension. The 5' ends and internal nucleotides of synthetic oligonucleotides can be modified in a number of different ways to permit different affinity separation approaches, biotinylation. If a single affinity group is used on the oligonucleotides, the oligonucleotides can be separated from the incorporated terminator regent. This eliminates the need of physical or size separation. More than one oligonucleotide can be separated from the terminator reagent and analyzed simultaneously if WO 00/08209 PCT/IB99/01444 more than one affinity group is used. This permits the analysis of several nucleic acid species or more nucleic acid sequence information per extension reaction. The affinity group need not be on the priming oligonucleotide but could alternatively be present on the template. For example, immobilization can be carried out via an interaction between biotinylated DNA and streptavidincoated microtitration wells or avidin-coated polystyrene particles. In the same manner, oligonucleotides or templates may be attached to a solid support in a high-density format. In such solid phase microsequencing reactions, incorporated ddNTPs can be radiolabeled (Syvinen, 1994) or linked to fluorescein (Livak and Hainer, 1994). The detection of radiolabeled ddNTPs can be achieved through scintillation-based techniques. The detection of fluorescein-linked ddNTPs can be based on the binding of antifluorescein antibody conjugated with alkaline phosphatase, followed by incubation with a chromogenic substrate (such as p-nitrophenyl phosphate). Other possible reporter-detection pairs include: ddNTP linked to dinitrophenyl (DNP) and anti-DNP alkaline phosphatase conjugate (Harju et al., 1993) or biotinylated ddNTP and horseradish peroxidaseconjugated streptavidin with o-phenylenediamine as a substrate (WO 92/15712). As yet another alternative solid-phase microsequencing procedure, Nyren et al.(1993) described a method relying on the detection of DNA polymerase activity by an enzymatic luminometric inorganic pyrophosphate detection assay (ELIDA).
Pastinen et al.(1997) describe a method for multiplex detection of single nucleotide polymorphism in which the solid phase minisequencing principle is applied to an oligonucleotide array format. High-density arrays of DNA probes attached to a solid support (DNA chips) are further described below.
In one aspect the present invention provides polynucleotides and methods to genotype one or more biallelic markers of the present invention by performing a microsequencing assay.
Preferred microsequencing primers include the nucleotide sequences Dl to D15 and El to E15. It will be appreciated that the microsequencing primers listed in Example 5 are merely exemplary and that, any primer having a 3' end immediately adjacent to the polymorphic nucleotide may be used.
Similarly, it will be appreciated that microsequencing analysis may be performed for any biallelic marker or any combination of biallelic markers of the present invention. One aspect of the present invention is a solid support which includes one or more microsequencing primers listed in Example 5, or fragments comprising at least 8, 12, 15, 20, 25, 30, 40, or 50 consecutive nucleotides thereof, to the extent that such lengths are consistent with the primer described, and having a 3' terminus immediately upstream of the corresponding biallelic marker, for determining the identity of a nucleotide at a biallelic marker site.
3) Mismatch detection assays based on polymerases and ligases In one aspect the present invention provides polynucleotides and methods to determine the allele of one or more biallelic markers of the present invention in a biological sample, by mismatch WO 00/08209 PCT/IB99/01444 41 detection assays based on polymerases and/or ligases. These assays are based on the specificity of polymerases and ligases. Polymerization reactions places particularly stringent requirements on correct base pairing of the 3' end of the amplification primer and the joining of two oligonucleotides hybridized to a target DNA sequence is quite sensitive to mismatches close to the ligation site, especially at the 3' end. Methods, primers and various parameters to amplify DNA fragments comprising biallelic markers of the present invention are further described above in "Amplification Of DNA Fragments Comprising Biallelic Markers".
Allele Specific Amplification Primers Discrimination between the two alleles of a biallelic marker can also be achieved by allele specific amplification, a selective strategy, whereby one of the alleles is amplified without amplification of the other allele. For allele specific amplification, at least one member of the pair of primers is sufficiently complementary with a region of a TBC-1 gene comprising the polymorphic base of a biallelic marker of the present invention to hybridize therewith and to initiate the amplification. Such primers are able to discriminate between the two alleles of a biallelic marker.
This is accomplished by placing the polymorphic base at the 3' end of one of the amplification primers. Because the extension forms from the 3'end of the primer, a mismatch at or near this position has an inhibitory effect on amplification. Therefore, under appropriate amplification conditions, these primers only direct amplification on their complementary allele.
Determining the precise location of the mismatch and the corresponding assay conditions are well within the ordinary skill in the art.
Ligation/Amplification Based Methods The "Oligonucleotide Ligation Assay" (OLA) uses two oligonucleotides which are designed to be capable of hybridizing to abutting sequences of a single strand of a target molecules.
One of the oligonucleotides is biotinylated, and the other is detectably labeled. If the precise complementary sequence is found in a target molecule, the oligonucleotides will hybridize such that their termini abut, and create a ligation substrate that can be captured and detected. OLA is capable of detecting single nucleotide polymorphisms and may be advantageously combined with PCR as described by Nickerson et al.(1990). In this method, PCR is used to achieve the exponential amplification of target DNA, which is then detected using OLA.
Other amplification methods which are particularly suited for the detection of single nucleotide polymorphism include LCR (ligase chain reaction), Gap LCR (GLCR) which are described above in "DNA Amplification". LCR uses two pairs of probes to exponentially amplify a specific target. The sequences of each pair of oligonucleotides, is selected to permit the pair to hybridize to abutting sequences of the same strand of the target. Such hybridization forms a substrate for a template-dependant ligase. In accordance with the present invention, LCR can be performed with oligonucleotides having the proximal and distal sequences of the same strand of a biallelic marker site. In one embodiment, either oligonucleotide will be designed to include the WO 00/08209 PCT/IB99/01 444 42 biallelic marker site. In such an embodiment, the reaction conditions are selected such that the oligonucleotides can be ligated together only if the target molecule either contains or lacks the specific nucleotide that is complementary to the biallelic marker on the oligonucleotide. In an alternative embodiment, the oligonucleotides will not include the biallelic marker, such that when they hybridize to the target molecule, a "gap" is created as described in WO 90/01069. This gap is then "filled" with complementary dNTPs (as mediated by DNA polymerase), or by an additional pair of oligonucleotides. Thus at the end of each cycle, each single strand has a complement capable of serving as a target during the next cycle and exponential allele-specific amplification of the desired sequence is obtained.
Ligase/Polymerase-mediated Genetic Bit Analysis T M is another method for determining the identity of a nucleotide at a preselected site in a nucleic acid molecule (WO 95/21271). This method involves the incorporation of a nucleoside triphosphate that is complementary to the nucleotide present at the preselected site onto the terminus of a primer molecule, and their subsequent ligation to a second oligonucleotide. The reaction is monitored by detecting a specific label attached to the reaction's solid phase or by detection in solution.
4) Hybridization Assay Methods A preferred method of determining the identity of the nucleotide present at a biallelic marker site involves nucleic acid hybridization. The hybridization probes, which can be conveniently used in such reactions, preferably include the probes defined herein. Any hybridization assay may be used including Southern hybridization, Northern hybridization, dot blot hybridization and solid-phase hybridization (see Sambrook et al., 1989).
Hybridization refers to the formation of a duplex structure by two single stranded nucleic acids due to complementary base pairing. Hybridization can occur between exactly complementary nucleic acid strands or between nucleic acid strands that contain minor regions of mismatch.
Specific probes can be designed that hybridize to one form of a biallelic marker and not to the other and therefore are able to discriminate between different allelic forms. Allele-specific probes are often used in pairs, one member of a pair showing perfect match to a target sequence containing the original allele and the other showing a perfect match to the target sequence containing the alternative allele. Hybridization conditions should be sufficiently stringent that there is a significant difference in hybridization intensity between alleles, and preferably an essentially binary response, whereby a probe hybridizes to only one of the alleles. Stringent, sequence specific hybridization conditions, under which a probe will hybridize only to the exactly complementary target sequence are well known in the art (Sambrook et al., 1989). Stringent conditions are sequence dependent and will be different in different circumstances. Generally, stringent conditions are selected to be about 5°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. Although such hybridization can be performed in solution, it is preferred to employ a WO 00/08209 PCT/IB99/01444 43 solid-phase hybridization assay. The target DNA comprising a biallelic marker of the present invention may be amplified prior to the hybridization reaction. The presence of a specific allele in the sample is determined by detecting the presence or the absence of stable hybrid duplexes formed between the probe and the target DNA. The detection of hybrid duplexes can be carried out by a number of methods. Various detection assay formats are well known which utilize detectable labels bound to either the target or the probe to enable detection of the hybrid duplexes. Typically, hybridization duplexes are separated from unhybridized nucleic acids and the labels bound to the duplexes are then detected. Those skilled in the art will recognizethat wash steps may be employed to wash away excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay formats are suitable for detecting the hybrids using the labels present on the primers and probes.
Two recently developed assays allow hybridization-based allele discrimination with no need for separations or washes (see Landegren U. et al., 1998). The TaqMan assay takes advantage of the 5' nuclease activity of Taq DNA polymerase to digest a DNA probe annealed specifically to the accumulating amplification product. TaqMan probes are labeled with a donor-acceptor dye pair that interacts via fluorescence energy transfer. Cleavage of the TaqMan probe by the advancing polymerase during amplification dissociates the donor dye from the quenching acceptor dye, greatly increasing the donor fluorescence. All reagents necessary to detect two allelic variants can be assembled at the beginning of the reaction and the results are monitored in real time (see Livak et al., 1995). In an alternative homogeneous hybridization based procedure, molecular beacons are used for allele discriminations. Molecular beacons are hairpin-shaped oligonucleotide probes that report the presence of specific nucleic acids in homogeneous solutions. When they bind to their targets they undergo a conformational reorganization that restores the fluorescence of an internally quenched fluorophore (Tyagi et al., 1998).
The polynucleotides provided herein can be used to produce probes which can be used in hybridization assays for the detection of biallelic marker alleles in biological samples. These probes are characterized in that they preferably comprise between 8 and 50 nucleotides, and in that they are sufficiently complementary to a sequence comprising a biallelic marker of the present invention to hybridize thereto and preferably sufficiently specific to be able to discriminate the targeted sequence for only one nucleotide variation. A particularly preferred probe is 25 nucleotides in length. Preferably the biallelic marker is within 4 nucleotides of the center of the polynucleotide probe. In particularly preferred probes, the biallelic marker is at the center of said polynucleotide.
Preferred probes comprise a nucleotide sequence selected from the group consisting of amplicons listed in Table 1 and the sequences complementary thereto, or a fragment thereof, said fragment comprising at least about 8 consecutive nucleotides, preferably 10, 15, 20, more preferably 25, 47, or 50 consecutive nucleotides and containing a polymorphic base. Preferred probes comprise a nucleotide sequence selected from the group consisting of P1 to P7, P9 to P13, P15 to WO 00/08209 PCT/IB99/01444 44 P19 and the sequences complementary thereto. In preferred embodiments the polymorphic base(s) are within 5, 4, 3, 2, 1, nucleotides of the center of the said polynucleotide, more preferably at the center of said polynucleotide.
Preferably the probes of the present invention are labeled or immobilized on a solid support.
Labels and solid supports are further described in "Oligonucleotide Probes and Primers". The probes can be non-extendable as described in "Oligonucleotide Probes and Primers".
By assaying the hybridization to an allele specific probe, one can detect the presence or absence of a biallelic marker allele in a given sample. High-Throughput parallel hybridization in array format is specifically encompassed within "hybridization assays" and are described below.
5) Hybridization To Addressable Arrays Of Oligonucleotides Hybridization assays based on oligonucleotide arrays rely on the differences in hybridization stability of short oligonucleotides to perfectly matched and mismatched target sequence variants. Efficient access to polymorphism information is obtained through a basic structure comprising high-density arrays of oligonucleotide probes attached to a solid support the chip) at selected positions. Each DNA chip can contain thousands to millions of individual synthetic DNA probes arranged in a grid-like pattern and miniaturized to the size of a dime.
The chip technology has already been applied with success in numerous cases. For example, the screening of mutations has been undertaken in the BRCA1 gene, in S. cerevisiae mutant strains, and in the protease gene of HIV-1 virus (Hacia et al., 1996; Shoemaker et al., 1996; Kozal et al., 1996). Chips of various formats for use in detecting biallelic polymorphisms can be produced on a customized basis by Affymetrix (GeneChip
M
Hyseq (HyChip and HyGnostics), and Protogene Laboratories.
In general, these methods employ arrays of oligonucleotide probes that are complementary to target nucleic acid sequence segments from an individual which, target sequences include a polymorphic marker. EP 785280 describes a tiling strategy for the detection of single nucleotide polymorphisms. Briefly, arrays may generally be "tiled" for a large number of specific polymorphisms. By "tiling" is generally meant the synthesis of a defined set of oligonucleotide probes which is made up of a sequence complementary to the target sequence of interest, as well as preselected variations of that sequence, substitution of one or more given positions with one or more members of the basis set of nucleotides. Tiling strategies are further described in PCT application No. WO 95/11995. In a particular aspect, arrays are tiled for a number of specific, identified biallelic marker sequences. In particular, the array is tiled to include a number of detection blocks, each detection block being specific for a specific biallelic marker or a set of biallelic markers. For example, a detection block may be tiled to include a number of probes, which span the sequence segment that includes a specific polymorphism. To ensure probes that are complementary to each allele, the probes are synthesized in pairs differing at the biallelic marker.
WO 00/08209 PCT/IB99/01444 In addition to the probes differing at the polymorphic base, monosubstituted probes are also generally tiled within the detection block. These monosubstituted probes have bases at and up to a certain number of bases in either direction from the polymorphism, substituted with the remaining nucleotides (selected from A, T, G, C and Typically the probes in a tiled detection block will include substitutions of the sequence positions up to and including those that are 5 bases away from the biallelic marker. The monosubstituted probes provide internal controls for the tiled array, to distinguish actual hybridization from artefactual cross-hybridization. Upon completion of hybridization with the target sequence and washing of the array, the array is scanned to determine the position on the array to which the target sequence hybridizes. The hybridization data from the scanned array is then analyzed to identify which allele or alleles of the biallelic marker are present in the sample. Hybridization and scanning may be carried out as described in PCT application No.
WO 92/10092 and WO 95/11995 and US patent No. 5,424,186.
Thus, in some embodiments, the chips may comprise an array of nucleic acid sequences of fragments of about 15 nucleotides in length. In further embodiments, the chip may comprise an array including at least one of the sequences selected from the group consisting of amplicons listed in table 1 and the sequences complementary thereto, or a fragment thereof, said fragment comprising at least about 8 consecutive nucleotides, preferably 10, 15, 20, more preferably 25, 47, or 50 consecutive nucleotides and containing a polymorphic base. In preferred embodiments the polymorphic base is within 5, 4, 3, 2, 1, nucleotides of the center of the said polynucleotide, more preferably at the center of said polynucleotide. In some embodiments, the chip may comprise an array of at least 2, 3, 4, 5, 6, 7, 8 or more of these polynucleotides of the invention. Solid supports and polynucleotides of the present invention attached to solid supports are further described in "Oligonucleotide Probes And Primers".
6) Integrated Systems Another technique, which may be used to analyze polymorphisms, includes multicomponent integrated systems, which miniaturize and compartmentalize processes such as PCR and capillary electrophoresis reactions in a single functional device. An example of such technique is disclosed in US patent 5,589,136, which describes the integration of PCR amplification and capillary electrophoresis in chips.
Integrated systems can be envisaged mainly when microfluidic systems are used. These systems comprise a pattern of microchannels designed onto a glass, silicon, quartz, or plastic wafer included on a microchip. The movements of the samples are controlled by electric, electroosmotic or hydrostatic forces applied across different areas of the microchip to create functional microscopic valves and pumps with no moving parts.
WO 00/08209 PCT/IB99/01444 46 For genotyping biallelic markers, the microfluidic system may integrate nucleic acid amplification, microsequencing, capillary electrophoresis and a detection method such as laserinduced fluorescence detection.
Association Studies With The Biallelic Markers Of The TBC-1 Gene The identification of genes involved in suspected heterogeneous, polygenic and multifactorial traits such as cancer can be carried out through two main strategies currently used for genetic mapping: linkage analysis and association studies. Association studies examine the frequency of marker alleles in unrelated trait positive individuals compared with trait negative controls, and are generally employed in the detection of polygenic inheritance. Association studies as a method of mapping genetic traits rely on the phenomenon of linkage disequilibrium.
If two genetic loci lie on the same chromosome, then sets of alleles of these loci on the same chromosomal segment (called haplotypes) tend to be transmitted as a block from generation to generation. When not broken up by recombination, haplotypes can be tracked not only through pedigrees but also through populations. The resulting phenomenon at the population level is that the occurrence of pairs of specific alleles at different loci on the same chromosome is not random, and the deviation from random is called linkage disequilibrium (LD).
If a specific allele in a given gene is directly involved in causing a particular trait T, its frequency will be statistically increased in a trait positive population when compared to the frequency in a trait negative population. As a consequence of the existence of linkage disequilibrium, the frequency of all other alleles present in the haplotype carrying the trait-causing allele (TCA) will also be increased in trait positive individuals compared to trait negative individuals. Therefore, association between the trait and any allele in linkage disequilibrium with the trait-causing allele will suffice to suggest the presence of a trait-related gene in that particular allele's region. Linkage disequilibrium allows the relative frequencies in trait positive and trait negative populations of a limited number of genetic polymorphisms (specifically biallelic markers) to be analyzed as an alternative to screening all possible functional polymorphisms in order to find trait-causing alleles.
The general strategy to perform association studies using biallelic markers derived from a candidate region is to scan two groups of individuals (trait positive and trait negative control individuals which are characterized by a well defined phenotype as described below) in order to measure and statistically compare the allele frequencies of such biallelic markers in both groups.
If a statistically significant association with a trait is identified for at least one or more of the analyzed biallelic markers, one can assume that either the associated allele is directly responsible for causing the trait (associated allele is the trait-causing allele), or the associated allele is in linkage disequilibrium with the trait-causing allele. If the evidence indicates that the associated allele within the candidate region is most probably not the trait-causing allele but is in linkage WO 00/08209 PCT/IB99/01444 47 disequilibrium with the real trait-causing allele, then the trait-causing allele, and by consequence the gene carrying the trait-causing allele, can be found by sequencing the vicinity of the associated marker.
Collection of DNA samples from trait positive (trait and trait negative (trait -individuals (inclusion criteria) In order to perform efficient and significant association studies such as those described herein, the trait under study should preferably follow a bimodal distribution in the population under study, presenting two clear non-overlapping phenotypes, trait positive and trait negative.
Nevertheless, even in the absence of such a bimodal distribution (as may in fact be the case for more complex genetic traits), any genetic trait may still be analyzed by the association method proposed here by carefully selecting the individuals to be included in the trait positive and trait negative phenotypic groups. The selection procedure involves to select individuals at opposite ends of the non-bimodal phenotype spectra of the trait under study, so as to include in these trait positive and trait negative populations individuals which clearly represent extreme, preferably nonoverlapping phenotypes.
The definition of the inclusion criteria for the trait positive and trait negative populations is an important aspect of the present invention. The selection of drastically different but relatively uniform phenotypes enables efficient comparisons in association studies and the possible detection of marked differences at the genetic level, provided that the sample sizes of the populations under study are significant enough.
Generally, trait positive and trait negative populations to be included in association studies such as proposed in the present invention consist ofphenotypically homogenous populations of individuals each representing 1000/o of the corresponding trait if the trait distribution is bimodal.
A first group of between 50 and 300 trait positive individuals, preferably about 100 individuals, can be recruited according to clinical inclusion criteria.
In each case, a similar number of trait negative individuals, preferably more than 100 individuals, are included in such studies who are preferably both ethnically- and age-matched to the trait positive cases. They are checked for the absence of the clinical criteria defined above. Both trait positive and trait negative individuals should correspond to unrelated cases.
Genotyping of trait positive and trait negative individuals Allelic frequencies of the biallelic markers in each of the above described population can be determined using one of the methods described above under the heading "Methods of Genotyping DNA samples for biallelic markers". Analyses are preferably performed on amplified fragments obtained by genomic PCR performed on the DNA samples from each individual in similar conditions as those described above for the generation of biallelic markers.
In a preferred embodiment, amplified DNA samples are subjected to automated microsequencing reactions using fluorescent ddNTPs (specific fluorescence for each ddNTP) and WO 00/08209 PCT/IB99/01444 48 the appropriate microsequencing oligonucleotides which hybridize just upstream of the polymorphic base.
Genotyping is further described in Example Associations studies can be carried out by the skilled technician using the biallelic markers of the invention defined above, with different trait positive and trait negative populations. Suitable examples of association studies using biallelic markers of the TBC-I gene, including the biallelic markers Al to A19, involve studies on the following populations: a trait positive population suffering from a cancer, preferably prostate cancer and a healthy unaffected population; or a trait positive population suffering from prostate cancer treated with agents acting against prostate cancer and suffering from side-effects resulting from this treatment and an trait negative population suffering from prostate cancer treated with same agents without any substantial sideeffects, or a trait positive population suffering from prostate cancer treated with agents acting against prostate cancer showing a beneficial response and a trait negative population suffering from prostate cancer treated with same agents without any beneficial response, or a trait positive population suffering from prostate cancer presenting highly aggressive prostate cancer tumors and a trait negative population suffering from prostate cancer with prostate cancer tumors devoid of aggressiveness.
It is another object of the present invention to provide a method for the identification and characterization of an association between an allele of one or more biallelic markers of a TBC-1 gene and a trait. The method comprises the steps of: genotyping a marker or a group ofbiallelic markers according to the invention in trait positive; genotyping a marker or a group ofbiallelic markers according to the invention in and trait negative individuals; and establishing a statistically significant association between one allele of at least one marker and the trait.
Preferably, the trait positive and trait negative individuals are selected from nonoverlapping phenotypes as regards to the trait under study. In one embodiment, the biallelic marker are selected from the group consisting of the biallelic markers Al to A19.
In a preferred embodiment, the trait is cancer, prostate cancer, an early onset of prostate cancer, a susceptibility to prostate cancer, the level of aggressiveness of prostate cancer tumors, a modified expression of the TBC-1 gene, a modified production of the TBC-1 protein, or the production of a modified TBC- 1 protein.
In a further embodiment, the trait negative population can be replaced in the association studies by a random control population.
WO 00/08209 PCT/IB99/01444 49 The step of testing for and detecting the presence of DNA comprising specific alleles of a biallelic marker or a group of biallelic markers of the present invention can be carried out as described further below.
Oligonucleotide Probes And Primers The invention relates also to oligonucleotide molecules useful as probes or primers, wherein said oligonucleotide molecules hybridize specifically with a nucleotide sequence comprised in the TBC-1 gene, particularly the TBC-1 genomic sequence of SEQ ID Nos 1 and 2 or the TBC-1 cDNAs sequences of SEQ ID Nos 3 and 4. More particularly, the present invention also concerns oligonucleotides for the detection of alleles of biallelic markers of the TBC-1 gene. These oligonucleotides are useful either as primers for use in various processes such as DNA amplification and microsequencing or as probes for DNA recognition in hybridization analyses. Polynucleotides derived from the TBC-1 gene are useful in order to detect the presence of at least a copy of a nucleotide sequence of SEQ ID Nos 1-4, or a fragment, complement, or variant thereof in a test sample.
Particularly preferred probes and primers of the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of a nucleotide sequence selected from the group consisting of SEQ ID Nos 1 and 2, or the complements thereof. Additionally preferred probes and primers of the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 1: 1-1000, 1001-2000, 2001-3000, 3001-4000, 4001-5000, 5001-6000, 6001-7000, 7001-8000, 8001-9000, 9001-10000, 10001-11000, 11001-12000, 12001-13000, 13001-14000, 14001-15000, 15001-16000, 16001-17000, and 17001-17590. Other preferred probes and primers of the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 2 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or of the following nucleotide positions of SEQ ID No 2: 1-5000, 5001-10000, 10001-15000, 15001- 20000, 20001-25000, 25001-30000, 30001-35000, 35001-40000, 40001-45000, 45001-50000, 50001-55000, 55001-60000, 60001-65000, 65001-70000, 70001-75000, 75001-80000, 80001- 85000, 85001-90000, 90001-95000, and 95001-99960.
Moreover, preferred probes and primers of the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of a nucleotide sequence selected from the group consisting of SEQ ID Nos 3 and 4, or the complements thereof.. Particularly preferred WO 00/08209 PCT/IB99/01444 probes and primers of the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 3 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 3: 1-500, 501-1000, 1001-1500, 1501-2000, 2001-2500, 2501-3000, 3001-3500, and 3501-3983.
Additional preferred probes and primers of the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 4 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 4: 1-500, 501-1000, 1001-1500, 1501-2000, 2001-2500, 2501-3000, 3001- 3500, and 3501-3988.
Thus, the invention also relates to nucleic acid probes characterized in that they hybridize specifically, under the stringent hybridization conditions defined above, with a nucleic acid selected from the group consisting of the nucleotide sequences of SEQ ID Nos 1-4 or a variant thereof or a sequence complementary thereto.
In one embodiment the invention encompasses isolated, purified, and recombinant polynucleotides consisting of, or consisting essentially of a contiguous span of 8 to 50 nucleotides of any one of SEQ ID Nos 1 and 2 and the complement thereof, wherein said span includes a TBC- 1-related biallelic marker in said sequence; optionally, wherein said TBC-1-related biallelic marker is selected from the group consisting of Al to A19, and the complements thereof, or optionally the biallelic markers in linkage disequilibrium therewith; optionally, wherein said contiguous span is 18 to 35 nucleotides in length and said biallelic marker is within 4 nucleotides of the center of said polynucleotide; optionally, wherein said polynucleotide consists of said contiguous span and said contiguous span is 25 nucleotides in length and said biallelic marker is at the center of said polynucleotide; optionally, wherein the 3' end of said contiguous span is present at the 3' end of said polynucleotide; and optionally, wherein the 3' end of said contiguous span is located at the 3' end of said polynucleotide and said biallelic marker is present at the 3' end of said polynucleotide.
In a preferred embodiment, said probes comprises, consists of, or consists essentially of a sequence selected from the following sequences: P1 to P7, P9 to P13, P15 to P19 and the complementary sequences thereto.
In another embodiment the invention encompasses isolated, purified and recombinant polynucleotides comprising, consisting of, or consisting essentially of a contiguous span of 8 to nucleotides of SEQ ID Nos 1 and 2, or the complements thereof, wherein the 3' end of said contiguous span is located at the 3' end of said polynucleotide, and wherein the 3' end of said polynucleotide is located within 20 nucleotides upstream of a TBC-1-related biallelic marker in said sequence; optionally, wherein said TBC-1-related biallelic marker is selected from the group consisting of Al to A19, and the complements thereof, or optionally the biallelic markers in linkage WO 00/08209 PCT/IB99/01444 51 disequilibrium therewith; optionally, wherein the 3' end of said polynucleotide is located 1 nucleotide upstream of said TBC-1-related biallelic marker in said sequence; and optionally, wherein said polynucleotide consists essentially of a sequence selected from the following sequences: D1 to D19 and El to E19.
In a further embodiment, the invention encompasses isolated, purified, or recombinant polynucleotides comprising, consisting of, or consisting essentially of a sequence selected from the following sequences: BI to B15 and C1 to In an additional embodiment, the invention encompasses polynucleotides for use in hybridization assays, sequencing assays, and enzyme-based mismatch detection assays for determining the identity of the nucleotide at a TBC-1-related biallelic marker in SEQ ID Nos 1 and 2, or the complements thereof, as well as polynucleotides for use in amplifying segments of nucleotides comprising a TBC-l-related biallelic marker in SEQ ID Nos 1 and 2, or the complements thereof; optionally, wherein said TBC-1-related biallelic marker is selected from the group consisting of Al to A19, and the complements thereof, or optionally the biallelic markers in linkage disequilibrium therewith.
A probe or a primer according to the invention has between 8 and 1000 nucleotides in length, or is specified to be at least 12, 15, 18, 20, 25, 35, 40, 50, 60, 70, 80, 100, 250, 500 or 1000 nucleotides in length. More particularly, the length of these probes and primers can range from 8, 15, 20, or 30 to 100 nucleotides, preferably from 10 to 50, more preferably from 15 to nucleotides. Shorter probes and primers tend to lack specificity for a target nucleic acid sequence and generally require cooler temperatures to form sufficiently stable hybrid complexes with the template. Longer probes and primers are expensive to produce and can sometimes self-hybridize to form hairpin structures. The appropriate length for primers and probes under a particular set of assay conditions may be empirically determined by one of skill in the art. A preferred probe or primer consists of a nucleic acid comprising a polynucleotide selected from the group of the nucleotide sequences of P1 to P7, P9 to P13, P15 to P19 and the complementary sequence thereto, Bl to B15, C1 to C15, D1 to D19, El to E19, for which the respective locations in the sequence listing are provided in Tables 2, 3 and 4.
The formation of stable hybrids depends on the melting temperature (Tm) of the DNA. The Tm depends on the length of the primer or probe, the ionic strength of the solution and the G+C content. The higher the G+C content of the primer or probe, the higher is the melting temperature because G:C pairs are held by three H bonds whereas A:T pairs have only two. The GC content in the probes of the invention usually ranges between 10 and 75 preferably between 35 and 60 and more preferably between 40 and 55 The primers and probes can be prepared by any suitable method, including, for example, cloning and restriction of appropriate sequences and direct chemical synthesis by a method such as the phosphodiester method of Narang et al.(1979), the phosphodiester method of Brown et WO 00/08209 PCT/IB99/01444 52 al.(1979), the diethylphosphoramidite method of Beaucage et al.(1981) and the solid support method described in EP 0 707 592.
Detection probes are generally nucleic acid sequences or uncharged nucleic acid analogs such as, for example peptide nucleic acids which are disclosed in International Patent Application WO 92/20702, morpholino analogs which are described in U.S. Patents Numbered 5,185,444; 5,034,506 and 5,142,047. The probe may have to be rendered "non-extendable" in that additional dNTPs cannot be added to the probe. In and of themselves analogs usually are non-extendable and nucleic acid probes can be rendered non-extendable by modifying the 3' end of the probe such that the hydroxyl group is no longer capable of participating in elongation. For example, the 3' end of the probe can be functionalized with the capture or detection label to thereby consume or otherwise block the hydroxyl group. Alternatively, the 3' hydroxyl group simply can be cleaved, replaced or modified, U.S. Patent Application Serial No. 07/049,061 filed April 19, 1993 describes modifications, which can be used to render a probe non-extendable.
Any of the polynucleotides of the present invention can be labeled, if desired, by incorporating any label known in the art to be detectable by spectroscopic, photochemical, biochemical, immunochemical, or chemical means. For example, useful labels include radioactive substances (including, 32 P, 35 S, 125), fluorescent dyes (including, fluorescein, acetylaminofluorene, digoxigenin) or biotin. Preferably, polynucleotides are labeled at their 3' and 5' ends. Examples of non-radioactive labeling of nucleic acid fragments are described in the French patent No. FR-7810975 or by Urdea et al (1988) or Sanchez-Pescador et al (1988). In addition, the probes according to the present invention may have structural characteristics such that they allow the signal amplification, such structural characteristics being, for example, branched DNA probes as those described by Urdea et al. in 1991 or in the European patent No. EP 0 225 807 (Chiron).
A label can also be used to capture the primer, so as to facilitate the immobilization of either the primer or a primer extension product, such as amplified DNA, on a solid support. A capture label is attached to the primers or probes and can be a specific binding member which forms a binding pair with the solid's phase reagent's specific binding member biotin and streptavidin). Therefore depending upon the type of label carried by a polynucleotide or a probe, it may be employed to capture or to detect the target DNA. Further, it will be understood that the polynucleotides, primers or probes provided herein, may, themselves, serve as the capture label.
For example, in the case where a solid phase reagent's binding member is a nucleic acid sequence, it may be selected such that it binds a complementary portion of a primer or probe to thereby immobilize the primer or probe to the solid phase. In cases where a polynucleotide probe itself serves as the binding member, those skilled in the art will recognize that the probe will contain a sequence or "tail" that is not complementary to the target. In the case where a polynucleotide WO 00/08209 PCT/IB99/01444 53 primer itself serves as the capture label, at least a portion of the primer will be free to hybridize with a nucleic acid on a solid phase. DNA Labeling techniques are well known to the skilled technician.
The probes of the present invention are useful for a number of purposes. They can be notably used in Southern hybridization to genomic DNA. The probes can also be used to detect PCR amplification products. They may also be used to detect mismatches in the TBC-lgene or mRNA using other techniques.
Any of the polynucleotides, primers and probes of the present invention can be conveniently immobilized on a solid support. Solid supports are known to those skilled in the art and include the walls of wells of a reaction tray, test tubes, polystyrene beads, magnetic beads, nitrocellulose strips, membranes, microparticles such as latex particles, sheep (or other animal) red blood cells, duracytes and others. The solid support is not critical and can be selected by one skilled in the art. Thus, latex particles, microparticles, magnetic or non-magnetic beads, membranes, plastic tubes, walls ofmicrotiter wells, glass or silicon chips, sheep (or other suitable animal's) red blood cells and duracytes are all suitable examples. Suitable methods for immobilizing nucleic acids on solid phases include ionic, hydrophobic, covalent interactions and the like. A solid support, as used herein, refers to any material which is insoluble, or can be made insoluble by a subsequent reaction. The solid support can be chosen for its intrinsic ability to attract and immobilize the capture reagent. Alternatively, the solid phase can retain an additional receptor which has the ability to attract and immobilize the capture reagent. The additional receptor can include a charged substance that is oppositely charged with respect to the capture reagent itself or to a charged substance conjugated to the capture reagent. As yet another alternative, the receptor molecule can be any specific binding member which is immobilized upon (attached to) the solid support and which has the ability to immobilize the capture reagent through a specific binding reaction. The receptor molecule enables the indirect binding of the capture reagent to a solid support material before the performance of the assay or during the performance of the assay. The solid phase thus can be a plastic, derivatized plastic, magnetic or non-magnetic metal, glass or silicon surface of a test tube, microtiter well, sheet, bead, microparticle, chip, sheep (or other suitable animal's) red blood cells, duracytes® and other configurations known to those of ordinary skill in the art. The polynucleotides of the invention can be attached to or immobilized on a solid support individually or in groups of at least 2, 5, 8, 10, 12, 15, 20, or 25 distinct polynucleotides of the invention to a single solid support. In addition, polynucleotides other than those of the invention may be attached to the same solid support as one or more polynucleotides of the invention.
Consequently, the invention also deals with a method for detecting the presence of a nucleic acid comprising a nucleotide sequence selected from a group consisting of SEQ ID Nos 1-4, a fragment or a variant thereof and a complementary sequence thereto in a sample, said method comprising the following steps of: WO 00/08209 PCT/IB99/01444 54 a) bringing into contact a nucleic acid probe or a plurality of nucleic acid probes which can hybridize with a nucleotide sequence included in a nucleic acid selected form the group consisting of the nucleotide sequences of SEQ ID Nos 1-4, a fragment or a variant thereof and a complementary sequence thereto and the sample to be assayed; and b) detecting the hybrid complex formed between the probe and a nucleic acid in the sample.
The invention further concerns a kit for detecting the presence of a nucleic acid comprising a nucleotide sequence selected from a group consisting of SEQ ID Nos 1-4, a fragment or a variant thereof and a complementary sequence thereto in a sample, said kit comprising: a) a nucleic acid probe or a plurality of nucleic acid probes which can hybridize with a nucleotide sequence included in a nucleic acid selected form the group consisting of the nucleotide sequences of SEQ ID Nos 1-4, a fragment or a variant thereof and a complementary sequence thereto; and b) optionally, the reagents necessary for performing the hybridization reaction.
In a first preferred embodiment of this detection method and kit, said nucleic acid probe or the plurality of nucleic acid probes are labeled with a detectable molecule. In a second preferred embodiment of said method and kit, said nucleic acid probe or the plurality of nucleic acid probes has been immobilized on a substrate. In a third preferred embodiment, the nucleic acid probe or the plurality of nucleic acid probes comprise either a sequence which is selected from the group consisting of the nucleotide sequences of P1 to P7, P9 to P13, P15 to P19 and the complementary sequence thereto, B1 to B15, Cl to C15, Dl to D19, El to E19 or a biallelic marker selected from the group consisting of Al to A 19 and the complements thereto.
Oligonucleotide Arrays A substrate comprising a plurality of oligonucleotide primers or probes of the invention may be used either for detecting or amplifying targeted sequences in the TBC-Jgene and may also be used for detecting mutations in the coding or in the non-coding sequences of the TBC-1 gene.
Any polynucleotide provided herein may be attached in overlapping areas or at random locations on the solid support. Alternatively the polynucleotides of the invention may be attached in an ordered array wherein each polynucleotide is attached to a distinct region of the solid support which does not overlap with the attachment site of any other polynucleotide. Preferably, such an ordered array of polynucleotides is designed to be "addressable" where the distinct locations are recorded and can be accessed as part of an assay procedure. Addressable polynucleotide arrays typically comprise a plurality of different oligonucleotide probes that are coupled to a surface of a substrate in different known locations. The knowledge of the precise location of each polynucleotides location makes these "addressable" arrays particularly useful in hybridization assays. Any addressable array technology known in the art can be employed with the polynucleotides of the invention. One particular embodiment of these polynucleotide arrays is known as the GenechipsTM, and has been generally described in US Patent 5,143,854; PCT WO 00/08209 PCT/IB99/01444 publications WO 90/15070 and 92/10092. These arrays may generally be produced using mechanical synthesis methods or light directed synthesis methods which incorporate a combination of photolithographic methods and solid phase oligonucleotide synthesis (Fodor et al., 1991). The immobilization of arrays of oligonucleotides on solid supports has been rendered possible by the development of a technology generally identified as "Very Large Scale Immobilized Polymer Synthesis" (VLSIPSTM) in which, typically, probes are immobilized in a high density array on a solid surface of a chip. Examples of VLSIPSTM technologies are provided in US Patents 5,143,854; and 5,412,087 and in PCT Publications WO 90/15070, WO 92/10092 and WO 95/11995, which describe methods for forming oligonucleotide arrays through techniques such as light-directed synthesis techniques. In designing strategies aimed at providing arrays ofnucleotides immobilized on solid supports, further presentation strategies were developed to order and display the oligonucleotide arrays on the chips in an attempt to maximize hybridization patterns and sequence information. Examples of such presentation strategies are disclosed in PCT Publications WO 94/12305, WO 94/11530, WO 97/29212 and WO 97/31256.
In another embodiment of the oligonucleotide arrays of the invention, an oligonucleotide probe matrix may advantageously be used to detect mutations occurring in the TBC-1 gene and preferably in its regulatory region. For this particular purpose, probes are specifically designed to have a nucleotide sequence allowing their hybridization to the genes that carry known mutations (either by deletion, insertion or substitution of one or several nucleotides). By known mutations, it is meant, mutations on the TBC-1 gene that have been identified according, for example to the technique used by Huang et al.(1996) or Samson et al.(1996).
Another technique that is used to detect mutations in the TBC-1 gene is the use of a highdensity DNA array. Each oligonucleotide probe constituting a unit element of the high density DNA array is designed to match a specific subsequence of the TBC-1 genomic DNA or cDNA.
Thus, an array consisting of oligonucleotides complementary to subsequences of the target gene sequence is used to determine the identity of the target sequence with the wild gene sequence, measure its amount, and detect differences between the target sequence and the reference wild gene sequence of the TBC-1 gene. In one such design, termed 4L tiled array, is implemented a set of four probes C, G, preferably 15-nucleotide oligomers. In each set of four probes, the perfect complement will hybridize more strongly than mismatched probes. Consequently, a nucleic acid target of length L is scanned for mutations with a tiled array containing 4L probes, the whole probe set containing all the possible mutations in the known wild reference sequence. The hybridization signals of the 15-mer probe set tiled array are perturbed by a single base change in the target sequence. As a consequence, there is a characteristic loss of signal or a "footprint" for the probes flanking a mutation position. This technique was described by Chee et al. in 1996.
Consequently, the invention concerns an array of nucleic acid molecules comprising at least one polynucleotide described above as probes and primers. Preferably, the invention concerns an WO 00/08209 PCT/IB99/01444 56 array of nucleic acid comprising at least two polynucleotides described above as probes and primers.
A further object of the invention consists of an array of nucleic acid sequences comprising either at least one of the sequences selected from the group consisting of P1 to P7, P9 to P13, P15 to P19, Bl to B15, Cl to C15, Dl to D19, El to E19, the sequences complementary thereto, a fragment thereof of at least 8, 10, 12, 15, 18, 20, 25, 30, or 40 consecutive nucleotides thereof, and at least one sequence comprising a biallelic marker selected from the group consisting of Al to A19 and the complements thereto.
The invention also pertains to an array of nucleic acid sequences comprising either at least two of the sequences selected from the group consisting of PI to P7, P9 to P13, P15 to P19, B1 to C1 to C15, Dl to D19, El to E19, the sequences complementary thereto, a fragment thereof of at least 8 consecutive nucleotides thereof, and at least two sequences comprising a biallelic marker selected from the group consisting of Al to A19 and the complements thereof.
Vectors For The Expression Of A Regulatory Or A Coding Polynucleotide Of TBC-1.
Any of the regulatory polynucleotides or the coding polynucleotides of the invention may be inserted into recombinant vectors for expression in a recombinant host cell or a recombinant host organism.
Thus, the present invention also encompasses a family of recombinant vectors that contains either a regulatory polynucleotide selected from the group consisting of any one of the regulatory polynucleotides derived from the TBC-1 genomic sequences of SEQ ID Nos 1 and 2, or a polynucleotide comprising the TBC-1 coding sequence, or both.
In a first preferred embodiment, a recombinant vector of the invention is used as an expression vector the TBC-1 regulatory sequence comprised therein drives the expression of a coding polynucleotide operably linked thereto; the TBC-1 coding sequence is operably linked to regulation sequences allowing its expression in a suitable cell host and/or host organism.
In a second preferred embodiment, a recombinant vector of the invention is used to amplify the inserted polynucleotide derived from the TBC-1 genomic sequences of SEQ ID Nos 1 and 2 or TBC-1 cDNAs in a suitable cell host, this polynucleotide being amplified at every time that the recombinant vector replicates.
More particularly, the present invention relates to expression vectors which include nucleic acids encoding a TBC-1 protein, preferably the TBC-1 protein of the amino acid sequence of SEQ ID No 5 described therein, under the control of a regulatory sequence selected among the TBC-1 regulatory polynucleotides, or alternatively under the control of an exogenous regulatory sequence.
A recombinant expression vector comprising a nucleic acid selected from the group consisting of 5' and 3' regulatory regions, or biologically active fragments or variants thereof, is also part of the present invention.
WO 00/08209 PCT/IB99/01444 57 The invention also encompasses a recombinant expression vector comprising a) a nucleic acid comprising the 5' regulatory polynucleotide of the nucleotide sequence SEQ ID No 1, or a biologically active fragment or variant thereof; b) a polynucleotide encoding a polypeptide or a polynucleotide of interest operably linked with said nucleic acid.
c) optionally, a nucleic acid comprising a 3'-regulatory polynucleotide, preferably a 3'regulatory polynucleotide of the invention, or a biologically active fragment or variant thereof.
The nucleic acid comprising the 5' regulatory polynucleotide or a biologically active fragment or variant thereof may also comprises the 5'-UTR sequence from any of the two cDNA of the invention or a biologically active fragment or variant thereof.
The invention also pertains to a recombinant expression vector useful for the expression of the TBC-1 coding sequence, wherein said vector comprises a nucleic acid selected from the group consisting of SEQ ID Nos 3 and 4 or a nucleic acid having at least 95% nucleotide identity with a polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ID Nos 3 and 4.
Another recombinant expression vector of the invention consists in a recombinant vector comprising a nucleic acid comprising the nucleotide sequence beginning at the nucleotide in position 176 and ending in position 3730 of the polynucleotide of SEQ ID No 4.
Generally, a recombinant vector of the invention may comprise any of the polynucleotides described herein, including regulatory sequences, and coding sequences, as well as any TBC-1 primer or probe as defined above. More particularly, the recombinant vectors of the present invention can comprise any of the polynucleotides described in the "TBC-1 cDNA Sequences" section, the "Coding Regions" section, "Genomic sequence of TBC-J" section and the "Oligonucleotide Probes And Primers" section.
Some of the elements which can be found in the vectors of the present invention are described in further detail in the following sections.
a) Vectors A recombinant vector according to the invention comprises, but is not limited to, a YAC (Yeast Artificial Chromosome), a BAC (Bacterial Artificial Chromosome), a phage, a phagemid, a cosmid, a plasmid or even a linear DNA molecule which may consist of a chromosomal, nonchromosomal and synthetic DNA. Such a recombinant vector can comprise a transcriptional unit comprising an assembly of: a genetic element or elements having a regulatory role in gene expression, for example promoters or enhancers. Enhancers are cis-acting elements of DNA, usually from about 10 to 300 bp in length that act on the promoter to increase the transcription.
a structural or coding sequence which is transcribed into mRNA and eventually translated into a polypeptide, and WO 00/08209 PCT/IB99/01444 58 appropriate transcription initiation and termination sequences. Structural units intended for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling extracellular secretion of translated protein by a host cell. Alternatively, where a recombinant protein is expressed without a leader or transport sequence, it may include an N-terminal residue.
This residue may or may not be subsequently cleaved from the expressed recombinant protein to provide a final product.
Generally, recombinant expression vectors will include origins of replication, selectable markers permitting transformation of the host cell, and a promoter derived from a highly expressed gene to direct transcription of a downstream structural sequence. The heterologous structural sequence is assembled in appropriate phase with translation initiation and termination sequences, and preferably a leader sequence capable of directing secretion of the translated protein into the periplasmic space or the extracellular medium.
The selectable marker genes for selection of transformed host cells are preferably dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, TRP1 for S. cerevisiae or tetracycline, rifampicin or ampicillin resistance in E. coli, or levan saccharase for mycobacteria.
As a representative but non-limiting example, useful expression vectors for bacterial use can comprise a selectable marker and a bacterial origin of replication derived from commercially available plasmids comprising genetic elements ofpBR322 (ATCC 37017). Such commercial vectors include, for example, pKK223-3 (Pharmacia, Uppsala, Sweden), and GEM1 (Promega Biotec, Madison, WI, USA).
Large numbers of suitable vectors and promoters are known to those of skill in the art, and commercially available, such as bacterial vectors pQE70, pQE60, pQE-9 (Qiagen), pbs, phagescript, psiX174, pbluescript SK, pbsks, pNH8A, pNH16A, pNH 8A, pNH46A (Stratagene); ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); or eukaryotic vectors pWLNEO, pSV2CAT, pOG44, pXT1, pSG (Stratagene); pSVK3, pBPV, pMSG, pSVL (Pharmacia); baculovirus transfer vector pVL1392/1393 (Pharmingen); pQE-30 (QIAexpress).
A suitable vector for the expression of the TBC-1 polypeptide of SEQ ID No 5 is a baculovirus vector that can be propagated in insect cells and in insect cell lines. A specific suitable host vector system is the pVL1392/1393 baculovirus transfer vector (Pharmingen) that is used to transfect the SF9 cell line (ATCC NOCRL 1711) which is derived from Spodopterafrugiperda.
Other suitable vectors for the expression of the TBC-1 polypeptide of SEQ ID No 5 in a baculovirus expression system include those described by Chai et al. (1993), Vlasak et al. (1983) and Lenhard et al. (1996).
Mammalian expression vectors will comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking nontranscribed sequences.
DNA sequences derived from the SV40 viral genome, for example SV40 origin, early promoter, WO 00/08209 PCT/IB99/01444 59 enhancer, splice and polyadenylation sites may be used to provide the required nontranscribed genetic elements.
b) Promoters The suitable promoter regions used in the expression vectors according to the present invention are chosen taking into account the cell host in which the heterologous gene has to be expressed.
A suitable promoter may be heterologous with respect to the nucleic acid for which it controls the expression or alternatively can be endogenous to the native polynucleotide containing the coding sequence to be expressed. Additionally, the promoter is generally heterologous with respect to the recombinant vector sequences within which the construct promoter/coding sequence has been inserted.
Preferred bacterial promoters are the Lad, LacZ, the T3 or T7 bacteriophage RNA polymerase promoters, the polyhedrin promoter, or the p10 protein promoter from baculovirus (Kit Novagen) (Smith et al., 1983; O'Reilly et al., 1992), the lambda PR promoter or also the trc promoter.
Promoter regions can be selected from any desired gene using, for example, CAT (chloramphenicol transferase) vectors and more preferably pKK232-8 and pCM7 vectors.
Particularly preferred bacterial promoters include lacI, lacZ, T3, T7, gpt, lambda PR, PL and trp.
Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late LTRs from retrovirus, and mouse metallothionein-L. Selection of a convenient vector and promoter is well within the level of ordinary skill in the art.
The choice of a promoter is well within the ability of a person skilled in the field of genetic egineering. For example, one may refer to the book of Sambrook et al. (1989) or also to the procedures described by Fuller et al. (1996).
The vector containing the appropriate DNA sequence as described above, more preferably a TBC-1 gene regulatory polynucleotide, a polynucleotide encoding the TBC-1 polypeptide of SEQ ID No 5 or both of them, can be utilized to transform an appropriate host to allow the expression of the desired polypeptide or polynucleotide.
c) Other types of vectors The in vivo expression of a TBC-1 polypeptide of SEQ ID No 5 may be useful in order to correct a genetic defect related to the expression of the native gene in a host organism or to the production of a biologically inactive TBC-1 protein.
Consequently, the present invention also deals with recombinant expression vectors mainly designed for the in vivo production of the TBC-1 polypeptide of SEQ ID No 5 by the introduction of the appropriate genetic material in the organism of the patient to be treated. This genetic material may be introduced in vitro in a cell that has been previously extracted from the organism, the WO 00/08209 PCT/IB99/01444 modified cell being subsequently reintroduced in the said organism, directly in vivo into the appropriate tissue.
By vector according to this specific embodiment of the invention is.intended either a circular or a linear DNA molecule.
One specific embodiment for a method for delivering a protein or peptide to the interior of a cell of a vertebrate in vivo comprises the step of introducing a preparation comprising a physiologically acceptable carrier and a naked polynucleotide operatively coding for the polypeptide of interest into the interstitial space of a tissue comprising the cell, whereby the naked polynucleotide is taken up into the interior of the cell and has a physiological effect.
In a specific embodiment, the invention provides a composition for the in vivo production of the TBC-1 protein or polypeptide described herein. It comprises a naked polynucleotide operatively coding for this polypeptide, in solution in a physiologically acceptable carrier, and suitable for introduction into a tissue to cause cells of the tissue to express the said protein or polypeptide.
Compositions comprising a polynucleotide are described in PCT application No WO 90/11092 (Vical Inc.) and also in PCT application No WO 95/11307 (Institut Pasteur, INSERM, Universit6 d'Ottawa) as well as in the articles of Tacson et al. (1996) and of Huygen et al. (1996).
The amount of vector to be injected to the desired host organism varies according to the site of injection. As an indicative dose, it will be injected between 0,1 and 100 pg of the vector in an animal body, preferably a mammal body, for example a mouse body.
In another embodiment of the vector according to the invention, it may be introduced in vitro in a host cell, preferably in a host cell previously harvested from the animal to be treated and more preferably a somatic cell such as a muscle cell. In a subsequent step, the cell that has been transformed with the vector coding for the desired TBC-1 polypeptide or the desired fragment thereof is reintroduced into the animal body in order to deliver the recombinant protein within the body either locally or systemically.
In one specific embodiment, the vector is derived from an adenovirus. Preferred adenovirus vectors according to the invention are those described by Feldman and Steg (1996) or Ohno et al.
(1994). Another preferred recombinant adenovirus according to this specific embodiment of the present invention is the human adenovirus type 2 or 5 (Ad 2 or Ad 5) or an adenovirus of animal origin French patent application No FR-93.05954).
Retrovirus vectors and adeno-associated virus vectors are generally understood to be the recombinant gene delivery systems of choice for the transfer of exogenous polynucleotides in vivo particularly to mammals, including humans. These vectors provide efficient delivery of genes into cells, and the transferred nucleic acids are stably integrated into the chromosomal DNA of the host Particularly preferred retroviruses for the preparation or construction ofretroviral in vitro or in vitro gene delivery vehicles of the present invention include retroviruses selected from the group consisting of Mink-Cell Focus Inducing Virus, Murine Sarcoma Virus, Reticuloendotheliosis virus WO 00/08209 PCT/IB99/01444 61 and Rous Sarcoma virus. Particularly preferred Murine Leukemia Viruses include the 4070A and the 1504A viruses, Abelson (ATCC No VR-999), Friend (ATCC No VR-245), Gross (ATCC No VR-590), Rauscher (ATCC No VR-998) and Moloney Murine Leukemia Virus (ATCC No VR- 190; PCT Application No WO 94/24298). Particularly preferred Rous Sarcoma Viruses include Bryan high titer (ATCC Nos VR-334, VR-657, VR-726, VR-659 and VR-728). Other preferred retroviral vectors are those described in Roth et al. (Roth J.A. et al., 1996), PCT Application No WO 93/25234, PCT Application No WO 94/ 06920, Roux et al., 1989, Julan et al., 1992 and Neda et al., 1991.
Yet another viral vector system that is contemplated by the invention consists in the adenoassociated virus (AAV). The adeno-associated virus is a naturally occurring defective virus that requires another virus, such as an adenovirus or a herpes virus, as a helper virus for efficient replication and a productive life cycle (Muzyczka et al., 1992). It is also one of the few viruses that may integrate its DNA into non-dividing cells, and exhibits a high frequency of stable integration (Flotte et al., 1992; Samulski et al., 1989; McLaughlin et al., 1989). One advantageous feature of AAV derives from its reduced efficacy for transducing primary cells relative to transformed cells.
Other compositions containing a vector of the invention advantageously comprise an oligonucleotide fragment of a nucleic sequence selected from the group consisting of SEQ ID Nos 3 or 4 as an antisense tool that inhibits the expression of the corresponding TBC-1 gene. Preferred methods using antisense polynucleotide according to the present invention are the procedures described by Sczakiel et al. (1995) or those described in PCT Application No WO 95/24223.
Host cells Another object of the invention consists in host cell that have been transformed or transfected with one of the polynucleotides described therein, and more precisely a polynucleotide either comprising a TBC-1 regulatory polynucleotide or the coding sequence of the TBC-1 polypeptide having the amino acid sequence of SEQ ID No 5. Are included host cells that are transformed (prokaryotic cells) or that are transfected (eukaryotic cells) with a recombinant vector such as one of those described above.
A recombinant host cell of the invention comprises any one of the polynucleotides or the recombinant vectors described therein. More particularly, the cell hosts of the present invention can comprise any of the polynucleotides described in "TBC-1 cDNA Sequences" section, the "Coding Regions" section, "Genomic sequence of TBC-1 section and the "Oligonucleotide Probes And Primers" section.
Another preferred recombinant cell host according to the present invention is characterized in that its genome or genetic background (including chromosome, plasmids) is modified by the nucleic acid coding for the TBC-1 polypeptide of SEQ ID No WO 00/08209 PCT/IB99/01444 62 Preferred host cells used as recipients for the expression vectors of the invention are the following: a) Prokaryotic host cells Escherichia coli strains DH5-ca strain) or Bacillus subtilis.
b) Eukaryotic host cells HeLa cells (ATCC NoCCL2; N°CCL2.1; NOCCL2.2), Cv 1 cells (ATCC NOCCL70), COS cells (ATCC NOCRL1650; NoCRL1651), Sf-9 cells (ATCC NOCRL1711).
The constructs in the host cells can be used in a conventional manner to produce the gene product encoded by the recombinant sequence.
Following transformation of a suitable host and growth of the host to an appropriate cell density, the selected promoter is induced by appropriate means, such as temperature shift or chemical induction, and cells are cultivated for an additional period.
Cells are typically harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract retained for further purification.
Microbial cells employed in the expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents. Such methods are well known by the skill artisan.
Transgenic animals The terms "transgenic animals" or "host animals" are used herein to designate animals that have their genome genetically and artificially manipulated so as to include one of the nucleic acids according to the invention. Preferred animals are non-human mammals and include those belonging to a genus selected from Mus mice), Rattus rats) and Oryctogalus rabbits) which have their genome artificially and genetically altered by the insertion of a nucleic acid according to the invention.
The transgenic animals of the invention all include within a plurality of their cells a cloned recombinant or synthetic DNA sequence, more specifically one of the purified or isolated nucleic acids comprising a TBC-1 coding sequence, a TBC-1 regulatory polynucleotide or a DNA sequence encoding an antisense polynucleotide such as described in the present specification.
More particularly, transgenic animals according to the invention contain in their somatic cells and/or in their germ line cells any of the polynucleotides described in "TBC-1 cDNA Sequences" section, the "Coding Regions" section, "Genomic sequence of TBC-J section, the "Oligonucleotide Probes And Primers" section and the "Vectors for the expression of a regulatory or coding polynucleotide of TBC-I" section.
The transgenic animals of the invention thus contain specific sequences of exogenous genetic material such as the nucleotide sequences described above in detail.
In a first preferred embodiment, these transgenic animals may be good experimental models in order to study the diverse pathologies related to cell differentiation, in particular concerning the WO 00/08209 PCT/IB99/01444 63 transgenic animals within the genome of which has been inserted one or several copies of a polynucleotide encoding a native TBC-1 protein, or alternatively a mutant TBC-1 protein.
In a second preferred embodiment, these transgenic animals may express a desired polypeptide of interest under the control of the regulatory polynucleotides of the TBC-1 gene, leading to good yields in the synthesis of this protein of interest, and eventually a tissue specific expression of this protein of interest.
Since it is possible to produce transgenic animals of the invention using a variety of different sequences, a general description will be given of the production of transgenic animals by referring generally to exogenous genetic material. This general description can be adapted by those skilled in the art in order to incorporate the DNA sequences into animals. For more details regarding the production of transgenic animals, and specifically transgenic mice, it may be referred to Sandou et al. (1994) and also to US Patents Nos 4,873,191, issued Oct.10, 1989, 5,968,766, issued Dec. 16, 1997 and 5,387,742, issued Feb. 28, 1995, these documents being herein incorporated by reference to disclose methods for producing transgenic mice.
Transgenic animals of the present invention are produced by the application of procedures which result in an animal with a genome that incorporates exogenous genetic material which is integrated into the genome. The procedure involves obtaining the genetic material, or a portion thereof, which encodes either a TBC-1 coding sequence, a TBC-1 regulatory polynucleotide or a DNA sequence encoding an antisense polynucleotide such as described in the present specification.
A recombinant polynucleotide of the invention is inserted into an embryonic or ES stem cell line. The insertion is made using electroporation. The cells subjected to electroporation are screened Southern blot analysis) to find positive cells which have integrated the exogenous recombinant polynucleotide into their genome. An illustrative positive-negative selection procedure that may be used according to the invention is described by Mansour et al. (1988). Then, the positive cells are isolated, cloned and injected into 3.5 days old blastocysts from mice. The blastocysts are then inserted into a female host animal and allowed to grow to term. The offsprings of the female host are tested to determine which animals are transgenic e.g. include the inserted exogenous DNA sequence and which are wild-type.
Screening Of Agents Interacting With TBC-1 In a further embodiment, the present invention also concerns a method for the screening of new agents, or candidate substances interacting with TBC-1. These new agents could be useful against cancer.
In a preferred embodiment, the invention relates to a method for the screening of candidate substances comprising the following steps: providing a cell line, an organ, or a mammal expressing a TBC-1 gene or a fragment thereof, preferably the regulatory region or the promoter region of the TBC-1 gene.
WO 00/08209 PCT/IB99/01444 64 obtaining a candidate substance preferably a candidate substance capable of inhibiting the binding of a transcription factor to the TBC-1 regulatory region, testing the ability of the candidate substance to decrease the symptoms of prostate cancer and/or to modulate the expression levels of TBC-1.
In some embodiments, the cell line, organ or mammal expresses a heterologous protein, the coding sequence of which is operably linked to the TBC-1 regulatory or promoter sequence. In other embodiments, they express a TBC-1 gene comprising alleles of one or more TBC-l-related biallelic markers.
A candidate substance is a substance which can interact with or modulate, by binding or other intramolecular interactions, expression, stability, and function of TBC-1. Such substances may be potentially interesting for patients who are not responsive to existing drugs or develop side effects to them. Screening may be effected using either in vitro methods or in vivo methods.
Such methods can be carried out in numerous ways such as on transformed cells which express the considered alleles of the TBC-1 gene, on tumors induced by said transformed cells, for example in mice, or on a TBC-1 protein encoded by the considered allelic variant of TBC-1.
Screening assays of the present invention generally involve determining the ability of a candidate substance to present a cytotoxic effect, to change the characteristics of transformed cells such as proliferative and invasive capacity, to affect the tumor growth, or to modify the expression level of TBC-1.
Typically, this method includes preparing transformed cells with different forms of TBC-I sequences containing particular alleles of one or more biallelic markers and/or trait causing mutations described above. This is followed by testing the cells expressing the TBC-1 with a candidate substance to determine the ability of the substance to present cytotoxic effect, to affect the characteristics of transformed cells, the tumor growth, or to modify the expression level of TBC-1.
Typical examples of such drug screening assays are provided below. It is to be understood that the parameters set forth in these examples can be modified by the skilled person without undue experimentation.
Methods for screening substances interacting with a TBC-1 polypeptide A method for the screening of a candidate substance according to the invention comprises the following steps a)providing a polypeptide comprising the amino acid sequence SEQ ID No 5, or a peptide fragment or a variant thereof; b) obtaining a candidate substance; c) bringing into contact said polypeptide with said candidate substance; d) detecting the complexes formed between said polypeptide and said candidate substance.
For the purpose of the present invention, a ligand means a molecule, such as a protein, a peptide, an antibody or any synthetic chemical compound capable of binding to the TBC-1 protein WO 00/08209 PCT/IB99/01444 or one of its fragments or variants or to modulate the expression of the polynucleotide coding for TBC-1 or a fragment or variant thereof.
In the ligand screening method according to the present invention, a biological sample or a defined molecule to be tested as a putative ligand of the TBC-1 protein is brought into contact with a purified TBC-1 protein, for example a purified recombinant TBC-1 protein produced by a recombinant cell host as described hereinbefore, in order to form a complex between the TBC-1 protein and the putative ligand molecule to be tested.
A. Candidate ligands obtained form random peptide libraries In a particular embodiment of the screening method, the putative ligand is the expression product ofa DNA insert contained in a phage vector (Parmley and Smith, 1988). Specifically, random peptide phages libraries are used. The random DNA inserts encode peptides of 8 to aminoacids in length (Oldenburg K.R. et al., 1992,.; Valadon et al., 1996; Lucas 1994; Westerink 1995; Castagnoli L. et al., 1991). According to this particular embodiment, the recombinant phages expressing a protein that binds to the immobilized TBC-1 protein are retained and the complex formed between the TBC-1 protein and the recombinant phage may be subsequently immunoprecipitated by a polyclonal or a monoclonal antibody directed against the TBC-1 protein.
Once the ligand library in recombinant phages has been constructed, the phage population is brought into contact with the immobilized TBC-1 protein. Then the preparation of complexes is washed in order to remove the non-specifically bound recombinant phages. The phages that bind specifically to the TBC-1 protein are then eluted by a buffer (acid pH) or immunoprecipitated by the anti-TBC-lmonoclonal antibody produced by a hybridoma, and this phage population is subsequently amplified by an over-infection of bacteria (for example E. coli). The selection step may be repeated several times, preferably 2-4 times, in order to select the more specific recombinant phage clones. The last step consists in characterizing the peptide produced by the selected recombinant phage clones either by expression in infected bacteria and isolation, expressing the phage insert in another host-vector system, or sequencing the insert contained in the selected recombinant phages.
B. Candidate ligands obtained through a two-hybrid screening assay.
The yeast two-hybrid system is designed to study protein-protein interactions in vivo (Fields and Song, 1989), and relies upon the fusion of a bait protein to the DNA binding domain of the yeast Gal4 protein. This technique is also described in US Patent No US 5,667,973 and US Patent N° 5,283,173 (Fields et al.) the technical teachings of both patents being herein incorporated by reference.
The general procedure of library screening by the two-hybrid assay may be performed as described by Harper et al. (Harper JW et al., 1993) or as described by Cho et al. (1998) or also Fromont-Racine et al. (1997).
WO 00/08209 PCT/IB99/01444 66 The bait protein or polypeptide consists of a TBC-1 polypeptide or a fragment or variant thereof.
More precisely, the nucleotide sequence encoding the TBC-1 polypeptide or a fragment or variant thereof is fused to a polynucleotide encoding the DNA binding domain of the GAL4 protein, the fused nucleotide sequence being inserted in a suitable expression vector, for example pAS2 or pM3.
Then, a human cDNA library is constructed in a specially designed vector, such that the human cDNA insert is fused to a nucleotide sequence in the vector that encodes the transcriptional domain of the GAL4 protein. Preferably, the vector used is the pACT vector. The polypeptides encoded by the nucleotide inserts of the human cDNA library are termed "pray" polypeptides.
A third vector contains a detectable marker gene, such as beta galactosidase gene or CAT gene that is placed under the control of a regulation sequence that is responsive to the binding of a complete Gal4 protein containing both the transcriptional activation domain and the DNA binding domain. For example, the vector pG5EC may be used.
Two different yeast strains are also used. As an illustrative but non limiting example the two different yeast strains may be the following Y190, the phenotype of which is (MATa, Leu2-3, 112 ura3-12, trpl-901, his3-D200, ade2-101, gal4Dgall80D URA3 GAL-LacZ, LYS GAL-HIS3, cyhr); Y187, the phenotype of which is (MATa gal4 gal80 his3 trpl-901 ade2-101 ura3-52 leu2-3, 112 URA3 GAL-lacZmet), which is the opposite mating type of Y190.
Briefly, 20 pg of pAS2/TBC-l and 20 ug of pACT-cDNA library are co-transformed into yeast strain Y 190. The transformants are selected for growth on minimal media lacking histidine, leucine and tryptophan, but containing the histidine synthesis inhibitor 3-AT (50 mM). Positive colonies are screened for beta galactosidase by filter lift assay. The double positive colonies (His', beta-gal are then grown on plates lacking histidine, leucine, but containing tryptophan and cycloheximide (10 mg/ml) to select for loss of pAS2/TBC- plasmids but retention of pACT-cDNA library plasmids. The resulting YI90 strains are mated with Y187 strains expressing TBC-1 or nonrelated control proteins; such as cyclophilin B, lamin, or SNF1, as Gal4 fusions as described by Harper et al. (1993) and by Bram et al. (1993), and screened for beta galactosidase by filter lift assay. Yeast clones that are beta gal- after mating with the control Gal4 fusions are considered false positives.
In another embodiment of the two-hybrid method according to the invention, the interaction between TBC-1 or a fragment or variant thereof with cellular proteins may be assessed using the Matchmaker Two Hybrid System 2 (Catalog No. K1604-1, Clontech).). As described in the manual accompanying the Matchmaker Two Hybrid System 2 (Catalog No. K1604-1, Clontech), the disclosure of which is incorporated herein by reference, nucleic acids encoding the TBC-1 protein or a portion thereof, are inserted into an expression vector such that they are in frame with DNA encoding the DNA WO 00/08209 PCT/IB99/01444 67 binding domain of the yeast transcriptional activator GAL4. A desired cDNA, preferably human cDNA, is inserted into a second expression vector such that they are in frame with DNA encoding the activation domain of GAL4. The two expression plasmids are transformed into the yeast cells and the yeast cells are plated on selection medium which selects for expression of selectable markers on each of the expression vectors as well as GAL4 dependent expression of the HIS3 gene. Transformants capable of growing on medium lacking histidine are screened for GAL4 dependent lacZ expression.
Those cells which are positive in both the histidine selection and the lacZ assay are those in which an interaction between TBC-1 and the protein or peptide encoded by the initially selected cDNA insert has taken place.
Method for screening ligands that modulate the expression of the TBC-1 gene.
Another subject of the present invention is a method for screening molecules that modulate the expression of the TBC-1 protein. Such a screening method comprises the steps of: a) cultivating a prokaryotic or an eukaryotic cell that has been transfected with a nucleotide sequence encoding the TBC-1 protein, operably linked to a TBC-1 5'-regulatory sequence; b) bringing into contact the cultivated cell with a molecule to be tested; c) quantifying the expression of the TBC-1 protein.
Using DNA recombination techniques well known by the one skill in the art, the TBC-1 protein encoding DNA sequence is inserted into an expression vector, downstream from a TBC-1 sequence that contains a TBC-1 promoter sequence.
The quantification of the expression of the TBC-1 protein may be realized either at the mRNA level or at the protein level. In the latter case, polyclonal or monoclonal antibodies may be used to quantify the amounts of the TBC-1 protein that have been produced, for example in an ELISA or a RIA assay.
In a preferred embodiment, the quantification of the TBC-1 mRNAs is realized by a quantitative PCR amplification of the cDNAs obtained by a reverse transcription of the total mRNA of the cultivated TBC-1-transfected host cell, using a pair of primers specific for TBC-1.
Expression levels and patterns of TBC-1 may be analyzed by solution hybridization with long probes as described in International Patent Application No. WO 97/05277, the entire contents of which are incorporated herein by reference. Briefly, the TBC-1 cDNA or the TBC-1 genomic DNA described above, or fragments thereof, is inserted at a cloning site immediately downstream of a bacteriophage (T3, T7 or SP6) RNA polymerase promoter to produce antisense RNA. Preferably, the TBC-1 insert comprises at least 100 or more consecutive nucleotides of the genomic DNA sequence or the cDNA sequences, particularly those comprising one of the nuceotide sequences of SEQ ID Nos 3, 4 and 6-8 or those encoding a mutated TBC-1. The plasmid is linearized and transcribed in the presence of ribonucleotides comprising modified ribonucleotides biotin-UTP and DIG-UTP). An excess of this doubly labeled RNA is hybridized in solution with mRNA isolated from cells or tissues of interest.
The hybridizations are performed under standard stringent conditions (40-50 0 C for 16 hours in an WO 00/08209 PCT/IB99/01444 68 formamide, 0.4 M NaCl buffer, pH The unhybridized probe is removed by digestion with ribonucleases specific for single-stranded RNA RNases CL3, TI, Phy M, U2 or The presence of the biotin-UTP modification enables capture of the hybrid on a microtitration plate coated with streptavidin. The presence of the DIG modification enables the hybrid to be detected and quantified by ELISA using an anti-DIG antibody coupled to alkaline phosphatase.
Quantitative analysis of TBC-1 gene expression may also be performed using arrays. As used herein, the term array means a one dimensional, two dimensional, or multidimensional arrangement of a plurality of nucleic acids of sufficient length to permit specific detection of expression of mRNAs capable of hybridizing thereto. For example, the arrays may contain a plurality of nucleic acids derived from genes whose expression levels are to be assessed. The arrays may include the TBC-1 genomic DNA, the TBC-1 cDNA sequences or the sequences complementary thereto or fragments thereof, particularly those comprising at least one of the biallelic markers according the present invention. Preferably, the fragments are at least nucleotides in length. In other embodiments, the fragments are at least 25 nucleotides in length. In some embodiments, the fragments are at least 50 nucleotides in length. More preferably, the fragments are at least 100 nucleotides in length. In another preferred embodiment, the fragments are more than 100 nucleotides in length. In some embodiments the fragments may be more than 500 nucleotides in length.
For example, quantitative analysis of TBC-1 gene expression may be performed with a complementary DNA microarray as described by Schena et al. (1995). Full length TBC-1 cDNAs or fragments thereof are amplified by PCR and arrayed from a 96-well microtiter plate onto silylated microscope slides using high-speed robotics. Printed arrays are incubated in a humid chamber to allow rehydration of the array elements and rinsed, once in 0.2% SDS for 1 min, twice in water for 1 min and once for 5 min in sodium borohydride solution. The arrays are submerged in water for 2 min at 95 0 C, transferred into 0.2% SDS for 1 min, rinsed twice with water, air dried and stored in the dark at 25 0
C.
Cell or tissue mRNA is isolated or commercially obtained and probes are prepared by a single round of reverse transcription. Probes are hybridized to 1 cm 2 microarrays under a 14 x 14 mm glass coverslip for 6-12 hours at 60 0 C. Arrays are washed for 5 min at 25 0 C in low stringency wash buffer (1 x SSC/0.2% SDS), then for 10 min at room temperature in high stringency wash buffer (0.1 x SSC/0.2% SDS). Arrays are scanned in 0.1 x SSC using a fluorescence laser scanning device fitted with a custom filter set. Accurate differential expression measurements are obtained by taking the average of the ratios of two independent hybridizations.
Quantitative analysis of TBC-I gene expression may also be performed with full length TBC-1 cDNAs or fragments thereof in complementary DNA arrays as described by Pietu et al.
(1996). The full length TBC-1 cDNA or fragments thereof is PCR amplified and spotted on membranes. Then, mRNAs originating from various tissues or cells are labeled with radioactive WO 00/08209 PCT/IB99/01444 69 nucleotides. After hybridization and washing in controlled conditions, the hybridized mRNAs are detected by phospho-imaging or autoradiography. Duplicate experiments are performed and a quantitative analysis of differentially expressed mRNAs is then performed.
Alternatively, expression analysis using the TBC-1 genomic DNA, the TBC-1 cDNAs, or fragments thereof can be done through high density nucleotide arrays or chips as described by Lockhart et al. (1996) and Sosnowsky et al. (1997). Oligonucleotides of 15-50 nucleotides from the sequences of the TBC-1 genomic DNA, the TBC-1 cDNA sequences particularly those comprising at least one of biallelic markers according the present invention, preferably at least one of SEQ ID No 7-8 or those comprising the trait causing mutation, or the sequences complementary thereto, are synthesized directly on the chip (Lockhart et al., supra) or synthesized and then addressed to the chip (Sosnowski et al., supra). Preferably, the oligonucleotides are about 20 nucleotides in length.
TBC-1 cDNA probes labeled with an appropriate compound, such as biotin, digoxigenin or fluorescent dye, are synthesized from the appropriate mRNA population and then randomly fragmented to an average size of 50 to 100 nucleotides. The said probes are then hybridized to the chip. After washing as described in Lockhart et al., supra and application of different electric fields (Sosnowsky et al., 1997)., the dyes or labeling compounds are detected and quantified. Duplicate hybridizations are performed. Comparative analysis of the intensity of the signal originating from cDNA probes on the same target oligonucleotide in different cDNA samples indicates a differential expression of TBC-1 mRNAs.
Thus, is also part of the present invention a method for screening of a candidate substance or molecule that modulates the expression of the TBC-1 gene according to the invention, wherein this method comprises the following steps a) providing a recombinant cell host containing a nucleic acid, wherein said nucleic acid comprises the 5' regulatory region sequence or a biologically active fragment or variant thereof, the 5' regulatory region or its biologically active fragment or variant being operably linked to a polynucleotide encoding a detectable protein; b) obtaining a candidate substance, and c) determining the ability of the candidate substance to modulate the expression levels of the polynucleotide encoding the detectable protein.
In a preferred embodiment of the above screening method, the nucleic acid comprising the regulatory region sequence or a biologically active fragment or variant thereof also includes a region of one of the TBC-1 cDNAs of SEQ ID Nos 3 and 4, or one of their biologically active fragments or variants thereof.
A second method for the screening of a candidate substance or molecule that modulates the expression of the TBC-1 gene comprises the following steps a) providing a recombinant cell host containing a nucleic acid, wherein said nucleic acid comprises a 5'UTR sequence of one of the TBC-1 cDNAs of SEQ ID Nos 3 and 4, or one of their WO 00/08209 PCT/IB99/01444 biologically active fragments or variants, the 5'UTR sequence or its biologically active fragment or variant being operably linked to a polynucleotide encoding a detectable protein; b) obtaining a candidate substance, and c) determining the ability of the candidate substance to modulate the expression levels of the polynucleotide encoding the detectable protein.
In a preferred embodiment of the screening method described above, the nucleic acid that comprises a nucleotide sequence selected from the group consisting of the 5'UTR sequence of one of the TBC-1 cDNAs of SEQ ID Nos 3 and 4 or one of their biologically active fragments or variants, includes a promoter sequence, wherein said promoter sequence can be either endogenous, or in contrast exogenous with respect to the TBC-1 5'UTR sequences defined therein.
Among the preferred polynucleotides encoding a detectable protein, there may be cited polynucleotides encoding beta galactosidase, green fluorescent protein (GFP) and chloramphenicol acetyl transferase (CAT).
For the design of suitable recombinant vectors useful for performing the screening methods described above, it will be referred to the section of the present specification wherein the preferred recombinant vectors of the invention are detailed.
Screening using transgenic animals In vivo methods can utilize transgenic animals for drug screening. Nucleic acids including at least one of the biallelic polymorphisms of interest can be used to generate genetically modified non-human animals or to generate site specific gene modifications in cell lines. The term "transgenic" is intended to encompass genetically modified animals having a deletion or other knock-out of TBC-1 gene activity, having an exogenous TBC-1 gene that is stably transmitted in the host cells, or having an exogenous TBC-1 promoter operably linked to a reporter gene. Transgenic animals may be made through homologous recombination, where the TBC-1 locus is altered.
Alternatively, a nucleic acid construct is randomly integrated into the genome. Vectors for stable integration include for example plasmids, retroviruses and other animal viruses, and YACs. Of interest are transgenic mammals e.g. cows, pigs, goats, horses, and particularly rodents such as rats and mice. Transgenic animals allow to study both efficacy and toxicity of the candidate drug.
Methods for inhibiting the expression of a TBC-1 gene Other therapeutic compositions according to the present invention comprise advantageously an oligonucleotide fragment of the nucleic sequence of TBC-1 as an antisense tool that inhibits the expression of the corresponding TBC-1 gene. Preferred methods using antisense polynucleotide according to the present invention are the procedures described by Sczakiel et al. (1995).
Preferably, the antisense tools are chosen among the polynucleotides (15-200 bp long) that are complementary to the 5'end of the TBC-1 mRNA. In another embodiment, a combination of different antisense polynucleotides complementary to different parts of the desired targetted gene are used.
WO 00/08209 PCT/IB99/01444 71 Preferred antisense polynucleotides according to the present invention are complementary to a sequence of the mRNAs of TBC-1 that contains the translation initiation codon ATG.
The antisense nucleic acid molecules to be used in gene therapy may be either DNA or RNA sequences. They comprise a nucleotide sequence complementary to the targeted sequence of the PTCA-1 genomic DNA, the sequence of which can be determined using one of the detection methods of the present invention. The targeted DNA or RNA sequence preferably comprises at least one of the biallelic markers according to the present invention. The antisense nucleic acids should have a length and melting temperature sufficient to permit formation of an intracellular duplex having sufficient stability to inhibit the expression of the TBC-1 mRNA in the duplex. Strategies for designing antisense nucleic acids suitable for use in gene therapy are disclosed in Green et al., (1986) and Izant and Weintraub, (1984), the disclosures of which are incorporated herein by reference.
In some strategies, antisense molecules are obtained by reversing the orientation of the TBC-1 coding region with respect to a promoter so as to transcribe the opposite strand from that which is normally transcribed in the cell. The antisense molecules may be transcribed using in vitro transcription systems such as those which employ T7 or SP6 polymerase to generate the transcript.
Another approach involves transcription of TBC-1 antisense nucleic acids in vivo by operably linking DNA containing the antisense sequence to a promoter in a suitable expression vector.
Alternatively, suitable antisense strategies are those described by Rossi et al. (1991), in the International Applications Nos. WO 94/23026, WO 95/04141, WO 92/18522 and in the European Patent Application No. EP 0 572 287 A2 An alternative to the antisense technology that is used according to the present invention consists in using ribozymes that will bind to a target sequence via their complementary polynucleotide tail and that will cleave the corresponding RNA by hydrolyzing its target site (namely hammerhead ribozymes Briefly, the simplified cycle of a hammerhead ribozyme consists of(1) sequence specific binding to the target RNA via complementary antisense sequences; site-specific hydrolysis of the cleavable motif of the target strand; and release of cleavage products, which gives rise to another catalytic cycle. Indeed, the use of long-chain antisense polynucleotide (at least 30 bases long) or ribozymes with long antisense arms are advantageous. A preferred delivery system for antisense ribozyme is achieved by covalently linking these antisense ribozymes to lipophilic groups or to use liposomes as a convenient vector. Preferred antisense ribozymes according to the present invention are prepared as described by Sczakiel et al. (1995), the specific preparation procedures being referred to in said article being herein incorporated by reference.
Throughout this application, various publications, patents and published patent applications are cited. The disclosures of these publications, patents and published patent specification WO 00/08209 PCT/IB99/01444 72 referenced in this application are hereby incorporated by reference into the present disclosure to more fully describe the sate of the art to which this invention pertains.
EXAMPLES
EXAMPLE 1: Analysis of the first mRNA encoding a TBC-1 polypeptide synthesized by the cells.
TBC-1 cDNA was obtained as follows 4l of ethanol suspension containing 1 mg of human prostate total RNA (Clontech laboratories, Inc., Palo Alto, USA; Catalogue N. 64038-1) was centrifuged, and the resulting pellet was air dried for 30 minutes at room temperature.
First strand cDNA synthesis was performed using the AdvantageTM RT-for- PCR kit (Clontech laboratories Inc., catalogue N. K1402-1). 1 pl of 20 mM solution of a specific oligo dT primer was added to 12.5 pl of RNA solution in water, heated at 74 0 C for 2.5 min and rapidly quenched in an ice bath. 10 pl of 5 x RT buffer (50 mM Tris-HC1, pH 8.3, 75 mM KC1, 3 mM MgC12), 2.5 pi of dNTP mix (10 mM each), 1.25 pl of human recombinant placental RNA inhibitor were mixed with 1 ml of MMLV reverse transcriptase (200 units). 6.5 pl of this solution were added to RNA-primer mix and incubated at 42 0 C for one hour. 80 pl of water were added and the solution was incubated at 94C for 5 minutes.
of the resulting solution were used in a Long Range PCR reaction with hot start, in pl final volume, using 2 units ofrtTHXL, 20 pmol/pl of each of TGACCACCATGCCCATGCT-3' (271-289 in SEQ ID No 3) and GCATTTATTCACGTCCACGCC-3' (3929-3949 in SEQ ID No 3) primers with 35 cycles of elongation for 6 minutes at 67 0 C in thermocycler.
The amplification products corresponding to both cDNA strands were partially sequenced in order to ensure the specificity of the amplification reaction.
Results of Nothem blot analysis of prostate mRNAs supported the existence of the first TBC-1 cDNA having about 4 kb in length, which is the nucleotide sequence of SEQ ID No 3.
Example 2 Detection of TBC-1 biallelic markers: DNA extraction Donors were unrelated and healthy. They presented a sufficient diversity for being representative of a French heterogeneous population. The DNA from 100 individuals was extracted and tested for the detection of the biallelic markers.
ml of peripheral venous blood were taken from each donor in the presence of EDTA.
Cells (pellet) were collected after centrifugation for 10 minutes at 2000 rpm. Red cells were lysed by a lysis solution (50 ml final volume 10 mM Tris pH7.6; 5 mM MgCl 2 10 mM NaCI). The WO 00/08209 PCT/IB99/01444 73 solution was centrifuged (10 minutes, 2000 rpm) as many times as necessary to eliminate the residual red cells present in the supernatant, after resuspension of the pellet in the lysis solution.
The pellet of white cells was lysed overnight at 42 0 C with 3.7 ml of lysis solution composed of: 3 ml TE 10-2 (Tris-HCl 10 mM, EDTA 2 mM) NaCI 0.4 M 200 pl SDS 500 pl K-proteinase (2 mg K-proteinase in TE 10-2 NaCI 0.4 M).
For the extraction of proteins, 1 ml saturated NaCI (6M) (1/3.5 v/v) was added. After vigorous agitation, the solution was centrifuged for 20 minutes at 10000 rpm.
For the precipitation ofDNA, 2 to 3 volumes of 100% ethanol were added to the previous supernatant, and the solution was centrifuged for 30 minutes at 2000 rpm. The DNA solution was rinsed three times with 70% ethanol to eliminate salts, and centrifuged for 20 minutes at 2000 rpm.
The pellet was dried at 37 0 C, and resuspended in 1 ml TE 10-1 or 1 ml water. The DNA concentration was evaluated by measuring the OD at 260 nm (1 unit OD 50 lig/ml DNA).
To determine the presence of proteins in the DNA solution, the OD 260 OD 280 ratio was determined. Only DNA preparations having a OD 260 OD 280 ratio between 1.8 and 2 were used in the subsequent examples described below.
The pool was constituted by mixing equivalent quantities of DNA from each individual.
Example 3 Detection of the biallelic markers: amplification of genomic DNA by PCR The amplification of specific genomic sequences of the DNA samples of example 2 was carried out on the pool of DNA obtained previously. In addition, 50 individual samples were similarly amplified.
PCR assays were performed using the following protocol: Final volume 25 pl DNA 2 ng/pl MgC12 2 mM dNTP (each) 200 itM primer (each) 2.9 ng/pl Ampli Taq Gold DNA polymerase 0.05 unit/lil PCR buffer (10x 0.1 M TrisHCl pH8.3 0.5M KCI lx Each pair of first primers was designed using the sequence information of the TBC-1 gene disclosed herein and the OSP software (Hillier Green, 1991). This first pair of primers was about nucleotides in length and had the sequences disclosed in Table 1 in the columns labeled PU and
RP.
WO 00/08209 PCT/IB99/01444 74 Table 1 Position Primer Position range of Primer Complementary Amplicon range of the name amplification primer name position range of amplicon in in SEQ ID No 1 amplification primer SEQ ID 1 in SEQ ID No 1 99-430 9391 9845 B1 9391 9408 Cl 9828 9845 Position Primer Position range of Primer Complementary Amplicon range of the name amplification primer name position range of amplicon in in SEQ ID No 2 amplification primer SEQ ID 2 in SEQ ID No 2 99-20508 988 1529 B2 988 1006 C2 1509 1529 99-20469 5039 5554 B3 5039 5056 C3 5534 5554 5-254 5997 6350 B4 5997 6015 C4 6332 6350 5-257 14371 14817 B5 14371 14390 C5 14798 14817 99-20511 18751 19217 B6 18751 18771 C6 19198 19217 99-20510 19605 20005 B7 19605 19625 C7 19986 20005 99-20504 29529 30061 B8 29529 29547 C8 30041 30061 99-20493 42268 42752 B9 42268 42287 C9 42732 42752 99-20499 69026 69543 B10 69026 69046 C10 69525 69543 99-20473 76323 76790 Bl1 76323 76343 Cl1 76771 76790 5-249 78292 78721 B12 78292 78309 C12 78704 78721 99-20485 81893 82372 B13 81893 81912 C13 82353 82372 99-20481 84392 84929 B14 84392 84412 C14 84909 84929 99-20480 89746 90198 B15 89746 89765 C15 90179 90198 Preferably, the primers contained a common oligonucleotide tail upstream of the specific bases targeted for amplification which was useful for sequencing.
Primers PU contain the following additional PU 5' sequence TGTAAAACGACGGCCAGT (SEQ ID No primers RP contain the following RP 5' sequence CAGGAAACAGCTATGACC (SEQ ID No 7).
The synthesis of these primers was performed following the phosphoramidite method, on a GENSET UFPS 24.1 synthesizer.
DNA amplification was performed on a Genius II thermocycler. After heating at 95 0 C for min, 40 cycles were performed. Each cycle comprised: 30 sec at 95 0 C, 54 0 C for 1 min, and sec at 72C. For final elongation, 10 min at 72 0 C ended the amplification. The quantities of the amplification products obtained were determined on 96-well microtiter plates, using a fluorometer and Picogreen as intercalant agent (Molecular Probes).
Example 4: Detection of the biallelic markers: sequencing of amplified genomic DNA and identification of polymorphisms.
The sequencing of the amplified DNA obtained in example 3 was carried out on ABI 377 sequencers. The sequences of the amplification products were determined using automated dideoxy terminator sequencing reactions with a dye terminator cycle sequencing protocol. The products of WO 00/08209 PCT/IB99/01444 the sequencing reactions were run on sequencing gels and the sequences were determined using gel image analysis [ABI Prism DNA Sequencing Analysis software (2.1.2 version)].
The sequence data were further evaluated to detect the presence of biallelic markers among the pooled amplified fragments. The polymorphism search was based on the presence of superimposed peaks in the electrophoresis pattern resulting from different bases occurring at the same position as described previously.
fragments of amplification was analyzed. In this segment, 19 biallelic markers were detected. The localization of the biallelic marker is as shown in Table 2.
Table 2 Amplicon BM Marker Localization Polymorphism BM position in Name in TBC-1 gene Allele 1 allele 2 SEQ ID No 1 99-430 Al 99-430-352 Intron 1 A G 9494 Amplicon BM Marker Localization Polymorphism BM position in Name in TBC-1 gene allele 1 allele 2 SEQ ID No 2 99-20508 A2 99-20508-456 Intron C T 1443 upstream to Exon A 99-20469 A3 99-20469-213 Intron A C T 5247 5-254 A4 5-254-227 Intron B A G 6223 5-257 A5 5-257-353 Intron D C T 14723 99-20511 A6 99-20511-32 Intron D C T 19186 99-20511 A7 99-20511-221 Intron D A G 18997 99-20510 A8 99-20510-115 Intron D deletion of 19891
TCT
99-20504 A9 99-20504-90 Intron D A G 29617 99-20493 A10 99-20493-238 Intron D A C 42519 99-20499 All 99-20499-221 Intron G A G 69324 99-20499 A12 99-20499-364 Intron G A T 69181 99-20499 A13 99-20499-399 Intron G A G 69146 99-20473 A14 99-20473-138 Intron H deletion of 76458
__TAACA
5-249 A15 5-249-304 Intron I A G 78595 99-20485 A16 99-20485-269 Intron I A G 82159 99-20481 A17 99-20481-131 Intron I G C 84522 99-20481 A18 99-20481-419 Intron I A T 84810 99-20480 A19 99-20480-233 Intron J A G 89967 f^ o- 1 BM reters to "biallelic marker". Alll and a112 refer respectively to allele 1 the biallelic marker.
and allele 2 of Table 3 BM Marker Name Position range Probes of probes in SEQ ID No 1 Al 99-430-352 9482 9506 P1 WO 00/08209 PCT/IB99/01444 BM Marker Name Position range Probes of probes in SEQ ID No 2 A2 99-20508-456 1431 1455 P2 A3 99-20469-213 5235 5259 P3 A4 5-254-227 6211 6235 P4 5-257-353 14711 14735 A6 99-20511-32 19174 19198 P6 A7 99-20511-221 18985 19009 P7 A9 99-20504-90 29605 29629 P9 99-20493-238 42507 42531 All 99-20499-221 69312 69336 P11 A12 99-20499-364 69169 69193 P12 A13 99-20499-399 69134 69158 P13 5-249-304 78583 78607 A16 99-20485-269 82147 82171 P16 A17 99-20481-131 84510 84534 P17 A18 99-20481-419 84798 84822 P18 A19 99-20480-233 89955 89979 P19 Example 5 Validation of the polymorphisms through microsequencing The biallelic markers identified in example 4 were further confirmed and their respective frequencies were determined through microsequencing. Microsequencing was carried out for each individual DNA sample described in Example 2.
Amplification from genomic DNA of individuals was performed by PCR as described above for the detection of the biallelic markers with the same set of PCR primers (Table 1).
The preferred primers used in microsequencing were about 19 nucleotides in length and hybridized just upstream of the considered polymorphic base. According to the invention, the primers used in microsequencing are detailed in Table 4.
Table 4 Marker Name Biallelic Mis. 1 Position range of Mis. 2 Complementary position Marker microsequencing range of primer mis 1 in microsequencing primer SEQ ID No 1 mis. 2 in SEQ ID No 1 99-430-352 Al D1 9475 9493 El 9495 9513 Marker Name Biallelic Mis. 1 Position range of Mis. 2 Complementary position Marker microsequencing range of primer mis 1 in microsequencing primer SEQ ID No 2 mis. 2 in SEQID No 2 99-20508-456 A2 D2 1424 1442 E2 1444 1462 99-20469-213 A3 D3 5228 5246 E3 5248 5266 5-254-227 A4 D4 6204 6222 E4 6224 6242 5-257-353 A5 D5 14704 14722 E5 14724 14742 99-20511-32 A6 D6 19167 19185 E6 19187 19205 99-20511-221 A7 D7 18978 18996 E7 18998 19016 WO 00/08209 PCT/IB99/01444 99-20510-115 A8 D8 19872 19890 E8 19892 19910 99-20504-90 A9 D9 29598 29616 E9 29618 29636 99-20493-238 A10 D10 42500 42518 E10 42520 42538 99-20499-221 All D11 69305 69323 Ell 69325 69343 99-20499-364 A12 D12 69162 69180 E12 69182 69200 99-20499-399 A13 D13 69127 69145 E13 69147 69165 99-20473-138 A14 D14 76439 76457 E14 76459 76477 5-249-304 A15 D15 78576 78594 E15 78596 78614 99-20485-269 A16 D16 82140 82158 E16 82160 82178 99-20481-131 A17 D17 84503 84521 E17 84523 84541 99-20481-419 A18 D18 84791 84809 E18 84811 84829 99-20480-233 A19 D19 89948 89966 E19 89968 89986 The microsequencing reaction was performed as follows After purification of the amplification products, the microsequencing reaction mixture was prepared by adding, in a 20gl final volume: 10 pmol microsequencing oligonucleotide, 1 U Thermosequenase (Amersham E79000G), 1.25 pl Thermosequenase buffer (260 mM Tris HCI pH 65 mM MgC1 2 and the two appropriate fluorescent ddNTPs (Perkin Elmer, Dye Terminator Set 401095) complementary to the nucleotides at the polymorphic site of each biallelic marker tested, following the manufacturer's recommendations. After 4 minutes at 94°C, 20 PCR cycles of sec at 55 0 C, 5 sec at 72 0 C, and 10 sec at 94°C were carried out in a Tetrad PTC-225 thermocycler (MJ Research). The unincorporated dye terminators were then removed by ethanol precipitation. Samples were finally resuspended in formamide-EDTA loading buffer and heated for 2 min at 95 0 C before being loaded on a polyacrylamide sequencing gel. The data were collected by an ABI PRISM 377 DNA sequencer and processed using the GENESCAN software (Perkin Elmer).
Following gel analysis, data were automatically processed with software that allows the determination of the alleles of biallelic markers present in each amplified fragment.
The software evaluates such factors as whether the intensities of the signals resulting from the above microsequencing procedures are weak, normal, or saturated, or whether the signals are ambiguous. In addition, the software identifies significant peaks (according to shape and height criteria). Among the significant peaks, peaks corresponding to the targeted site are identified based on their position. When two significant peaks are detected for the same position, each sample is categorized classification as homozygous or heterozygous type based on the height ratio.
References Altschul et al., 1990, J. Mol. Biol. 215(3):403-410 Altschul et al., 1993, Nature Genetics 3:266-272 Altschul et al., 1997, Nuc. Acids Res. 25:3389-3402 Ausubel et al.
(1989)Current Protocols in Molecular Biology, Green Publishing Associates and Wiley Interscience, N.Y. Beaucage et al., Tetrahedron Lett 1981, 22: 1859-1862 Bram RJ et al., 1993, Mol. Cell Biol., 13 4760-4769. Brown EL, Belagaje R, Ryan MJ, Khorana HG, Methods Enzymol 1979;68:109-151 Castagnoli L. et al. (Felici 1991, J. Mol. Biol., 222:301-310. WO 00/08209 PCT/IB99/01444 78 Chai H. et al., 1993, Biotechnol. Appl. Biochem., 18:259-273 Chee et al. (1996) Science.
274:610-614. Chen and Kwok Nucleic Acids Research 25:347-353 1997 Chen et al. Proc.
Natl. Acad. Sci. USA 94/20 10756-10761,1997 Cho RJ et al., 1998, Proc. Natl. Acad. Sci. USA, 95(7): 3752-3757. Chumakov I. et al., 1995, Nature, 377(6547 Suppl): 175-297. Compton J.
(1991) Nature. 350(6313):91-92. Dib et al., 1996, Nature, 380: III-V. Ellis NA,1997 Curr.Op.Genet.Dev., 7 354-363 Feldman and Steg, 1996, Medecine/Sciences, synthese, 12:47- Fields and Song, 1989, Nature, Vol. 340 245-246. Fishel R Wilson T. 1997, Curr.Op.Genet.Dev.7: 105-113 Flotte et al., 1992, Am. J. Respir. Cell Mol. Biol., 7 349-356. Fodor et al. (1991) Science 251:767-777. Fromont-Racine M. et al., 1997, Nature Genetics, 16(3) 277-282. Fuller S.A. et al., 1996, Immunology in Current Protocols in Molecular Biology, Ausubel et al. Eds, John Wiley Sons, Inc., USA Geysen H. Mario et al. 1984. Proc. Natl.
Acad. Sci. U.S.A. 81:3998-4002 Gonnet et al., 1992, Science 256:1443-1445 Green et al., Ann. Rev. Biochem. 55:569-597 (1986) Grompe, M. et al., Proc. Natl. Acad. Sci. U.S.A 1989; 86:5855-5892 Grompe, M. Nature Genetics 1993; 5:111-117 Guatelli J C et al. Proc. Natl.
Acad. Sci. USA. 35:273-286. Haber D Harlow E, 1997, Nature Genet. 16:320-322. Hacia JG, Brody LC, Chee MS, Fodor SP, Collins FS, Nat Genet 1996;14(4):441-447 HaffL. A. and Smirov I. P. (1997) Genome Research, 7:378-388. Hames B.D. and Higgins S.J. (1985) Nucleic Acid Hybridization: A Practical Approach. Hames and Higgins Ed., IRL Press, Oxford. Harju L, et al., Clin Chem 1993;39(11Pt 1):2282-2287 Harper JW et al., 1993, Cell, Vol. 75 805-816. Harris H et al.,1969,Nature 223:363-368. Henikoff and Henikoff, 1993, Proteins 17:49-61 Higgins et al., 1996, Methods Enzymol. 266:383-402 Hillier L. and Green P.
Methods Appl., 1991, 1: 124-8. Huang L. et al. (1996) Cancer Res 56(5):1137-1141. Huygen et al., 1996, Nature Medicine, 2(8):893-898 Izant and Weintraub, Cell 36:1007-1015 (1984) Julan et al., 1992, J. Gen. Virol., 73 3251 3255. Karlin and Altschul, 1990, Proc. Natl. Acad.
Sci. USA 87:2267-2268 Koch 1977, Biochem. Biophys. Res. Commun., 74:488-491 Kohler G. and Milstein 1975, Nature, 256 495. Kozal MJ, et al., Nat Med 1996;2(7):753-759 Landegren U. et al. (1998) Genome Research, 8:769-776. Leger OJ, et al., 1997, Hum Antibodies, 3-16 Lenhard T. et al., 1996, Gene, 169:187-190 Livak et al., Nature Genetics, 9:341-342, 1995 Livak KJ, and Hainer JW., 1994, Hum Mutat., 379-385. Lockhart et al. Nature Biotechnology 14: 1675-1680, 1996 Lucas 1994, In Development and Clinical Uses ofHaempophilus b Conjugate. Mansour SL et al., 1988, Nature, 336 348-352.
Marshall R. L. et al. (1994) PCR Methods and Applications. 4:80-84. Martineau P, Jones P, Winter G, 1998, J Mol Biol, 280(1):117-127 Me Whorter et al. A screening study of prostate cancer in high risk families. J Urol 1992;148:826-828. McLaughlin et al., 1989, J. Virol., 62 :1963 1973. Muzyczka et al., 1992, Cuur. Topics in Micro. and Immunol., 158 97-129. Narang SA, Hsiung HM, Brousseau R, Methods Enzymol 1979;68:90-98 Neda et al., 1991, J.
Biol. Chem., 266: 14143 14146. Nickerson D.A. et al. (1990) Proc. Natl. Acad. Sci. U.S.A.
WO 00/08209 PCT/IB99/01444 79 87:8923-8927. Nyren P, Pettersson B, Uhlen M, Anal Biochem 1993;208(1):171-175 O'Reilly et al., 1992, Baculovirus expression vectors a Laboratory Manual. W.H. Freeman and Co., New York Ohno et al., 1994, Sciences, 265:781-784 Oldenburg K.R. et al., 1992, Proc.
Natl. Acad. Sci., 89:5393-5397. Orita et al., Proc. Natl. Acad. Sci. U.S.A. 1989;86: 2776-2770 Parmley and Smith, Gene, 1988, 73:305-318. Pastinen et al., Genome Research 1997; 7:606-614 PCR Methods and Applications", 1991, Cold Spring Harbor Laboratory Press. Pearson and Lipman, 1988, Proc. Natl. Acad. Sci. USA 85(8):2444-2448 Pietu et al. Genome Research 6:492-503, 1996 Porath J et al., 1975, Nature, 258(5536) 598-599. Reimann KA, et al., 1997, AIDS Res Hum Retroviruses. 13(11): 933-943 Ridder R, et al., 1995, Biotechnology (N Y), 13(3):255-260 Rossi et al., Pharmacol. Ther. 50:245-254, (1991) Roth J.A. et al., 1996, Nature Medicine, 2(9):985-991 Rougeot, C. et Eur. J. Biochem. 219 765-773, 1994 Roux et al., 1989, Proc. Natl Acad. Sci. USA, 86 9079 9083. Sambrook, et al. 1989. Molecular cloning: a laboratory manual. 2ed. Cold Spring Harbor Laboratory, Cold spring Harbor, New York.
Samson M, et al. (1996) Nature, 382(6593):722-725. Samulski et al., 1989, J. Virol., 63 3822-3828. Sanchez-Pescador 1988, J. Clin. Microbiol., 26(10):1934-1938 Sandou et al., 1994, Science, 265 1875-1878. Schena et al. Science 270:467-470, 1995 Schwartz and Dayhoff, eds., 1978, Matrices for Detecting Distance Relationships: Atlas of Protein Sequence and Structure, Washington: National Biomedical Research Foundation Sczakiel G. et al., 1995, Trends Microbiol., 1995, 3(6):213-217 Sheffield, V.C. et al, Proc. Natl. Acad. Sci. U.S.A 1991; 49:699-706 Shoemaker DD, et al., Nat Genet 1996;14(4):450-456 Smith et al., 1983, Mol.
Cell. Biol., 3:2156-2165. Sosnowski RG, et al., Proc Natl Acad Sci USA 1997;94:1119-1123 Steinberg et al. Family history and the risk of prostate cancer, The prostate 1990;17,337-347.
Stryer, Biochemistry, 4th edition, 1995 Syvanen AC, et al., 1994, Hum Mutat., 172- 179. Tacson et al., 1996, Nature Medicine, 2(8):888-892. Thompson et al., 1994, Nucleic Acids Res. 22(2):4673-4680 Tyagi et al. (1998) Nature Biotechnology. 16:49-53. Urdea 1988, Nucleic Acids Research, 11: 4937-4957 Urdea MS et al., 1991, Nucleic Acids Symp Ser., 24: 197-200. Valadon et al., 1996, J. Mol. Biol., Vol. 261:11-22. Vaughan TJ, et al., 1996, Nat Biotechnol. 14(3): 309-314 Vlasak R. et al., 1983, Eur. J. Biochem., 135:123-126 Wabiko et al., 1986, DNA, 5(4):305-314. Walker et al. (1996) Clin. Chem. 42:9-13. Westerink 1995, Proc. Natl. Acad. Sci., 92:4021-4025. White, M.B. et al. (1992) Genomics. 12:301-306. White, M.B. etal. (1997) Genomics. 12:301-306. Wilson R. et al., 1994, Nature, 368(6466): 32-38. Zhang SD et al., 1996, Genes and development, 10 1108- 1119.
SEQUENCE LISTING FREE TEXT The following free text appears in the accompanying Sequence Listing regulatory region WO 00/08209 PCT/IB99/0I 444 polymorphic base complement 3' regulatory region deletion of or probe homology with Genset 5' EST in ref sequencing oligonucleotide PrimerPU sequencing oligonucleotide Prmer"i EDITORIAL NOTE APPLICATION NUMBER 51878/99 The following Sequence Listing pages 1 to 75 are part of the description. The claims pages follow on pages "81" to "86".
WO 00/08209 PCT/IfOQ/i1 AAA 1 Y <110> Genset SA <120> Nucleic acids encoding human TBC-1 protein and polymorphic markers thereof.
<130> <150> <151> D.18363 US 60/095,653 1998-08-07 <160> 7 <170> Patent.pm <210> 1 <211> 17590 <212> DNA <213> Homo sapiens <220> <221> misc feature <222> 1..2000 <223> 5' regulatory region <220> <221> exon <222> 2001..2077 <223> exon 1 <220> <221> exon <222> 12292..12373 <223> exon Ib <220> <221> exon <222> 12740..13249 <223> exon 2 <220> <221> allele <222> 9494 <223> 99-430-352 polymorphic base A or G WO 00/08209 WO 0008209PCTIIB99/01444 <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> primer-bind 9391. .9408 99-430 rp, primer bind 9828. .9845 99-430.pu complement primer-bind 9475. .9493 99-430-352 .mis primer-bind 9495. .9513 99-430-352 .mis complement <221> primer_bind <222> 9482. .9506 <223> 99-430-352.probe <220> <221> misc-feature <222> 3953,4056,4167,4739,6217,6245,6860,9998..9999,10006,10012,1010 4 10477, 10822, 10825, 11095 ,11256, 11273, 11857..11858,11895..11896 14057,15912. .15913,16217. .16218,16329..16330,17504 <223> n=a, g, c or t <400> 1 aggacagtat ctattgtgag tcccctggat agcctacaag ctggccctct gaggaccatg cctcacaaag cagaggtcca ctagcacaat agttttaagt gtaagcatcc aagctcagtg atttgcctcc tacgtctgtg atcctgtaag gcaccttgcc accccaaatc caagctgtga agggaaatca ggcttcaagg tgggccacag tcactgcact gtgtgtattg taaaccacac gactaactcc ataaaactct gggaatgcca aagacactgc taatattgat ggccacttta gtcccattta ctgctgggat tccgtaaaga tgggtccact taagacagcc tcttggtacg aatagctgct cttacacttt gcaggtaaga ttggattcaa atagctacca taaaaatacc ctaatctaaa atgaggaaac gcttttagtt cctgctttgt caatgaagac gtccaaccgt WO 00/08209 acagctcaaa tccagaaaga ccctgcggac ccaccctgcg Ctgttttttg atc t ccgcgt ggcatgcaca cgcggggcag tggagtttcc ctgattttcc tgccatcctc Cccccggccc agaggagggt cacctgctga gccgccaagg ggccccgcgc atggggcggc CCcgcgggct tgcagcagct acacggcgca gtggggagat gaggaaacca CtcCtcctcc gttgctgccg cgagccgagg gcgatggact cggcccggcc tcctggccac cctgccccgc tctctcgggg gcccttggac cccccaaccc atcctagcgt cacgctttaa caacatttct tctagccgct aaggttttgt gtggcttttc agaaattgaa attaatggat tcatttatta tcacacccaa PCT/IB99/01444 *cgctcagcca tgccacctgg tcttctctgg agtgacgttg cacactgcca ctttccttgc ctggggtgtg gaaaaagcgc tggtgtttaa tgcctcgcat caaggacggg ccgcccccca gcagatgagg gagccactca acagggtgtg CCCcggctcc ggcccgagta tgggggcggg gttgctcgcg gtggggagga cc cggc tcc t gagccgcgcg tcctcctctt ccgccgcggg ggagccccct ccccgcccgc ccgggctctg CCtgctggct Ctggccgcgg gaggaagttc gaaagcgaaa ccgccagctc gtttgcaagg aagttcaggg ttctatgggt tcctgatggc ttttgctttt ctgttttgat agtgatggtt atgttttccc ataaaatcag gaaactcagc cttccctaaa gggtctggaa agaggatgtg cgcctctgtg tcccctagtc actctacatc cgtgtgcgtg ctaatccagg tcccgggagg gccagcgccc gagaaggggt tgttgtgtgc cgaggcgcct ggccgagcaa cgactgtaca ccagggaaac Cttcccccct gggtggcaag tgtgactcgc ggaggacacc gtcttcccct cagacacctc cggctgctgc acctgctgtg ccccgtcccc ccaggccgtc agcgcgccgc cctctcgggg aggggaaggc attgccatct ccttaatgtt actggtccgc cgaccagatt tactttgcag acatcctgta tctgtgatta ttttcagaga tttacttaac gttccttcca ttcatagatt actttgaaag tatctgtaca gtccaccccc c tgcc tcct c atgcttctta cctggtggga ttagggagcg accgctggga tggtgtgtgt ctctgcgtca gcacttcgcc catagggcat aaggcggggg agtttccacc tcgggagcgc gcggcgggca tcccgccacg gctgtgccca attcccccca gaggggaggg ccgtccgggc gagtccccct cgcctccagc CtCCttctcc tcctggtgcc tcctcagctg ccgcggcggg cccaggatgc ggcacaggta cgtcgcggcc atctggccgc cgttgccccc gctagcgacc gcatctctcc ggaaagagtg tagtaacttt ccagtcattt tgagaccccc ggtatcattt atagcttatt ccaaacagtt tctcattgtt agcatttaaa agttcaaact agctacatta cgagcacccg cttttctcag tagggatctg agctctgtcc atgtccccag tcctgcgcgt ctcccgcaat ttcgttgttt ccgtgcctca agagcaaggt acgtctgttt ggagagcggg gtgccacctg agggcctgca gatcctgcgc cagacactgg aggc cgcggc cgtgctgccc cccagctccc gcgctcgccc tcctcttctt gccaccgtcc ggtggagaag aagagcgcag ccccaagcac aggcgcttcc gccccctccc ccacggacgc cttacccccc cgagagctcc cctcccccct tggtcagagt ggcagctcca tgaaaccctg ctcaaacttc cgtttgaaat ctctggaagt taattttcag tcccttatga aatgacctgg tctaaacttt agtaaaaaaa gctctcccct atccctctcc ggagcttcgc Cgcttttcac acctgatcgg gtgccgggct tggttagaaa cccagagtcc gttcacctct ggcttggtcg cggagggaga caggcagtgc ctataaatag tcacgcgcgg aggggtctgg ctgaggatgg ggacccgcag aggcacagtc cggggaccga aggctgggag cctcctcctc gccggtgcct aggcgggcgc ccagccgggt ctgcgcgtcc tggggct tcg gcagcacgcc gaggccaggg cacccccgcc gccggcttct cccgccaatt gaccccaagc ccagtgcgcg cttcattgtt accaggcatt ccacctagat tgctttaaaa ggtgcctcat tgggatgatt tttaaatagg ttcaatgagc 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 WO 00/08209 taggggtggt cttgagccca ggtgacaaag gaacacaagt gttcaaagca tgtccagcca attacattac tatctacccc ggtggcagtc tgggtattag atccagctgg ctgcaatgac tggtccagat tcaagcagca cctggagcag agggaggaga tgacagaatg ccccagtttt gggac t tcc t ccgtgctgcc agcagttttc gagtccatgg cgggtggatc tctctacaaa ccagctattt tgaaacatga ctgtagtgat actgactttt ttcacgaaag aaggagtctt CCtccgcctc aggcgtgctc tcttggccag gtgctgggat aagttccatt gtttagaaaa atgaagctgt gagtgtgcct tgtatttatc tataggtttg gtacacagtg ttagagtata PCTIIB99/01 444 ggcacccacc ggagttcgag tgagatccca agagcattct ggaaataaaa tttgcaaacc gttttcaaag actgagagtg atgcagtgtt gaggaaacgt tgataatccc aatgagatgt cttctaggat tggctgttac gcaccacagc gcaaggggct gcagctccct tgctcttttc gtatatgtgt cttctcaagt atctcttggt ccaggcatgg acttgaggtc aatatagatc gggaggctga tcgtgccact gacgaaggat cttcctttcc acatgaaggt gctgtgtcac ctgggttcaa taccacggcc gctggtcttg tacaggtgtg tgatgaggtc agatgtgtca gcagaatgta gtctcctttg agcacatttt ttgccagaat agcactcaat attttgatgt tgtagtccca gccatagtga tctcttaaaa aacactattg tcatcaggtg aggagaatag gaaaattttg ctggggcttc gacctggtgt gcaactctga tgtcccccac acccctatgc cttctaggat ctcttctgtg tgcctgctct caagaggatt gaagtccttc agattcccag gggtgcctca ttaatgnacc cattaaactt tggctcatgc aggagttcga tacaaaaact gacaggagaa gagtccattc ttaggttttc aacaccacat tttcatgctt ccaggctgga gtgattctcc ggctaaatt t aactcccgac agccacggcg ttagatgcag tatttgggcc ggaaatacat agatgagcaa atataatcag tagtgaattt attatttatt taggtttgga gctacttggg gctatgactg aaaaagagt t agtgcaagga ataattaaaa gaaaaaaaat caaatgcgtc cccttttcac ccc at aaggc agcaacagag t tccc tagaa agagccagat gtaaccctgg ttcacagcag gttggccacc ctgtctttga ctactctctc aggacntgaa ggatcatttg acgttagttt gagaagtaaa ctgtaatccc gaccagcctg agccaggcat tcgcttgaag agcagcagag agtcagaact tccaataaaa caaggttttt gtgcagtggc t gcc tc agc c tgtgttttta cttgtgatcc cccgaccagt gggcaatgtg aactgaaaaa tttagaacca cagctatttt caaatctaaa atacatgcaa gctattatta ttgctgaggc aggctgaggc tgccacctca taggggacat gacctggaag taatttcttt cactagtgta tccttgtcat cacgacagca acagtttgtc cttgcccctt gacagctttg gtgggcgggt caagcagtgg catcttcagt agctttctag acatgctttt cacagcattt aatgtatcac ttttgccctt caatatttta atctgctcat agcactttgg gccagcatgg ggtggcatgt ccaggaggcg tagtgttggg gttaccttac aatatcttta gacttttttt gtgatctcag tcccaagtag gtagaggcga acccgccttg ttttgacatt tcccttttca ctcttgatat acaaagaggc tctcttcaaa cctctgaatt agtgcttaga tgtttattta caagcaaaat aggaggat ca ctggagcctg tttctgaagt ggactaagtg CCtgtggatt gttataaatt agtctattgt tttctggttg aaaacactag cttcctcatt accaggaagg ggcttttttg ggagcctgaa tgtcttggtg agtagatggt aantttgatc ctctgtaggt ggcccatttg ttccagtcta tatatttctc taaaatgact gagtccaagg caaaaccctg gcctgtagtc gaggttgcag gtttgtatcc aatttccttc gaccagattc ttttttttna ctcactgcaa ctgggactac ggtttcacca gcctcccaaa tctaagccaa gatttcagat gtaggttttt atttaatttt agacaatgcg aggtaagccc acagtgcctg ttttatactt ttagatagac 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 WO 00/08209 caacccagct atc t ttgga t aaatgccacc cctacttctc taggtggtca aagggcctgg cctcactttt gttaaatcag gcagaataac ctaggacact agtgccattc tcccagcact gcctnggcaa gtagcacaca cagggggcgg agtgagattc atcattactt tgaaaacatt ccaagggttg ggaatcctgt attagagtac taaagaacag ttcctgtcca tagagctaga accagcattc taaagtcccc agggggttcc ataattccca gagcagcttt agatttacca taagaacaaa gtgttttcat ttgtcccaat ctcatttcaa ttttatgatt attattgaaa gactgtgcta tacacagaca ttgctgaggg aagaatggtc aatggtttgt tggcaaaaat PCT/IB99/0I 444 aatccactag atcatctgtg ttcacaattt agctttgtga acagatattg gttcaaatcc ctgtgccact ttaataaaca ttaaaggaac cgcccatttc taaaggacta ttgggaggcc acatgacgaa cctgtaatcc aggatgcagt tgtctcaaat tacatgtcaa tagaaccact ttgctccctg ggaggaagtg ttcaggttgc agaaattaaa ttctggggtn ggccactagc tgaggaggtg atcctgctct caataagcag tcctttcttc ccatccccac gattttagag caatatttgg gaggaagttg tctgtcacta atattaggga ctaagtgtat ctgttctaat ggtgggagt t actctactaa tatgaaacat aaaggttgag tagtattaaa tgcagttact aaagatattt agaaacaaca attagtgagg acgtacagaa ctctagcaaa cgattctgcc attcaatgat cactatgata actgcaggta ccaccctttt gccttgagtt gaggctggca atctcatctc cagcttctgg gagctgagat aataataata attagaaagg ggtttcagga tgggattcct gatgaagagt agtataattc ccattgattc cccaacagcc acccctccat agggctgaag ctgtagtttg cttactaata atgtacctcc acaatgttgg atggaagaaa agaagctgag gggttctctg aatttggaca aatttctaaa gctgccagag aaaggacatc gtgcagaata caagaacgtt aagagttctc tgcaaataac ctggtgcaga tttaaaccaa gagggttatt gaagtttgta gaaccctttg gatcatgaat gtggttaaga acttcttata aatattcctt atgtgttggt ggagggttat tcctgtgcaa ggctctaatt gatcacnttg taccaaaaat ggaggctgag cacgccactg atttatatga cacaccccag gctccatgca gggtgaggaa gtagccaagt tgttcaggtg acagagcaat aatcaatatt tcatcctttc ctgcagaggc ctgtgaagga cctacccttg cccacatttt ggacatttgg acttggggat taacttgctt cagcacttgg agccacttaa tggcttaaaa atatgtagca tttgtgtctt cacaggtttg act agaag ct cttggaatat atggattgag aataattgca atccctaata cccatctaaa gataagacag gtaaaatgag gtaaaatgtc gcaagcaaac gtatggcctt tattgtccaa aactattctt acataatctc tgaagagtat tatatgactc aggtcaggag acaaagatca gcagaagaat cactacagcc gaaagaagtc tactaaagca atggtgaaac cacactgctc cagtgagcct catgctcact atgagtagct ggccggttcc ttctctccct tgttgtactg gtggaggggg cttctctcac tgttctttaa tattatacat gatcttgttc tttcaacttc atgggagtca cttttccaga ggagcttgtt tagcaggaca gggtagctac ctgtagaggg tattggaatc gaggttctat atggctttaa gtttttgcca ttatttgcat gatctatggg atatagattc catgacaaaa catgaagtgg tctggagcca gggcaggtga cgttttgtaa tttactttta tgagggccag aagaagtgac gtgcctgtaa ttggagacca gccgggcatg tgtttgaacc tgggtgacag attcaaaagc tccttgatga agcctctact ccgttggggt actgcatggg ccatctggcg gcctggggac taatctgacc acccactccc tcagttactg ctgggaacaa ttcctgatca gggaagaagg tatgaaaata cattctctta acacttgaaa gggacttgga atctagttgc agctttaaaa cattaacaag tatgtttaaa atagggcgtg acagtatttc ttggggctta aaaataatca ttccttttaa agt ttat cLc 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 WO 00/08209 tgttatggaa atcttatatt cttattattt aagaacatgc gaaatggcag tggtactgtg gccactatat taataattct atgaaacaga tgtgagagag gtccgttttc tgaaatgcct taaaaaatta gcatatatat aaaggtttaa tacggtcttc cagaaaagcc gaggtaggac actctccatg ttccaaacta gtcttcctct tccggctccc cctgtgggaa gccctgagta ggataactgt cagtttgtgg gcatgagaaa aaggagggca gagaccttat tagcaggggt gggtggattg agtcagttag tgagtagtga gagggagatg ctatccagat gtgggagcag agtcaagggg actgaagctc ttacaaaggt ctgtcatcag tgcccaaaac ctttgaagta PCTIIB99/0I 444 gtttttattg agcagaatct gatagtttat tacagctggt gactcatttg tacatctata taataactgc agtacttctt ttctgttaca agaataaaga attattagga gagctgtaaa gctatgagtt tatacagaaa aatttgaata ttgagggaca tgctgataca ttaagtacta ggtccttcct ccttttccat ggagcaaagg taaactcagc ttgggggtca ataaattgtc ggcrggctaa cagggattat cggggaacca ttttagggga atgccaggta gtgatatgat aaggatgggg aggtagtaat tgctattcac agaccgagtt agaaatattt aacacttggg atgggattta tccgggcttc taaatgagac cagcaggatc agcaggtgaa tgtggaaaga acaagtaatg gaattgctta gaacagtgc t tgaaaaacaa atgactgatt ctctaagaag atcatctaaa taaacatgtt ggagtagaag gaaagagaga gaagctgtgc taaagtattg atgttcaaat aacacagagt ctgattttgg cctattcaaa tgccctaaaa gttggaaata gtttctacac catgtttcta gcagacacgc tgcctttcct gggaaccaag cttcgtctcc cgtgttagtt attctgagga taactagttc gaatataaag gaagacttca aaattttgtg caccttttct aggatgaaga ctagatacag agctttaaaa aatngcctat caccattagg gactcaagtg agtttcctag aacataggca accagaatgt atgaggggag aacaaccaga tagatattca taaataatta aaggtctaat aaacttcagg attatcaact gaaattgaaa ataattgata ttgggattag tcgatccaga tcattattta tttaaaggac ctttattttt tatattataa aaaaagaata aaaccccaaa ctcttaaata caccttggaa gaagacaagg accttgtaag cagcaaacaig ttgcttcctg gtaacccacc agaaatgctg aacccaggag tgcaagtaag gagaggaacc tctatcttca ttggagatat gattgtatgg ttgattaagt ctaggacgaa agggatctga cacatagnng taactaaatt atggatctgg gtgtatgtgg caaattgccc ttcatcatag aagtgcgtgg ggtgcttgac aggatgnaag aaggtaagat cctgatctaa tggctatgtt ctacttttta cat tgaaatg gatttaaatg gtaattctgc gagctcagat cagctgtcaa catttaatgt tgggattatg agtcagggac tatttcttga aaatttgctc gacttcagtt ccttaagaat tggtgattgg aaaagaggtg atggagactg cagggcattg t catggaaga tacttcccac atgatacaga actgtctggc tctcatgttt gtaaaatctc gtatgcacca gagcctttaa agacacagcc gggaattatt tact ccagga aaagaaagag atgacccctt ggaaangaaa caggcctagg aactcaggaa tagatgcatt cccatctcct tgggctctag tactcaatag accaaaagat tcaaacacag aagaaccaga gttaccctga ggatgtagaa cagagaagct ttttgtcaat actgaatttt tatgcttgtt t tat cct ttg cagt tagaac cattttcacc tgaacttcaa tttactttca acatttgaaa ttagcattgt cctgttcaga tcaagaagct gtagaaagtg gtagttgctt ttggtagatg agtgcctgtg tagaaataga tataagatat tgtcacctga tactgtgact tctaccagca agaccctttg tggctcagag aaggtgcacc agattcctga agagaatttt aatatgcgat tagttggtga ggccattcag tnctgggaag agcctatagg ggaggcttcc cttgtgcagc gtgataagtg cggataaatg aagtcagctg taggtgagat gaagaaaagc agagattcaa 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 WO 00/08209 gaaggaaggt gaggcgggcg cgtctgtact gctactcggg gccgagatcg aaaaagaaaa aaacaaggca caactttaac gatggtcaat aaagagccat attggagaga aaatacgtat gtgaatcaga gctaggggag caagtgagta tttgtcatcc aaaagggctg gcagatcact ctactaaaaa ggaaggctga tcgtgccact ggaagaaaga aagcgttttc tggagtgcag ctcctgcctc tttttgtatt tgacctcagg actgcacctg gcttatagca actgaagaca agtcttttaa aaactgaggt ttggctttag tgagtgtgac ttctaggcag Cttctgaaat ctctggtttt aaacagaaaa aataacattc gcagctggtg tgtggctgag agtccggctt PCT/1B99/0I 444 gtggccgggc gaacacgagg aaagatacaa aggctgacgc cgccactgca aaaaaaaaaa gtcaggggct agaatttcag aggaagtgga ggccaaggga cctcagttat tgagtcccta cagatgcaac tggggcaggg aagaataggg aataggacaa ggcactgtgg tgaggccagg tacaaaaatt ggcaggagaa gcactccagc acatagacag tgt tnnttgt tggcacaatc agcctcctga tttagtagag tgatccacct gcctaacatt catgaactga tgt taatgga aatgtaagta ttaatgctgt gaggt ttgtg agttatgcca agaagcttat aatagccctt ctcccccagg acagtgataa acagcaagga ggctccctgc gtgcgaagac tgcgtttcac gcggtggctc tcaggagatc aagaattagc gggagaatgg cntcnagcct gtaaggaagg aaatagcctc tggatcccta gacagtaagt agtttgaaat tcttttaaaa ctttgagtca gcctgccttc gcagggagct cttgagatcc gaaggcgtta ctcatgccta agttcgagac agccaggtgt ttgcttgaac ccgggcaaca ggaaatgtag ttgtttgttt ttggcttgct gtagctggga acggggtttc atcttggcct gatatctgtt ataaaacagt aggtggattt atatcttaag cagagtagca tttcatagta taaccaggtt ttgaacctct tgaaggtagc tttctgcata ctgttttgct aacatctgct ctgtgcattc tcagcaggca cctctggact aagcctgtaa gagaccatcc cgggcgcggt cgcgaacccg gggcgacaga tgtggccaag Ct ttaaatt t gggcaaacca gtagacctta caaaggaaat tacttattga ggcacnatgg atggagtttc ggatacagga cacagacaac ggatacatca taatcccagc cagcctggcc ggtggtgcat ccagggggtg gagcgagact t taaggnnag tcagaaagag gcagcctctg ttacagacac accatgttgg ctcaaagtgc gatgagaaga gtttaagaca gtgattcaga tgatattact ctgtatcgtc gtttcccagt t tatatggaa tattatattt tattgctatg tgaagtgtgt gagttcccag ttctaacgag cctgaccacc gtccaccaga gagatgtgaa tcccagcact tggctaacac ggcaggcgcc ggaggcggag gcgaggagcc at tgagaaat tacaaccttg ggccttacaa ccttggaggg atcttttttt gcccctcagt cagacacgag accttagcat gagactgaag tcagctttga aacgtggttg actttgggag aacatggtga gcctgtaatc gaggttgcag ctgtctcaaa tttgggtttg tctcactctg cctcctggat ctaccaccac ccaggctggt tgggattaca agccaggtgt atgtttgcaa acctctagac tgtcccagat ttctatcatg gggctctttg tacaattttg gggtttcagg acttcattaa aaaatagatt acccttccca gtctcggtgg atgcccatgc aaggaacctg cctgagccag ttgggaggcc ggtgaaaccc tgtagtccca cttgcagtga gtctcaaaaa tcgtcagagc aggacctcgg accaggaatg aaggnaagag ttttttttcg tattctttta ggngatagca Ctgtccatat atccagggag acaaaagggt ttgaaaacag gccaaggtgg aaccccatct ccagctactt tgagccacga aaaaaaaaaa ggtttggtag ttgtccagac tcaagcaatt accaggctaa ctcaaactcc ggtgtgagcc tggagtgata cataataggc tacctgggcg cagttgttta ggggcctttg ttacctgtaa agaaagttct cttttgagtt attctaatgc gcttgatcca agatggaacc attttggcct tgccctgggt taaccaagca ggagaagtca 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 WO 00/08209 acagtgggat actgattcac ccaccggcag ggagatccaa tattttaagt gctgaatttt gagaaattat aagagcagaa aaactgaaat taggtctggg gcccttttgc ttaagaaaaa cataatgttc atgcgatcag cttgttatta tttcctcgtt catttgaaga gcaaatagca gtcttgccgt acatgtacag aagcttctca ctccattagg tcttggtatc aattgctgtt catcagtggg ggagcctaac ctcaaaacca ttgtccacac ttcctgtact tgttttctta gactaaatga ttcattcaag taaaaagaaa tatttgagga attccagaat tttggactga atttgactct cggatgcctg ttccccaagt t tagtt ct ta ctgctcattt PCTIIB99/0I 444 cccctgatct aacagtcatg agt at ctgc t aagactaagg atactgaaat tgttttatgg cagtttttga gtgttaagat gagcttgtga gtggacccca aggttgcact tatagtcatt acttaggcaa gggttctttg tcacatgata catttgtgct tgcttgnccc actttctcag tgttttgggt ggcgaggatg tattttcaag ttattttatg aagtagatgc gcaggtactg gaataggacc acacacccaa ccatttcact atctcaagtg ctaatcctac tctgttaaat ggtagcactc aaaattttac catgatacct acattttaaa gaaaatgaga atttccttgg ttcccaaaat acaagagaaa agagaattgg agtgattttt aataaaagaa attccagcat acccaagtta atgtgttcaa tgtggctggc gaataaggaa tgatagttta gttcaaaaat gtacattatt gtcacctggg agactgcatt ttgagtggca cgtaaactgt aggtcattga tgtttttttc tggctgaaaa tgctttgtta attttatgag ttgggctcag gccaagcaag cgagtgtcaa gttaaaatct tactctctag ccttttgctt ttggagatga aggtgatcta agggtagaga cacagaatca ggcacggcag ttctgcaccc ggggataatt gacctgaaac tagaggcaga agatgagtgc aactacacgt ggactgaaca ggagagtttg cttttacggc gagatttata acttactcta ggcttgcata taatttgtag ctttgagtgc Ctttgcttgt agccgatgat tggtttttat ttaatgctgc tagttttaaa tcaagagaaa tcagatgaat gatcttgtta tgtaacaagc aagttctaaa gtaaaaatgc taagaacgct tttcgcaaac aaaatgtgag ttagcattcg gattagctta ggctccacag ccaagtcaca tccacctgtt ggatagaaat ggtgtaagga gttcatttgt tattagcaga tgtataggac aggctttgtg aactctcata catcaggctt ataaactggg acagtattta ttagtaagca ttatatgcta ctt cagc tt t ataacttaat gcctgtacct tgcgtggaat catctgagaa accaaattct tatgctaaaa ccatttttgc tataggtata aagcctcagc ctgattaagg caaacaaaag tgtatggggg agttataaat gcacatttga tcagtctaaa gttctaaagc aaatgtgaat tgccaagaaa tctccacatt tactggccag ggatatgcat ctcaggtcag acatggtaaa ttgtagctct gataaaattg ctaaccccat tgtgattcaa aactgtcagt gctaaagttt CCttatttag tggttcttct ggcttgtagg ataatggaag aaataaaggc ttataaatca ggagattcag tggcctcagg tccaatagag tttatagcca atttcatttc caaagatgag ggctcctatt cagtccagct Cgttgttcag taggcttctg gtaattggga acccatggt t aaacacaaat tacctcaatc gtgttcacaa aagacgctgt taagtgagat tcaggatatt tgattactta aaacagatac actactaatt catgcctctc cttgattcag tgctgatgct tgtaatccta tttcccaagg ctaagttttg atctgattag agttctgctc gggcaggact aaaatataat ggactgtgga gctgtctgcc gaagccttga tctctctgca aaattaatat agtcattcag aaggcaggag gactgagaag taatatggag tttcatgtta agggactaac caattgagtc ttgctggaaa taaaaacatt acgtcttagg attctggtca atttggacaa ctatatagta cattttacac gccagtcatt cttccagtct gaaatatgaa tgtcattact agtgattttg 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 ttgttggaaa cagaacagta aatcacactg gccatgatgc taacagcgtg atagattttc WO 00/08209 tgttcttggg ttctcttgac taaacagatg gcatcataaa cactgagtaa agattgtttt aaaattgaat tttatcaagt gaactgaggg agaaaaattg agccaaatta tgctcctgtt gaagctttnn gataaacaga catcccaaaa ccacttctgc aaaactatgt acccgtttaa gttttggcac aagctgaacc gaacaagatg cttcaatatt ctgttgccca ggctcaagcg cgtgcctggc ggtctcgaac ataaacatga gctgctgatt cattcttcca gattcatgtt catcccagat tggaaatcca ttgtatactt tctcaaacat PCTIIB99/01 444 acaccaatgt tacttataac aggtcagcac cacaaaacta tgtcttggga cnnattgtat taaaaagtaa tcaagacact tcctgggaat gcgctcttta ggcctgnnaa tgatgagagg caggcgattt tgttatcatt aactattgca tctcagtagc ttcatcacct aatttccatt ccatctagta agcagagatg aatattttcc gtctcgtccc ggttggagtg attctcctgc taatttttgt tcctgacctc gccaccacac tctttggaca cccaaacttc tctctgatag cctgttcaga tgcatagttt aaactgctcc cactgtatct atttgctggg taactgcatt cagtttgcga gaagaggcat gttccataga aaaacacata tttgtaagca ttaaccatgt aagatttttt gtggatgcct aatgagcagg tctgctaaag ctttggccct accatttgcc cattgcttaa gttacaattc agaaactctg aaacgtt tgc tctatgatat tcaaacaatt ttctttttct cagtggcccg ctcagcccac atttttagta aagtaatcca ctggcctcat ttgtgcctat accataagtt ggggtctttt catgctacaa ttaaataata acatggaaaa catagcgaag tgaaataatt tgccagagaa ggaaaccctt gtaaaggaac atatgaggaa tatacataag atgttaacag ttatacagtc aaaattgaga aatgatttcc aacattgtca ctttggctaa ccagaaagtc cttgaccagt atttgtcttg tttgaagaaa Ctcttgtctg tcagtgttaa tggctagttg atctggatgg tttccccccc at t tcggct c caagtagctg gagacagggt cctgccttgg cctttcttaa aaactttttg tgatgtttct caaactgatg gttaatacaa tgcatttttc gcttccatga gattatctgc ctccaggtta gacatatgca tgaccagcat aattttataa acttcaaaac ctttatttct ccattgagtc ttttatacat agcaaaagga catggaaact tggtggacaa ctttctcaaa aacaaacaaa ccactttcgc atct t tagga tgcttcagga cagctgatct tttttcatcc gtcctcttca tctgctgctg gcttgagaca actgcaacct ggattacagg ttcaccgtgt cctcccaaag aatgagttat taaagcatca tcttgctttg tcttatcctt gtttatttgg atgnactttt tcaaatgcag tgtaggagca aggcctcttc tttactgcca ctaattaatt gcatgccatg attttgtgga caagataaac ggtctctaaa tat taactgg cgtcagaagg cttgcaaaat ggactctggt acactctcat atgccttggg tttgactgga ttgcgctggt tcttgatccc gagggcaatg aggattgtgt atgagggcat caggcttcat cagtcttgtt ctgcctcccg tacacatgat tggccaggct tgctgggatt acatttgtaa gtgatttcac attttagcag cttagagcct tgccaaaaaa tgaagacccc taaggcagca 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 17580 17590 <210> <211> <212> <213> 2 99960
DNA
Homo sapiens <220> <221> exon WO 00/08209 WO 0008209PCT/1B99/0I 444 <222> 4661. .4789 <223> exon A <220> <221> exon <222> 6116. .6202 <223> exon B <220> <221> exon <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> 9919.. 10199 exon C exon 14521. .14660 exon D exon 50257. .50442 exon E exon 56256. .56417 exon F exon 63326. .63484 exon G exon 76036. .76280 exon H exon 78364. .78523 exon I WO 00/08209 PCT/IB99/01444 <220> <221> exon <222> 85295..85464 <223> exon J <220> <221> exon <222> 93417..93590 <223> exon K <220> <221> exon <222> 97476..97960 <223> exon L <220> <221> miscfeature <222> 97961..99960 <223> 3' regulatory region <220> <221> allele <222> 1443 <223> 99-20508-456 polymorphic base C or T <220> <221> allele <222> 5247 <223> 99-20469-213 polymorphic base C or T <220> <221> <222> <223> <220> <221> <222> <223> <220> allele 6223 5-254-227 allele 14723 5-257-353 polymorphic base A or G polymorphic base C or T WO 00/08209 12 <221> allele <222> 19186 <223> 99-20511-32 polymorphic base C or T <220> <221> allele <222> 18997 <223> 99-20511-221 polymorphic base A or G <220> <221> allele <222> 19891 <223> 99-20510-115 deletion of TCT <220> <221> allele <222> 29617 <223> 99-20504-90 polymorphic base A or G <220> <221> allele <222> 42519 <223> 99-20493-238 polymorphic base A or C <220> <221> allele <222> 69324 <223> 99-20499-221 polymorphic base A or G <220> <221> allele <222> 69181 <223> 99-20499-364 polymorphic base A or T <220> <221> allele <222> 69146 <223> 99-20499-399 polymorphic base A or G <220> <221> allele <222> 76458 PCT/IB99/01444 WO 00/08209 13 <223> 99-20473-138 deletion of TAACA PCT/IB99/01444 <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> allele 78595 5-249-304 polymorphic base A or G allele 82159 99-20485-269 polymorphic base A or G allele 84522 99-20481-131 polymorphic base G or C allele 84810 99-20481-419 polymorphic base A or T allele 89967 99-20480-233 polymorphic base A or G primer_bind 988..1006 99-20508.pu primerbind 1509..1529 99-20508.rp primerbind 5039..5056 99-20469.pu complement WO 00/08209 PCT/IB99/01444 <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> primer_bind 5534..5554 99-20469.rp complement primer_bind 5997..6015 5-254.pu primer bind 6332..6350 5-254.rp complement primer bind 14371..14390 5-257.pu primerbind 14798..14817 5-257.rp complement primer bind 18751..18771 99-20511.rp primer_bind 19198..19217 99-20511.pu complement primer bind 19605..19625 99-20510.rp primerbind WO 00/08209 <222> 19986..20005 <223> 99-20510.pu complement <220> <221> primer_bind <222> 29529. .29547 <223> 99-20504.pu <220> <221> primer_bind <222> 30041..30061 <223> 99-20504.rp complement <220> <221> primer_bind <222> 42268..42287 <223> 99-20493.rp <220> <221> primer_bind <222> 42732..42752 <223> 99-20493.pu complement <220> <221> primer_bind <222> 69026..69046 <223> 99-20499.rp <220> <221> primer_bind <222> 69525..69543 <223> 99-20499.pu complement <220> <221> primer_bind <222> 76323..76343 <223> 99-20473.pu <220> <221> primer_bind <222> 76771..76790 <223> 99-20473.rp complement PCT/IB99/01444 WO 00/08209 PCT/IB99/01444 16 <220> <221> primer_bind <222> 78292..78309 <223> 5-249.pu <220> <221> primer_bind <222> 78704..78721 <223> 5-249.rp complement <220> <221> primer_bind <222> 81893..81912 <223> 9 9-20485.pu <220> <221> primer_bind <222> 82353..82372 <223> 99 -20485.rp complement <220> <221> primer_bind <222> 84392..84412 <223> 99-20481.pu <220> <221> primerbind <222> 84909..84929 <223> 99-20481.rp complement <220> <221> primerbind <222> 89746..89765 <223> 9 9-20480.rp <220> <221> primer_bind <222> 90179..90198 <223> 9 9-20480.pu complement <220> WO 00/08209 PCT/IB99/01444 <221> <222> <223> primer bind 9475..9493 99-430-352.mis <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> <223> <220> <221> <222> primerbind 9495..9513 99-430-352.mis complement primerbind 1431..1455 99-20508-456.probe primer bind 5235..5259 99-20469-213.probe primerbind 6211..6235 5-254-227.probe primer bind 14711..14735 5-257-353.probe primerbind 19174..19198 99-20511-32.probe primer bind 18985..19009 99-20511-221.probe primer_bind 29605..29629 WO 00/08209 18 <223> 99-20504-90.probe <220> <221> primer-bind <222> 42507. .42531 <223> 99 2
O
4 93-238.probe <220> <221> primer-bind <222> 69312. .69336 <223> 99 2 0 499-221.probe <220> <221> primer-bind <222> 69169. .69193 <223> 99 2 0499-364.probe <220> <221> primer-bind <222> 69134. .69158 <223> 99 2 O499-399.probe <220> <221> primer-bind <222> 78583. .78607 <223> 5- 2 49-304.probe <220> <221> primer-bind <222> 82147. .82171 <223> 99 -20485-269.probe <220> <221> primer-bind <222> 84510. .84534 <223> 99 2 0481-131.probe <220> <221> primer-bind <222> 84798. .84822 <223> 99 2 0481-419.probe PCTIIB99/0I 444 WO 00/08209 WO 0008209PCTIIB99/01444~ 19 <220> <221> primer-bind <222> 89955. .89979 <223> 99 2
O
4 80-233.probe <220> <221> misc-feature <222> 3698,12593,13035,21712,27644,27655,31143,43084,43129,64585,66950 67301. .67302,67926,75425,98821. .98822 <223> n=a, g, c or t <400> 2 ctcaagcttg gatgtgcaat gtctgtgagg atgggaggat catctctgtt agaaacttct tattttttaa gac tc t ttt t ttcagacctg actgatgctg tgttggagac actgcaacct ggactatagg ttcactgtgt tcctaaagtg atcaaaacaa tctgcatttt tgtgggcaat tcagtgattg tttagcaggc cgtggacctg tccagggtac aatttctaag tatggataat gcyttgaaaa aaggtcttcg aggacatatt aataccgtat agggaagaga gtgaggacat aatacttgaa ataatgactg gccaggcacg tgtttgaggc tttttttttt agatacttgt atgcaatttt tgatagtagg ctttaaaaac cttgctttag agagtctcgc ccgcctcccg cgtgcgccac tggatgggat ctgggattac ttcagcttgc cttaacattg tcccttcact cacagaaccg acttggtcac cctggggatc gctctatagt gtagtagcca tggacaaagc atgtgtagca aacagcagat caaggaacat aaaggtgaac aacttgataa tcctcattat tccaaacttt caatagaatt gtggctcacg caggagtttg aaagaaatta tttaagataa ttaaacattt ggagagcaga acatgcatac t ctt t tagct tctgtcgcca ggttcaagcg cacgcccagc gttctccgtc aggcgtgagc ttcactttta taactggtgt ctggaggctg gttgcaacag tatttgctga attgctcatt gtgaaacaca t tc ttt tgat tgaatgtcgc ttcattagtg gactgaataa tttatgccca tttcctatac tatctggtgt ttcccttgcc catgcttaga cactgtggag CCtgtaatcc agaccagctt ttgtctaaga ataagaaaca tattttaatt agaaacattg aaatgcactt aatattttct ggctagagtg attctcctgc taatttttgt tcttgtcctc cactgcgcct tgaaagcttt tgagttgaag cctcgagcct attctgtgca gtgagtctgt cactcatttt aaggtaaatg gcatattctc ttttatgaga aattaggatt aataatacct ttagattggc tgatactggg agtctgaagg aaacatttca gtttacccca cctccaaatt tagcactttg ggtcaatata accagtgtca agt cat tt ct atggcaatag aattaagtac ctgtctctta ttctttcttt cagcggcaca ctcagcctcc atatttagta gtgatccgcc ggc tcatatt attatgagtt gc agg cccc t ggacaggcac cctccctgtg taccttaggc gaacaagcca atagttcccc attctcatag atccattctt tcattattca aacagcagta agaaattttt aacatgaatt ggcacagtcc catgagtcta tctgttgaag agaaattatt ggaggctgag gcgagacccc tcttccaagg aaatgtgaat acgtgg aaaa acagagattc ggatctacta ctttcttttt atcttggctc tgagtagctg gagacggggt tgccttggcc ttctttatat tgaaagcaat gggagccctt ttacacttgg gcgcgtagca gtgtatttcc atattacatg ttctcaaagg agagtccaat tctcttttat aagaagacat gaatgagggg aaaaagtgac tgtaccattc ctgtgaccca taaggagctc 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 WO 00/08209 tatataagag tttaacaata gtgagtgaat gaaagctaat tatcttttat aatatttctg gcagtggttc cagcactttg ggtcaacatg gcgcacactt gaggtagagg gagactccat atatctgatg ctcacgctgt ctcctgggtt caccaccaca caggctggtc gggattacag ttcatttaag gtagaggacc agcagaggaa cccctggcaa agtctctaca agtaatgggg ccaggctgga gcgattctcc ggctgttttt aactcccgac tgagccactg caggtgagaa aaaatgtacc tgaggcaggt ccccgtctct cgagggagaa cctgtactcc aaagtacctc ccattgtatt agagaagaat aagagatctg gagaaattta tgaaaagcta ctgtttgagt PCT/1B99/01 444 aggtcactgc ggaacataga caaaaaaaat ttacagaagg ggacacatac catatggcca ccattaaaag ggaggccaag gcgaaacccc gtagtcccag ttgcagtaag cccaaaaaac ttttttgatt cacccaagct cgagcgattc cctggctaat tcgaactcct gcatcagcca cctcatgaca tgaaagggac catttcacct atgctcactt cttttctgtg gatttgcttg gtgcagtagc t gc ct cagc c tgtattttta ctcaggtggt cgc ctggccg tccacataag ttcggctggg agatcacttg acaaaaacta ggattgcttg agcctgggca cttgtaaata tttattttca gaaaggatga tagaaaggga tgttgtctca atgatagcta ctcacaacac agcctccttt taaattaggt ttgtcaacag atgtgtacag atataaaagc ggtgtgggga tagaaagtag gcaggtggat atctctacaa ctactcggga ccaagatcgt aagcaaacaa gtctggttta ggagtgcggt tcctgcctca ttttatattt gacctcaagt ctgcacctgg gctctgcgag aggaggtaac gggcattgca gaacaaagcc tgcttgaaat tgtgtgtttg atgatcttgg tcctgagtaa gtagagacag ccacccacct tttttttttc aaacaattta tgtggtagct agctcaggag caaaaaatta aacctgggag acaaagtgag agtaacacta gtttttatgg tgggaaaata tgaaggaatt ccttctgaaa cctttctacc agtgttatga gtaagagcaa gaaaaaaaag caaataagtg tatgcagtga gaataaatct tatgacacca agaaaacatg agtagtggtg gctgggcaca catttaaggc aaaaaataca ggctgaggca accactgcac aaaaagctca cattttttat ggtgcgatgt gcctcccaag ttaatagaga gattcatctg ccttggtata gaaagttcac agtctggcca ctccagagct aggtggtgat aactgcaaca tatttttgag ctcactgcaa ctgggattac ggtttccccg tggcctccca taacaaaatt attcagagat cactcctgta t ttgaganca gctgtgtgta gtcaagactg accctggcac agacttcatt ctagtagtta aaagtaggag gtataggcag tgcccccaaa acgctgtgtt ttcgatcttg atatttgcac caaaaataat ttcatttagt aattgatagg attaagcttc gtggctcacg caggagttcg aaaattagcc tgagaatcac tccagcctgg tagagtaggt ttttattttt cagctcactg tagctgggat cagggtttca cctcagcctc tgtgttttaa tatacgtctt agaccacaga gggcttctca acaaaggtat aagaatatat atggagtctc c ct ccggct t aggtgtgcgc tgttggccag aagtgctgag attttctaac ttttgttgca atcccagcac gcctggccaa gtcccagcta cagtgagcca cctgtctcaa tagtggttgt agggagagaa agaggaaaac agagaatagg ggtaagttat caatgtttta ccattggtct tctgactaaa attgaaagaa ttcaactaca ataaacacca aaagatgtct cctgtaatcc agaacagcct agatgtggtg ttgagcccag gtgacagagt aatagtcatg tgagacaagt caatctctgc tacaggcgtg ccatgttggc ccaaagttct tttgtattca caggctgcag gccagggaat ctgttctcaa ttgttatatt cagtatttag gctctgtcgc ctgagttcaa cactacaccc gctgatctca attacaggca agaaagcaat tattaaaaaa tttgggaggc catggtgaaa ctctgggggc tgattgtggc aaaaaaaaaa caagcaaact gcttggttgc gcaagaaagc ttctttaatt tgt tt tatt t cacactttac cactttactg 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 WO 00/08209PCIB9014 PCT[IB99/01444 aagaggaagt ttgaggctca gaaaagtaag cagatctctg agcacatttc ctggacttgc tgatggcctg aagggagcga gaatattgga caaagcaaag aatttctcct tttattccat aggcttataa aaatattcct tttgttagcc tccctactat tctaaaaggc tcctattttt tctaaggctt cataagacac aagaacaaag catgagtgta ttcacacctt ttcttttgaa cttcgaagat tacaaaacca cgagtgacat gagtgggctt ctcatgccaa ttttttaaaa ttctcagtag gatgcdtttg taaagccaga tacattaagt caacaaatag ttgcaaattt tttacataaa ttgagaggcg agtgaaaccc ggcacccgta gcggagcttg actctgtgta aaaacatcag ttaagaaaat atccagttgc gtcattttac agccatgtca gtgccaaaaa taaatgactt agtgaattac agatctttaa gatttaagtt caatgttcat cagtattatc tactcttttg agtccctata cactatgtat tatatagtag gtaaataaag cttttttgca ttactatctg tacacatgta aatagcctga tctggagtgt acatctgatt gtttctgaga gagt tgggag gcggctggcc gagcacttga attacttagc aaatgatgga ataacatttt caaaagcctc ggcctgcagg aacaccagca gtcttgctca cttggcacag agagcagatg gacgcgggcg cgtctctact gtcccagcta cagtgagcca aaaaaaaaaa aaagtggatt tttattttgg agagtctgag ttggggtaat ctaaggagca gatgtgtcct ttcgtttttc cacccagtgc cagagtcttt aaatcacttt cataaaggta tacattttaa tattatatcc aagagcatgc aacttaaaaa gggtcagcaa t t ttgtygaa gacttcagca gccctttaca catcaatctg tggcacgtct gcagtatctt ttattttatt attttgcaat gggctgcatt gttatgatga ttttccttat caccaggtgt aatgttgatt tttcctgagc cagagtcata aacactccat aagtaagcac tot cc tgt ot gtggaggact gggctgggtg gatcatgagg aaaaatacaa Ctcgggaggc agatcacgcc agcagtagat tgtgaattta atttgggggt aaactggccg caataaacta caaagcaact gatgaggtga agagattttt acttttttag cactcgattt agaaagtatt ttaggagagt aaagtataaa attgttttta aatgggattt attagataca acagttactg actatgccca acaccgccac gttgccacaa gaaaaagtt t ggagttcttt gagaaatcaa attatagttc ttttaattaa atcttctgag tgagagctcc cttgtgaccc agacgaattg tttggaacgt attttaatgt ttatttaaat actcgactgc cagtgtggat atttctcttt acatacctcc ggtcatgcag cagtggctca tcaggagatc aaaattaaaa tgaggcagga actgcactcc tttcctatta gagaagtata acatgtgcat aagaccacgg cttcaactga ctctgaggca gatcacagac cattccttta acatcgcaga aggctagata ttgtcccggg gtaagattga accttttttt atttggccta ttttgccgct ctgaagtgtg tcagctgctg tgggccaaat atccattcat acactatatg gccaaatata aagaaattat atctgatttt tttttgattt ggaaagttaa atcatgaaaa caaagggata aggggaggga tcttgtcttc ttaggttagt acaaatatcc gga ccaat ct cttttcttta ctggatagct atrcgacacc aatcataaaa ttctatcata acgcctgtaa gagacgctcc attagccggg gaatggcatg agcctgtgtg aaaaaataat cagcttaaat gtttattacc ttagtgaaga ttggtttcaa aaattatttc aggatcagaa agaagcagag tggcagcaga tgctgaaaaa taagtagcat gttctatgct atgttttctc ggtttaaaaa ccaaagaata gcttctgttc gtgttagcta tctacccacc tttccagtta cctcacaaag gctctataga CCctccctcc ccctcagagt tatggcacac attttatttt acagttgatt gagtgctgtc gttagttgct ctgcctatca gtcttattta ttagtagcat gcttctagct tgtagggtaa ccctgtctag ctgaagaaac cgtttgctgc acataaaagt tct cagcac t tggctagcac cgtggtggcg aacctgggag acagagagag taatattggg ttttcttttt tgggtatatt 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 WO 00/08209 gcatactggt cagtaggtaa aaatttctga atttaaataa atatttaagc tagggaggcc gagtgtttgt cctagacata ggcgtagttc caatatttac aatgctcatg agaaacactt tcaggccatt ggttcagtag aatgctgc tg tgtaaatgtg aatacacact gttgtgtaca agatgacaga acaaataaga tggtcagtgg tcttggtgta agtgtcaata tattgcttgc ttcaggactc caatttctag atatgtttaa ttctatttta ccaaatagcc gggttagctc catgatggag aatcttcaca tttcatagta ccctttggta acctaaaagg ttagatgcat tttcaatagg atattggcca tagttcattc gattctcaac taacttatgt tttgttcttg PCT/IB99/01 444 ggggattggg tttttcaacc aactttatcc gcaagtatct ctggtcagtt agctttccct ggaggccagc tggct t tcag acaaacacac tgtgaacaac gcaaactact ccttagatgg ctcttgcctc gcagggcggg Ctcctggtct agtttctgag gatcccaccc gatgaaccac tacagaaaca gacattggta aaccagggaa ggaattgtgc gacataccat tttttatttt gatagttgtg agt catt ct a gtgtacttta tgtaattgga ctgaaacttg tttgatatga attgtgttct actgttgagt tgttgggaat aacacatttc cactcaactg tcattcttgt agcaagctgc ctagccacgt ctttgaaaaa aaatgtttga ttgtgtaaca aaactcagat cttctagtgt ttcacacccc tgtagctggc atgtccttct caactttata tctgctacca tccttaccac aatttttcta atacctacac atacaataca cagttgtacc atgagtgct t agcctcctga gggagccctg gcagaccgcc ttaaaatcca tgcaattcat acaccgcccc gccccagcat acttcagtgt aaggtgaggg ctatggaagt gaagacattc tttccaaaag aagattcttt agagagatgc catctttctt tcccttctaa acaaatgaaa tttggaggga gtgatgtatt cctgaagaat tttttcctaa tttttttttt tgtaaatgaa ttactctatt tacaattcct cctgggtgca gcgtgcatcg gcactcactc ttttacaaca acatttttac 22 acccatcacc ctttcattct tctatgatta tttaataact actcctgaaa gaggactctc aggcagggtt acctacagta attcacacac tactgatatt acctactaac Ctcaaatttc gtagctgaga aaatgctgca tctggggagt ggggaacata tgcaagtgtg tttcatgtag caagcagagt cagaagagca gtgagggggc gaacagagaa atgtcactca gcagatctag taaaaggatt aacttttcag ttattcatct aaaattgatc atggcccttt tattagtaag gtcttagaga gaaaaacttc ttcttataca tttttcaaat caggtagaat Cctgttgatt ctttttgaat gtgttaaaca tgaaggcata catagattta taaaaagtac tttaccctct caaatagtga cccccacttg taatgaaaca tgctttctag agtgggtttg tttggcagta tacagtcctc agaagcacat aaaattaaaa ttgttctatt atgatagagg aggtgctccg ctggttaaag tttctgacaa gaggtcctag gtgtcgtcca ggaaggctat gaagttacct gtggtaggag aggggaaggg agtgatgctt gaaggcattc gtgggctttc gaatatatac taaaagtctg aagctgcttg tgaattgaga acctaggagt cagttgtcca aatttagatc gacttttaaa agttatgaaa attaaatgta taaaaccctc tcagagtctc tat tt t ttct attttgaata tcagittgct caactttaaa ttgcatacct ttttggttgt tacagaagaa acattgtacc tggggaaatt ttactgtttt acatttaatc ggttttgtgc gtgagggagg tgccatccct ttaacattgt gttcacaaaa ttatttttaa gagcagtttg cctcccgggt tgcagattct gctccaaggc acagcagtct gcctccatct ttgcttattt aggaggagag cccagaagtt aagttaggtt ttggactaaa tagacagacc cagtaagcct atattcattc tCtaagattg tatgtattgt aactactata tgcaaagaaa attaagctaa aacaggtttg tccttaaaag gtaatcaata tgtaacttct aatacttgtt cagtccactg cttccaacaa tattaaaaat tgagtggtag atattgtcat aataaaacaa atcatcttgc attgaggtgc 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 WO 00/08209 PCT/IB99/01444 23 tatttgccct gaattgcagc agtaagtgcc tacagagtga ttttccatat agaaagaaat tctaagaata cagattcgga aaagaacttg agtaaacagt agattgtggc ccagcagtgt ttggtggtgg gcccgtgcag ctgtgtgttt agctccttta ccagaagagc agt cact tcc tcgcaaagga taagatttgt gtgcatccca agataaaact gtacttgtat cattactcct agagggcaag agaggcgctg ggtgcttcag taaattctct tctagattct atcgtctggt agtgattcat aactttaaac gtcgctgcct tcatacacgg agaagtgtga tcagccattg agcaggggac ttgggttgta ctgcctcacc ttcctgtctg gataggactt gatgtcacac tggcccattc agacattgaa tcagtttttt gaaaaagaaa tcccagagct L ttgatacagt gagagcctta gttaagcagt cctgttttct ttatgtcctt gtccttgagg cttctagatg gaactagaga gctcccagga agctcctcgg cagctccgct ccatcgaatg aacttatgag ttaaatttgt ggtatgttta ggaagcagtg tgtcaggaag caaacaaagt cggatgagtc atccttacag Cttctggtcc gaataacaaa tctaacaaga tatggagaag ttaattctgt aattttaaaa tcttctgacc atcattgaga gcattaacag gaaagagttc atgtcagaag gctgtgtaag actgcctgcc gagagtccta agtatcagta atccgtaatg tgctttcttt attcagggca tttgcttttg atagagaaag ttgggagact tcttaatctc aaatgtgggt ggagacaagt agagagaggg cactcctgca agcgggcatt gcattataat tttataaatc gccatctgtg Ctcctcggag gtcgccccag ccaggaacct gtatcactca tgcataaata ttgagtgaga acatgttcgt caagtgaaag aagagacgtt tccccatctg gcagagcaag tagctctgcc attggagtag tttattcatt agtaatcaga gtgccagaga atctctcacc agcagccgca gtgtgagtta tcctggatga ttggttcttt gtactgctta tttcttgtgt cactcctctg gttctcttgt cttttccttt ggagggatga cccccttgtg ggatattgtt tttatacttt aaatattatt gaggtgggag aaattatgaa ttaacatgag tctccagcac cactattcat cttggccagt tatttaatgg cagaacacat ccacatattc tgtgaaaayg gacctgtcca caggccttca ccacaacctg gtgagcacag gctggggcat gaaatgagtc tcgagctgct tgagcaaaaa gttgactgtg aagccctgga agaggtatgc gtgaactaat atgttttcta gtaatagttg agttcccccc ctataaatgg taatatgaat gaagtcccag gtacagaagt tggagcagag ggaattcagc ataaatacac ttaacaccct ttgttggcgt gaagtctgct ctccatgaag ggagatgcct agctcttttc cattttaaag aattaaaaat gttcctgggc gttgcttgag gtcgaatctc tgaacacatg tcacacccct ggcgttgttc ctccccattt accttcattt acttagatac cccatggtgt aggccttgcc gtgactcgga ggaggcgagc cccgggggtc agacgcctca atctgtgact aggctttact tgtgagtata tggtacctaa gaactttgct gcagggttat tggcctcata tgtgacctgg aaatctctca ggttcctgtg attccttcca acacagttat caaggtcaca gacctatgtg gtttggaatg cctcccagct ggggtagtgg gcttttagag gtctgcacat tttcagtgat gtttctcaaa aatgtagctt gtctgtctgc cgatttatct ggaaatgtat tttttttcct gaagtggctc gccaagagtt aacagtagat tggcaaagat tagaagctgc agaacgttac ctccagcaag tCt t Ctgct t tgcaatgttt gtctgatctg catctctgag gagtcatctc aaa ca ccct g cccgggggtt tgaacgaaag agccaggtat Cttggtttgg caagcaatgg catgcatagt gctgtgagga aatgggaggg gggtgacagg acgaatttgc ctgtaagaat accagttaga agtgtccctt ctttaaaaac cctgtgtaca ttcgtgtagt ttctgagtaa ttgttttctg tgatcccaaa acacacatcg tacttctgtg cattgaaaca agccagagtt tataatagat ctctctagca acaggaaata tttttaaagt gccagttcct actcctgtaa caaggtcaca 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 WO 00/08209 PCT/1H( 24 gtcagctgtg atcgtgctac cgcactccag cctgggtgat agagtgagac ctattaaaaa aaaaaagtat ctctgttttt ggaagaagtc ttccacctgt ttctctgatt gtcagtcttc tttttaagta ggtgagcaca tgaggatgtg tgtgtgtgtg cttattttca tattatgaaa ttgagacagg tgcagcttct actacaggca tttcaccatg gtctcccaaa ttatacttat acacacgcgc cacacaggat tcctgattga ggatggattt gtgaaaatga ttcttagcat tttttatgtt aatgccctgc gctcctgtag tagatatatc ttaattacta tatacttttt ttgacgagga gtgtttctgg gtctaatctg gaagggcctt ttgataactg tgactgcata ttgtgggaat ggcagaatgg aagtggcacc tgttacaatc taagaaagca tgttgggagc cactgaaaac ctccactctg ttgctccgtg gctgtttagt agctcctgct cctaaaacat ctcacgtggc ctgaatgggg tgtgtgtgtg gctgtcattt acagatctta gtctcactct gccttgtggg cacgccacca t tgaccaggt gtgctgggat tttcacattt gcgcngtgcg aacatctgtg tttaggggaa tcttttttaa aaattaaatc tagaacaaaa tctttatcct ggaaaaaact atgaaaataa aagcctgggt ttttgccatt tcttactgaa ttagtttaag ttagtgagca taagttgagg ctcaggttgt tattttgtag gttattgcct ccttggagac ggccccttgg aagagaggc c gaaatcactt gttctttagg ataaacacgt tacctttgcc atagtgttac tccctgtcgg gtggatcctt cctccacctc aattccagtg aagtttggtg ggaatgtcca tgtgtgcgcg gaaccaagtt atggcagatt gttccccagg ttcaagcagt tgcccggcta tgttcttgaa tacaggtgtg ccttgcccta cgcacacaca tttgatcatg ttatttttcc aaaattattg tcgttcgtga atgtttcttt aaactctttt t catt ctt ct cacctctgat gttactaagt taaaaaatca tgccattttt tgttgtctta tatgtctgtt ggattgctta cctgtgaata tttaaaaaat tgctggttct atgggcctgt catcaccccc caattggaac aagggcatgt aatgatgacc gggaaatggt agagagcgag tggaacaacg cgccccctct cacctgtggg gtcttaaccc gagaacaaag ttgtctgttt tgcaggaagc cgcgcgcgtg aatt t tac ta ggtttgtgtt ctggagtgca tctcctgcct atatttttat ctcctaacct agccactgca gtggacactt cacacacaca tacactgcaa cagtttgaaa atcccattca actactttta tattttgaag ttcaaccaaa tcctctgtgc tttatgaaca gttgaatatc tttcagctaa aaaaatgtgc agaaaagtct tcaaatcagg cttacaggta ggagaaaaca acacacgtta gtgtaattaa gctgagcaga tttccccctt tatgatatgt aatctttctc cactgtgagc caagaacggc cagagatgag agacaaaagc cctgctaacc tgagtctaag ctccgcctcc tgggaatgct tcctggggag agagccactg tgtctttgtt ttgatgactt tgtgtttgtt gtggcgtgat cagcctcccg ttttttttgt caagtgatcc ccctgctgca acatgcatgc cacacacaca tttgtgccat ggaagagtta tttaaaatca atttcttacc cttatatttt ctcttagcat caaattttct caaaaaggta attagatata atctgttgta aaccaacctg ttgccaagtc atgtctgatc cataacttgg tttatgattg aaacaattat attgcaagtt tattcccatg taggcagttt ggaacatgtt ttttcatgaa ttgatataac cgtcaatata gaaaaggagt ggtgtgctcc cccccgtgct caccgcccag tcgccaaact gtgccaaaga ttcacactga tgtgtgtgtg tatattttgt ttnttaagat tttttttttt ctcggctcac agtagctggg agagatgggg gcctgcctca aattgttttt gtatatacac cacacacaca atcagaaact tttggaaaat aattttattg tagttttctt atactttgtg ctcctactgt aaaatttctg gggcttaatt caagggtgtt tcttctttct ttctctagtt tctgagacca tgttcaggac gtataaattg tgtttatata catcatcaag ttttcatttt cacagaagag ctctttatca tcttaatctc aagaattctg ttctgtgatt P9/01444 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 WO 00/08209 PCT/1B99/01444 gattatttgt ttatacaaag atagttgata ctaacaaaat ggactttgaz gaggcattcc ttcttccagc ctgacagagS agtattggca tccatttctt agagctctga ggcaacctca agagcagcag ctttgaaggt tggggtttta cttggggtgg gtgcgcaggg gtctagcatt gcgcatgctt agttttaggt cgatcacttt tgctggtgca ctgtaaccaa ctgtgaacta ttgccacact attcaaaatt acctcccatg ctttctatgt tttgtaccta cagggtttga cccgtggaga tgttagaggg aggaa cgaag aagagtgcct gctcggaaat ccatcataga tccctgtcct gcatctgttt ctcttctcca tccgaagagc ggctctggca tcttcataat cttgtccttt :cccgtgaatt x tccaaagcaz ,tggaggcagc :agatatgaac Iaccctgggac Lggtctgaggc agggaaggaa L tgtttagagt *cttcttgcct tgaaagttta ggt taggcgg tatgctgcca gctgtccgca tgtttactgg cctagaggaa gagcccactc gttttctatc agagaaacag tat ctggaaa agcgtgggct gaataaactg gatagagaca catagtactg ctgtggaagt gggggcacca taaagtattc tttctcttcc ggtctgctta tgtggagtgg ctgattcatg cggtgttgcg gctttgtgca acagttccaa ctgattcctg cctgcccttc ggggaattgc ctctgcccct gtgtgttgct ctgtttctgc gggatatgaa :cctccccact i accatcttgg :agatattcct igtaaggccgg Iccycatcata aatcacaaag tttaaagcta *ggtcacagtg cctaggacat *ttaaaaactt gcaacttgag tacttccggg tgtgcattgg agttgtaggc cgtcatgcac gcccagctcc tactgggagc ttgacaactg gggccctctc tcggagtctg tctaaagtta aggcagcatg gaaagaaaat ccttcgggtg tggagctgcc ctgccgtggg tgccagtgtg ccgcaaagaa agcagtggga tctgaaaggg gtggccatac gtggccagga gttttcaaac gcttcttctc agatgacaga cctcgtcact cctggagtcc ccttccgttg tttaagtgat ccattattta *atttagtgat *agtcataaat Itgattctggt ccgagtagcc tacctgaaat ttggtaagaa gtaactaggg atttagggta ttaaccagcg aaggcaaaag cagagcagga agatgaagtg gtcttgcgtt cgtgctagca gtgctcactt caggtaaatt cgagatctta ccgcccttcc cctgaccaac Ctgccgtcct Ct t tca aat c ccacctataa atatcattac cagtgatgcg agcctggccc actcaccagc tggcccctcg agccagatgg gggctcttcc accagagcca gtgccagaac Ctcacagggc tgcgttaggg gagcattcac tggtctctga ggaccatgga gtttgggaac tgagttgaac acctgccact taactcaaac aatttggact ttgtttaaaa caatcatctt gggactcctg accccgcaga gaaacctcaa agcagagcgc agggaattta acctctccat gtgaatccag gagagactga atgaaaggac tgagatttgg cCttcttctc cacgggggtt gaggcgttct ccgccatgtt ttgggaagct ttggtgcccg acctgatggt catgtctgac ccagcttttc cctgggatta tgatacattt aatgtttcca Cttgcttctt accttttttc gtggagctgc ccacatctct tcccaggtcc ccagagggag ccaagt ttcg atggtcgctt gccacagatg agactgagcc agccacacgg agctgctgcc ccctggtccg ttggtgttca gctctgttaa attcttggct ggtttcctgg aaatgttaag at aat tt tag tgaagacccg aggcgtgcga agagagcacg cgtcctcttc gaggttaccc aaacaggagc acaggtctgc ggcaagtttt gtcaagtaca ccttttgacc tgattcttcc gtgggggagc tccctgtcca gcctcttaat gcagctcccc ctgtgaccaa cgcctgacat gagctacccg cccttaggag attatgcctg tttttaaagc gggtaatgtc ttgccccagc cctcaagtag tgagcctagc Cttcccctgc tgtagcaccc gccctggagg gtgtttaata ggaaatttct actgcttgct gcatcctgcc aaatgtgttt tcctttagct agtcctgtcc cttggcctct tgcagattga cttattctat 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 ttgaccatgc ctgtaagctc agtgctttgg gaggccaagg caggaggatc cctggaggcc WO 00/08209 aggagttcga ttagctgagc gattacttga gcctgggcaa aagttttttt tttgcgattg agtggacaaa gcctggcata cccaaacaag gtctggagac tgcaggctcc cccacccgag catggctccc taacacatgt ggctgctaaa tctcaactcc ctcagaacca agaaaaacga tgctcttttt taagaacatc ggcttaaatc cgcgggtctg agatgcagga aggaagctcc ctgttggcca cttgccttca agtgggactg tgtcctctgc tggtctgtga ggccaccctt gtggtgttgt gcccaggagt cagagtgaga ccc tcgc at a ttttgagtct catggccttt gatgctgctc tcagctcctc ccctgtcagc cactagcccc ccctggcaca tccctcccct tcacaggttc ggcaggcaag Ctcctcatgc agccgcctga gaataaaagg atactctgct gacattttta actttcccca agctccaaag cgctgcggga Ctataggcat gacaccaaca gtcaatctcc gagcttgaat cactgtctgt ccactttgac ggcaacatag gaagctgtag ttgagtttac cCttgtctcg gt ttgt ac ct tgattctggt gtgcctttaa ctcctgttcc tccctcctgt gctaggcaga tctggcttgt ggctattccc tcccttgtat atcttcgtca tcagtggctc CtCctctgtc atttctgaat attattaagg aagtgcttta tgaagcccat aaccagggag ctcattctct Ccctctggag ctgtgttggg ctgaatggaa tccgccccct ggcaactgat tccccatcct ccacactagc ttagagctct ag cgtgc a ca tgcagccgct gcttgtgaga ctgcctgttt acctctcttc ccacttggaa tagtgataat ttttataggt gtctctaact aggtggat ag tgcctcttgt attaagttgc 26 tgagaccctg tcctagctac agtgagctgt cggggtgggg ataaagtatt gaccagtaag gtaatggcta taggccgtaa gccccaccac gatcaccatc gccatgccag tctgcctgga cctgactctc cccatctccg agattcttgc tctataggca aacattgttt aagaattaat catgaattat tttacaggtg ttagtgggag gaactgtgca gtggtcctgc cgtggactct aacatgtttc tccatcctga ttctgtctga taggatgcac aggtctccat gaggtttctt agactctgtt ggactcactg aatagatctt tacatttata caggtgraaa gatactagta gttactagtg gggctaaaaa aaatatagtg tctaatctct ttactcaggg aagaatctga PCTIIB99/0I 444 tctctacaaa aaaataaaaa 16980 ttgggagact gaggcaggag 17040 gatcacaccc ctgcactcca 17100 gagtcatgtc tatacttgag 17160 catcagtttt gagcagtcct 17220 ttgtatatat ttgcctgtca 17280 aaagtaccaa acagaacagg 17340 tcacccctga ttcatcagac 17400 ccaatctcct gcaggaagat 17460 catgtccacc tttcctctga 17520 ccatgaactc accctcatgc 17580 atgctcttcg tcagtatccc 17640 ccatagcagc tctctccctg 17700 gcagctcctg caggcttgat 17760 aaacttagtg attagtgatt 17820 catattattt cttatctctt 17880 aagtgttctg tgtatgcaaa 17940 ataataatag ccacatatta 18000 ttcctttaat tagacaatct 18060 ggtgagtgga ggctgggagt 18120 ccagaggcag gacctgagct 18180 cagcactggg ctgcagccag 18240 ctgtgcttcc ctcttcccac 18300 cagtgtacct gcatgtctcc 18360 tgggcatttt aatgtacgta 18420 ccgcctccct aatagttagc 18480 gaggacaaat caggcatctt 18540 gatgccagag ccctccactg 18600 atgttccttc cagctgagag 18660 tccccagagg tccctgcttt 18720 cctgatttgc ctgggcggct 18780 cattcccact ctgactttgg 18840 aacagtacct tttaacacct 18900 ttctggcagt gcaaacttca 18960 agcgaggcag ggatgtttat 19020 aaacacacca acagtaatac 19080 aaggaaaaat agaaactttc 19140 aaaaayctgg agagaagggt 19200 taaaaagaga agaaaataca 19260 aatagtataa tgggcaaaat 19320 atgtgtgact gttttctatg 19380 actttatatt ttggaaacct 19440 acatcacatc caaagacagt gtgcaaactg gagccatggg catgaaagac at ttcaggtg attggacagg acagttccat aaaaaccatg tgtaatttgg gttaagtgag aagtcaggca tgtctcaaac cacaaaatcc tctccagcca cagctgagcc acacaagtgt ctcaaaatga aaatcttaca ccatgtcatc gtgtttgcaa agattctaat taaggagtgt cagtggatag aaaattagtc ccatgaaata WO 00/08209 PCT/HIQQ/f1 444 27 atctgaggta ggtaggaagt taatttatat ttagaaattt gcttgcatat gtctagtagc tccaggacaa c agt tggt c taatcctgaa ggtactacat cttggcaagg cc tgt t taaa ctgccgcaga gccttgtgtg agagggtggc tcttttaatt ggttaaacgt ccagcatgtg cagtgtttgt agcaaaaaca ttcagcttca tagtccatgg ggttgattcc tctttgtaac tcaaatggta tgaactaat t tagcatctgt Ctccttgtgg tttgttggct gctcctctga gaacctcact ctcatctgtc gtttagagcg ggcagactct ggctgatgtt ccagtccaga tagtgatgga t ttgagaaga gattaaaaac tatggtgctt atggagcaca tgtcaagcct ataatctcca gaaacaattt tatgtgctag atattgtgca cctaatatag Latattcccaa tcttgtaaga cctcagaggg tgaagggaaa I caaatgtttt tgtaaagcaa aactatagaa acgctgagta gt tagcgaag tttattttta gtgccatggt gttatttttc tgttcccttt tgtggtgttt tccatgtccc tgtatatgta atgtctttgc agagtggtt t tttctagttc tacattctca tgtttcttga ttttgatttg gtatgaatat gtaaagggta gtggatcgcc agatgaagac gtgttttaca gttgcagtag tcaggtccac tctttgatgc Ctccaggagt gt tgtgctgt atggt aaggt tgcgtctttc ggagacacag acaaaaaaaa taattttctc t taaggc aac ggaacaatat tttcaccaac ataagaataa atcccagact attacagcct tccctgcttc ttgtatgata tgtaataaaa atggctt tag agtgcattct tgtggaagga ctactgcagg gt tcc agggt ggtttgctgt gtaatgctct cctatgtcca ggt tt t Ctgt tgcaaaggac ccacattttc tattgtgaat atattccttt tagatctttg ccaacagtgt ctttttaata catttctcta cttcttttga aggatgctta atcgttggcc ctcatacacc aacttgacgt gtccgggtgg tgtgaggagg atcctaggtt ctctcaagtc ctgaagaagc ttaaagaaag Ctgccctttg tgtgggcgtt aatgctgaaa tctgggtgta tCccaacttt taaatttagt aatagaagct tttattgatt attttttttt taagttagca tcaaatacta aataggaaat atagatcgtg tgatgcttta ctcttggtgc ccattcattc ttgggtgtgt acatgtgcag acctatcaac CCctgctccc tgtgttctca tcctgcgtta atggtctcat tttatctagt agtgctgcag ggttatgtac aggaattgcc aaaagcattc atcaccgttc atgatcagtg gaagtgtctg cgtctgtgtg tgtactgaag tgttgattaa gcttggagtc gttctgaagt Cagggcttag aggcctgtct tCaaataagt aaagagtgag atttgaccat gttttccagg tgttttctca atcaacttct ttatgcaaaa gaaacgggga tttatacttt ttcagacttt tgaaacccat ctttttaaat aagtctaaga agtaggtcac cagcgatttt aaatagaatc agtgtggcag tgtgggttct ttggtaacta ttaagatttg gatgtgcagg CCatcaccta tgccgccccc tgattcagct gtttgctgag tcatttttat ctatcattga tgaacatatg ccaggaatgg acaccatctt ttacttctcc tgactggtgt atgttgagct ttcatgagag acagccttct gtaaagcaga gaggctttct ttctggaaat tctaacaagc aataaacaac gtcaggctgc ctgagtcatc tgtgatgggg atgccaggtg gaggcaaggc gccgtgggct gactagatat gataatcctt aaaatcattt tCCtttaagc atatgtcttg tgtataagaa tcaacagtga gggctggttt gtgcacagca tacttggaga ctgaaagctg tcacttctgg tagggtgaat tacactaggc gatttatttt tttgttacat ggtattaagt caacaggctc cccatctatg gataatggct ggctgcatag tgggcatttg catgcatgta gattgctggg ctacaatgtt gcaacctcac gagacagtat ttttttcatg agacatattt ctCtttttca tagaggcagt tcagatcatg cttgttaaaa tccccagtga cgtgggaaat Cctgggtctc agggatattt aaaatgcagt aacccaaatg cttatctctt ctaacctaat ctggtagtac tngttattaa tatttacctc atttcagatt taaaaaaaag atagtccagt 19500 19560 19620 19680 19740 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 21120 21180 21240 21300 21360 21420 21480 21540 21600 21660 21720 21780 21840 21900 21960 WO 00/08209 PCT/1B99/01 444 gaaacttaa tatagtaac taaacagta tattgtgaa atttacatg taagtggaai caccaaaag( tccctcactc cctcaggatl tgacttgagi gtttctttcc ttttgatctt ctactttctc acggaagaaa ttactttctc ttgcaagcca cccatccgcc cttgtcattt atttttgtgc cggcgtttaa cacaagatag ggcttggatg ctgataccag tggtctaact ttgatctgtt agtgaatgac acatgagcag caaaatgtaa accttgagag tccctggtcc ggaagggttg aaaaaacagc caattcaatg aatctaattt gagtgcagtg catgcctcag tttatatgtt gcctcaagtg acccaccgca cacttgccaa ctctatagat tttgtgtctg g ttcaaagtt a ttaagtctai a tgcctaaaci aacttaggai t aaactgcagS :acctggggac 3 tcagtgagtc :ctgattgctc a aactgtcacc :aatcccggae :gctataatat I aaatctagat iaaagttgtct ftataactttg Lagaatctgat Ccctacccca caaaagtcaa ctgttaacaa acagccagtc cacctgagcc ctgagatgct ctgacccatg ctagagtgtg aacctttgaa gagtagtttg cctcaaaatc ggaggcattc tgagtgtctc tctgtgtacc gatcagctct tttattgaga ttgtgaggtt tttttatttt gtgcgatctc cgacctgagt tactagagac atccttctgc gtctaatttt tctactgccg ttgcctattc t tttttgtcti g tggattagal :gaagtatati a tgaaactttc jtggggatggc Iccacgttgac -acttggtggc taagtgtgctt ttatattcct gtttttcttt tcagttggca Cttttctgtq tgttcatgga cccaccctgc ggaagtcaca ttgtgagtta cataacactg agtgctgctt agagtcacat acgatgctac tgcctgtgtg ttttaaccta aaaatatttc agggctggga actttcaggg ctaaggatta ctattgcctg gaggagggag tataattcac attttttggt tttttttttg ggctcattgc agctgggatt ggggtttcac ctcagcctcc gaaacatttt tccacctcta tgaacacttc tgtggatgta, tattagatat a atctgaatc i atttacgta( 4 ctggaaaati I acatgggcc I gtctgggcct Iagggttgac< agtgttgctc gttttttttt tgctcatttc *ttctacaggc *gagacgtgga gtgaacaaac tttctatggt gccgccccca gtgaatggca acactgactg tgtgagggct ggtaggaggg cgcctggatt agcacctgac cctgcaggag taccagggag caggtggata Ctagagtgag cctggcaggt caccctgggg cttcagtaaa t ttt t ttaga atcctataca atatccacag agacggagta aacctctgcc acaggcatgc catgttggcc caaagtgttg ttgtccccct accatagaca atctaagtgc g ctatgtcaal c aattgagatc t ttatgatga( cagcaggaaz Scaagcatagt Sgacatgcctt gaattctttc tcaattcgga cccgcatctt ttttccctgc Icaacttattt ttttagggca tttttaagat accaggttca *taaagatact *caccccattg agattttacc agcttttcct ggagc taagg ctgaggggaa tgaatccctg agcagattcc aatgtccaaa ccattgaaga gaatttgtgg gccagcacgt gtggaccctg gcctatttgc caggcagccc aggatagatt gtttgttcat agttgtgtga ttgctctgtc tcctgggttc Cccaccaagc agactggtct ggattacagg agaaaaaaac gcccctaatc aatcatataa atgcctagtt 22020 3 taagcagtaa 22080 -caatttatat 22140 atgaaacatg 22200 ttttagtgga 22260 -gacatctgtg 22320 cctggctgtc 22380 *tttttagact 22440 1 cctcagaggg 22500 *gattgccctt 22560 *gtgtttttgt 22620 gaatggagag 22680 tgggcaaaac 22740 tgttcttaat 22800 agataaaata 22860 tgatcacctc 22920 tgcttcttgg 22980 tcgacttgct 23040 agtgaacctc 23100 ttattggtga 23160 gagggctgag 23220 ccctcctgtt 23280 tccttagggc 23340 gtgggtgatc 23400 gtttaaagca 23460 acatacatga 23520 gtccagggtg 23580 aacttccagg 23640 ctcatcctgg 23700 tgcaagggaa 23760 tgttttgttt 23820 ttaaaatgta 23880 acatgaccac 23940 gcccaggctg 24000 aagtgattct 24060 ctggctaatt 24120 ccaactcatg 24180 cgtgagcccg 24240 ctgtagttgt 24300 tactttctgt 24360 tatgtggtct 24420 gcttctttga tttaacatgt tttcaaaatt cattatgtca taatacatac 24480 WO 00/08209 cagtaatcca ggtttatcct aataccgctg actcctcctc ggatgggaag cttctgccta gggaaggctg ggcagccaga tttttgagac ctgccacctc gactacaggc ttcaccacgt tcccaaaaag ttttttcttt ctgctcactg tagctgggat atagagacgg Ctcaagaggc cacccaccca cctggaggac gatgaattat attctgataa aaatgagttc cgcatcagat tttttcactt ttttttgaga tcactgcaac tgggattaca cagggtttcg cctcggcctc catcttttaa tggttcacat cagtaatgaa acttaaatta tggaagtact ttagcaactt gttatttata gagtacagta tcccctgcct aatgtttaca cttgaattaa cactgcgcca PCTIIB99/01444 ttctttttta ttaccagtcg tgaacattga caggggggcc ggtttggtcc ggctgaaacg ggggaagcca gggccacagg aatcttgctc cacctcccag gtgcgctgcc tgaccaggat tgctgggatt gagatggagt caacctctgc tacaggcaac agtttttcca caatccgcct gcccgtagct ttgtagaacc cttgggtgga aaattccaaa tatggcaggg attaataata ttacacaatt cagagtcttg ctctacctcc ggcgcctgcc ccatgttggg ccaaagtgct gagatgagga agccaggatt ccgttgacag gtgcttgtac ctagaattaa tttttttttt tttatttatt agtagcacaa cagcctccta gtttttgtag agcaatcctc ggcctccatg atgacttatt agaggcattt tgtatgtgtt ttacctgtga tgctggccct gaggctgccc agtctccatg ttggtagcat tgtcgcccag gttcaaggaa atgcccggct ggtcttcatc acaggcgtga cttgctctgt ctcctgagtt tgccaccaca tgttgaccag tggcctcccc acatttctgt accagttact gccctcaagc aaagtcctga gatgaaacag gccattgtaa agatgatccc ctctatcccc caggttcaag accagaccca caggctggtc gggattatag cagaaagatt cgtatcaatc ttttacgagt agtaatgggc atgttaactc tttttagcaa tgtttgtttg tcatagctca agtgcctggg agacagggtc ttgcttcaga tatttgaatg aatattccgt ggattgtttg tttgtgtgtt ttctacccac aggctttcct tggtttctgg gtgcctccat tcacacagag gctggagtgc ttctcctgcc aattttttgt tcccgacctc gccaccatgc cacccaggct caagtgattc cctggctaat gctggtctcg aagtgctggg cagctgtttg gggttacgcc cgcagcagct gtgattaata gcaacaaagc ttatctttat cataggtatt caggctggag ctattctcat gctaattttt tcg aactcct gcgtgggtca gagtgacaca tatttagctc aaattatcaa tgtgttagtg ttgctaataa aattagaggc tgacagggtc ctgcagcctc accacaggtg tcaccatgtt ctcccaacat aaagagcaga tgtatagaga cacttttggc gaatgtgagc ggggatggtt gcaggctgcc cactgccctc cagggaccct ctacatttct agtggtgcga tcagcctccc gtttttagta gcgattcacc ccagcctaca ggagtgcagg tcctgcctca ttttttattt aactcctgac attataggtg caaactgtgc cccaaatgtc gataagcatg aacagcacat ctattttctt catgtattaa accgcctttt tgcagtggca gcttcaccct tgtatctttt gacctcaggt ccacaactgg gttatgtctc taaatctagt gagttttgat tgaaggaatg agcatacatt ttcctagttg ttgctctgtc gacctcttgg cgcatcacca gcccaggctg gctgggatta catctcctgg catcacatat tgttacggat tggtgtggaa aagccagcag atgtgccttt gtgagtgtgt gcagctggga tttttttttt tctccgctca aagtagctgg gagacggggt tgcctcggcc tttctttttt ggcaccatct gcctccggag ttatttttta ctcaagtggc tgagccactg cccagaatcc tgatgctgga gggacctcct tgaaaattag tgcaatgaag gcattttgtg tttttttttt cgatcttggc ccttagtagc ttagtagaga gatccgccca acttactgcc ctgcagctct ctcttaatca aggtttgctc tatcttatgt tggggcatta agtggtttat acccaggctg gctcaagcag cgccctgcta gtcttgaact caggt tgtgc aggtggcaaa 24540 24600 24660 24720 24780 24840 24900 24960 25020 25080 25140 25200 25260 25320 25380 25440 25500 25560 25620 25680 25740 25800 25860 25920 25980 26040 26100 26160 26220 26280 26340 26400 26460 26520 26580 26640 26700 26760 26820 26880 26940 27000 WO 00/08209 PCT/IB99/01444 gctatgcatg ccccccctgg aggggagctg ggggctctgg ggttacagtg atggcacatt cagggagctc ctaaatgaac cagcatgagt ccgggctgga gcgattctcc agctaatttt atctcttgac agccacgctc ggtcttatat aatgatgggt ttttgctgga tggtgtggtt aggctttcct gttgggagac ttatagattg cactgcttgc cattctgata cagtgtacag agtttccatt ctctgtcctg atttagagat tgacgagagg tcgagcttgg ccctccagt atatctggag tctgaaaccg ctgcctgtgt ctttgcatag caaacccaca tcaccatttc ggctttgtgt ctcgcatctg tacaaactaa agagccacat gagtggctac tggacatgct agaaccacaa actaaagaag agtgttactg gtggggactg ggaaaggggt tccgctttgt tttttccaag actaacaaga gtgcagtggt tgcctcagcc tgtattttta tttgtgatct ccggccaaga aattctgaaa ttttcacaga gaatgttcga tttaggattt tcttcttact aaaacagaag gaggccgtca cagagaagtt gttttcttct caggctgcat cacttgtggg ccgttttcac tgtgtctgcc tctggatgat ctgcttctgt tccaagcaga agcaaaactg agttacctgg ggaatgttct atcatttcct gtgcattaac ttctctttac atgccccgat tgctatcctg ttatacttaa gtggctaccg catattgaga gctcaaggtg atgtggcacc taaacttcag gatatgtaaa tcatgtatcc ccactcaagg gagatcctga tgaatttgtt tttttttttt gcaatcttgg tcccgagtag gtaaagatgg gcccgcctcg ttttaaacat atgatttcta aagttgaagt tgaacagcag tattttgcag ggtaccagcc ctgttggttt gtggggatac tcagttcttc gtttccctaa acagt tagac tggcaggtat attttccagg tgccaatacg gagagagcag tcttctgctt acaaaacagg tcgcggaaat ctggtcacgt agcattccct tttgcccagg agcgacagac ccctcaggcg ggagagcgtc tacagtaacc atgtaataaa tattgtaagc gcagagctct gtgaactctg atcagccttc gcagtcaaga agctattgag aggttccttg tcatacaggc gataaagcca atatcacttc cttttacccc ctcactgcaa ctgggattac ggtttcacca gcctcccaaa tatttaccaa gtaccaaact tattatggtt ttctggtgat cagcatcttc tttctcttgc cttcagcctg ct t tccggac attctccgtc caccgaaggc cagatgttct ggc cc tcctt ctttcacctc cggatgtacc agctggccct cctctgcttg agatatcaag ccctagtgta gcagccacca ggt agct tt t acactcctgc ttctcctcat tagtcagcct ttactgtgtc aacagccaca aatgtagccc agagctcaag agaacattcc agatctccag agggtggtgc ttttctacaa cccagtgctt ttttgcagaa aaggtcatac aaggatgcat tatataaatg gatggagtct cctccgcctc aggtgcgcac tgttaaccag gtgctgggat agtaggaacg atgaatttta tgtnttcctg aagttatgga ctcaaacagt agacaaggca gcaaggattc aaagtggtgt agagaaaccc tcagcccctg tgtagtacga acctcccatg caggtaccaa agtgagggat ggggctcagt gattccttcg gaggaaaggg cttccatttg gaggcaggaa gtttcttcag tcgttttctc cctctcaggc Ctctgtgcct ttcgggttat tatcactatt ctcacattag aacattcagc cttatcccag tccccccagc ttgtgttgta cccacactgc tcaaggaatc gaagaaatag agtcagtggt taaactgctt aaaatatttg Cgctctgtcg ctgggttcaa caccacgccc gatggtctct tacaggtgtg tggtaattat tacttgaaag ttcanggtgt tgtacacagc tgccagggga gtatgggagg agattgcagg t tc tgc Ctgg atatggacca gtgcaggtcc aaagtcaccg gcccaggttt aattcacatc tgttctcgcc ggtgacaccc cctttggctt tga ccc ct ct tgtctcatta ggtagtgatt gcagccatga ccctcctcac cacttggatg gatgttttat ttatctcaac taaaattaaa cctcatttca aatattgtga aaagttctct tccgtcactc ctgtctcctg tcctaaaact ctaaaagcaa aggctttagg agaatggact 27060 27120 27180 27240 27300 27360 27420 27480 27540 27600 27660 27720 27780 27840 27900 27960 28020 28080 28140 28200 28260 28320 28380 28440 28500 28560 28620 28680 28740 28800 28860 28920 28980 29040 29100 29160 29220 29280 29340 29400 29460 29520 WO 00/08209 ggaatcttga caacctgttt cttgtgagag acgctcagga acaaaaaaat taaaggctgg aggccatgta tgcttgttac ggcacccaca attggcccag gtatcaggga ctgttttttg ttatgtgtgg cacccctgtc gggggagggc gtcaaccgaa tagtggttat ctaaagtgct ttgattaata tgtggctgac gctgttgcca gctcctgcat cctagtttaa tggtttcttt ttctataaaa gaaagaatac tgatattttt aantttctaa ttctcataga ttcagatttg gcaaatgatt accagtgatt tacagataca tcttactgtg ctgcctgctc cctaatgctt ttatatttat tcatttttat ccattactca ttaggtcacc ctgtcattgt tactcttcct PCT/1B99/0I 444 ttgttcatct gttttccctg gaagttgttt agcagatgca gactcaaact gaactaggaa attttttttt tatatgcagt tagctggctg ctcagggtca acccctgggc ttttgttttg cccaagacaa ctaggccatt atagcctcca ggcagaatca ctaaaagtag cagatagctg tatttatcct tacacaacca ggttcccacg cgcccacatc gcttggtgag tgtataaagg attattgtat aaagagagca gccttttgtt ttaagggaga agtttaagat ttggaaatgg gtttggggat tgagc tgaga gtggt cat tg gcagctgcag cccacccacc gttgagagtc agtttcagga atctcttttt tcctatacaa ccatatttat gttacccact gaagtggtgt cactcccctt attatgttcc tcagcttctt tgtgagcctg agcttaatca gttgtcagga cctctcggtc catgccaacc ccaatctgtg cttacccagg caagcttgtc ttttgttttt ttcttcttct atctgggctg gccaccataa ggacaatgaa gtgaaattgt gctaactttc gtaataagaa aagatggtcc tggaccttcg ttgccaagcc gacagtggcc ggtatatgat ctactgagat ttgatattct tcaattttgt aacatttgaa ggaga cata c actcagtttt ttttcccctt acatggaacc tctgaagggt ctgcaggtgc cctgggaaag tacatttctc atagt ttggg cgatatttga ttaaaacctg gtaaactcca gtagggcaga ggttctgtct tccattaagc cttctcrcct agccccctac tctcagccaa aaaggaggct accaaggctg tctttattct cccagatctg ttttcttcca gagaggtggg caacccgcgg ttacagctca gctgtggctc tgggaggtgt gttggtgtgt agtaatgaga acatgagtga tgtcaaagat tttgggattc aggtgtcgct gcatgacccg atggaaaaca tggtgcagag tttggaatat tatagtatgt acagaagtgg attgtagtga gtacatttag ttccattgtg aaaattgctt ttcttctatg cttgatttgc ttcttttgtt ctctgaagcc agcccccaag tagatctagc aatggattta tggtttaaaa ttttttgaag gcccactgcc aatggt tcc t ctgtagtcct cataactgat gtttaatcag ctgtaggcgg tgattgctta tttaaacagc tctctctgga gcacaccagc tatgacctgt ggtttcgaga ggaagtatgg cctgctttct tcagctattg agggaagcca ggagcagggt tcttggtaat atccctagct gtggcatgaa ccctCtgcta ttaaagcaaa ggaagaggag gccatgggga cttgggattc tttgggtcat ttaccaaatg aaaaaaaaca caagaagatg cttttttggt tctttctaga aatcatgatg ctctcttgtg gagttattca agaaatcaag cttggccctc ttgccatcca tgtccaaaag agaagtaaaa ataaaaaatt gacatcaaag ttgtaatagg acagctactt gcctcatgcc tggcacactg tgagtatcac gtttcagctc tgctctggga gt tgcaagaa aagataaggg tctctctttg tcacccagct tagcctcagg gggagaaatg aggcaccgtg ttcatttttt ttagtgtatt aaagattggc cagagc tgga tatattgctt ttgtaacagt tttcttatta ggatcaacat atagttgtca agactgaaga ggcctcacac atatctaaat agatggtgct tgggcatttt t acac atgga tggtgatagg agaaaaaact aaacctcatc ct aa ccagt a ggctggaact acttggcatg cccccaaagg ctgtccctgc tggtcacttc cactgtgttg tttcagtttg taaaagccca ttatcttctc taagttagcc tgattgtgat gttgctgctt taggttctca 29580 29640 29700 29760 29820 29880 29940 30000 30060 30120 30180 30240 30300 30360 30420 30480 30540 30600 30660 30720 30780 30840 30900 30960 31020 31080 31140 31200 31260 31320 31380 31440 31500 31560 31620 31680 31740 31800 31860 31920 31980 32040 WO 00/08209 gatggcaggg gagctcaata aagggattct acccccagga gttggccgac ggtgccacag ctttgtgaga aatgagagct agagagagag ttcatgtttt tacatcatga cccgtgagct catgaggcaa tctgtgaatg ctgattggct tttattgtgg ttcagtagtg gttaaactga aaccaccatt gagtcatatg ggttcattca agttatattc gcttccacct atgtctcttc ggatcgatca cattttacat cttgtaatct tgtgttgggc tttattataa tttccttcat catatttggg tttgattttc gcccttccat gtaatgccgg gaagcactca aggggtctga cagccattct caaccaaaag ccaaggatct gctttttggc agatagcagc tctgaattga PCT/1B99/01 444 tgaaaagttc aattctactt taatccaatg gccatgattt tgcagagcag cagtctccct tagggctagc ggggcctaaa agagacacac gaattggtgt Ctttcctgcc tttctctgtt aataaaatga gctttttagc ggagagtcag taaaaaacat ttaagtatat aactctatgc ctactttctg gtatttgtct cgttgtagca tgttgtatgt tccacctttt aagatcctgc tagtgtagtt tcccaccaac tctggattgc ctggatgttt aataatggcc agttaatgtg gaaatgtctt tactgttgag tttgaccaa cataatgata ctccacagga aggaaggaag tgaaatccac gatgttcttg ctctcaacct aaagtacatg ttgatgctcc gggacagcaa ttctgtttgc ggtattattg ctccgtttta gcactgtatt cctggttagg cccctgggac agtaactgtc ttaggcacaa gtctgtatga gacctttggg cacccacatt gcctcatttt aactcagtgc catttgagag tgtagtgtca aaacataata tcacattgtt ccaataaaca tttctctgag tcttatttct tatgacagga atatttaaca ggctattgtg tttcagttct ctgtgagtaa agtgcccaag agattttctg agaacagtat atcctagtga gttgggcatc ttcaagtctc tatggattat aagcagctgt gccccacctt gcttgttacg aagggctgaa cttggctgcc gatgaccaga cttggat agt taaaatcctg tgccccatcc aaacctctaa 32 ttaaatctct atagattatt cagatgagaa taggaatagt agccctgggg tttgggcctg ttgtttcatc gtgcaagccc cagagaggca aggatatcct ttctgagaag ggcctgtggc tggttaataa gaaaaagggt cagttaagag ctaccatcta gtgcaatgga actcctattc tttgactact ggtttatttc tttctctctt ttttcttcat aagactgcag ttcggatatg ccctcatact ggctccagtt gatcaatctt ccctcctttg gcgtgaggta atttcatgtc aagtcctttg taaatcagac gtgtcccatt gtagttaaga taaggagaag ggaagtaaaa ttcattttta gaCtgtggca gtgctgctgg aagtcactgc caggtgcaca ccaaccatac cataataccc tggagccttt gactgaggct gtctagggtc ttgggcgggt ctacccaccc agaggcagta tcagaaaact gggtttggga tggaatcgca ccagagtttt Cttttgtttt ctcccatcat catgtaaatt tatagattta aaccatattt tctgcagaat cccctctccc gtagataact gcttagcata ttttttccgc tcatctgttg ctatgaacat tacccagaag gttttctgca cctctacacc ctggattaca gagtggtaaa atatctcatt ctggtcggcc cccatttttt tggcctgaac tgtgccttgg gtgttggggc gcagccggtc agagcctcct atgtcagtgg gagggaggat tagtttgcac cagaggaaac cctcactggg tgaaaagcag tgagtctgtg tatgttagaa caaagaccat agcacctggt c tgggc tgc t ctgttccttc ttgcataatg atgtacacct atgttctgat gagcttcgtt aaatgtggac cttggtatgt aatgtatatt tcagaaaggc aaaaaaattt aagtataaag ttttcatctt agcccctggc catttaagta atgtcctcaa Cttttttttg acattcatct gggtgtgcaa tgggtttgct gctgctgtac ctcacccaca cttgattttc tatgtaagtt gtggttttga atttatgttt aattgagtta ttaaatcatg Cttcttcggt agtcagtgag cattcctaat ccatgaatgg acttttaaga ggtcacattg aattgcttca Ctggttcctg cagctctggc gcattggggg 32100 32160 32220 32280 32340 32400 32460 32520 32580 32640 32700 32760 32820 32880 32940 33000 33060 33120 33180 33240 33300 33360 33420 33480 33540 33600 33660 33720 33780 33840 33900 33960 34020 34080 34140 34200 34260 34320 34380 34440 34500 34560 WO 00/08209 ctttagggga tgtaaagaag aacatgcagt tctgagcaca ttaaggcaca tttagaaaag atttaatcaa tggtcccatg atttgccata tttttgagac cgctgcaacc ggattgcagg tttgccatgt tcccaaagtg tttgaaagga ttaagaaatt agtcataaaa ctgactttca cattgaagcc cctaaaaagt agcagtgagg cattgtgttg ggacttcccc gggcttcctt tgggtggggt aagtttataa cacttcagga tcacctgtac ggtcacatgt tgatggcacc agataagtcc aatttataaa ttacaatcgt tgccgagcaa actatcatga cctcccatga gccaaaccac gggtcatgat acttcagcat cgacacactg aaaatccaaa ggtaggatta PCTIIB99/01444 aggttctttt ttagtcatag tttcacgatg caatgaccag acttttaagc tatctctcct agcagccgac taacaactct ttcagttatg aaattcttgc tctgcctccg cacgcgccac tgtccagact ctgggattac aatattacct tgcttaccac tggattaaag agagatcttg tgaaacttaa atgctacttc ctggggtttt ggttgggaga atcccctctc taatggattc tagatatttt ctggagccta cagctgtgca atgcctctgg ttcattctac ttgcttgttg tgtattagtc ggaaagaggt ggtggaaggg aaggggggaa gaacaggatg catggggatt attaaatccc cctcccgcac ctgcatccca ctgatgaaaa aatagaccag taggtaactt caaaactcat atgcttcact tcccagaaag gcttggaagg tgagtgtgca tcagagtggt ttctcctgcc tgattcttaa gt cagtagag tctgtcaccc ggttcaaatg catacttggc agtcttgaac aggcgtgagc ctttaatgat aaaaacagt t tgtttgaaat atgatcatgt cacaaacccg ccaagttgat ccaggtggaa agaattggtt ctccgcaaag tgaactcaga agtcacagat tagtagttta agcagaaccc gacctttcct taaatgtcta attatttctg cgttctcaca tgactcacag gaagcaaaca agccctttat ggggaaactg atgagaacta aagtgcgcat ccgtcttctg ttgatgtctt cctagaatat agagaagata tttttttcct gatggggaga ctttacaatc cgacgagtgc atcgatttcc gcactcgatt aaatatgcaa tcccctgttt tgatgccaca actttcttag aggctggagt attctcctgc taatt t ttgt tcctgacctc caccgcgcct gtttttagtc t tcaaggagc ctattgggat cgtctgtttt agttccccac t tct t tcagg aggtcacatt cactatttta cacaaaagta tcacgtccag gcatgagagg tctcttgtca ccatcacggt cctcctttct accagctctt gttgaatagt ctgctaataa ttcagcatgg tgtccttcac aaaaccatca cccccatgat caattcaaga gtctggccct agccctattc aagggtggtt agaatggaag tgaaaatatc tagataaata gaccaaagac atcccaacac agtgaggtga cctgtgctgg ttctatgttg attttttact ctgtcttggg tggaagctgt tctctctctc gcagtggccc ctcagcctcc atttttagta gtcatccgcc ggccacagtc caagtaaatt atttgaactt tgtaaattta cattttctac ttgcctaaga atatgggccc tccacatata aactttttgt tttcctaatt ataagcattg agggagggtg tcggccaggt tttcttgatg ctttttgttt ctctgtaaat ttccaatggg aaacatacct ctggggagac atggcagcag gatctcatga taaattatct tgagaattgg gatcccttta ctacttgggc ccagaccttg ttacatttat aagaatgctt tatagataga tgggaatcat aaggttaaac aacgtggcca Ccctcagaat ggcctggatg gaattacttc gttgaatatt gtgtgctggg tttttttttt aatcttggct cgagtagctg gagacagggt tgccttggtc agtagagact gtggtaatgt gtccacttta tgtcagtgta tacatgagaa gtcatggata ttcaaaggaa actcagcgaa ttcatcttga tttaagtcat tgtaatggga gaggacagca cacagagtct cctttgacag tttttccctt tacagagctg acttctctgg gagactgggt ctcaggaaac gagagagaag gaactaactc ctgcctgttc tggtgacata tgtgagactg atgcttaggc gaggt aca ca tcatagagtg atcttaggga tatattagtg 34620 34680 34740 34800 34860 34920 34980 35040 35100 35160 35220 35280 35340 35400 35460 35520 35580 35640 35700 35760 35820 35880 35940 36000 36060 36120 36180 36240 36300 36360 36420 36480 36540 36600 36660 36720 36780 36840 36900 36960 37020 37080 WO 00/08209 tttacagttt agccaacata tgaaattgcc ctgtatctga cactcagatt tcttaggatg t taatgagga ttgcatgcta taggtattgg aagacagctg tctgaattgg ttgttatctt tgaaactctt agtacaaatt tataccttga aatgaatttt agccgtcgtc tagatgttgg gaagcacatg gtattctaaa aagtctgggt gatcccagga acaggtcggt cagctctgag ggcettcect aggccatccc Cttttttctt acgttaaatc gctatggact gttatggtgt tgaagccctc Ctetctette gagggctctc ctgtgagaaa Cccgagctcc gggtatagtt tttgaccctc cttttcacca ttctcagcag aggeteccac ctaaatgatg tgttaggtga PCTIIB99/0I 444 ctctgccaec cctaaaaacc atttgtgcca gttctctctc taggctccgt gtt ct tt ctg tctaaaattt tttaacatac gttgaaagtg Ccaagagaga aaagctgcag ttattgatgc atcaggataa aaatatatga cttttggttc gtctgatttt atcctttttt gcttttatgt ttatgcgtca tctgcttcct gaggaagcgg cctggcttaa ggccacgtta ccagagtact cccgcttcaa atgcctttcc aggtacagat aaacctgctt gaacatttgt ttgaagggag atgaatggga accctgtgag actagaatac caaatgtgtg attctccact taatagctgt ataacacttt tgtctcagta ggtgtgacag ggtggagggg tgggt tgaga gtcatcagtc aaccaaaata atctcctggg ctgttgtatt cttccacagt ttttcagacc taatagcctt gcaggtaaga aaetatactg gatcaaatgc agaaaggatg ctattgagga aaatgctatt aatgatcctt aatctgccca tttcacaaaa acattaaaag cccctctccc caagtgctgg Ccctgggttg gcgatggggt gatgaaagca agttaaagaa atgacataga tggtgacttt aaaaattctg atcgtaataa tgaacttttt tgttttaggg gtccccccaa gcccttggga ttactgtcat ga ta cagcaa acttgtactg ttgtttaagc ccctggcttc ttctctgtaa gttaggggag gaaacagaag cccggagtac ttgaaagctg aatcgtgatc tgtctctgac tttttttcag cccgagaggg attttaccag ggagctcata tcagtggctg tgtcatcaca tatccctgec agtgtgcagt tagacaaagg gggaaatgct gaagagtctt ttgtggcata ttctatccea tctatattat ccttaatttt cctgtaattt ttgtcttctg cattgcactc gtgcagtgga aggtcaggtt gaccagacgc cagcaaagat aagcaagtgc gctcaaacaa aattgtgcca agccttgttt taaaagggaa atggggtagc aattcctatg ggtgattaag cccaaaaaga gaaggaagct ccaccctgat cacccagtcc ctgcatggac cgtagccact att tgagggt cagaaaggcc cctgggctga ggttgtaatg ttagetttta tatgttcaga gaggaaaaaa aaaaattggg taactccaga Ccttcctgtt taatagctgt tcatcagagg tctgacatga tgtatgtaaa age gtac aag gcgtctacta ttaaaattcc aac ct taat a agcttaataa aaatgtcata tttttttttt ctcaagtctt gatgttcaag catgataatc actggggtgg gtcctgtgtt tagagtccac gaaaggtgcc tgtgaattca atccctttct atccattgag Cetgggcttt gttgtcagag ttggaatcag ttgaagccct tttagatgag ggtagagacc ctctgcaagt cttggacttc ctatgatttt tttgcaacca tttctctttc gaggaagttg ctgagatact ggaggccagg agctgctttt gtagtatatt tctggaagtt aaccceagcc ctccttttct ttccaggctc tcctggctgc tccttctacc atgatagctc gatagatgta agcattgttc tcttgtaagg agttcaaggt taaagggttt at tt tggggt atattgttta tggcagaaat ttttgccttc gagtctgggg cgattttaat cagggactcg gttggaagta gacaaggaag tttcaagtcc gcacagcagc aaagaaagga ggc acecccca gctcagctca aaaeatatte getetgtaaa atttgctcet aatgcaeagt attgtgtgag ceagagettc caggaagaga ceeteeagaa attagageag gagetteaeg caggtctagt gcttgetttt gagccacct gctggagggg ctgtagatgc tttctgttta ttctggaagg 37140 37200 37260 37320 37380 37440 37500 37560 37620 37680 37740 37800 37860 37920 37980 38040 38100 38160 38220 38280 38340 38400 38460 38520 38580 38640 38700 38760 38820 38880 38940 39000 39060 39120 39180 39240 39300 39360 39420 39480 39540 39600 WO 00/08209 aaatttgtta taatacattc caaagatgaa tggctcatgc ggagttcgac aaattagcca gggaatcact tccagcctgg ttttaaaact ttaatttaaa tatccaatta gccagagatt ggatctaccc taacaagctc tctcccgtag gggagtggtc accagcagct cttttaaaaa aactccatgt ggtcaggatg acccacctct cttgcagtca actgccccgc Ccccactccc ttggacctgc ggc a ca caga ctttcatcac gataaacacc gtcagatatc ataaactctt ggcctgcttg acctctttga ctcctgcttc t aggc ca cag ccgagggcat ccaaaacccc gacggtgacc ttcatagaac ccaaccccag tcttttttag agtactgttt gggtggatgc PCT/1B99/0I 444 ttgctgtaat ttaagaaaat agtgcgactt ctgtatccca acctgacttg ggtgtggtgg tgaaccctgg gcaacaagag ctatttaatg acaatgtatt aggtaactgt caaccttttg cagagttttt ccaggtggtg actgaactca agatgtcctg gtaatcacag aaatccacgt ggttactgcc gtggccttgc tagcatctgc tcaaattcat atctccttcc acccctcctc acagccgccc cctggaagtt tgttctgcga ttatgatatg tatatatgaa taaaggagct tttccttcca cttctctcct tcggagcttc aggaggcgtc gaccttggaa acccctctcc ccgggcctcc aggtttctct tctgattgcg aacaaaatag ttaagtgctc tattattatc agtgtaggtt ggtatttagt gactggtgtt gcacgttggg gtcaacgtgg tgtgcaccca aaggcggagg cgagactcca gtcaggtcat taaaaggtac aatgtaaagt tgtgcattag gatccagtag ttgaggctga tggtctaggt acctctgtgc agtgaacaat ggatgagatc catgcactct cc ag ca cagg aggcactccc catgctgctt cagtcgtgtg accccgacga tctgcactct ggcctgctca tgaattgaat tttcttataa gtatatgtga gagtcccaat ctctgcgtgt ttctgcccag tcttgtgttt tggggtccct ctgagcattg caccctattt tattgttgcg tctccagtat aagaagtcta atcatgatat tcaaggattt ggtgatttat gttgatctgg tcagtctttg tgaaaaacat aggccgaggc tgaaacctca taattgcagc ttgcaatgag tctcaaaaaa ccatccataa Ctttgttccc aagtggctaa aatttcccaa gtttggggtg agctcgtgtg tctgtcagct ccacagtgag gtaaacgacc acagggttaa ctgctgtttt aggccctttt tgtcccttcg tcttttaatt tcagaactca acgtctcact cgccacctta gctgtctcct gcatgattgg ggttttatat agttttatga CCCttgggtc tcgttgctgg tcttccctga gttttctgtg cggggcaggt acagaggaga ctacgtgacc aggagcccct tcttcagtaa agcaacagaa taataggaat ttcatttaat gaatgaggaa attagcaggg gctttgaact ggtgatatgg gggcagatca tctctactta tacttgggag ccaagattgt aaaaagcaag tgggtagagt tagtgtcaca aaaagtgctg ttgttcaaat ggaccaagaa gggaccacat gtgacccctg gtccaagctg aatgttgggt gtgtgggcag tcacctcttc ccttctgacc ctgggccccg cccacacttg gcactggacc tgggatcatc tgggctgccc t aggggtgga t cacaggaag gtagaaagtt tagttttgca gagagttgcg cccctcatag gacgctccag ctcagggcgc gcagcaggag gtcagccaga atgggccctg gggaaaatgt atcaactttc agattttgcc tcagcactta ccccacaaca actgacacag agcggcccct ttgcctttga ccaggtgtgg cctgagatca aaatacaaaa gctgaggcaa gccattgcag ttatattaca cattgcttaa taacgtgaaa aacgccaaag ccaggttgct tttgcatttc tttgagaact tgctgctgga agtaggtttg ggtctgacat cagtcagggt ttcagagtgt acctgacctg tggggaacta ccaaggtggg tttccccttt tcttctgagg ttgacccctt gcttggtttt gt aggggagg atatgaaagt taatttaaga tggctcccgg gctgtcccag gctccctggc catggtgcta gaagccgtct caaagaaagg gacacagcaa tggcattttc ttttttatcc aaatagatta ctcttgtcta aagctgtggg aggggtggtc 39660 39720 39780 39840 39900 39960 40020 40080 40140 40.200 40260 40320 40380 40440 40500 40560 40620 40680 40740 40800 40860 40920 40980 41040 41100 41160 41220 41280 41340 41400 41460 41520 41580 41640 41700 41760 41820 41880 41940 42000 42060 42120 WO 00/08209 gaggagcttg cttttcctct tagtcctgga accaattcct cactgcccag tttccacatc ctccatgttc gggcggggtg caaggcccag ccatggtggg tcagtgttac ttctggctga gagccctgac aaacgccact ttcttgatcc tggggctgcg ctgngcc tc t ctctgagctg ctaagtcatc gggataaaga tttttaagtg ctcaggcaga actgtttggt ccaggatatg cccaatcttc gcaaccaaga acaaaatggg ttctttacct aaccacccca ttttattaaa ctcacaataa gtggcaagtg tcaatcccag atcttttttt gtacttctct acaacttaac tgagacagag gcaagctccg ctataggcgc cgtgttagcc aagtgctggg ttctgttttc PCTJIB99/01 444 cccatttcct ttgcttttgt ttcaccacct agcttatccg cacatagtaa tgggaagaga ctgtgagcac gcgaaggtcg cttcccagac cttttccacc agaaatggta ggcggaatat tttcagctcc tctttccaag ttctccagct cagagctctt t cctggt tc t ttgtcatgac cttacacagc tctgcaggcc tgtgtgcacc gttaggtgct agtggccttt gttaaagtgc tggaattccc gtcagccagc tcattatcag aaagtggctc ctttccacaa tgtttactgt t cc t atgaa c atcagataac atacctggct gctgaacttc gctaccctga agtat tt ctt tctcgctctg CCtcccgggt ctgccatggc aggatggtct attacaggtg ctctagctag ggtagttagt gtccattacc aattagctct t tggtggtga gtgcccagaa gggggagttg tgacagtttc ccactgtgac acagccctca ccaccatgtc ataggatagg t tgtt tct ct tcaaaaaata cataattttc cctgtagacc ggtgctctgt cttccctgga ctctaaccag cttggaagtt tcttgctcct tctttcctca ctgttctgtg catgtgtgtc taaagaatgt aatttctaac Cttgtcttct tgatgagtta ccatcaatta aaactgacaa gtgcaggctt tagtcagttt ctgtttgagg ccagggccca cagaacactt ttcatacttg tataccaaat tcgcccaggc tcatgccatt gcccggctaa cgatctcctg tgagccacca actgtcatat accagggctg ccaaggcatt gtgtcccatg agatgaaatc aatgtgacgt agctaagtca cccacaatmc gtggctgctg ggtgctcatc tcataaaatt gcaaactgtt ttagttttgt cagcttcctt tcccatgcgt tcccatttag gccctgggct cttcccactg actgagtcag taccctaaat ggtccttgtt tcacaccctt tccatagctc tgatccacta atatatgaga caattaatat atattcaggc ataattacct aacctgtatg caatgatggt gtttttttcc tatgcagatt ttgagtagct tgctcttgac tctttgtatt gatttctagc gaatgttgtc tagagtgcag Ctcctgcctc ttttttgtat acctcatgat tgcccggcca aatgcaactg gcatcatcag aggatgagcc tcttgccgtg agtggggtac cggacctctt t t ttccagtg tgaagaaaga gtgggaagtc ctggtggcac acaagaacca acaaagatca tgtctttaat ccccttgcag tatctcctgt agccaccagc cgcccaccca ccgtgtggnc gacttttttc ggc tat t ttg tctgcttatc CCCctccgta tttttcgagc ggctgtgcac tcacttttgc gtggattgac gcatactatc gcacatcttg gattttacct aagaagaatg acacatttac tcgcagatta agatcatggt cttataaacg tcccttattt agcatgcctg tttttttttt tggcactatc agcctcccaa tagtagagac ctgcccgcct tgaatgttgt taggaaataa ttgcctgctc agccaagttc gagggataaa ttgtaaagca taagcttcag tCcCtttcag aggaaaataa cctggggagg tgaccagggg cagttgaaaa gcacttaaga caagaactga atgcaaaaac ctacagcttt cgcccatcac ggcctgttct ttcagtgctc ttcctcatct ggagggagtg ttggcttctg tggctcccat ccttcttctc tccctgcctg ttaaaaaacc tagaccttaa tggtcgttag tttatgctgg gttcttccag gtagttgaca ctacttaatg ggaaactaag agagc caggt gctgaaattc tggtagtctt gcatgaggca ttttttttct ttggctcact gtagctggga ggggtttcac cggcctccca ctttaaaaaa tcaggttctc 42180 42240 42300 42360 42420 42480 42540 42600 42660 42720 42780 42840 42900 42960 43020 43080 43140 43200 43260 43320 43380 43440 43500 43560 43620 43680 43740 43800 43860 43920 43980 44040 44100 44160 44220 44280 44340 44400 44460 44520 44580 44640 WO 00/08209 tttggagtat gcaaatgaat gatatggcag gtaaagctgc tttgtttttt ccgttgttgt tttatgttgt tttcgcatgg gggtaacaca c tagagc tcc gaggagaact ctgtattctg tttttgtctt cttccccaca aatttggcat tgtcacagca ttttaaaatt gctccacatc gttatcgcag tgtggtctct gactggaacc tagagcagcc ctggcctctg tgggattgtt gcatcgctgc gtgaaaacag ctttgatttt cacgcctgaa attgttgaga gatttttaac tatgaacaaa ttgttaccac taggagagac ttgttcctcc atgctactat aatcatccag gggtttattt ttaatactag atttgggaca ttgaactgtg agggttgcat gcttggagaa PCTIIB99/0I 444 tttccataaa ctgttcattc ggaacacaga tcatactttg tctaagaaat taacttttct tcaaacagag agttcagaat cagaaaaagc tctttcttac gcagttcaga cacttgagag tctcgatcat catttgtgtg ctaaaatatc atgatttaat cttagaaccc cgtcattata Ccctttggtt acagttgggc cagaaggcat CCtccttcgg tgcagctgt c ctagagagta atacatgtaa tgccccggtt gttctccttg ggaggactga aagcagctta tatctccct t aaatattata gtgaaatgaa gaaatggtga tgacccctag gccctctgct ccccaaaatg gctgaccttt tgctcaaata agctcctgcc ttttcagtgg gaggaaatag acagcagtgg agatccacag attgaatatt aatctctgcc taaacaatat tttgagctgt ccagagattt gtttgagaat tttcaagagg cagaaaattg agggcactta acttaatgtc agccagatac ggcattgatt cagatagaaa gacataccag aggaggcagt attggaggct ggagatgtca tggtggcttt agtcggcggt atgttttccc ctcagcctcc tctcagcaag gaaacctcag gggactcaat tatcattaga acctgccact ttacaaacca Ctttctttgg tataaattac caaatttgtt tttggataaa gattttagct gggatattta aaacattcta tcagtagtgc acagtggatc tgtctttgta atgtgt tgag tgggagtgaa agagtaaagc aggcattact aagtcatggc agggttgaga gtggacttgg Ccatgcatat tcctgggctc gacaacatta tcgtaacaac aatgttcaat aactggaatt gatgaagaga gagaataagg acatgtgatt agtgctttgt tgggcagata ctgttcataa gaatgcaaca catagatcat tgtctccaag attgtttctg gaatagtaaa gccgacttta gaatatcatt aagaggcacc gctgcactga aatgcaagtc agtagccctc gctggtagga acaaggttct ccactcgagt aagccttgtg actgattcag tgagctattt gttaatcgta agagatacgt ctgaatcaga gcaatgccta cagtgtataa tgaggttgag agcttttatt ttctggactt gacctgaatt ggagatgagc tgcagcgtgg ggggagcttg ctgctgtttc tgctttctgt agtctacatg agacgctgca tttctccttt tttttaactt gttataaaat atctgtctac taatgtcgtg gaaagtgcaa ggaggtggtg caatgatgca gcacagagtt atttatgact gcctcctgaa aaaggctaca atctaatcct taaatatgcc tctcacattt aacacagttg gccaagccaa ggggagagaa cttagaccac ttggcttagg tagctgacag ccacatctct cccagtctgg caggccaaat tgtagccagg aactataaaa ttttgcccct ggttcttatt gaggcattga aactgttcct aaaccctcct tagttcatgt ggcctggat c caggcagcta cgacgtgcta agc cc tgc ta atggaaatgc ccaggcatgg tgtagtagag gtcattttac gatgttaatc ccagaatcga cttttttttt gctctatggt tcgtttcctt tctttaaaag gaaagaagcc tgcacgttgc atgtcatcct gttggggaat ctgttgggag ttatgactgg taatttaagt ggactaagtt tgtcagtgcc tacactgggg gcccatgagg gcttcctaca gccggatccc ctaacgcatt aatgatgcaa ttgcctcaga caagactgca atctttttgc ttagaacaaa ctactcttaa aaatcagtaa tcccagggtt agaggtggtt tggtgggcag cctgacaaag ttaaactctt agaggtgaaa ccccgaccaa acaacagtat gcaagcgcat ttcagagtgt tcccctcaga 44700 44760 44820 44880 44940 45000 45060 45120 45180 45240 45300 45360 45420 45480 45540 45600 45660 45720 45780 45840 45900 45960 46020 46080 46140 46200 46260 46320 46380 46440 46500 46560 46620 46680 46740 46800 46860 46920 46980 47040 47100 47160 WO 00/08209 cttgctgaat ttagagaagc cttattatgg gcaccttatt ttgtaagaat tgtcactagt ggtcttaaat ctctgaatac gtagcccaag gttccatcct cagtaggaat tagaagaagg aagtcagctg gcagatcacc caactgaaaa actcgggagt aagattgggc aaaaaaaaga tcccttctgt tatggaaaca aggaattgca agaactttaa taaaaaaaaa taaaaggatt ttgggggtga aattagaata aaatttagaa ttcttttgtt tggtgaccaa tgagtagctg taagagctat tagcctatta gacagatcgg ataatgacag tgcagtgcaa tcaattcttg ccccaaaaga actctataaa ggaacaaatg actcactgta gcc t tctgga gc cc agc acg PCTIIB99/01 444 38 caaaatcttt acggggtaaa ctctgaccct tgttttagct caaacaagct gataataaga ctcacagttg taaaataaaa tggcagagag cctggcctga tttcaagacc agtt taggtg gacatggtgg tgaggtcagg tacagaaatt ttgaggcagg caccacactc aaaaaaaaaa tgcctgggtg ctgccctcgc aaactctttt tcacagctgc gaaaggaaag caattgaact aaggatttgt gatttgcatt aatacaggga caacagctat aacccttgtc aaattttgat tttttttatt act cagtctc gaagtggaag agttttgatt ctgtactgcc cttttctatt tgtagaattt gctgtgtaat gtcttctgaa tgaccttggg cttgtcatca tggtagaagg aatttagcaa ctctictttt tactagctgt ggaaaatgga tatttatgcc aaaagatcac tcgtccccaa ccggagtttc tacatggatc gttacacctg ctgggaagag aggtgagccc ctcacacctg agttcgagac agcagggcat agaatcgctt ccacctgggc aagaatatcc ggtcctgggt tatcaggaca gcaaaccagt agtgttccac ggctcaaatt gcatcatatt aagtgttttg attttttctt aaacataaat ttctcaggca aacaaggcag ttcttttctg gtgttcttac tttaccacca catcaggagg agaatcccag tttcataata gaacagtaca taatttattc ttagatatag tcaggaagac caagtttctg ggattaaatg tctgctagtg gatccccagt ttttactttg gtgaccttgg gatcataata aagaacccat aaaagtagaa acaatacttg ataaacttct tggggacaaa ctcattgtga gaaggctgtg ttgttttact taattctagc cagtctggcc ggtggcgcat gaacttggga aacagaatga aggtcaaccc tctcttgacg gcgcctgcca gagagatatg agaattccaa aaagacttca cagtaatgac caggaaaata agtttttatt aacagtacat cctgctgggt caaggttcta tgcccctagt tctgtgttgg ctctgagggg ttaagcaact cagcctgtct tgtatcaaat cagtaacatc atttgtctga gaaacatttg atgagttaga aactttagtt agtctaccta gttactgtta gaggcttgtg gaggaaaata ccaagttata tcacctgtcc atggtaaaag aacattaggg tatttttgca atagacagtg cagcctctac ttccgaattt gtaaacagga agtagggttt actttgggag aacatggtga gcctgtaatt ggtagaggtt gattccgtct cacctaaccc cacacqagat tgccagccag cttccaatgt gagccaagat agctgcagaa taatcctaag ttttttccat tttaaaatat gtaaaccaat gtcagcagct acctggttag aaagatatga gccctgttct aggctctgtc tgttaaagat ccagaacctg tgagatgata ctcctataat taggctcata gattatagtg gtatgccggt tcctttccag tataaaatgc ctgctggcta catgtagaag cacctttttt aaacctcact tatgagattg ctcaacaaac agacagctta gatccagttt gtccttgtca tgttaggaat gaaaggaaca aggatgagat aagaatatcc gccgaggtgg aaccccgtct ccaactactc gcagtgagcc ccaaaaaaaa tcagcggggc tgtgagagtg aacacatcat gaggtaaagc ggtaaaagaa taagattaaa tatacagggt ctttcatttt ttattgccac at tttgt ccc gtgctcagtg ggcttacagt tagcaaacaa cagtggttta atacccactt cacaaaatca ccctattaag ctttataatt gcatataaac atgaaataag gtatgtagtg gtacctcctt gctaatatct ccagcgcagt ttaaatacat 47220 47280 47340 47400 47460 47520 47580 47640 47700 47760 47820 47880 47940 48000 48060 48120 48180 48240 48300 48360 48420 48480 48540 48600 48660 48720 48780 48840 48900 48960 49020 49080 49140 49200 49260 49320 49380 49440 49500 49560 49620 49680 WO 00/08209 tttaatcttc ccagactgtt ttacataaac atgtcttaga cacagtaggc ttttcagaag agctactgaa gtggcttatg tatctgaatg aaaatgacaa ccccacgatc aaaagaaaag tactgctgct tattgaacag tgtcttaaaa ttttcagagt atattcatta ctttaaatct tgtctctctc gaacttttct tttgcaagta gtcaaaatat acactggaac ttggaaagag ttttgagcag accctgccat tgattgcaaa taagacacca gcatagctag tgagttatga atagaatgta ggaataagac aacatgcagt tagcacactg gccactttct tagtgatggc tgttcgtgtg aggtcaaact agggcatagc tataattggt aatgaccaga gccaggtgtt PCT/1B99/0I 444 cttcagaata gggttccagt tccttgttcc gttttgtgag actgtatgaa gagataaaaa gttttgctta ccaatgcata aacagaattt ttctggaatg tcctttagaa gacatctcgt tagaatggag gcctggtctt atacagcagc aatattaaca ttcctctatg aagctctaac tctctctcta tgtgtaagtc atctgaaaaa gacatgaact caattaatgt catgagtctt gaagtctgca tgccccagaa ttttgatctt actctgagga tgtcatagcg caacactgtg aagagatagt ctttcgacat tttgaagctg gggggcgt tt tatgtttttg attactgcat tatcttggga ttaatgccaa tataaaatgg gaggcaaaac cagactagaa gcattaagtt cctggccaga cctggctcga tcagtttgca tgtaaattat cattttctgt tattacacct ctatttgaca ttaacacagg gctgatttga ccgtctccct ccagtttgtg gagctccgag aaggaaaatc atcttggctc acgatagtct ctgtttactt gtcactgtta acgctgaaag ctttatcctc attaacatac aaagaat tag ttgaaagttt tatcttcaaa agggtatcta taatctcttt caattttgga ttaaaatgac ataagctcct agtggataga tacgagcaac gtctatatct gaaagtaacc agaaagacgg agagcaggtg tgaaggtaat gtgtcatcag tttttgttga tactggccaa gaacggtgct cagtgtgcca tgaaaaggca tactggccct tagcacagtg gttaagaatg cacatgaaag ctccttccta tttctttaaa tgtatataaa aattatcaat taaaaagcag tatcatttgt aattttaaat cactgtgttt ttccagatta aagatgggcc agctgtggca agaagctcca tgaagttaat aatgtataca tctggtgatc gtacagtgtt gtgcttttca agccatggtc gtaacttcac ctgagttcta agattttgtt gtagcttaag gagaactgcc caaagggaag attacagtac cttgatgctt gagaatgtgt cgttctctgt atagtttctg tttgactgaa ccacagttgg gtcctctcat agtgctgttt tttaaaagca ttaaatgaca agaggaaaaa tattcttgca cacagctggc caaaagcaga agagtttcct tgccaacatt gctatgtgac actgcatagt gctctgagaa aatataatta gtatctttaa ttcacgtttg ttggtgatat gaatgtgcat ttcagagctg Ctttggcccc aaaggctatt aggttggttt cacatcagac tctatctata taatgatagt tagaacttct ttttgttttg tgtgcctgtg tctgtgtgct cctgtactga catttcctgt atgcaaagtt cggtgataaa atgtagcaga atttcattca gtatagagct tgcatctgtg gcatgtccct cagttgaaaa aacagaaaat aataggctag cagggtgctg tcttccaacc gatgtctaaa gctcgggagc ggcagttatg aatgacagcc ttctttgtgg atacattctg aaggttacct cttctaatgc attaggcaac tatcataccc cagtttggta ttaaataaca attcttcctc tggctcagac tattatattt tttttgttga ggagagct tc ccaccagagg cttcaacaga gccatcttga ataagcatgc tctgtttact ttcaccaaca gagatccaag ttttcccctc tgttaggtat ttttcagtga tatcaatagt ttccatgctg tacatactct gtagtgaaga tggttcagtc gcatcattct aaaaagtcat agtttcagtt acaatgcttg gtacgaattc gagatataaa taagctttcc tggaagatga cagtttttct agatgtttgg acagcagtta ttcatcatgt atgtaaaatc tgaggacagc ctgtgcaagc ggaacccctt ttcctcattt 49740 49800 49860 49920 49980 50040 50100 50160 50220 50280 50340 50400 50460 50520 50580 50640 50700 50760 50820 50880 50940 51000 51060 51120 51180 51240 51300 51360 51420 51480 51540 51600 51660 51720 51780 51840 51900 51960 52020 52080 52140 52200 WO 00/08209 catctggctt tttctcttct aacttgctta attgctgata tttgtcgggg ataagaatca aactgcgaat aaaaacaaga aaataaagac atttaagcaa taatcccagc cagcctggcc gtgtggttgt gaacccagga aacacagtga cttattccac caggaaccag actgggacca aactggagca aatggccccg gcataaagac cttgagttcc ttgaaattga tgtgtatgag attttttact tacctcactc ctaattttca accgtcctcc tgctcttgtt tcctgggttc c aca cc cggc tgttggccag aagttctggg gaagatgtga tgttgacatg agattaaaat aacatttttc ttctaaaatc ttccattctg cattgttaac caaatcagac taataaatat PCT/1B99/0 1444 cttggcagtg tagctctgta atataatgtt aaaaaataac tggattgggg gccagagtgt ttaagcaaaa gt taagt tcc ccaagacact gattaataca act ttgggag aatgtggtga gcacacctgt gacgaaggtt gactccatct ttcagggtct ccctggaccg tgtagacata ct tggagaaa ggctaagaat attatttgag ttttttttct tcaggggttc gaagttctta attctggaag cattgtaggc ttggtgttag tctgcaaggc gcttaggcta aagcgatt ct taatttttat aatggtctcg attacaggca gcagcctaat ttattaccag gatgtgataa ttagttaaat tctgcaagtg agctttcaag agagcagaat ctgaacgtta tttagagaac ttcagttttt aagttccaca ggaagtatta tagcaataac cagggcagaa acagtaagta tgatgtataa tatggcatat tctaatatta aatatgataa gctgagacag aaccctgtct aatcccagct gcagtgagcc taaaaaaaaa cagggggcca aatgccattc ctgattaacc acccacacag ccattttttt gacctgctgt ccc ttct tga aagctgactt tggttaagcc catcgcattc acttggaagc aacaaaaagc cgttttcccc cagtacagtg cctgtcttag tattagtagt aactcctgac tgagccactg gtaagatcac ttgagctaat cattaaattt aatacatgat tgggggtcat agatggtggc tggggatgga tcacaaagtc ttggttgcaa gtggtctttt tgtgtttatc atccattgta agcctgattg Cttttacatt ttcacttaat tgaaacaaat ttctgggcac aatattgatg ttattggctt ggagatcacc ctactataat acttggaaga aagatggtgc aaaaaaaaga gaacctatcc cat c ttgggg taatgtgcac acatgaagag cttgtcaaca actatgtact aggaaggtta gcatactctt tgtttcctga tggaaagaac tgaagttgtg gctgcctctc cccctttttt gcacaatctc cctccagagt agtagtagta ctcaggtaat tgtccagcca aacatgtgat ccatgtaact tgaattacag ggtttaaaaa ttaattgctg agctggcaag gcagccatag caagttggct atttacattt atttttactg tttgtggtga ttagtgtgta cttaaaaata aaatatagat gttgccaata tttactaagg aaaaacatca taaacgtgag ggcacagtgg tgaggtcagg tacaaaaaaa gtgaggcagg cactgcactc aagaagtaat ctacagcttg tgactcacac atctttgaga aacacaaact ttataagaaa tagagagata aattgcatct tgggaaagaa cttgaataga catactatgt atttctccaa tttgaagaca ttttttttga ggctcactgc agattacagg gtagagatgg cctcccacct atttttctgt tcaatacagc cagcatttta ttgatgtttt tcaaatattc agcctcccag gcagttttgt cccacccacc cagacatttg gatctcagtc tttgacttca aaacacaata caggacctgg tttagtaagt gcaagatttg ggttcatgga gtttattgat ccaaacttct atatgcaaac ctcactcctg ggttcgagac aattagccag agaat cgct t cagcctgggc tatttttcca ggatgcaagg acacactcag tgtgggagga ccacacagat gcgacattga ggcattctat gagatggctc tttagaagga tgaatcaaat catctcagtc aattagatag ccagtcctcc gacagagttt aacctccgcc cacccaccac ggtttcacca tggcctccca atttttaaat cgtggcttgg tgctttacta ttatttaaaa agtgcaattc cctattagct ctgggaaagc agagtaggca tgttaaatca agt cc tctt c 52260 52320 52380 52440 52500 52560 52620 52680 52740 52800 52860 52920 52980 53040 53100 53160 53220 53280 53340 53400 53460 53520 53580 53640 53700 53760 53820 53880 53940 54000 54060 54120 54180 54240 54300 54360 54420 54480 54540 54600 54660 54720 WO 00/08209 ccctatctct acagctggtt caccatctgc ttaggatacg cagttctttc ctgggctctg gttggcatcc tgatttgctg aatcacagcc aaaaacattt ttctaaaatc aaaaaatcta aagatggtgg ttggcttccc gtttctgtgg catttccctt gcagctttag tgtcatttga ggtagaagct tcctagttta attaacaaca agaattgggc aaggggtaag gggtagaggc cgtctcagcg cctttactaa gcgcctgaag ggaaaagatg ctcggctgtt ggtgattggg ccttcttttc accacctttc cagaaaaacc aattaaatta acatagtgtg cacttacttt caagtgtaag tatactgtaa cccaaactgt atgtttccag tctgtttttt atcagcctgt PCTIIB99/01 444 acaagcttac cctgtatctg cttggttgat gaaacagcag tggctgtgtg ttagggtggc agagtccagt caaggattag aaaaaaaaaa ttcttggtta tctgcaagtg gggttgacat at gga aa acc tctagctagt ccatgaaata tattgatatt gttgagtaag aaggggtaga accaaatgga gatgaaaaaa cgtagctcct agatgtggat agaaatcaca atggacagtg ggataacaag tgcgctctgg ctcgattatg cttagcactc gggcaaggta gagaagtgtg atcacatctt atgattcctc caagggtttt taaactagaa gggagcacag gcagtgtgat atgtgggtgc ggccttagag tactggtcca caatgtgaca tttaaatttt agtagattgg aaaccgcatg aagttcttgc aaggcatttt gatgtttgtg aagttgtgtc cagacgtgct gctgccgcca ctcaactcta aaaaaaaaat aataatacat tgggggtcat ttttaaaaat ataaaagctg aaataaccag agaggt tggg actcccccct tttggtggga tttattcagt ggtttggctc attgcatcaa gtcctgggga tatgtcttag tacctcctag acttatattt taggcagcca aatggtctat aagaaattac caggaagatc agcttcattg gaattaaaaa aatctatttc ttaattaaaa gttgcctata actaattgta acttcagaca cttagacaaa tgacagtgga agcgggcagg agaagagata cagtgatcct gtttttctac aaaattttta ggtgtgtggg cctggagcct ggaggccact gattgagcct gactacagag tgtagcagcc gttgcagtgc gtgacattta ctagggttga gatggtttaa ttaattgctg gtattcaaca agaggaaggc caccaaatca tccaggaatc gggcttgata aacagtagct ttaactccaa taaataagaa gttgtaacca ggctcatagt cagtagagcc gtttttagca agacgcgtta tgttgtgtaa tccagcctct tccctgtctt aaaaattaag ggaagcatct aaagtcaagt catatacctt ttcccagaag aactagataa tggcttgtct ggtggtct tg tgcctcaccc tgttgtgagg attcactgtt agtacagaac aattaaaaat taatttattt aaaagaaaaa ggtcttattt gggtgtttgt gattttaggc tttcagctga c agga tgct c ttactgccag agcaaggcta ttgtgttttc catttttaaa aaatcaaata agcctcccag gagtacgagg agcactgggc cctgatcctc aatgtaaatt atttagttat ctcttcatat gaagcagaaa aacgatcttt tgctagtcat ttgatagggg aacagagtat ttttccaaaa agccagttgt tggaaattcc gaaaatgatt aaagaagtaa tttgacatgg agtcaacctc ctaattttag acttaataga gccgggaaat tgatttgatg ctgggtactc ggcttaaatc tctctgagct agcacacagc ttttcagtga ttgaaactaa gtggacttat ttactgtatc aaattgatgc aatattgcga tgtagttctg ag cagtgttg atcttctggc atgttgcctg aggaacgtac gccccaaacc tcatagccca aattttttta ttcattgcag cc tc ttagc t gaaaagatta ttagagtcac ctgaacttca gtcaatttaa aattcttcat atttgagaga tgggacccat ggagtgcctc tgggaatttt taagatggaa gttgggggtg tgaggaaaat aactgcttga atagctgtag tgctgaacaa ctacagtgtg aaaaaatgca acccctcatt tggccatctc catgagtttc aggaagaaga atatactttg tagggagaca tcaggcctgc ccagtttcta atgtgtctgc gatctgccag gc ttt tggaa attttgtcca gtataaaaat ttcacagata 54780 54840 54900 54960 55020 55080 55140 55200 55260 55320 55380 55440 55500 55560 55620 55680 55740 55800 55860 55920 55980 56040 56100 56160 56220 56280 56340 56400 56460 56520 56580 56640 56700 56760 56820 56880 56940 57000 57060 57120 57180 57240 WO 00/08209 gtttgagaac tgatttcttt tgattctatg gggtttcaag cgttgtatat gtgcaatggc tgcctcagcc ttgtgggttt cagtggcgtg ctcagcctcc ttgtattttt cctcatgatc gcccggccaa tcttgaactc agatgtgagc aacaaagcag catcaatttt aaggaaactt ccaccgtact atgcagtggt taccctcttt ggatcttgga tacctttaca atgagtctgt tctgcgaaag aagcctggca actctaggaa tactggagtg cactttttac acattaagat aatgtgaatc gccaactcac tcatagtgac gttaaaatta tatgacatta gatagaacct ggctgctgac tataaaaact gtgacattgt tattccttga ttgttaaatg acatggaaaa PCT/1B99/01444 cgctattttg aaaaatatat ctactgagtt tatgtataga gcactttttc gcgatctcag tcctgactag tttttttttt atctcagctc tgagtagctg agtagagatg cacccacctt ttttttgtgt tggacctcag cactgcaccc tgttttttac agtaatcatg gggcttagag gcactgcctc gaagaccgaa ctgggctctt gctgtcttgg ccaaatgacc tcttccccca gcaaattgca cctggatatt tctggaccct agctaaggct tgcagaagct ttaacaactc aaaaaatttg ctcagagacc ataaaagacc gatgaggggc tagataattc tataagctgg agtaggggcc tactctagaa ttaatagtca aaatgacacc aaaagagtac acacccacga aagcttacct gttaatgtct ctggttgagc atgggttttt ttttttgaga ctcactgcaa ctgggattac ttttaagaca actgcaagct ggactacagg gggtttcact ggcctcccaa ttttactaga gtgatctgcc ggcctgcata tatagttttt ggaagttatt cagttgaata ctgttgaaca gttctggatg gcccccttac agttctaata ccaaattgct ttcacagttg tgggtctgtg tgtttttcac gggtagtgaa gacctggaat ctaaccataa caaacaaatg agcctaaatt ttgtaagaga ctgaagtgat caggattagg agtttagtac aatgtctttt agatgcaaac actaatcagc aaacaaaaca ataagtagat agcatggtct tgcagatgtc 42 tcagtcatta tctggcaaga tcatcatgaa cctgagttta tgtagtttca cccctgcctc aggcgcccgg gagtcttgct ctgcctcccg tgcccgccac atgttagcca agtgctggga gacgggtttt tzgcctcggcc tgcatttttc taggcatttt gtttgttacg gtggcttagg ggatctccag gacaccagct tgatagaagg ttccttgcac gttttgaaaa cctagatgat acagctattc tgggcatatt agttgggcac ttccttatgt agggggcttt agggcggtct tgaatcatat ggacagttgt ggaaataaag gtatcaattt acatcaaaat ttaggaatgg tctgcttgcc caaaaatgta aaaaacccaa ctgtatttac cctgtactgt caggttatag ttagtgttct gtaaaagcct taaccaggtg tcagttgtgc ctcttgttgc ccaggtttaa caccatgcct ctgtcgccca ggttcacacc catgcccggg ggatggtctc ttacaggtgt cactgtgttg tcccaaagtg atctctagga taaccttttc cattttccct gccacagagc gtgcttatct ttcagtgtga agagacttgc ctgtactttt gggagaaagc caccttcagg caaatatttg ttgtgggggc agatgattga gttgcctgac gtcagtcagg attttgtggt ctttgacctt gtggattaag gaatttataa aggagaagat gatttctcta gat tgcagag tttgacccgg ctgcagtaag catgtgacta tgacgtaaaa tgatatttcc acaggatgac agtcaaacaa gagtctaatc ttctgaataa agtgggaaaa ccaggctgga gctattctcc ggctaatttt ggctggaatg attctcctgc taaatttttt gatctcctga gagccaccgt gcaaggctgg ctgggattac gcataaatgg tgaattttga ttctatggat tgggttcaca cagaacacgt ctttagcagg actgagtaga tcttgaggtt agagaaaaga tgtctttgct agcttcttag taatagaaat gcattctgta tttgccacat tggttttaac tcagaataaa tgaagtagag aggc cc t tcc aattttccca aacataatcc aagatatcta gggctgcctg caatgccatt aatgcttatt tcccatgtca gatgtctaag atgtgcgtat catagggccc 57300 57360 57420 57480 57540 57600 57660 57720 57780 57840 57900 57960 58020 58080 58140 58200 58260 58320 58380 58440 58500 58560 58620 58680 58740 58800 58860 58920 58980 59040 59100 59160 59220 59280 59340 59400 59460 59520 59580 59640 59700 59760 WO 00/08209 aacctggcat ctcttcactg gtggtttttc taaacagtga ggaatggcct gtttctcttg cttagcaaaa cttactcctg ttcaaaaaat ttcataactt tataaatcac aagtatttgt gaatagcaga gcttcctgtt ttctctcagt gcctcttcca t ctgc ctc ct gaatgcttgt aagtccttac agtgaacaca atgctgagat ttcttctaca tacacagtaa attactattt caaaatattt gagga ca cc a cctagggcaa ttcccttaga atggtgcata cttttcccaa tgagtggcag tgcctaactt cagtcttctt aaagaatttt tgactttaag aatcacatgc ggatgctcac cagcttatct actccacctc tcttgccgtc tccacatgca cttttctctt PCTIIB99/01 444 agccctggtt tcaaaagctt actttcttct gtgtatatta cctgccctcc ggttaggagt acattgggtc aataccacaa aaaggtctga gggaaaat tt ttatcatgct ggggagggtg gggagacaag gctaccataa gttctggagg gcttctggtg tggtcacatt cactggatat ttaattatat tctcttttga aaaattacag ggccaactta gtatttccaa ggt tcaaat c tttctctagt agagaaaatg gggacctgtt gggttatgag gtggcctcgc aggtggccag ggcagggtac ggaggcttcc gctctccccc ggttcaaaac agaattttct aaagcattag atggcaatgt tgcagaaaga agctcagaaa aattttgcct cccttgtgcc CCttttccct tatgactcct cctgctttgg ttatacctct ttttttaagt cagaccctgc gaaggcagca caacctgcat tatctggtac taagacaatg cccagtttga gactgttttc tgcgtgtgtg aaatagaata caagttacta ccatagctcc tctgtcagct gattcccctt agggcccatc ctgcaaagaa ggggacacca aaggtatagg aaaaaaatga gaggtggccc cacagatggt ccccatttga gcagtcagca ctcactgcct tatggtatgg aatcagtgct agacagaaac aaagcggcca tttctcccaa tttccacctt tgtcaccttt taagttttag ttagaggctc ctgtgcagtc agcctcatca catgcatgat ctcacaataa tgaggc aaa c catccttaag gtcctggcaa atggtaaata tgcatttcag gagtaagaag ctgcaagccg ccatggtggg gatcctgtgc tgtccagtga ggaatgattt gaaaaagtat tagcccacat tgtatgtacc atagtagaaa caaaaatagt aaaaccatgc tcctagactt ctcatctcct tggataatcc ggtaacattc ttcaactcac atgtggtcat tacgtgaaag agtgagactt tat t ttat ca gtagcagcct tccctgggct ctcttttctt gccaatacac atctgctgct c agaga aa cc ctttttttcc aagacaagaa tttggccttg taaaattagg acatttttaa ttggacattt agttcctacc gaattccaga tccctggagc cccttccagc tgaatcactc gctcggctca aattattaat tatgcttatt aggttttttt ctatttacat taggtgggct gaagggcatt tttaaatcac cacagccaat agtaatagga tttgtgaaaa ttacttctca caggatatat gcagagatca gactaaagca cgttagcttg gtggtcacgg cctctgtatg aggatgctct acaagctcca tctacagggt ggtttacagg ggaaagaaga ttgaatctgt ttaattgcaa tgtttgacat gtcactcacc taccatgagg ataacgtgca gctacctgcc atccttctgg cggatggaaa agacttggca taatagctga tttgccctaa tcactgtgag tctgttttta Cagcctctgg at ct cagc ta taccaaacgt cttcttgcca tcggttccct aatgaaggat agccttcctc ctagttatgg gcaccacttt gggggatgga ccactgccag ccaggccatt agaatctaag attcttttct aat tggacat aaagccccac tcagcatttg ctatgagctg gggtatattt acagaaattc tctgtacttg cactccagcc tctattataa cctcctccca gggattagga cattatatta ggccctgtat aagtacttac taataaaatg gataggaaca ttctgacatg ggcctaatga a taatc atgt tggaatggcg agagcagaaa acaggctgtc gaaagatcaa tcttattctt gtaatgagct ataacatcct tattcaaatt gagctttgtt gctcttcttg tgattagctt ggggcaggt t gctgctctct ctctcttgta tctgtggaac 59820 59880 59940 60000 60060 60120 60180 60240 60300 60360 60420 60480 60540 60600 60660 60720 60780 60840 60900 60960 61020 61080 61140 61200 61260 61320 61380 61440 61500 61560 61620 61680 61740 61800 61860 61920 61980 62040 62100 62160 62220 62280 WO 00/08209 cttgattgct gttctcactc caaacactgg cctaatgtta gtaacaaacc ataaaaagaa gctttctgag gttgttaatt cagggtcaga agtgagggga gcctcccttc tcatccctgt tttgagacaa agccaggcat tcattggagc tcgggtgaca aatggaacat tccatgtctt tttctagctg gtgccataca cttggtaagt ctgaagagac tactgagact aaaaatctct ttcattaagg tttttacagt ggtttaatta aattgtagaa accttgtctc catatctatg gaaaaatcaa gaatttacat atgtggccgg tggatcatga ctaaaaatac ggaggccgag catgccacta aaagtgtatg ccttgagcac gccaagggat ggggtgtgat aggtggcaat PCT/IB99/01444 cagttagaaa ataggtggga ggcctatcgt aatgacgagt tgcacgttgt atgagccagc atcacatctt ggcaccaggt acatgataac gaataggggg tgttcttatg aatcccagga gcctggacag ggtggtgggc ccacaaggtt aagccagact gaaacacaat ctctcccttt agcaattcca aagaactctt ctgtgccatc tgtccaagtt tttaatttaa gtattttcac ctgcagtcat caaccgaagg caggagccac agcagtttgt ttgaatagtt ggttctgaat tgaaaactaa tgtattaggt gcacagtggc agtcaggaga aaaaaaatta gcaggagaat cgctccagcc ggaggatgtg cgtggatttt gactgtattg tatgatgcac gcactcccgt tgagcaaact attgaacaat 9999t9999g taatgggtgc gcacatgtac ttctctttca acatgacaat tggagctcag acatttgaga tggaagtagg ctgttaatta ctttggaggc catagtgaga acctataatt gaggctgcag ctgtctcaaa tcatttttat taaaggtgtg ccttaaacac aaagcagctg gattggagat atgtattgac aaagccctag tggggagctt aattgtggtt aacatcctgg tccctcgttt gtgtgtgcct tttaaaataa ttggggactc acaataatat gttataggta tcacgcctgt tcgagaccat gccaggcatg ggcgtgaacc tgggtgacag tgtaggttat ggtgntctgt gatagatttg agagggccct ttgcctgccg gtcgcaagga gagaacactt gagcggggag agc ac acc aa cctaaaattt tctgagctct ttttcatact gtcgtatact agtgagaaga ggaagaagca ataatggaac tgaggcagga Ccctgtctct tcagctactt tgagatgtga aaaaaaaaaa tattaagttg ccacgtcatc cagtttccca acttcccagc gacaatggaa ctgcctttag ggtaatcaca gttaactttg cagtccagta aaaacgtata ttactgctca tgaatgattt gaagatggga aaccaacctc agattttaaa atctagagat aatctcagca cctggctgac gtggtgggcg caggaggcgg aatgagactc gtgcaaacat ggggactcct cagttgccac cctgacttgt cctatcaccc cagaaaacca ggacacagga ggatagcatt catggcacac aaagtataat acttcctttt tggctttatt ttattccttg agggaggaag aatagggcaa cagtggccag gtatcgcttg acaaaaataa gggaggccga ttgtgcctct aaggaacaag tattctgtgc accgaggtga gcaaacagca agcatgcgat gtttcactca gtttagcaat aatgtcatct cttggcatgg actcaaatat gatgtt caga caaacagaat tattttggaa acaatataca agatggaaag atatagtaac gatttaaggt ctttgggagg acggtgaaac cctatagtcc agcttgcagt tgtctcaaaa agcaccatct gcaacctatc tgtgaaggac cagtggccat aagctgctgt aacatcgcat aggggaacat aggagatata ttatacatat aataataata gattctctct tccctagaat cagagtCtga gggccaggga ggttttagtt gcatgatggc agcccaggag aaaaaaaatt ggtgggagga gcactgcagc aatttggata ataaattatt aatctggaaa gccaaaggat tcttattgac catgaaaaat caaaatttac tcaagcatat agggagggtg tgataggagg accgaggctt tcatcagaaa actgggtggc gtcagccctc tatttgggaa tatctatgta gtgtgggagg ccaaggctgg cctgtctcta cagctactca gagccaagat aaaaaaaaaa tatagaaggg ccccgaggat ttgttgaact gcacagggcc ctctactggt 62340 62400 62460 62520 62580 62640 62700 62760 62820 62880 62940 63000 63060 63120 63180 63240 63300 63360 63420 63480 63540 63600 63660 63720 63780 63840 63900 63960 64020 64080 64140 64200 64260 64320 64380 64440 64500 64560 64620 64680 64740 64800 WO 00/08209PCIB9044 PCT/IB99/01444 ggtgagctgS cttccagaat tgctttgggs gcagtgtgag taaaggttaa ggtgtctcgc tccgtctcct gtgcccgcca gttagccagg tgctgggatt gaacatcatc tgaggtgtct ttctcatcat gccaaatgtt agatgttgtt gctgaaagtg tttcaacttt tgtgatgctg aataggtagt gtgtctgttc gaacatgtga tgcatccatg atggtgtcca ttccatgtct tggtaggatg ggtaattcag tatacagcag ggctggggct atgctcaaag atgggccctc tgaggtcaca CCactgtcac agtggcaggc gaaagcagtt gccatcatgt agggaggaat ggaaggtgtc cctcattctt tctgcagagg cgaaggcact tccttgagct agtcctctcc I ctcgatgtgg acctgttctg ccctgcggtg ccgcatcagg tttcataacc tctgtcgccc gggttcacag ccatgcctag atggtctcaa acaggcgtga tctgagaaac tgctaagccc caccgttatc gtgcaaaggg ctgtttctct acacagccga tattttagat aggattggag ttttcaaccc tcatctttat tgcttggttt ttgctgcaaa tatatcacat ttgcaattgt atttattttg ctcttagcag tggctcttta gtttggccat aatatcttta acaacagccc tgttgaccat tctgtgtttc caaggcaatg ttgtgaattg tcctatcatg gcagacagcc tcccctcctg ccacggacac gaaaagaaga gtgtgtgcct ggcatgaggt atccctgcct taggagatyg gttgcagctg gcccggttct cagtatgaag cttaaggtta aggctagagt cattctcctg ctaatttttt tctcctgacc gccaccacac tttccctgac cctgtagcac tgcttattat acttaaactc ttagagttga tttgccgcta tccaggattg tgtgattgaa ttgccttcct gtccatgtgt ctgtttctga ggacatgatt tttctttatc gaatagtgct ttttgagtat aacctgtatt ttgccttttt tataccctcg agattcagag tgggaaggtg ggtcacaaag cctgggaccc ccttggtgga gcagtagctg tgaaacaaag ccgctcccca gattcagtca gatcagcccc gaccagatca gagccccacc ctgtgggagc ctccccaccc gccctgctgc ctgctgctga ctgattgttc tccttt~ctc tttttttttt gcagtggtgt cc tcagt ct c gtatttttag tcgtgatccg ccggccccac tgtggtctcc cccacactct catcactgct ctttctttaa gaaaatagaa atcagtgtga cgtatgcagg cttgtcaccc ccctccctct acccagtgtt attagtttac ttgtcccttt cggttcactg gtgatgaacg atactcagta tcttactcca CCcttatagg gcttctagga tgtgagacac gcctgcaccc tcacctgagg tctgagcgca gctggggccc catttgctga tgagaaatag cacttgctcc CCttCttctc tgcttcttgt gaacaagggc acggcctccc ccggtccact tctccctctg ttttagagca aggctccaca ctgcagccac tcaagccacg ttaatttttt gatctcagct ccaagtagct tagagacggg cccgccttgg ttaaggttat tctcccacct ccccatggtg gctgcctaac tccttacaac acagacaggt cttcggaagc tttcttacaa aggaaccaag ccacccccca cagctctcat ttagggtaat ctatggctgc ttactgggca tgtgagtaca atgggattgc cctcccccgc atacagccct cagtggctgt tgcactagca tctctaagaa ggaggtgaca ggaggcccgt atttggccca catggtgagt gttcagggtg aaggctgggn ttcattcccc tgctcagatg Ctcggcgtgg ctgcagggct ggcagggctg nngccccctt tgtggccctg gaacacacag gacagaggat tagctagcct tttttgagac cactgcaagc gggac tacag gtttcaccgt CCtctcaaag tctttagctt caagactgga cgtatcacat ttcaccttgg atgatcaggt tacgtaactt tgcacttttt aggtgtattg cat ggt acc c ggagtccctg ttctaagtga gacctgcagc agagtattgc cctgggttgg tgtgtctttt agggtcgaat ctgtccttag ctgcggactg gacacagcag ccgccatctc atgaagaaac ggaactgaac gttgctgtgc ctgacctgag tacaggaaat ggaggctgaa aggaggaacg tgcagtatcc tcatcacttt ctgtgcactc caggcagcct gctgcattca ctctgacagt 64860 64920 64980 65040 65100 65160 65220 65280 65340 65400 65460 65520 65580 65640 65700 65760 65820 65880 65940 66000 66060 66120 66180 66240 66300 66360 66420 66480 66540 66600 66660 66720 66780 66840 66900 66960 67020 67080 67140 67200 67260 67320 WO 00/08209 PCTJIB99/01 444 46 gctgaccccc ctgtccacac gtcacctctc cccactggaa gcctgcccgs gttacctccc acactttctg2 aagcaaaatc aagaacagaa cttgttatac t t tttnggca gttctacagt gtgaaattgg ttgagttctc ctcctcagtc tctgcccact ttggggtaga cattattctt ccctcagaca acatggactg tgggggctgc tgctccctcc gaggcgtgca cagaaaccca cacatcaatt gtttataatc gggcttctag gagacgctaa catattgatt aactataatc cacaaaagat waggagaacc aggcagaggg ggacacacag gggtgctggc atgagaaacc gttatctcac atcacagttg gtgagccatt tctgaatata ataaagtggg agtcggtggt ctctctcttc acttcccgac Itgtctctctc Lcggggaaact ctgactccgc ccactcctct Itttagtctac aaatccatat tggttggttg tttctgaaat atgtgttttc ttgtgaatat gcacgtgccg tgcccacatt tccataccac ccatgccagc tcacaccagc gaagccttgg attgccactg cccagcatcc cagtgacact ctagcatccc gcctctccca ggagcaccat gctatagttt cctgcaagag agatcctcca gcgtgtgtcc gtttgcttct caatttttta gagaagtagg tccaaaagga gagcgcttcc agcagcatcg ggggaccaca ttcatggcaa gcacctgtct aggagcctaa gtctctgatc CCctttaagt gtatgatttc gctggcaaat :cccactcttt i ggcctgagag caggcttttt gggattggcc ccagggaggc cacccaaggc accatggcta ttcagttttc gatatatttt tggcttttag ctccaaagag tttctcaacc ttatttccaa gcgagttact atcccttcta acaactctcc cagaggcaac gtcaggagcc ttttcatgtc tgtcttctgt gggaaactcc agcctagaac tgataggagg tagatcagaa tattaatctg tctgatgatc tcagggatac agactacacg actaaatgta gaatccataa agaacrgttg aggaatcagc ccttctcctc tgcrctttga gtgttctctt aagacagaaa gtccagttgg tgaattcttg ttatcaggat ccaaagtgaa ctttcacaga gtttaataac cccatcctcg cctggcctcc ggtttggatg gacctccgtg ccagggactc tagacccggc ctcccacaga tgtgctgtgg cct ca aggc c cttaaaaagt gagttttcat t ctaaac agg taagaataat tttgtcaaat cttagagggg cagtgccaga gggcaccatg CCacccctgc tgctctcagc tgccccaccc tattctttga ctggggctcc cgagggaccc actttccagc gcttcagccg agcagaagca catattatag ttttggtgac cagacatgtt tgtgggtcat taaagcctgc aaggtaagaa gattttttag tgagagtatt tcccaggcgg ggggcaggtg ccatctttga gggacctaga ggacgttgct caccaccagc cacatcgtgg ataactaaat ggtctggaat cagctcctct tgaggcaatg cccggggtca agtggagtcc agctcctcca ccaagtcagt cagtgaaggt gaccttcata gggtttttgt tttttttttt aggcctcatg ttcatcttta atgaatgagc aggagttctg ttgcttctgt tttggtggaa ttagcagatg atctgcatcc ctctctatcc cactgtcgtc ccttcaggct catcagctgc aaagaacact agaatgcatc gtcagtaagg cagaagtgcc tggcatgcct gggtccagca ctggtgtcca gtaggagaac aatccataaa gaagatgacc tgggctgcct gagctgctca gttgaagtcc atgtaacatt gtatggaggt cacacacatt gatcatattt gtcgttgata cttcctgcct cacccctcag catttcccag gtctcctctc caggtgccct gactccacct tgtttagtct gtgtagtata ttctggccag tttcctgcct tttttttttt gctgggtcgt cacatcctgt ccttaagagc gaacccaggt gtttcttggc tcatgttcct gtactcatca atttgtccag tgggtagaca ctgaccacgc tcacatcatc attccccagt tcaacaggcc taatctcccc ggatggcaca at ttt ttgat gtgctgccgc gcagagctgt gaagaaaaga atttagaatc agatgagaag aagtacaaac cgctcggcca tcactagcag tgtgtgagaa cagcagtctt cagttgaaca attctgaaga atttggtcat aaaggaaaga ttttcaagtc aggaagccct 67380 67440 67500 67560 67620 67680 67740 67800 67860 67920 67980 68040 68100 68160 68220 68280 68340 68400 68460 68520 68580 68640 68700 68760 68820 68880 68940 69000 69060 69120 69180 69240 69300 69360 69420 69480 69540 69600 69660 69720 69780 69840 WO 00/08209 PCTIIB99/0 1444 tggtgttcag tgtttgcaga tttccattgt ctacccactt gagcagccac ccttaatctc tagtgggaag aaagggtgtt tgtgagaaga ttgtagggtt gtaccactcc accgtgagac ct tgtagcag agt cagatgg acagcaccgt tgaggtcacc taacaagccc ttccccctcc gcaaaatttt aat tgtggat atcgaatgtc agtcccattc gcacttctct tgatagattt ttagctttcc Ccttctctcc aatgcaacaa ggggctgagt gcatcatatg gagctggagg cttttaatcc gtttactcag gatggaagga tggataattc atttgattcg aaaaatgact aaaatctctt tgttttgctt ggaaaagcag Ctcaggctga gtctccccca gttgtcctgc Cctggtagtg agtctcatag gatgtcactg tggactcatt ctaaccctgc actgatactg tgtatcctca tgttagtatc tctcttctgc ctggagaaaa tgaatagctt ctgccacatg gctctgaata acagggacca attcttgacc ccagatgatc aagtgataaa gtggagcact catcttgtag tttttataaa tcattttata gctttgtggt gtaggaagga ctgagctggt tgtctttttg a cagt aaa ca gtgttctcag ggtggtcttt gctacagtgc ccctcatcca gaatgtgggt tgccattcac ctatcaatat tatttatttt gttttccaaa taatgtctga gttatttgaa ggggagggcc ccaaacattg atagaatttc ttaaccgcta aacccagagg aacacactgg gtcatggagt caagtgtttc cattcaggat ggcaactgat gtgtctcatt ccagatcttc agaggagata gagcaaaaga tgtgggtggc gtaaacagag gaggaaggc a cctgcaagat tgctgcttat ctaataggat gaaggaaaag tcatgaagag gcctttctgg ggggtttttt tgtgaaggta aagttgtaaa ttgcacagtt gtcttcccga aggcagagct gtcttttgct gggtgctctg gtggaatgcc tcaacactgg ctctctaccc gagttgtcgt tttgaaaatt ttactatatt tacatagcta ataaaattag tttaatagaa gcatataaag tcatggaccc agaactattg gtgtaaaacg atcaggagct CCtgtcttgc aagatgtgtt gcaactagtc tgggctcaca ccaaaacaat gattccctgc aaccaagtga tcagattctg atccaggaag gagggtcggt aagaaacttg Cccgtgtttg ccgagatatc ggacactctt Ctcttgaggg tgagtttctg tCCtgcctga attgatcctg gaattactag catatttctc aatgcccatg atctgcttta atgcagttta ttacccagct gctagcatgc tcaatctaga catagttaag cccacggctc atcagcacta attatacgaa acgtgacctg gttagcctag atcgagtagc accaattaat taataaactc ttagaaatgt tCCgctgaat Caaaagctga ctaaaaggat acctggaaga tggactgtgt gaaggccaga tgtgcagtgg ttagcttggc ctcccacacc gagccagtgg c tgc tcct ag actctgggaa Cttaaacttc Ctcagagcta gaatcttaga gtcagattgc ttagtcaact ctgcaagaga aggagtctcg gagatgggat gcttgtgaaa atgcctggga aaattgctgg gaagaacatc gtcatttaga cacttagatc tttgaccctc cagaaaaatg aagaggcagg tCCCtcgagt ttctcctatg atctgttcac gtgcct tgag cggccaactg gcttagtacc tgtggattgt CCtctccctc aac ag CCat t ccaaagagct taaaactgag atacatataa gacattgttt tttatttgct ccttgcacag ctcagcgacc atgtaaaata tacaaagtca gactcacagc gac aggaagt caggttcatg Ccattttaaa gagtcaactg aaggactctc gcacacgttc tgagggttac aatgcaacaa gatcattagg tggtttgcca gtgcagagcc cctctggcct Ctgaaaatgc ggggtttctc aacacatccc tgtgcaggtt gtgaaggctc cgt tagatga aatatgtttg cacaagacac tccacttaac taaaggacct cctcatatcc ttggcagaaa gggtgtgtgg aaactgaaca ttgggtgtga ctgcaggtgc tcctaaatgg ccaggaaatg tttacttggt ccaaacttt tctgtttcca attagtattt aaaattatta ctacattgaa tcattcattc atctatagta tccaggggtc ggaaaacagt gatgggtgca tgttccccag tgcatttggg caggacagat 69900 69960 70020 70080 70140 70200 70260 70320 70380 70440 70500 70560 70620 70680 70740 70800 70860 70920 70980 71040 71100 71160 71220 71280 71340 71400 71460 71520 71580 71640 71700 71760 71820 71880 71940 72000 72060 72120 72180 72240 72300 72360 WO 00/08209 tttctgcata tttgggtaat tggtggtggt caaactttgt gaaagcgaga agttgttcct tgtgcccttt agattattgc gtactctcca cagagaagc t caaatgccat gtaacttttt cagaccctgg cccacactcc ggtgccctcc cctgtgagaa ttggcaaaat gaggaaggtg gtacataatt ttaggaattt cagagccagt gaccactctt taggaatgtt gggctctggt cgataacact catgactcct gggatgggac agctgagctg aaaccagtac tgtggctact ttggcatgaa tagtattttg ttttctaatg ttgcatgtgc cctagagggg tggaagataa agtggataat acttgtgcag tgaatacaga agttcagata tgaaaatatg cctagatcca PCTIIB99/OI 444 aagaaaatca aacaagagat gattattatt gaaaattagt tttcccagct ttactttaaa tgtcctcttg ccattattga ctctgcaaac aggtagtcca tgaaataaat attagtgaaa ctccagggac atctgtgtga tctggttgaa gctctatgtg ttggcccaag gctatagcta tgcagggcct caagacagag gtgaccgcac gagtggccaa gtgaggctta ttctgtgtga gacagattaa gcatattctg aggagcatca gcaagggtaa actggtgttg ccaaattgag aaaagaatgc tgtatgttgg tgggtgct ag tgc ta ccc at t cagaga ccc agctgggagg ttgttctttg ggtatatctg acgagaaatc acacatctgg ttcctccctg Ccttttgcct atgacagttt ttgaaagtgt atttggaatt tgcttggacg tatgtgcagt tgtaatttat tttggaaact acaccattca tatgtgttct ttctgagctt gcacatcaga tggatgatgt tatggagcag cctgactgtg cctttctcga gttcctattg gtttgccttc tttctggaag attgcaagat acaacagagc agaacacaca ttagcatagg aataagaaat ggtagtttgc attcagaata c ta cc agca c gcagaaatgc tggaatgaaa ccc agt acag gcgtgctgtc ctaatgtctc gtgtaacaaa aaactgttac tgttctacac caggagcccc gtgggtaagg agcactgggg ggggaggttt aggacagtgg actgatgaaa tgctgagacc tcgctgatcc 48 ctgaaactgc ggggtgtaca cagctttcag aaacttttct gttatagtta tttgcatttg gggtttttat tagcaaccat gtcccttttt cacgtgccag gaattgttct tcaagggctg aactcgaggc gatggcctgg gaagtgcttg cctgtcagct tcataacata ctgcttaggg gaaaatgcag atgaagcctt cccgtgaagc tcactcccca agccactcta taacatgaga actaaacctt ctgtttgata agaagtgggg gagattgtga aagccacatg agtataaaga agtattttta atatctaatt atcctgcatg acacacacac tgcttctggt aggtgagtgc agctatggat actgggggaa ttaggagacc ttctttttca gaataattgc aagcaggttc atcctggaag ggtgttttgc ttctacctgc ttgcctctgg atagagtaat tgctacagaa aatgtgtgtg ttgcattagg aaa-aagagga aggccatttt tagcataagg tggttgatta cagtgcctgt ctctgccgtt ttggaggctt tgctgataaa ccactcggta ggctgcctcc aaccctttct gtgcaaggtc cagctctgcc ccctgctagg caagcggtgt gggtatctga ccctgtgttc ccagacggag aagtgctcca atatttttga tggctgttaa acacactgga tattgattat aaaattaact ggggtcacat acacacacag gcccaggcta acggagctcc tgcttactag gagatagagg gtagctttcc ggaagctgag agtgaacaat ataattcttg ccttgaccag tgaatctagg ttgtgagttc aaggctgtca ggctctgcaa cggtcataag gtctatccga cattgtacgt agctaaggtt gtacttactt ggcgctacat gaaaggcgtc cgagcgggtc agattgccac gagtgcagag ggtcattggt gcaaggctgg ccctaaattg tgaaagat ta cttctaagca cccaccatct cccaccctct caattagcat ttagctaaaa ctttatgcca gggtccattt tcttcttgga gactatgagg gcacttgaga tttcaaagac gttgaagtga tcacctgtta tccagttcag ctgcacacaa agcgctggag aggctaacag cagcaaggtg aggcaagaag tcttgagtca gaagagccca taacgtgtgg cctgggccca 72420 72480 72540 72600 72660 72720 72780 72840 72900 72960 73020 73080 73140 73200 73260 73320 73380 73440 73500 73560 73620 73680 73740 73800 73860 73920 73980 74040 74100 74160 74220 74280 74340 74400 74460 74520 74580 74640 74700 74760 74820 74880 WO 00/08209 WO 0008209PCTIIB99/OI 444 49 agcttggccc ctctgctccc cttgacaagt ataacagttt tctttttgga aacactgcct taatatgtgt tagctgcatt gaactgtgtg tttcnttttt cagtggcatg ctcagcctcc tattcttagt caggcgatcc cc tggc caa c tctgattact tgatttttaa tggcaaagac cacattggat gtattctttt caggacagct gatattgcca aggcgtttaa cagacatgat aagaatgttt ttgtaatagc tctgcacttc aactgagtct tggactggaa cagatgatct cccttgtcat cctgatgttg ttcttttgca ctattttctg tggagccgag tttgttgaat cgtgctgctc aaatgttgca gctctagaga aaattatgtg atatcataga ctgtgaggtt tggctgccac tggaaacagS *atcagctaca *tttgtgaaaa aggtaacttt atagtttgaa agtctgggaa tagatgatgc tgttcctctt tttttttttt atctcagctc cgagtagctg agcgatgggg tcccgcctag agttttcttt ttcacctgta ttgaaagcat tgattgatat ttctggttca cacagggcga atcgctttac aggtctcagc aatgctcaag tattttacag cttatccctc tgtcactgct cctctaataa ccaagagatt a ccc aggc ct cagacctacc ctcaggttca tgttccccac caacacagcc atttttcata ttcctagttt gaacaattct ttgctgcctg gcacatgttg gtatcctggg attcaaaatc aggaactcat cttcctacat Ictgcctggct Icactcccatc Laaagcctcct taatcaggat tgtgaaaaga agtacggaga aagcagatcc tgggat tggg gagaataggg ttggagttgg actgccatct ggattacagg tttcgccatg gcctctgaaa tttcgattga cttccaccaa tattccagga tatgatccag ataggctttc acctttccta aacattttga tttgtagcag tttctgatgt gtatagagtg tccagatgtg gataaaggac caggaagaca ataaattggg ctctgactct tcccagcctc gctcaaatat ttccaccttg actgccagaa gcctgtgaac ctgaaacagt gtcagagaaa tcagagggaa atatgttcat agggaataca ttttcacctt tcactgaaag tcCctttatt ccagatgttt agtcacattc gaacaaaaga gttgagagct aaacacctgc tatgcatgtg agagagtgct gtgggtgcag gttatgtcta agtttttctc ctgcctccca cacctgccac t tgggcaggc gtgctgggat agttcagcta aaaataaata ataactggtg cttctaaaga tttttttgtt cacacccata aggc ctact c gcattttgct ttgacatggg ttccttatgt cctcaggagc tctgtgctag ctgttcatcg cccagtcaca agggccttcc tgcatctgct cacatcctgg ggcacactgc atgatcttgt tttaggagag gcgtgggttg accaaacaca aactctggat ttactaccag ttagagccaa gacctgtgaa attttgacca aactttaaag cttaatcgtt cagaggagga aatcctttaa tttttttttt tgctcctcag gtatgaagca tgtagtaagg ggtgcagcag gaggat taac ttgtctccca ggttcaagca cacgcctgac tggtctcgaa tacaggcatg tttgcaggac aaacaaccat gacttcgttt ttttgctgct tttattatta cttctctgcc acttctagac tcttcatatg gctgcggaaa ctttaataca tttttcaccg gcattattcc tctcaatttg cagctagcaa ctcttgcccc cttcctctgc gagaagctca gtcacgttat ttccataatc ggaagggatc aagtaggcac gtagcgtatt cctgcttcag taagatacta ggacttgctt catggaccac cataacactt acagtggtca tcaagtactt ggaagaggaa gcctatttga tcttttaaac gctgtttcaa tttgcaggca cgaggcc tt t tggggaggaa agtt t tc tt t ggctggagtg attctcctgc taattttttc ctcctgacct agccaccaca cgaaggtagt gagtaattgc gcagaggaag taatctgaag caactaatat cagcttggag caggaagtgg agtgaggaag cagtatcggc acaaaatgct tcaggtaaca aagcgcttca tagattggaa gtgtcagagc catcagccat ctcaccccca ttctgactac cctggctgat atctccctgt ttacgggtct ccc at aagt a gcaaatacca gaatattcct tgccttcaga tgagagcacc gtgaatgcaa tccacatgta ccaggcagtg 74940 75000 75060 75120 75180 75240 75300 75360 75420 75480 75540 75600 75660 75720 75780 75840 75900 75960 76020 76080 76140 76200 76260 76320 76380 76440 76500 76560 76620 76680 76740 76800 76860 76920 76980 77040 77100 77160 77220 77280 77340 77400 WO 00/08209 gaatttttga ttgattagac tgagcctctg ggtggatctt gcactgctca tactaggaac actttagttg cttgccaggt ctattgagta ctatggaatc aattcctcca ttgcacaaga aagtggcagt ccagcaggca agccaaatcg aggcataact cagatccaga cacctggagg tttgcctcac tgtttgccag attcttattt tagtcagggg tcccttagat ccatgagggt gtttctcatg gtgggcctct Ccctgccctt ttgagtaaca PCTIIB99/0I 444 gttttctata agtcttcgat aaaacggttt agcggattct gatctgggct tggcatgaga gaaaaagttt atctggatat atttcagggg ataagatgcc ggcttgagtt cagattttta agaagaccca cttgggaagc aatgtttaaa ttattgctgt tgtaccagct agcacgagat agttcccgct aaccagcctt ttatacctgt tgggttttgt gccttcatct ctccctagag aattgccttc gggccctttc ggctctacac cctccgtgtg atttatgtaa atgggagagc tgttttcttt gaccctctgt ggctctcggg tttctgccag ttatttcatc tccatatatt tccagaggga cctaattcag gtgcaaagag gatct t ct ta gcgttagcgt acttgttgga gttctagtaa ggcagatccc ctcgaggttg cggccccagc gggattcgta Ctcttattag agctcttacc gactttggag tgtgggatgt cagacatttg ctcagccact tagattccct tttctcccaa gatggctcgc acatgtcaca aaaacctgcc ttcatacaca tcaggaaccg tgcagaataa gctactaaaa aaaaacatgt ttcaaagaca ctagaaatcc cctttcceccc gtttcccatt aaaatcacta agagttcatg aattgtgtgc cacacaactc cacagctggt ctctcagttt aggttgttgt cctctgtgag aatcatgtca ttaagatgca atacaccaaa acttgctgag tcttagtaaa tatgtatttg gtggaggaat cctgggccta tgaattagta gtcttctctt taattctcag cttcatgatt Ctctacgctg gccagagtct aggggaaaca agaacagggt tcctccttaa ggatccgatc gaggacttgg ctgggttgtg tggcatctct gtcactgtct tctccccgct gccactgcca tttatattct aagc ct tcaa agtggaatgt aagtgatatt aaatgatagt gatctttaaa ttgatactac taaaatttgc caaatactag ttgctttact aattaaaata aaggtttata ttagtttatt ttttggggtg gcttatggga agataggaca ttctttaggc cctgtaactc ttctgtgaag cacttgtctt agaaattatg tgaacatgta gagactggct taagagaatt acaagggaac cacccagcca gctgagctca acacccaccc cccttgtggc accacagaga CCccctggtt ttggtgagca t tt Cctgt Ct attgtttgat cttctgataa cgtgtagatc ctgaggagcc agtattgact tcctcccctt tggagaccag cagccttgac Ctcccgtgcc tgatttactt CCtgccttca tgtgagcatg tttgacaatt tattatatat aaatttgttt ttcagacatg tttttaatat atttttatta ataagaattc gatacaaaaa C catggtt aa aacaaaagag gtgcctttgg ttatattatc tatccaactt tcaggccgtg ttggtggcac t tggagt tcc cttgttttaa cttctcctgc caatggattc tcttatttct tatgaaatgt aataaaaagg gtgcctggca gtggatcgca tgtgagcagt tgtcttcctg cctctacaat cctcaccatg ttagtaaatc ctccrtggtg agtctaagat tcacggggct cgatcgctca acaggtgtat gatgctgact tctcttcttg tgtcaggacc acttcatgaa cccgctgtta tttgagaaca tatgcaggga gtcagttcgg caggttcttt caaatgtttt ttatgattag ctttgggtaa aagtgagcat ttcactttta atattccttc gctatctcct aaaaaaaaaa ttacagaaac 77460 77520 77580 77640 77700 77760 77820 77880 77940 78000 78060 78120 78180 78240 78300 78360 78420 78480 78540 78600 78660 78720 78780 78840 78900 78960 79020 79080 79140 79200 79260 79320 79380 79440 79500 79560 79620 79680 79740 79800 79860 79920 ggcctcttgc ccctgagccc accttgggtg ttgtcaaagt tagggctgtc gcacagtttg tttttattgt taagcatcac agagcatctt acatcttaaa ttacccttct tctacaagaa ttttctgtcc ggttgagcat agattaacta gttcacatgt taggtgagtg cacgtgcgca ttttccctac aaaggaggag ttgacagctt aagggaaatg tatccaaatt tctctctttt cctttaaaga ctgaaaaaat atctttagtg aaagcctcaa WO 00/08209 PCT/1B99/0J 444 taaaatctca tctaacactq ttagctatga agcgctgttt aaattctacc tcatctctgc tggaagagaa actccacttt actctaatga ctttgtaggc ctacatcatt ccaagggctc gtcactttgc cagaaaagat tgatggagat tggcctaggg aactttgttg taggttacct gcgccttaac cacataatgt agagacgctt aaaacaggac ttttctatag tttggcaaaa tgaaaataga gacatgatgc actgtcatca actggaggac ggtggctgga gaagtttagt gtcctcaaag agtcagctga tggcttgaaa tgggctacag actgtctcct ttgttttttt aagataccgg acatttttct aaaacggtac taggaggccc gcacattcct gaagcttctg *agctctaggc gaaattaaaa *ttttcataca gttaatggtt aaggcaacca Cctggggccc atttttcctt ggaaatgata c ttcataaat agtggtgaga cctggatccc ctgggattag aagaactgct aggtttttac agcttagttt cccatttcat actaagtaaa ggcattaagt cttcacatga caatagcccc gctctgttca ctggcttgct gacatgctgc aacacagaag ctgaaaacaa cgcgtgatga gccttcagcg cgcagccttc aaaagagatg ttcaaataga aaaagcagtt atttaagatt aaggagagga tgtagtgagc agggaggttt gttttgtttt aagacaggca ttactattrt aaatgcttta ttcctcccaa gcattgtgca gggCtgaact tttaagcttt gaaaattatt gggtcatgaa tgtaaacagc ggctgaccac caggcactgg cggctcataa gcccttctat caaagctgca ctcggagttt acagactcct aggcgggaag tttttttcca aaaccatttc tatatttcaa tataagcctt atatttcagt atgtatctgc tactcacacc ttcctgtatt ctttctcatc cctgatggag caaatagatg ccacctgcaa gattgtttct tgagtgtgtc tcccctttac cagaagaaag cAgtgcaatc gttccaatac cctttgtgtt cctatttcct gagaaggaag taggctactg tcaaatgcag gttttgtttt aaataaaaat tttaaaatta aaggatgaag gcaaagcttc cccaaatgat ttctccggcc cttgccaata aatctacctt gaggagtgag tcaggcatta agactggagg agctgctgct atgggtaaaa tgcagagtaa gcttgtaaag cataaacatt gctgtgctaa tgggatctca cagtccatcc tat tt ttagc agcctgccat taagtctgga ttgcaccaec tccttggagg ttgctgaatg tttctagctt acatctacct gaggaggctg agggaggagg tatagtgaag gtggccagga cagtctgtct ccgttactca ttgagaaggc tcacatagga acagtaacgc gttcccagcg gccggataaa aggc aggaga cctcactgcg gacatttgct gttttaaaaa aattggtttg gatgtgattt atgttgtccc ctgcagtcct ctcccgattt ttggagggtt acttctatgt ccttacattt gatggaaatg aattacttgg gctgaggggt tgcagaaagt agacgttaac tttgaagctc gtaagatatt atgcatagag gtgggtcgtt aggccgcact catctttcag actgatgact tcagtcacta taaactctaa ttagctcata ggcggctgcc gcagttcttc gagtacagca ttgggggaaa cagtgttcag aggagtataa gcttcagaga aaatctccag gtgctgttgt tagaatgtag tcagccttga agattgcact aataagaggg aatacagtgc acgtcttgcc aaagtcccac o tgggcggc t cacttttcca attccagaat gggcagtggg aaaaaaattt caagtgtcat tccttcaact aagaccccct ggacgctttg ttttgacttc tctccacatt gggaggaggg ttagtgaaga catcactgag tctggggctc aaacaagcag tctgaagctc tttctgtaga atgccagtgt gtccagctgg ggcttgtgat tacttaaaaa tagagaatgg tagtcttttt aaacatgtag tattagttaa agtgatgtgt tacctggtgt gggccctggg aaaaatctaa cctctgatgt aaactaaggg gactttagga ctattcaggt tctgcacagc cggagccacg caaagacaaa ttgagatcat ttgctgaaat aaagtaatag tgtttctagg tgaaaggacg ccaacagttc aggagagtta gtaaatgtat tttataggta ccaaagccaa cagacaaatt ctgaattcaa gtgtctcaca aatgggagga 79980 80040 80100 80160 80220 80280 80340 80400 80460 80520 80580 80640 80700 80760 80820 80880 80940 81000 81060 81120 81180 81240 81300 81360 81420 81480 81540 81600 81660 81720 81780 81840 81900 81960 82020 82080 82140 82200 82260 82320 82380 82440 WO 00/08209 gtggtggtga cagcgtagt c ccataacctg gaacaacaag gggcagaggc acaaccagtg tattgacacc tttcattggt taacaccctg ttgtgttttg gctcaataga tgttgtgaat ataataaaat gactttaaaa ttttggaatc taaatctcag ttgtgctgac acctaattta gttgtaggga agatgccctc tctcactgtg catttctgtt agtggggaat tcaattggcc cagttagttc gattcatctt gaaagagtgt ttagggctgc gatctcgagg ttcagaaact tgtttcttcg ttattcctct cttccattat tatatttcta gtaacctgcc gacatcagac ggggttgctg tagaagttgc aaccttgaca ttaaaaatca gtttttaaag ttcacttact PCTJIB99/0I 444 gtggagcatc tccaaaggtg catCtcatta cattcagcga ccctggtgat CCcctgcctt ctattttcct tcttcaagca ggcataactc ctcatggctt tgctcattaa agcattggga aaatcatatg cacataacac ataggaagca aagatctcct aaaagtttcc aggcacttgt aagcctttcc attttttgtt taaaccactg taaaaataca tatacaggta tgccttggtg tctagtatat tcccactttc ggaccctgag acccatctca gctatcccag acagggattc tggcttttaa cacctgcttt ctttcagcca ctctgatttt tcagtttagt tctttttttc ctcggccgct atccatttgc gagttcttga gtaaacttgt atctttgaat ctatcttaat tctggcagca gcctttctct acatcatctc ctcctgcggt ctggacctgc gccagccaga tttattcaga gtcacctttc taccgaacca aatctccaga gttaaagtag aagaacattt cccagtgcta caccctctcc cttactaagt gttaacctaa cagtaattgt ccctctgaga ccaggatccc cagctataaa aaacttcaaa tct t tgt tat gactacattt tatgtaacac cttcccagcc ctgcggccac ctgggtggga gctcacatgg ccagcgggct agtttcccat atgttatcat tgctttcctt aaagaaacag taaaattgt t caaccaataa cccagcagct tgccttggcc cagccactct actcctcaga atttaacaaw tctattcctt ttctgatgct ggcatttggg gacactaact Cttaccagtg gctccagggg gtgcccccat ctgtttttca gtattaatcc tgtgggcctt gaactccttg gcctaataca aagacacctc attttttaat tgtcttaatt CCtccaaatt tgat ttat tg agagaccact ttattttaat gtagagacca tgcaaaaaag agttcatata tttactttag agtattattt tataaaccag ttaccctgaa tcaacaacca cctgagccat gaagcaggct ttaattaagg cctgggtctg ttgcacagca attaaccatc gatttgatga agaaaagaaa tttttcttat ttagttatcg tcaactctat ggtgccctcc aagaacaaaa gggaaaaatg gtacttctgc gtttcaaaac ttatctataa agtctctggc agcccttgca cactgaccta aagttagaat ttgcccacct ggctcctgca tgaggtctga ttctttcctc gtg tctctgc gtgcctgatg tcagcagagt tacattaaat ttttaacata tcctttccgg taaaaaaacc gatgtggatt tggcgtagat agctatagaa gtcttgattt ttgaaaggag agttttgttt tagatctttt atatttcaga aagctctgat gacttaagaa cagtagttgt gatctcagcg gttttgtggt tcatccctgc gcacccagtc tagagaggca aatttacagc tactgacact attattattc tsgctctgct gaggaaggtg ctcttattct tatggccaga ttctttattc agttgtacag agaggaaaca aaatctttta aggaatcaat ggggtcatac gtgagaaaag tgcttggctg tctgccctgc cacctccctg cctaggaaat tttgttctcg agcgtgttct tgtattagat tctcttaagg acaaacagat tcaataaaga gaaagtctcc aagatcctaa ctgtatttgg gtggtactgt aatcactggt ttattctgaa gtctaggaag ctggaaatgt tattttctgt ggaatattct ttcaaagaca ggaagtgaag gatgtttgtg ctggcatggc ggttacagag ctgtgctttg tttgcttttc ccctgcaagg ttctttctct tgcctccaat tagttattag ttaaccccag agacagggct gcagtctgta actaggaagt cattatcatg ctgttgtaca gagacacttt gtgtgaccca 82500 82560 82620 82680 82740 82800 82860 82920 82980 83040 83100 83160 83220 83280 83340 83400 83460 83520 83580 83640 83700 83760 83820 83880 83940 84000 84060 842.20 84180 84240 84300 84360 84420 84480 84540 84600 84660 84720 84780 84840 84900 84960 WO 00/08209 taaaaacatg taaaaccttt gtaaaacata tctgagatat cactcctgct tggtttaact gtcatattta gaaaacctag cagatggaaa atcctagata gttctgaacg atgatcgtga caatgtttag cagctctagt gttttgtaca tttctccaaa gtttcatgat cattacctct aggtgagttt ggagcagatt tggaggacta agctacgaaa ctctcataag tttgagaact acccaagtgg tatgacatgt tatctaaaca ggaccaccat gtgtaacgaa aaaatcctta atccacatgc aaagatgcaa tctgtttgca ctttaattga agctttggct tgatcccatt cacactcact aaagtgcttt tttttcccta tctggcattt tacttactta agggctatat PCT/1B99/0I 444 ttttagtatc ttCCCtttcc atttttaatg caggetctta ctgtgatgtg ttctgcattt aagtggcttt aaaccatagt agaccatcaa tgtagaaact gcattctgca gtgtgtgtgt ctgtagaaac agcttgttct ttgagtgctt taattctttg gtgtgtgtgt ggtttcagta tactgggaat ttgatcctgt tgctccttag gcactggaag cctagaatct gcatcattag tatagcctac gactatactg taactaaaca catatatgca agcatagagc ctctggagat agtgtttgtg atttggcctg cattgcatat gagcctctcc tgttggggcc cagtgcaggc ctgctgcaaa agcagctctc ttttaccagg tcattcattc catttacagc ccactacctg tcctttaaaa cagtttaact tttatgtact aaaatgaaat tggaaaaggc ctgtgttttc aagtctgttg tgactttata tcaggtatga taaatctctc tgatgcctgg tgtgagcgtc cagcacaggt tctgaaacat tgatttgtgt ataacaaagt ggtttttttt tacaatagaa tacaactaaa acagtttaag tagagaaagg ggatttcatc gctccagata gtgatttcat tacacaccta aatactgttg tagaaaaggt gttcgccatt aatcaggcaa ttctgtagtc attttcattt caaaagaaag gcccttatga atctcttttc ttgctttttg cgcactcctg gtggaaaggt Ctacactgcc attctaatac tttctttcat ttctagggca acatgtcacc Cccaggagca ttcctggaaa aatagactaa tttgaagcat tttatttgca aaagtttgca aatggaatgg tcagatatga ggaagccata aaaagcacgc gtcagtccaa t tgagcagga catggaggag atggtgaaat tattagtagt atatatattt gtgtgtatgt tatgataggg taactgaatt ccattatcct ctagatggtg gaagaacctg ggcatcattg attgagtcac gtcatgtgtc cattgtgcaa cacatgttgt gtagttgtaa acaataaaaa gaccaaaatg aaacaaatgg caaaaggaat gcagccacac aggtttcata aagaacagtt cttccctgga ccaggttggg agaggatata aggggttcaa aagagcct Ct tgacttctcc tcattctctt gacccccgag ttgctcctgt gtttgaattt atactgtctt atgtggggct tattgcgtga tttttcttca agcccttgat tacccaacct accttgcaaa actgtttcct gcatatcaca gccacatgga ttcttacatt atgcaatgat tgaatggttt aacatatatt cagttcaata ctgaagtgtt aatgccctgg ccagtaatgg gaagtgtcta tgcagagtca acttaataat acttcataaa tcctaggccc cacaaggtgg t atggt att c ttgtaatgca tgaaataaag ccatgattcc cttaggtgtt ccagttgtta cttgtctgtt acactcttct tgggcatcga aagtggttca actcaagtcc ggaggtcatt acccttttga cttacagctt agccttggtt ccctcaggcc tattttcaga gacaacactg atgtctccca gaaactggtc gggaacagag tctgcagcat tggcttggta tgcttaagcc accactttgt aacgtgtgga aacatggtgg taagagactt gcaatgtagg taattggaat ctatgaattt t Ctgtggc tt agaggctgag gttgggcatg ctgctgatga gaggctgggt gcagtcaagc agcaatacgt gtgtacttaa tacaaatctg agtatttgtg taatcttatg acacacaact ctatttttga agtggattgc aagcacagaa actttagatt ctgcactcat tgatgtggat aatatacgca ctgactgacc ctcccacctc taatttagag ttctttgatt ttgttgcatg acctagactg atcccagctg 85020 85080 85140 85200 85260 85320 85380 85440 85500 85560 85620 85680 85740 85800 85860 85920 85980 86040 86100 86160 86220 86280 86340 86400 86460 86520 86580 86640 86700 86760 86820 86880 86940 87000 87060 87120 87180 87240 87300 87360 87420 87480 WO 00/08209 PCT/1R99/01444 acattgttta gtaaaaaaca tgctggaaga tgtttcaagc tttttttttt gatcttggct tgagtacctg gaggtggggt cctgcctcat ctgacttaat cagagagtaa aggttggagc cagacatttt acaactactc acagcatgta cgtgatgggc tgtgagaagg tttgggggta ttatagaaag cttgacatct agggtgctgt tgcagcctta tgcttccatt tgagggcat c tgtatagata gccaaggaca ggcaattcca gcagtgtact aagaatggaa gtcacagtgg agggtggctg gagcatggcc atctggggcc ggattcacat aatgaacaaa ttcagaagac cacttcttgg cataccagga tgctcacatg aattcactac tccagccctt aagctctgtg cctcctaagt aaaaataaaa gtggaaatgg cgctatctgg tttttgagac cactgcaacc ggactatagg ttcgccatgt cctcccaaag aactctagga catgcccggc tgaggtgttt ccaaagaagg acctctgcga aggggatgat ttggtccaca tacaggaatg tcgggagcac t aga tc ta cc gagtgttctc agtcttaagt gagat cCCC C gcagatggtg tgggtggtcc gtctcagcct gggctctgga aaggtgacag t a ccaaagga actttgggaa gtgctctgtg tgggaggcag tcagcccagc acacccgaga acacacaaac ggagcccctg acacctcaaa gcagtttgca gggtggggcg cagtggtaat agatagtgtg ggtatgacaa gcgagttctg attgagcctc cgtattgaga acacagcatg C tat ttggaa agagtcttgc tccaccttct catgtgccac tggccaggct tgctgggatt cataggtatt cccctgtaga gagtccagct ccaggcagta cagctactca gggtcatgtt ggtggtggtt agagcagaga aaaaatttgc cgcatctctc tgtcaggctt tctacacaga aggcccaaat gagcatactg tctgtgatag gagcaactgg gctctggctc agcccaggac cctgtccatt taattagtaa gctgtgaagt ggcaaatagc attgtcatca cctgctgaat agctgagaaa ttccagttga tcaggggcaa ccgtggaaaa tggcggggag aacaagcaga tgcccccttg tgggaccagc caaaccrtca agaaaaaaat atacttagga ggaagaaaat gttgccattc tctgctaccc gggctcaagc cacgctcagc ggtcttgaac acatgcatga ataattccta aaggcagggt ggacttaaaa tatattttag cctcagcctt gcaaaaaact tgctgatccc gcaattctca atttagggcc ccctctttcc tctgctttcc aagcagacat cttgggggtc ccctgtgtac ctgtggttcc gacacagctg aatgtggaca agaagaccaa gaataaacca caaaggaaag Ctcagctcag ctattttcag tatcatgtca cagaatctgc ccctgctgtg gcttgtaggt aggtgcctct ggagtagttt aagtggttta gggactttta ttgctgatac agattggagg gggttcatga cc cat tgtct tacatcaggt agctgtgcgt atttttccac aggctggaat attctcgtgc taatttttgt tcctggcctc gccactgtgc tttttataga ctgtgggagc gatgacctaa gcttcacggg tgcagcacaa ttattttaaa tgaactaaag gaacctgagg cgggttttta ctctggtgtt cactgccccc ggggtccagc ttcagagtag agatggggtg atttcattaa gatcctgggt ggctgaaagc gggtgtc tga tctacatctt gaggtcagtg tttaaggaag catcctttag gagcttgtta atttcagtga ggcaactctg tagaaaccag tctgcctgtg tgtacgagga ccactggcgt gtgggtttga caggccgact gcagggggtt ctttattaat cctattttct gctgtttcag gtacctggtt tgactttttt gcggtggtgt ctcagcttcc atttttagta aagtgatcca ccgacctcca tgaagctgag cagggctgtg gatcggctgg tcataacgtc acaatccatg aactgtgcag gatcacagca tgttcaacat tcagtggtcc tctatctgaa tcccatccag atgattcctc atgtaatgga gggcaggaag atgccctccc ggaagaggtg agccagagag agcctttgtg ctgacccaag ttgtattctt aaaaaaagga gctccactca gaaactcagc gatcaccagg ttagaaacac ggttcctgta ggggagccgt caactggtgc tgttgaaaat tgttttttgt gttcccactc aggaaggcgg cagtgtccat 87540 87600 87660 87720 87780 87840 87900 87960 88020 88080 88140 88200 88260 88320 88380 88440 88500 88560 88620 88680 88740 88800 88860 88920 88980 89040 89100 89160 89220 89280 89340 89400 89460 89520 89580 89640 89700 89760 89820 89880 89940 90000 WO 00/08209 ggactgtgaa attcaccgaa taatttaaac tgatgccagt gcagtgcaca taaaatattc aatccaaaat ataacagaaa aggttgagct tttgatacaa gaactagaaa ggtgattttt aaaatagttt gcaaacagaa agaacctaag gaattgctct gccaggtacc ctgggggagg acaccttaaa ttttattcat Ctttctctgg gacaggcgcc tcgtccagtg aattcagggg aggcaattaa gaagagttta gtaggaaaag tgtgaaagaa acatgagtgc gaagttccat ttgtggcagg gcgtgtgctg aggcgctgtc gctcaagtca tgagccagtt ctgaaggaca ccctcgcaca atgacttttg atct caagga aaattgattc agttttacac tttctgccct PCT/1B99/0I 444 gagaaatgct gaggagatat ttatgtaaca acccagcacc ttgattagtg ataccctttg atgagggaag cttggaacta ggtagaatat tgttaagtaa aacatgcttt cttttatttt tatcttttcg gagaaatttt agccaggatg gagatgggcc ttgcagcttt tgggcactgc gtgtatatat ctgagacaac ctgagaactg cctcccaatt accagggtt c cataactgca gcagagatga tggctttggg ccagaggttt aaaaagtaat caaacaatct gacgtggcat tgtgtggaag ggaacttcca tgccccaggt ttcatagagt acctcggctg aaagagaatc catgcacaca tgccttatta agagaaaata cagggaagat tgatttttaa tttaaagcct gagtctacaa ctagtaaaca tataatgctt aagggtatac gctgttggga actcagtcat ccatatgtat gatgtctaac catgaagcca aggaaaagtg taggaaatag tcaagcttcc gcaaaacatc cttcagtacc acaggaaggc agcaagaaag gtgtagtgac acttgtccag cttgttttcc ataccaaagg ctcctggcag cttttctttc tcctttgacc gaatgaaggg ttgtgaccca tactgctgag tgttcccagg agtgtacatg cagttttaac ccattgatga cagatccctt ggagttgcct gggaggccag cgccagtgga tcaaatggag agttcaatcc caccccacag atattatcca agattgttgg aggctagttt caatattgga gacttcactt tagcaaatga aatatctggg tact ctact a tgtatgcaga gacaatttgg cccgtttctt aaggatattc acttgatgac gttatatata ggataggaaa gaaatatagc aataatgtag gaaagtatgg aaaattctgg tctagatccc aagatgagag cagtgctcag cctcaggagt tatcaacacc at tgggt tt t tggatcactt cccaagtaat accagcctca cctaggagtc gggtggttcc agccattaac ctttccaggt acatttattc atttgtggtt tggttttgct agctaagaga cgtttaattc gaactgtggc gccgcaattt ataataatcc agtaaacaat atataatgga ctgaatcaaa gaatggtgag gaatgccagt gttgcttaag tctgaatgtg gccaagaaca aaagtatttg gataatagaa acattagcat cgaaacatat ggaatgtatc tcctagactt tggattaata gcgacatgaa ttttatgttg tagatataaa ctctattgct aaatagtcat aacttgactg cagtaattac ccagtccccc ggaacggctt gactcagacc tagtttttaa taatgttagg gtgctgtcta tag ccc aagg tattgccatg ttggcagtca tagggattaa ttaacacaga taggagatca agcaccatat tttactgttc gaggttgaaa gcgcctgctc tcacagccat tgagagaggt taaggctgac ctgctgacct tctctctctc ttttagtttt aacagcaagc aaaggaaaca agggagccat gcaatgcaat tgttctgatc taaacagaca gcttcatgtg agacatttct gttgctgatg cccaagccag ctcaggaaat gtcacttata tgatgatggt aaagctctta gttatgttta agttgtattt tcagtaactt tcctactttg aaaactatga aactctagtg ttgcagaggg aggcaagacc agaaatgaaa tattcgtctg ccttcctgct agtgtgcaag gctgaagccc gtttggggta ggagatcatc tggaggcctg acatcaatcc cttaaatctt ttataattat agactattca tgtgagggt t aacctgccag cctgggaggt taagtaccga tcaaagcctc cacggtcgct cctcttactc taggcatcaa tgaaaaattc tggtt t ttga cagaagaagt agagaggcag tagcagggtt 90060 90120 90180 90240 90300 90360 90420 90480 90540 90600 90660 90720 90780 90840 90900 90960 91020 91080 91140 91200 91260 91320 91380 91440 91500 91560 91620.
91680 91740 91800 91860 91920 91980 92040 92100 92160 92220 92280 92340 92400 92460 92520 WO 00/08209 tttttttttt acgt tagggc a act ggaaag ttccgccctg cacttaggag aagacaactc ctcatttctg ctgttcttgt gcaaacaggc acctaacaga tttaaaaatg gtacttaatg aaattattac agctaagcgt gtacttgtgg ttgaaatgga aagaacttat ccaacagcag tttataaagc cagcaggctg ggatcaaagg tgctaaggtc ggggacaggc gccgttgtca aggtccacag agagatgata tcagttctcc gaagaaagac agcacggaga cactgcccaa agcaaccata gtccctgtta taaaaataca tccattctga cttggcagca ccagcgggca gaatgctatt tcccctaaga ctcagccatc acacagtgta acatggtgga aggcagttct PCT/1B99/0I 444 ttcttttaag ttcactgctg aaaggaaatg ccttccttac ttgtttttca aggcatcgtt catatttgtg aactctgaaa tatgattctt attttcagag agaggttttt ttaccttgga cttctcagtg agaggagacg ggt t tt tctc catcgctaaa cgattcctct cttacgcaaa agcttcctga ggctttccca tatcccggga aggcagtccc ccagagatga ttactcaaag agggatggta attctttgtg cctctagaaa cagttgagac ccagagctgg aagaacacct tgttctactt aggagttgtt tatttttgcc tctaagcctc aggtgagaaa ttccatccca cctgcccagc tagagatgtc actcaaggga ggtggtcat t aacttctaat caaataaaag atggtcccag tgctgagagg ccttgggcag caagagaaag cacgtgtggt tcgtattcac tgcaaaggag acccaactga ttgtttcttt aaaagaagtg tttttttttt attatttcta cgtagttttt cttcacaggt agttttagta cagttacaag cctctcagtg cagaaccttg atcacaaata tgaccagagg gaatgggtcc agagctttct acttggattc agaatctaag aaggaggaaa agtccttggt cttccatttc tgtgcacgca ccaggtccag aactgcggag ttttctactt ttgagggtgt gaaattttta tgatttttct ggccagccta gcatacccta cctaataacc ctaactgaaa agtctccaga t taaatggct a aagc tc aca gtttgtttta 56 cttgactgca ccccagcccc cagcagcagc tacagacacg gttttcgtca tcatctgtgt agt t ttt tag actataatta tctcctttta aaataagaac tggcttttgg gatgtttctt cttatttatt gcgcattgtc cgtgatgact cttatgaagt acaaccaaag acctccttga tatggtagtt acctttccca tttttattat gagaggctgt aggatgccgt agcttttaac gtaacacaga gcatatacaa aacacggata ggagggtgtg catcaccccc tgcagctctt tttttaatgt ttttaaaagt tggtgttcct tcatagaaag gtgagtcaag tcagatatgt actcacattc atatgcctgt gggtgaagag tccaagccaa tttgcattga ttgggtaaat tt Ct cagat c tggggttctc agctgtcttc gacggcttga ccattactat gggtgacatg taaacagtcc aactttgact acccatagtt taaaaataaa aaggtgagta atat cct tt t acttctagta gtgattgcag tttctttcta tgagtaccac aatggataaa acagttgcag cattaactca ccctgatctg ggagcagaca ttctgcactt ggcctgttag ttctatgagc aataaatata agatttgatt tactcaggtg cagagcaagc acccccacat ttgtcaatct cacaagtgtg tgtttttgag gggctgtcct atgagctttg ctatctgaaa gaaagagagg tgaaatttaa atacaattta cc tgt ctggc tgataggtcc agtgtttagc gaccttgtag catcagataa tcatagaaac tgattctgct gtcacttagg tgtgggaaag tgggttttgg cattacttag tggtgactct gatgtatcta tttttatgtc tcaaaaacct gtcccaagta ccaagtgtag acgcctgcct taacaggtat gtccttcaag ttagagaaaa gtagagcata ccaaaggcaa tttatagttg gattgtcctt aactctttta ctgaatgcca aaaaccagct acaaaccaga aatgaaggtc aggacataca actgaggtgc C ac ccaggc a gatggcatga tagcagtgct tggctgtgga gagaataagt cagacacaag tgcattcctc aaccaagacc cttctttttt ccctggaagt ctgtaggggt ctgaaatata ttgttaagat ttttttggtg 92580 92640 92700 92760 92820 92880 92940 93000 93060 93120 93180 93240 93300 93360 93420 93480 93540 93600 93660 93720 93780 93840 93900 93960 94020 94080 94140 94200 94260 94320 94380 94440 94500 94560 94620 94680 94740 94800 94860 94920 94980 95040 WO 00/08209 acagagcata gctgtgaaaa gattacagat aaggatcatt gctgagtgct aggattgtat aatatgcctc aggaagggtg ccaaaatcaa cccatttcat accacctctt ataataatgc ggtgctgtgg tcggtgttat tcactactaa ggtaggccct agagagatac gtcccttttt aattttctgg aaatatatat gtgtatatat gtgtatatat gtgagataga gaaaggcctc tttaatgtca agagagacga attatgtgct tggaagagag cataaacctc atgcctgtgt gtagctttag ttgagtatca caggaggaaa gtgctcatca ggcacaacgg ctttggccct tgttgtgatc ctgggggctc ccttcctgtg ccagagtgct gcaattgtat ttgaggccac PCT/1B99/0I 444 gaaagtaatt actgcttagc gttctccctg cagaactgtg gggggccacc tgtatcgatt tcatatgtaa ttcaatacag cccatttcct tccctgttaa ttgttcagca acatggattt gagacaggac tgaggagacc Ctaagctgca gagccaggcc cccaagtggg catattttgg cacaagtgtt atatgtgtgt acgtgtgtgt ataccatttt ccaagtcaca agggacatca Ctagagctaa gggacagttg aggtgctgag gatattgctt ttagcagaac ttcttgacgt atggcgtgga ttgtgtttgt c tgt t ttgag aaacaggcca aagtctttgt atccgagctg caccatctca tcagggtgct tggccagaac cccaaggctg tctcatttct cattgagaag tcatgctgct tacctacatt gaatggtcgt gtctatgcca aagttgagta tgtggtgttt gaatcatgcc tactgaattt acctttattc atacatgtga gtacatcaga gtcataatcc catgtgtgaa tgacattaat ggttgcacct ctgatgatgg accacccctt gtgttctagt gtcacattca gtgtatatat gtatatgtgt tcccacctaa gagcactcca gcatacatgt cacacttgtc ctagaaagac gatgtagact gtctccatgg acttggcctt agtgatgtct cgtgaataaa aaagaatttc t tt ttgt cag ggctctgctg gtcactgacg tccctctggt gcatcggctt tttaagggcc tgatgataaa gacagcgtgt tcttttattc ctcctgagca CCtgtgCtat cctcaataaa tctcttgacc acccaccagt agacactgca ggatatagtt tcctccgtgt tcatatagct tgtccataaa acgttgtcta cgattgcata gtacaagtca agagaagaca gcagaatagc cggaatgcag gtggattttg gcccagtagg gg ccc age ea aaaaagtatt atatgtgtgt gtgtgtgtgt aatggageat ggatgcagct tggagtttct acctgggaag acacctggaa gagtgagatc ctcgtagtga tctaaggact tgttcctcta tgcaacttag agattagagg cccgaaatcg cagtaacaaa cccacttcag ggtggtgcct ecacagcagc tggc cace ga ctgtagactc gggcactgga tccaggtggc gtgagagcaa tgtttttgtg ggcatcagac aagtagtcct agttcctgag gctctcaaag tttccatgat acacttttca tttctggggg attgttagaa gacgctggag gacgtgccag ttgacgccca tgcttgcttc aaatgaccat aggggcttca aggatcagag ctgacaaact gagctagact t t tttgtt t gtgtgtgtat gtgtgtgtgt ggcaaatctg gtgagctggg gcagttttct caagcctgce gttctattta ctcattcctc acagtcagtg ccatatgtgt gacatcacta gttttcttgt attgttacca atttgtgcgt cttacaagtc ctttgtgttg gggggtttgg catagcagga cctgcaaggg atccctgctg tcccacctgt aaatggtagg gctgaagcag aagacaggga agtaattggt acacttctgg tccctgcagt agttggatct ccctacgaa gacactgaca ggccaaaata atatcaaaat agcaaattct atggaaccaa cactgagcca tataaaagca gcaaattaat aagtgatgag agtacagctt aaggctcttg tcgagtcatg gaaaaatgaa atatgtgtgt gtgtatatat gactggatta gaacaggtca tagggaaccc agagcaaatt actageatta ctctgtaggg aga cc aggc a tccggggtaa actttacaca tggt tt t t t cgtgggcctt ttaagtatat tccgaggctt ctgaagcatt gttccctctg gaagaaaatg gtgcgagttg aaactcggct gttageactg atccaaagcc gccatgctta 95100 95160 95220 95280 95340 95400 95460 95520 95580 95640 95700 95760 95820 95880 95940 96000 96060 96120 96180 96240 96300 96360 96420 96480 96540 96600 96660 96720 96780 96840 96900 96960 97020 97080 97140 97200 97260 97320 97380 97440 97500 97560 WO 00/08209 ccttagaact cagagcccag tctgcaggag agacgctgga cagcaagtag t ttgt tgt tt tttagatggc tgtcactgtg cgagatcatg gactacaccc aacctaagat atagcggttt tagatgtaac tgaatgttta ttaatttgga aagtttgtct cagttgatca taccaggtat gtttcatacc atcatgttcc catcaagata nnaactgcag cctctctcag ctgcttgcag tctcacaaag tgatctttat t tgcgggggg gtgagtctcc tggctggcat aaccaagatg gttaacattt ttttgggcat aaagcaataa ttaactttgg gcctacgtgt ccagtgttgt tttagcactg gggttgggca tgtgactgac ttaagccatg PCTIIB99/0I 444 ggagcggt cg cgaccgggag agattgcaac aggagagaag attcttacga ttagatacta gtggacgtga tttgtaaaga cctcaggcaa tcagcgtccc gatcattggg ccatcataaa ttaaaggagt agatcacttt tatcctgatc ttaaaaaaaa aagtcatagt tgctagcatg catactatag tttgagtctc ttcaggggtg tcagatttat gatgacgagg ccagtcactg atggggttct cactgtaacc catctcttaa ccggccattt tcgagacttc atgagttcag gaggactttg cttattactg aacaagaaat atttttgctt ggtcatgacc tatctgagtg atgagaccag gaaaggactg taggcaagga ttttgggtta gccctgctgc cctgagtgca accatcccac gaagcgggaa actccaactt aatcgtccct ataaatgcaa gcattcacaa aggcgtgggt tggcaaggtg aagatcagtg ccaagatgat taacatttga attgaatttg actgtcaagt aatagagtgt aggtaaatgc tgagctgcag agtcatgtat catcccttgg ccccaagagt gacagctgac acctgtgcct tgtaaagcct gtgcagtcac acgtcttcta tcctggggta cacgaggaga ctcctgttcc cctttatccc ttctgcatca ttactcaaaa aattcatgct ttcagcccag caaccatcag tttattacgt ctccatcatt ttgacatgag gcggagaggc tatttccccc 58 agacggtgga cgcagcccga actgtccagg gtgtgcttct gcaattcagg tctccagtcc cttatgtttt tacggtggaa ccatcgttct cagttggctc atcttgggtc gagttcagcc ggactttgtt aagatcatca gaaatggatc tttcatacat tttatgggac ttgtggggtc ttatttttgc aaatctgact ctgggacttt agtttttcag tcaacaagca ctctgatgtg aggtcacttc ttccatagga aaaggagaga ccacagtgct ctgggt caga tcgtggttcc gatcttacta acattgactc cacattttta gagtaaagga tgagattatt aagttgtaac gtatgtggca cctgtggatg aactgtgtga aacactcatt ggagctgcgg gcccacgggc ccttaactga cagggaggaa gggcatgtcc tgattactgt cttgttggtt tttcaaaagc t ccgagaggg tcgcccattc attgatccct tttatccctc ctacatcaga aattaaataa tctctctttg ttttgcttat agctgacacc tgagatattt ctgttgtgtg tcttgcagaa caaaaaaaaa aggtcgcaca aaatgctgct cacttaagag cttgacaaca gtttcttttg ttgccatact gccaccagtg ggatagcggt gctagatgta tttgaatgtt tgcatcaaga tggtggtttt atgccttatg tgagatattg acctctacac gtgagtcctg taggttggac ggattctcag tgtgcacttg cggcggagcg gactgacagc gagggacaga accggcttgc cagtgttttt acacagtagc cctttttgag tggaagagct tttgtgtggc ttgttatgga ggc tcagagg gtggttccac ttttactatt aatgatttat gtatttaagg cccataagta ttttagaccc ctttgtggta atgtaatgca ggagtaggca agatcaggct cagtgactct cacggttgtc tgggttgctt caatcatttc attctctcag tagactcact cctaaacagg ttccatcata acttatagga tactgttgga aagaaacaag tttttttttt aacacctgtg gtgtctgcat agggtgtgag ttacgagatt agtctcagcc agccaaattt gtggtgtcaa 97620 97680 97740 97800 97860 97920 97980 98040 98100 98160 98220 98280 98340 98400 98460 98520 98580 98640 98700 98760 98820 98880 98940 99000 99060 99120 99180 99240 99300 99360 99420 99480 99540 99600 99660 99720 99780 99840 99900 99960 <210> 3 WO 00/08209PCIB/O44 PCT/IB99/01444 <211> <212> <213> <220> <221> <222> <220> <221> <222> <223> <220> <221> <222> <223> 3983
DNA
Homo sapiens
CDS
171. .3725 polyAs ignal 3942. .3947
AATAAA
misc-feature 36 n=a, g, c or t <400> 3 ccaggccgtc agcgcgccgc acagaaaaac cccaggatgc ggcacaggtt agtgataact ccccaagcac ctgcgngtcc tctgcatatg aagtgtgtaa gttttgctga gttcccagac cggcccggcc ccgggctctg aatagattgc ttgatccaaa CCttcccaag atg gaa Met Glu 1 cca ata aca ttc aca gca agg Pro Ile Thr Phe Thr Ala Arg gtg gat Val Asp ttt Phe aaa Lys 10 gtg Val ggc ctg cag ctg Gly Leu Gin Leu 25 cat ctg ctt cct aac gag gtc tcg His Leu Leu Pro Asn Glu Val Ser ggc tcc ctg cct gtg cat tcc ctg Gly Ser Leu Pro Val His Ser Leu gtt gtg gct gag gtg cga aga ctc Val Val Ala Glu Val Arg Arg Leu acc Thr agc Ser acc atg ccc atg Thr Met Pro Met ctg Leu 40 ccc tgg Pro Trp 45 agg cag tcc Arg Gin Ser aga aag gaa cct gta Arg Lys Glu Pro Val ctt Leu acc aag caa gtc Thr Lys Gin Val cgg Arg tgc gtt tca ccc Cys Val Ser Pro tct gga ctg aga tgt Ser Gly Leu Arg Cys 75 gaa cct gag cca ggg aga agt Glu Pro Glu Pro Gly Arg Ser caa cag tgg Gin Gin Trp, gat ccc ctg atc Asp Pro Leu Ile tat Tyr 90 tcc agc atc ttt Ser Ser Ile Phe gag tgc aag cct Giu Cys Lys Pro WO 00/08209 cag ogt gtt Gin Arg Vai 100 got tgt ctg Ala Cys Leu PCT/1B99/0I 444 cac aaa otg His Lys Leu att aag Ile Lys att cac aac agt oat Ile His Asn Ser His 105 gac got gtc cac ogg Asp Ala Val His Arg 125 gac cca agt tao ttt Asp Pro Ser Tyr Phe 110 cag agt ato tgc Gin Ser Ile Cys 115 gtg ttc aaa gcc Vai Phe Lys Ala atc cgt cag Ile Arg Gin tcc gag ttc Ser Giu Phe 165 ggC cgc gtg Gly Arg Val gcg Ala 150 gac Asp gat gat caa aca aaa Asp Asp Gin Thr Lys 135 ggg aag atc gcc cgg Gly Lys Ile Ala Arg 155 gao acg ttt too aag Asp Thr Phe Ser Lys 170 140 cag Gin gtg cct Val Pro gag ato ato Glu Ile Ile ago too Ser Ser 145 gag gag Glu Glu aag tto gag Lys Phe Glu ctg cac tgc cog Leu His Cys Pro 160 gtg otc tto tgo Val Leu Phe Cys 175 goo ctg ato gao Ala Leu Ile Asp acg gtg gcg Thr Val Ala gag Glu 195 180 tgo ato Cys Ile cac His 185 aat Asn aag aag got ccg Lys Lys Ala Pro ccg Pro 190 gag aag Glu Lys cac gto ago His Val Ser ago ogg ggg too Ser Arg Gly Ser gag Glu 210 ago cc cgc ccc Ser Pro Arg Pro aac Asn 215 cc cat goc gcg Pro His Ala Ala 220 aca ggg ago Thr Gly Ser cag gag Gin Glu 225 cct gtg cgo Pro Val Arg tcg otg gc Ser Leu Ala 245 ccc atg cgc aag Pro Met Arg Lys agg aag gag otg Arg Lys Giu Leu 250 too tto gag gag Ser Phe Giu Glu too ttC Ser Phe 235 cag gat Gin Asp too cag ccc ggc ctg cgc Ser Gin Pro Gly Leu Arg 240 ggg ggc otc oga ago ago Gly Gly Leu Arg Ser Ser 255 848 896 944 992 1040 1088 ggc tto Gly Phe 260 tto ago Phe Ser ago gao att Ser Asp Ile gag Glu 270 gag Glu aac cac otc att Asn His Leu Ile gaa aat oga act Glu Asn Arg Thr 290 ago Ser 275 gga cac aat att Gly His Asn Ile gtg Val 280 ggo Gly oag ccc aca gat ato Gin Pro Thr Asp Ile atg otc tto acg att Met Leu Phe Thr Ile 295 cag tot gaa gtt Gin Ser Giu Val 300 tac otc atc agt Tyr Leu Ile Ser cot gao Pro Asp 305 ac aaa aaa Thr Lys Lys ata Ile 310 gca ttg gag aaa Ala Leu Giu Lys aat Asn 315 ttt aag gag ata Phe Lys Glu Ile too ttt tgo Ser Phe Cys 320 1136 WO 00/08209 WO 0008209PCTIIB99/01 444 61 ict cag ggc atc aga cac gig gac Ser Gin Giy 325 Ile Arg His Vai Asp 330 cat His cac tit ggg His Phe Gly itt gic tgt Phe Val Cys tct icc Ser Ser 340 aca aai Thr Asn gga ggt ggc ggc iii Gly Gly Gly Giy Phe 345 itt atc tgi cgg gag Phe Ile Cys Arg Giu 335 tac gig tit cag igc Tyr Val Phe Gin Cys 350 acc cig aaa cag gcc Thr Leu Lys Gin Aia 370 gcg cca gcc cag ctg Ala Pro Aia Gin Leu gag gci ctg Giu Aia Leu 355 ttc Phe igi Cys gi i Vali 360 gat gaa ati aig Asp Giu Ile Met acg gig gcc gca Thr Val Ala Aia 375 gag ggc tgc ccc Giu Gly Cys Pro gig cag cag aca Vai Gin Gin Thr cig caa agc cig Leu Gin Ser Leu 395 tcc aaa aca aaa Ser Lys Thr Lys gci Aila 380 cac His cia Leu aag Lys 390 aai Asn aag cic igi gag Lys Leu Cys Giu 400 gaa cig caa aag Giu Leu Gin Lys 385 agg ata Arg Ile cac cig His Leu 1184 1232 1280 1328 1376 1424 1472 1520 1568 1616 gag gga Giu Giy acg aca Thr Thr 420 aig Met 405 ita Leu ici Ser 410 acc aai cag Thr Asn Gin cag gcg act at Gin Aia Thr Ile 415 gaa Giu iii Phe 430 gag git cag Giu Vai Gin aaa Lys 435 iii Phe tig aga ccg aga aai Leu Arg Pro Arg Asn 440 cig aga. igi tia tat Leu Arg Cys Leu Tyr 455 cag cga gag Gin Arg Giu gaa tig ati at Giu Leu Ile Ile gaa gag aaa Giu Giu Lys cag Gin 460 gaa cac aic Giu His Ile cat att His Ile 465 gga agi Gly Ser ggg gag aig aag Gly Giu Met Lys 470 cag aca tcg cag Gin Thr Ser Gin gca gca gag aai at Ala Ala Giu Asn Ile 480 gaa Giu aaa Lys ggt Gly 515 gat Asp tia cca Leu Pro 485 gca aag Ala Lys 500 ccc agi gcc aci Pro Ser Ala Thr aga tct tia aca Arg Ser Leu Thr 505 cga Arg 490 iii agg cia gai aig cig aaa aac Phe Arg Leu Asp Met Leu Lys Asn 495 1664 1712 gag ici tia gaa agi ati tig icc cgg Giu Ser Leu Glu Ser Ile Leu Ser Arg 510 aai aaa gcc aga ggc Asn Lys Aia Arg Gly 520 agc icc ctg ict agi Ser Ser Leu Ser Ser 535 cig cag gaa cac Leu Gin Giu His icc Ser 525 acc Thr atc agi gig gat cig Ile Ser Vai Asp Leu 530 agc aaa gag cca ict Ser Lys Giu Pro Ser 545 1760 1808 aca ita agt Thr Leu Ser aac Asn 540 WO 00/08209 PCT/IB99/01444 gtg tgt gaa aag Val Cys Giu Lys 550 ctc ggc tc tcg Leu Gly Ser Ser gag gcc ttg ccc Glu Ala Leu Pro atc Ile 555 tct gag agc tc Ser Giu Ser Ser ttt aag ctc Phe Lys Leu 560 gag gac ctg Glu Asp Leu agt gac tog gag agt cat ctc cca Ser Asp Ser Giu Ser His Leu Pro 575 cag cag gcc ttc agg agg cga gca Gin Gin Aia Phe Arg Arg Arg Ala 590 gaa gag Glu Glu 580 aac acc Asn Thr 595 oca gct cog otg Pro Ala Pro Leu tcg Ser 585 ctg agt cac Leu Ser His ccc ato gaa tgo Pro Ile Giu Cys cag Gin 605 aaa Lys gaa cot oca caa Glu Pro Pro Gin Oct Pro 610 goc cgg ggg too Ala Arg Gly Ser toa gtg ago aca Ser Val Ser Thr 630 gca aac cat Ott Ala Asn His Leu 645 oat too tgg agg His Ser Trp Arg ccg Pro 615 gag Glu gtt tcg caa Vai Ser Gin agg Arg 620 cga Arg ott atg agg Leu Met Arg tat cac Tyr His 625 acg cot cat gaa Thr Pro His Glu 635 aag gao ttt gaa too aaa Lys Asp Phe Giu Ser Lys 640 oct gtg aag aco cgg agg Pro Vai Lys Thr Arg Arg ggt gat tot ggt Gly Asp Ser Gly 650 ggg act Gly Thr cag cag ata Gin Gin Ile tto Phe 655 otc cga gta gc acc Leu Arg Val Ala Thr 1856 1904 1952 2000 2048 2096 2144 2192 2240 2288 2336 2384 2432 2480 ccg cag aag Pro Gin Lys 660 gcg tgc Ala Cys 665 gat tot too ago aga Asp Ser Ser Ser Arg 680 oca oga tot oct tta Pro Arg Ser Pro Leu tat gaa gat Tyr Giu Asp gaa oca gtt Glu Pro Val 700 670 tca Ser gag otg gga Glu Leu Gly ccc Pro gaa gat ggg Glu Asp Gly ccc Pro 705 ggc ccc cca oca Gly Pro Pro Pro 710 otg tgg oaa aag Leu Trp Gin Lys 725 gag gaa aag aaa Glu Giu Lys Lys got att ott caa Ala Ile Leu Gin 730 aag otc caa gc Lys Leu Gin Ala agg aca Arg Thr 715 cag ata Gin Ile tct cgt gag otc oga gag Ser Arg Glu Leu Arg Glu 720 ctg otg ott aga atg gag Leu Leu Leu Arg Met Glu aag gaa Lys Glu 740 cgo ctg Arg Leu 755 aat Asn cag Gin tct gaa aat Ser Giu Asn gat Asp 750 tgt Cys 735 ttg Leu ctg aac aag Leu Asn Lys aag otc gat Lys Leu Asp tat Tyr 760 gaa gaa att act Glu Glu Ile Thr ott aaa gaa gta Leu Lys Glu Vai 770 WO 00/08209 PCT/IB99/0I 444 act aca gtg tgg Thr Thr Val Trp aag ttt gac atg Lys Phe Asp Met 790 aag atg ctt agc Lys Met Leu Ser act Thr 780 gct Ala cca gga aga tca Pro Giy Arg Ser aaa att Lys Ile 785 gaa aaa atg cac Glu Lys Met His gtt ggg caa ggt gtg cca Vai Gly Gin Gly Val Pro 800 cgt cat cac Arg His His 805 cga ggt Arg Gly gaa atc Glu Ile tgg aaa Trp Lys 810 aaa cag Lys Gin ctt aaa Leu Lys 820 aaa gaa Lvs Glu cac cag ttt ccc His Gin Phe Pro ttt cta gct gag Phe Leu Ala Glu 815 cag cca aag gat Gin Pro Lys Asp 830 caa ttc cac Gin Phe His gtg cca tac Val Pro Tyr agc Ser 825 ctc tta aag Leu Leu Lys 835 gac Asp gga Gly cag Gin 840 ttt Phe ctg act tcc Leu Thr Ser cag cag Gin Gin ctt ggg cga Leu Gly Arg gca gga cag Ala Giv Gin acc Thr 855 845 cct aca cac cca tac Pro Thr His Pro Tyr cat gcg att ctt att His Ala Ile Leu Ile 850 ttc tct gcc cag ctt Phe Ser Ala Gin Leu 865 aag gcc tac tca ctt Lys Ala Tyr Ser Leu 2528 2576 2624 2672 2720 2768 2816 2864 2912 2960 3008 870 gaa Glu cta tcg ctt tac aac Leu Ser Leu Tyr Asn 875 gtg gga tat tgc caa Val Gly Tyr Cys Gin 860 att Ile ggt Gly ttg Leu cta gac Leu Asp att ttg Ile Leu 900 cag Gin 885 ctt Leu 880 ctc agc ttt gta Leu Ser Phe Val gca ggc Ala Gly ctt cat atg Leu His Met ttt Phe 915 att Ile ctg atg ttt gac atg Leu Met Phe Asp Met 920 att tta cag atc cag Ile Leu Gin Ile Gin agt gag gaa gag gcg Ser Giu Giu Giu Ala 905 ggg ctg cgg aaa cag Gly Leu Arg Lys Gin 925 atg tac cag ctc tcg Met Tyr Gin Leu Ser 940 ttt aaa atg ctc aag Phe Lys Met Leu Lys 910 tat cgg cca gac atg Tyr Arg Pro Asp Met 930 agg ttg ctt cat gat Arg Leu Leu His Asp 945 tac cac aga gac ctc Tyr His Arg Asp Leu 950 agc ctc tac gct gcc Ser Leu Tyr Ala Ala 965 tac aat cac ctg Tyr Asn His Leu 955 ccc tgg ttc ctc Pro Trp Phe Leu 970 gag gag cac gag atc ggc ccc Glu Giu His Giu Ile Gly Pro acc atg ttt Thr Met Phe 960 tca cag ttc Ser Gin Phe ctt cag gga Leu Gin Gly 3056 3104 3152 ccg ctg Pro Leu 980 gga ttc gta gcc aga Gly Phe Val Ala Arg 985 gtc ttt gat atg Val Phe Asp Met att Ile 990 WO 00/08209 WO 0008209PCTIIB99/OJ 444 aca Thr 995 gag gtc ata ttt aaa gtg Giu Val Ile Phe Lys Val ccc ttg Pro Leu 1000 att ctg cag cat Ile Leu Gin His 1015 get tta agt etg ttg Ala Leu Ser Leu Leu 1005 aae cta gaa ace ata Asn Leu Giu Thr Ile 1020 ggc ttg gta eag atg Gly Leu Val Gin Met 1035 gga agc cat aag Gly Ser His Lys 1010 gtt gac t tt ata Val Asp Phe Ile 1025 gaa aag acc atc Glu Lys Thr Ile 1040 3200 3248 3296 aaa agc acg cta ccc Lys Ser Thr Leu Pro 1030 aac ctt.
Asn Leu aat cag Asn Gin gtt gag Vai Giu 106C gta ttt Val Phe 1045 gaa atg gac Glu Met Asp atc gct Ile Ala 1050 gaa gaa Giu Glu aaa cag tta caa gct tat gaa Lys Gin Leu Gin Ala Tyr Giu 1055 tac Tyr eac gte ctt caa His Val Leu Gin 1065 caa aga atg gat Gin Arg Met Asp ctt Leu atc gat tcc Ile Asp Ser 1070 tct. cct. ctc Ser Pro Leu ag t Ser 1075 gac aac Asp Asn 1080 aaa tta gag aaa Lys Leu Giu Lys 1085 ctt gaa cag ttg Ueu Glu Gin Leu 1100 acc aae agc agc Thr Asn Ser Ser cag gtg gca aat Gin Val Ala Asn cgc aaa. cag aac ctt Arg Lys Gin Asn Leu 1095 gac etc Asp Leu tta Leu 1090 ggt Giy gag Giu 3344 3392 3440 3488 3536 3584 agg atc caa agc ctt, Arg Ile Gin Ser Leu 1110 agc aag ctg aag cag Ser Lys Leu Lys Gin 1125 ctg etg cag acg gtg Leu Leu Gin Thr Val 1140 gag gee ace Giu Ala Thr att. gag Ile Glu 1115 aag ctc ctg agc Lys Leu Leu Ser 1120 110 agt Ser gcc atg ctt aec tta Ala Met Leu Thr Leu 1130 gag gag ctg cgg egg Glu Glu Leu Arg Arg 1145 gaa ctg gag egg tcg gcc Glu Leu Glu Arg Ser Ala 1135 cgg Arg ccc Pro gac Asp cgg gag cct gag tgc Arg Giu Pro Giu Cys aeg cag ccc gag Thr Gin Pro Giu 1155 1160 1165 cagctctgca ggagagattg caacaceatc ccacactgtc cagaagaegc tggaaggaga gaaggaageg ggaagtgtgc ttgeeageaa gtagattctt acgaactcca acttgcaatt tttttttgtt gtttttagat actaaatcgt cccttctcca tagctttaga tggcgtggac gtgaataaat. gcaacttatg aaaaaa.
age gca. gag ccc agc Ser Ala Glu Pro Ser 1150 aeg ggc gac tga Thr Gly Asp caggeettaa ctgagaggga ttetcaggga ggaaaecggc cagggggcat gtceagtgt gtcctgatta etgtacaeag ttttaaaaaa aaaaaaaaaa 3632 3677 3737 3797 3857 3917 3977 3983 <210> 4 <211> 3988 WO 00/08209 <212> DNA PCT/1B99/01444 <213> <220> <221> <222> <220> <221> <222> <223> <220> <221> <222> <223> Homo sapiens
CDS
176. .3730 poiyA,_signal 3947. .3952
AATAAA
misc-feature 1. .458 homology with Genset 51 EST in ref :A35235 <400> 4 ataataggca acctgggcga ccaaaacaga ctgaagacat gttaatggaa ggtggatttg gtcttttaaa atgtttctgc atatgaagtg aaaacagtga taactgtttt gctgagttcc tgattcagaa tgtaaaatag cagacccttc cctctagact attgcttgat ccaag atg gaa cca ata aca ttc aca gca agg Giu Pro Ile Thr Phe Thr Ala Arg aaa Lys 10 cat ctg ctt cct His Leu Leu Pro aac gag gtc Asn Glu Val tcg gtg gat Ser Val Asp ctg acc acc Leu Thr Thr ttt ggc ctg cag ctg Phe Gly Leu Gin Leu 25 atg ccc atg ctg ccc Met Pro Met Leu Pro 40 gtg ggc tcc ctg Val Gly Ser Leu tgg gtt gtg gct Trp Val Val Ala gaa cct gta acc Glu Pro Val Thr 60 cct gtg cat tcc Pro Val His Ser gag gtg cga aga Giu Val Arg Arg aag caa gtc cgg Lys Gin Val Arg ctc Leu s0 ctt Leu agc agg cag tcc Ser Arg Gin Ser tgc gtt tca ccc Cys Val Ser Pro acc Thr 55 tct Ser aga aag Arg Lys gga ctg aga tgt Gly Leu Arg Cys 75 gaa cct gag cca Glu Pro Glu Pro ggg aga Gly Arg agt caa cag Ser Gin Gin tgg Trp gat ccc ctg atc Asp Pro Leu Ile tat Tyr 90 cac tcc agc atc ttt Ser Ser Ile Phe aac agt cat gac gag tgc aag Glu Cys Lys cca agt tac 418 466 514 cct cag cgt gtt cac aaa ctg att WO 00/08209 WO 0008209PCT/1B99/0I 444 Pro Gin Arg Val His Lys Leu Ile 100 105 ttt gct tgt ctg att aag gaa gac Phe Ala Cys Leu Ile Lys Giu Asp 115 His Asn Ser His Asp Pro Ser Tyr get gtc cac Ala Val His egg Arg 125 cet Pro agt atc tgc Ser Ile Cys tat Tyr 130 tcc Ser ccg Pro gtg ttc aaa gcc gat Val Phe Lys Ala Asp 135 atc cgt cag gcg ggg Ile Arg Gin Ala Gly 150 tee gag ttc gac gac Ser Giu Phe Asp Asp caa aca aaa.
Gin Thr Lys gtg Val1 140 gag atc ate agc Gu Ile Ile Ser 145 gag etg cac tgc Giu Leu His Cys aag atc gce cgg Lys Ile Ala Arg 155 aeg ttt tcc aag Thr Phe Ser Lys 170 cag gag Gin Glu 165 aag ttc gag gtg Lys Phe Giu Val 175 160 cte ttc Leu Phe etg ate Leu Ile tgc ggc cge Cys Gly Arg 180 gtg acg gtg Val Thr Val gac gag Asp Giu 195 gag agc Giu Ser tgc atc gag aag Cys Ile Giu Lys gcg cac Ala His 185 ttc aat Phe Asn 200 ccg ccc Pro Pro aag aag get ccg ceg Lys Lys Ala Pro Pro 190 eac gte agc ggc agc His Val Ser Gly Ser 205 eat gcc gcg eec aca His Ala Ala Pro Thr gce Ala cgg ggg tce Arg Gly Ser ggg agc cag ccc egc ccc Pro Arg Pro 210 gag Giu egc Arg aac Asn 215 ccc Pro cct gtg egc agg Pro Val Arg Arg 230 tcg ctg gcc ttt Ser Leu Ala Phe atg cgc aag tcc Met Arg Lys Ser 235 225 tcc cag ccc ggc ctg Ser Gin Pro Giy Leu 240 agc ggc Ser Giy att agc Ile Ser 275 245 ttc ttc Phe Phe agg aag gag ctg Arg Lys Glu Leu 250 tcc ttc gag gag Ser Phe Giu Giu cag gat ggg Gin Asp Gly age gac att Ser Asp Ile age Ser ggc ctc cga agc Gly Leu Arg Ser 255 gag aac cac ctc Giu Asn His Leu 270 gag gaa aat cga Glu Glu Asn Arg 754 802 850 898 946 994 1042 1090 1138 260 gga.
Giy cac aat att gtg His Asn Ile Val 280 eag eec aca gat Gin Pro Thr Asp atc Ile 285 act Thr 290 gac Asp atg ctc ttc acg att Met Leu Phe Thr Ile 295 ace aaa aaa ata gea Thr Lys Lys Ile Ala 310 ggc Gly eag tet gaa gtt tac etc ate agt Gin Ser Giu Vai Tyr Leu Ile Ser 300 eet Pro 305 ttg gag aaa Leu Giu Lys aat Asn 315 ttt aag gag ata tee ttt Phe Lys Giu Ile Ser Phe 320 ttt ggg ttt ate tgt egg tge tct cag gge ate aga cae gtg gac cac 1186 WO 00/08209 WO 0008209PCT/1B99/01 444 Cys Ser Gin Gly 325 gag tct tcc gga Glu Ser Ser Gly 340 tgc aca aat gag Cys Thr Asn Giu 355 Ile Arg His Val ggt ggc ggc ttt Gly Gly Gly Phe 345 gct ctg gtt gat Ala Leu Vai Asp Asp 330 cat His gaa Giu His Phe Giy Phe Ile Cys Arg 335 ttt gto tgt tao gtg ttt cag Phe Val Cys Tyr Vai Phe Gin 350 att atg atg aco ctg aaa cag Ile Met Met Thr Leu Lys Gin 365 1234 1282 360 9CC Aila 370 ctg Leu ttc acg gtg goc gca Phe Thr Vai Aia Aia 375 tgt gag ggc tgc coo Cys Giu Gly Cys Pro gtg cag cag aca Val Gin Gin Thr ctg caa agc otg Leu Gin Ser Leu 395 tcc aaa aca aaa Ser Lys Thr Lys got Ala 380 aag gcg cca gc Lys Ala Pro Ala cag Gin 385 cac aag ctc tgt His Lys Leu Cys ata gag gga Ile Giu Giy atg Met 405 tta Leu 390 aat Asn acc Thr gag agg Giu Arg 400 aag cao Lys His tot Ser ota gaa otg Leu Giu Leu oaa Gin 415 otg acg Leu Thr cag aaa Gin Lys 435 tot ttt Ser Phe aca Thr 420 ttg Leu 410 aat cag gag cag Asn Gin Giu Gin 425 909 aot att Ala Thr Ile ttt Phe 430 gaa gag gtt Giu Giu Val aga cog aga Arg Pro Arg aat Asn 440 gag cag oga gag Giu Gin Arg Giu aat As n 445 gaa ttg att att Giu Leu Ile Ile Otg aga tgt tta Leu Arg Cys Leu 455 gag atg aag cag Glu Met Lys Gin 999 Gly tat gaa gag aaa Tyr Giu Giu Lys aca tog oag atg Thr Ser Gin Met 475 gco act cga ttt Ala Thr Arg Phe 490 oag Gin 460 aaa gaa oao ato Lys Giu His Ile 1330 1378 1426 1474 1522 1570 1618 1666 1714 1762 1810 1858 470 goa goa gag aat Ala Aia Giu Asn agg ota gat atg Arg Leu Asp Met 495 att gga Ile Gly 480 ctg aaa Leu Lys agt gaa tta Ser Giu Leu aac aaa goa Asn Lys Ala 500 coa Pro 485 aag Lys 000 agt Pro Ser aga tot tta aca Arg Ser Leu Thr 505 900 aga 990 otg Ala Arg Gly Leu 520 gag tot tta gaa agt att ttg too Giu Ser Leu Giu Ser Ile Leu Ser 099 ggt Arg Gly 515 ctg gat Leu Asp aat aaa Asn Lys cag gaa cac toc Gin Giu His Ser 525 510 atc agt gtg gat Ile Ser Val Asp ago too ctg tot Ser Ser Leu Ser 535 agt aca tta agt Ser Thr Leu Ser aao Asn 540 aoo ago aaa gag Thr Ser Lys Giu 530 tot cca Pro 545 aag gtg tgt gaa aag gag goc ttg 000 atc tot gag agc too ttt WO 00/08209 PCT/IB99/01444 Ser Val Cys Glu ctc ctc ggc tcc Leu Leu Gly Ser 565 cca gaa gag cca Pro Giu Giu Pro 580 Lys 550 tcg Ser Glu Ala Leu Pro gag gac ctg Glu Asp Leu tee agt Ser Ser 570 Ser Glu Ser Ser gac tcg gag agt Asp Ser Giu Ser 575 Phe Lys 560 cat ctc His Leu gct ccg ctg Ala Pro Leu ccc cag cag gcc Pro Gin Gin Ala ttc agg agg cga Phe Arg Arg Arg 590 gaa cct cca caa Glu Pro Pro Gin gca aac Ala Asn 595 cct gcc Pro Ala acc ctg agt eac Thr Leu Ser His cgg ggg tcc ccg Arg Gly Ser Pro 615 gtg age aca gag Val Ser Thr Glu ttc Phe 600 ccc atc gaa tgc Pro Ile Giu Cys cag Gin 605 ggg gtt tcg caa agg Gly Val Ser Gin Arg 620 610 cac His aaa ctt atg agg tat Lys Leu Met Arg Tyr 625 aag gac ttt gaa tcc Lys Asp Phe Giu Ser 640 tca Ser acg cct cat Thr Pro His 630 aaa gca aac cat Lys Ala Asn His 645 agg cat tcc tgg Arg His Ser Trp 660 ctt Leu agg Arq ggt gat tct ggt Gly Asp Ser Gly 650 cag eag ata ttc Gin Gin Ile Phe gaa Glu 635 ggg Gly act cet gtg Thr Pro Val ace egg Thr Arg ccg cag Pro Gin 1906 1954 2002 2050 2098 2146 2194 2242 2290 2338 2386 2434 etc ega gta gee Leu Arg Val Ala 670 665 aag gcg Lys Ala 675 gag ctt Glu Leu tgc gat tct tee age Cys Asp Ser Ser Ser 680 ccc eca ega tct cct Pro Pro Arg Ser Pro 695 ccc cca eca gag gaa Pro Pro Pro Giu Glu aga Arg tat gaa gat Tyr Giu Asp tat Tyr 685 tea gag etg gga Ser Glu Leu Gly 690 ttt Phe gge Gly tta gaa cca gtt Leu Giu Pro Val 700 aag aaa agg aca Lys Lys Arg Thr 715 ett eaa eag ata Leu Gin Gin Ile tgt gaa gat ggg Cys Giu Asp Gly 710 tct cgt gag etc ega Ser Arg Giu Leu Arg 720 etg etg ctt aga atg Leu Leu Leu Arg Met 735 gag etg egg Glu Leu Trp eaa Gin 725 aat Asn aag get ate Lys Ala Ile gag aag Glu Lys gaa Glu 740 730 eag aag etc caa gee Gin Lys Leu Gin Ala tct gaa aat gat ttg ctg aac Ser Glu Asn Asp Leu Leu Asn 750 aag cgc ctg Lys Arg Leu 755 aag etc gat Lys Leu Asp tat Tyr 760 gaa gaa att act Glu Giu Ile Thr ccc tgt ctt aaa gaa Pro Cys Leu Lys Glu 765 2482 gta act aca geg tgg gaa aag aeg ctt age act eca gga aga tea aaa 2530 WO 00/08209 PCT/B99/01444 Thr Thr Val Trp att aag ttt gac atg Ile Lys Phe Asp Met 790 Glu Lys Met Leu Ser 775 gaa aaa atg eac tcg Glu Lys Met His Ser 795 ggt gaa atc tgg aaa Gly Giu Ile Trp Lys 810 Thr 780 gct Ala Pro Gly Arg Ser Lys 785 gtt ggg Val Gly caa ggt gtg Gin Gly Val 800 cca cgt cat Pro Arg His cga Arg ttt eta get gag eaa tte Phe Leu Ala Giu Gin Phe 815 eag cea aag gat gtg eca Gin Pro Lys Asp Val Pro cac ett His Leu tac aaa Tyr Lys 835 aaa Lys 820 gaa Glu eac eag ttt ccc His Gin Phe Pro etc tta aag cag Leu Leu Lys Gin 840 age aaa Ser Lys 825 etg act Leu Thr eag Gin tee eag cag Ser Gin Gin 845 830 eat His gcg att ctt Ala Ile Leu 2578 2626 2674 2722 2770 2818 2866 2914 att Ile 850 ctt Leu gae ctt Asp Leu ggg ega ace Gly Arg Thr 855 gga gea gga cag eta Gly Ala Gly Gin Leu ttt ect aea eac eca Phe Pro Thr His Pro 860 tcg ctt tac aac att Ser Leu Tyr Asn Ile 875 gga tat tgc caa ggt Gly Tyr Cys Gin Gly tac Tyr tte tct gee Phe Ser Ala ttg aag gee tac tea Leu Lys Aia Tyr Ser 880 etc age ttt gta gca Leu Ser Phe Val Ala 895 ctt eta gac cag Leu Leu Asp Gin 885 gge att ttg ctt Gly Ile Leu Leu 900 gaa gtg Glu Val 890 ett eat atg agt gag Leu His Met Ser Glu 905 gaa gag gcg Glu Giu Ala ttt Phe 910 aaa atg etc Lys Met Leu aag ttt Lys Phe 915 atg att Met Ile etg atg ttt gac atg Leu Met Phe Asp Met 920 ggg etg egg aaa Gly Leu Arg Lys cag Gin 925 tcg Ser att tta cag Ile Leu Gin eag atg tac cag Gin Met Tyr Gin etc Leu 940 930 gat Asp tat egg eca gac Tyr Arg Pro Asp agg ttg ctt cat Arg Leu Leu His 945 cac gag ate ggc His Glu Ile Gly 960 tac cac aga Tyr His Arg gac Asp 950 get Ala etc tac aat cac Leu Tyr Asn His gee ccc tgg ttc Ala Pro Trp Phe ctg Leu 955 gag gag Glu Glu 2962 3010 3058 3106 3154 ccc age etc Pro Ser Leu ttc ceg ctg Phe Pro Leu 980 gga aca gag 970 gte Val etc ace atg ttt gee tea cag Leu Thr Met Phe Ala Ser Gin 975 ttt gat atg att ttt ett cag Phe Asp Met Ile Phe Leu Gin 990 tte gta gee Phe Val Ala aga Arg 985 gte ata ttt aaa gtg get tta agt ctg ttg gga age cat 3202 WO 00/08209 WO 8008209PCTIIB99/0 1444 Gly Thr 995 aag ccc Lys Pro 1010 ata aaa Ilie Lys Giu Vai Ile Phe ttg att ctg cag Leu Ile Leu Gin 1015 age aeg cta ec Ser Thr Leu Pro 1030 Lys Val Aia Leu Ser Leu Leu 1000 1005 cat gaa aac cta gaa acc ata His Giu Asn Leu Glu Thr Ile 1020 aac ctt ggc ttg gta cag atg Asn Leu Gly Leu Val Gin Met 1035 Gly Ser His gtt gac ttt Val Asp Phe 1025 gaa aag acc Giu Lys Thr 1040 caa gct tat Gin Ala Tyr 1055 3250 3298 atc aat cag gta ttt Ile Asn Gin Val Phe 1045 gaa gtt gag tac cac Giu Val Giu Tyr His 1060 gaa atg gac ate gct Giu Met Asp Ile Ala 1050 gte ett caa gaa gaa Vai Leu Gin Giu Giu 1065 aaa cag tta Lys Gin Leu ctt ate gat tcc tet ect Leu Ile Asp Ser Ser Pro 1070 gag aaa ace aae age age Giu Lys Thr Asn Ser Ser 1085 3346 3394 3442 etc Leu agt gac Ser Asp 1075 aac eaa aga Asn Gin Arg atg Met 1080 gat aaa tta Asp Lys Leu tta ege Leu Arg 1090 ggt agg Gly Arg gag age Ghi Ser aaa cag aac Lys Gin Asn ate caa age Ile Gin Ser 1l10 aag etg aag Lys Leu Lys 1125 ctt Leu 1095 gac etc ctt gaa cag Asp Leu Leu Giu Gin 1100 ttg cag gtg gca aat Leu Gin Val Ala Asn ctt gag Leu Giu gee ace Ala Thr att gag aag etc etg Ile Giu Lys Leu Leu i115 1105 age agt Ser Ser 1120 3490 3538 3586 cag gee atg ctt ace Gin Ala Met Leu Thr 1130 gtg gag gag ctg egg Val Giu Giu Leu Arg tta gaa ctg gag egg teg Leu Glu Leu Giu Arg Ser 1135 gee ctg ctg eag aeg Ala Leu Leu Gin Thr 1140 egg age Arg Ser 1150 3ca gag ccc kla Giu Pro 3634 1145 age gac Ser Asp egg gag ect gag tge Arg Giu Pro Glu Cys aeg cag eec gag Thr Gin Pro Giu 1155 1160 eagctetgca ggagagattg caaeaecate eeacaetgtc cagaagacgc tggaaggaga gaaggaageg ggaagtgtgc ttgceagcaa gtagattctt acgaacteea acttgeaatt tttttttgtt gtttttagat actaaatcgt ceetteteca tagctttaga tggegtggac gtgaataaat geaacttatg aaaaaa ccc aeg gge gac tga Pro Thr Gly Asp* 1165 caggccttaa etgagaggga tteteaggga ggaaaccgge eagggggeat gtcceagtgt gtectgatta etgtacaeag ttttaaaaaa aaaaaaaaaa 3682 3742 3802 3862 3922 3982 3988 <~210> '<21i> 1168 <21.2> PRT WO 00/08209 <213> Homo sapiens PCTIIB99/0I 444 <400> Met Val Ser Arg Arg Arg Lys Tyr Cys Ser 145 Cys Phe Ile Ser Gin 225 Leu Ser Leu Arg Pro Giu Ser Leu Leu Leu Ser Pro Phe Tyr 130 Ser Pro Cys Asp Giu 210 Giu Arg Ser Ile Thr 290 Asp Pro Ile Val Asp Thr Thr Ser Arg Cys Vai Gin Gin Gin Arg 100 Ala Cys 115 Val Phe Ile Arg Ser Giu Gly Arg 180 Glu Cys 195 Ser Pro Pro Val Ser Leu Gly Phe 260 Ser Gly 275 Met Leu Thr Lys Thr Phe Phe Gly Met Pro Gin Ser Ser Pro 70 Trp Asp Val His Leu Ile Lys Ala Gin Ala 150 Phe Asp 165 Val Thr Ile Giu Arg Pro Arg Arg 230 Ala Phe 245 Phe Ser His Asn Phe Thr Lys Ile Thr Leu Met Thr 55 Ser Pro Lys Lys Asp 135 Gly Asp Vai Lys Asn 215 Pro Arg Ser Ile Ile 295 Al a Ala Arg Gin Leu 25 Leu Pro 40 Arg Lys Gly Leu Leu Ile Leu Ile 105 Giu Asp 120 Asp Gin Lys Ile Thr Phe Ala His 185 Phe Asn 200 Pro Pro Met Arg Lys Giu Phe Giu 265 Val Gin 280 Gly Gin Leu Giu Lys 10 Val1 Trp Giu Arg Tyr 90 His Al a Thr Ala Ser 170 Lys His His Lys Leu 250 Giu Pro Ser Lys His Gly Vai Pro Cys 75 Ser Asn Val1 Lys Arg 155 Lys Lys Val1 Al a Ser 235 Gin Ser Thr Giu Asn Leu Ser Val Val Glu Ser Ser His Val 140 Gin Lys Ala Ser Al a 220 Phe Asp Asp Asp Val1 300 Phe Leu Leu Ala Thr Pro Ile His Arg 125 Pro Giu Phe Pro Gly 205 Pro Ser Gly Ile Ile 285 Tyr Lys Pro Asn Pro Val Glu Val Lys Gin Giu Pro Phe Giu Asp Pro 110 Gin Ser Giu Ile Glu Leu Giu Val 175 Pro Ala 190 Ser Arg Thr Gly Gin Pro Gly Leu 255 Giu Asn 270 Giu Giu Leu Ile Giu Ile Giu His Arg Val Gly Cys Ser Ile Ile His 160 Leu Leu Gly Ser Gly 240 Arg His Asn Ser Ser WO 00/08209 WO 0008209PCT/1B99/01 444 305 Phe Cys Arg Glu Gin Cys Gin Ala 370 Gin Leu 385 Arg Ile His Leu Val Gin Ile Ser 450 His Ile 465 Gly Ser Lys Asn Ser Arg Asp Leu 530 Pro Ser 545 Lys Leu Leu Pro Arg Ala Gin Pro 610 Tyr His 625 Ser Lys Ser Ser Thr 355 Phe Cys Giu Thr Lys 435 Phe Giy Giu Lys Gly 515 Asp Val Leu Giu Asn 595 Al a Ser Ala Gin Ser 340 Asn Thr Glu Gly Thr 420 Leu Leu Glu Leu Ala 500 Asn Ser Cys Gly Giu 580 Thr Arg Val Asn Gly 325 Gly Giu Val Gly Met 405 Leu Arg Arg Met Pro 485 Lys Lys Ser Giu Ser 565 Pro Leu Gly Ser His 310 Ile Gly Ala Ala Cys 390 Asn Thr Pro Cys Lys 470 Pro Arg Al a Leu Lys 550 Ser Ala Ser Ser Thr 630 Leu Arg Gly Leu Ala 375 Pro Ser Asn Arg Leu 455 Gin Ser Ser Arg Ser 535 Giu Glu Pro His Pro 615 Giu Gly His Giy Val 360 Val1 Leu Ser Gin Asn 440 Tyr Thr Al a Leu Gly 520 Ser Ala Asp Leu Phe 600 Giy Thr Asp Val1 Phe 345 Asp Gin Gin Lys Giu 425 Glu Giu Ser Thr Thr 505 Leu Thr Leu Leu Ser 585 Pro Val Pro Ser Asp 330 His Glu Gin Ser Thr 410 Gin Gin Giu Gin Arg 490 Glu Gin Leu Pro Ser 570 Pro Ile Ser His Gly His Phe Ile Thr Leu 395 Lys Ala Arg Lys Met 475 Phe Ser Giu Ser Ile 555 Ser Gin Glu Gin Giu 635 Gly Phe Val1 Met Al a 380 His Leu Thr Giu Gin 460 Ala Arg Leu His Asn 540 Ser Asp Gin Cys Arg 620 Arg Thr Gly Cys Met 365 Lys Lys Glu Ile Asn 445 Lys Ala Leu Giu Ser 525 Thr Giu Ser Ala Gin 605 Lys Lys Pro Phe Tyr 350 Thr Ala Leu Leu Phe 430 Giu Giu Giu Asp Ser 510 Ile Ser Ser Glu Phe 590 Glu Leu Asp Val Ile 335 Val Leu Pro Cys Gin 415 Glu Leu His Asn Met 495 Ile Ser Lys Ser Ser 575 Arg Pro Met Phe Lys Cys Phe Lys Al a Glu 400 Lys Giu Ile Ile Ile 480 Leu Leu Val Giu Phe 560 His Arg Pro Arg Giu 640 Thr WO 00/08209 PTI9/14 PCT/IB99/01444 Arg Gin Gly Pro 705 Arg Met Asn Giu Lys 785 Val1 Phe Pro Leu Gin 865 Ser Al a Leu Asp His 945 Gly Gin Arg Lys Giu 690 Phe Giu Glu Lys Val1 770 Ile Pro His Tyr Ile 850 Leu Leu Gly Lys Met 930 Asp Pro Phe His Al a 675 Leu Gly Leu Lys Arg 755 Thr Lys Arg Leu Lys 835 Asp Giy Leu Ile Phe 915 Ile Tyr Ser Pro Ser 660 Cys Pro Pro Trp Glu 740 Leu Thr Phe His Lys 820 Giu Leu Al a Asp Leu 900 Leu Ile His Leu Leu Trp Asp Pro Pro Gin 725 Asn Lys Val1 Asp His 805 His Leu Gly Gly Gin 885 Leu Met Leu Arg Tyr 965 Gly Arg Se r Arg Pro 710 Lys Gin Leu Trp Met 790 Arg Gin Leu Arg Gin 870 Giu Leu Phe Gin Asp 950 Al a Phe Gin Ser Ser 695 Giu Ala Lys Asp Giu 775 Glu Gly Phe Lys Thr 855 Leu Val His Asp Ile 935 Leu Ala Val Ile 665 Arg Leu Lys Leu Gin 745 Glu Met Met Ile Ser 825 Leu Pro Leu Tyr Ser 905 Gly Met Asn Trp Arg Phe Tyr Giu Lys Gin 730 Ala Giu Leu His Trp 810 Lys Thr Thr Tyr Cys 890 Giu Leu Tyr His Phe 970 Val Leu Glu Pro Arg 715 Gin Ser Ile Ser Ser 795 Lys Gin Ser His Asn 875 Gin Glu Arg Gin Leu 955 Leu Phe Arg Asp Val1 700 Thr Ile Glu Thr Thr 780 Al a Phe Gin Gin Pro 860 Ile Gly Giu Lys Leu 940 Giu Thr Asp Val Tyr 685 Cys Ser Leu Asn Pro 765 Pro Val1 Leu Pro Gin 845 Tyr Leu Leu Ala Gin 925 Ser Giu Met Met Al a 670 Ser Glu Arg Leu Asp 750 Cys Gly Gly Ala Lys 830 His Phe Lys Ser Phe 910 Tyr Arg His Phe Ile 655 Thr Glu Asp Giu Leu 735 Leu Leu Arg Gin Glu 815 Asp Aila Ser Ala Phe 895 Lys Arg Leu Giu Ala 975 Phe Pro Leu Gly Leu 720 Arg Leu Lys Ser Gly 800 Gin Val Ile Ala Tyr 880 Val Met Pro Leu Ile 960 Ser Leu WO 00/08209 PCTJB99/0444 74 980 985 990 Gin Gly Thr Glu Val Ile Phe Lys Val Ala Leu Ser Leu Leu Gly Ser 995 1000 1005 His Lys Pro Leu Ile Leu Gin His Glu Asn Leu Glu Thr Ile Val Asp 1010 1015 1020 Phe Ile Lys Ser Thr Leu Pro Asn Leu Gly Leu Val Gin Met Glu Lys 1025 1030 1035 1040 Thr Ile Asn Gin Val Phe Glu Met Asp Ile Ala Lys Gin Leu Gin Ala 1045 1050 1055 Tyr Glu Val Glu Tyr His Val Leu Gin Glu Glu Leu Ile Asp Ser Ser 1060 1065 1070 Pro Leu Ser Asp Asn Gin Arg Met Asp Lys Leu Glu Lys Thr Asn Ser 1075 1080 1085 Ser Leu Arg Lys Gin Asn Leu Asp Leu Leu Glu Gin Leu Gin Val Ala 1090 1095 1100 Asn Gly Arg Ile Gin Ser Leu Glu Ala Thr Ile Glu Lys Leu Leu Ser 1105 1110 1115 1120 Ser Glu Ser Lys Leu Lys Gin Ala Met Leu Thr Leu Glu Leu Glu Arg 1125 1130 1135 Ser Ala Leu Leu Gin Thr Val Glu Glu Leu Arg Arg Arg Ser Ala Glu 1140 1145 1150 Pro Ser Asp Arg Glu Pro Glu Cys Thr Gin Pro Glu Pro Thr Gly Asp 1155 1160 1165 <210> 6 <211> 18 <212> DNA <213> Artificial Sequence <220> <221> misc_binding <222> 1..18 <223> sequencing oligonucleotide PrimerPU <400> 6 tgtaaaacga cggccagt <210> 7 <211> 18 <212> DNA <213> Artificial Sequence WO 00/08209 PCTJIB99/0I 444 <220> <221> misc-binding <222> 1. .18 <223> sequencing oligonucleotide PrimerRP <400> 7 caggaaacag ctatgacc 18

Claims (9)

1. An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at least 60 nucleotides of SEQ ID No:l or the complements thereof-
2. An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at least 60 ncleotides of SEQ ID No:2 or the complements thereof.
3. An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at least 60 nucleotides of SEQ ID No:3 or the complements thereof.
4. An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at least 60 nucleotides of SEQ ID No 4 or the complements thereof. S. 15
5. An isolated, purified, or recombinant polynucleotide consisting essentially of a contiguous span of 12 to 50 nucleotides of any one of SEQ ID NOs:1 and 2 or the complement thereof, wherein said span includes a TBC-1-related biallelic marker in said sequence. 20
6. A polynucleotide according to claim 5, wherein said TBC-1-related biallelic marker is selected from the group consisting of the biallelic markers in positions 9494 of the SEQ ID NO:1, and 1443, 5247, 6223, 14723, 19186, 18997, 1989], 29617, 42519, 69324, 69181, 69146, 76458, 78595, 82159, 84522, 84810, and 2 89967 of the SEQ ID NO:2. S"
7. A polynucleotide according to any one of claims 5 or 6, wherein said contiguous span is 18 to 35 nucleotides in length and said biallelic marker is within 4 nucleotides of the center of said polynucleotide-
8. A polynucleotide according to claim 7, wherein said polynucleotide consists of said contiguous span and said contiguous span is 25 nucleotides in length and said biallelic marker is at the center of said polynucleotide.
9. A polynucleotide according to claim 8, wherein said polynucleotide consists essentially of a sequence selected from the sequences with the position range
9482-9506 in SEQ ID NO:1 and with the following position ranges in SEQ ID NO:2 COMS ID No: SMBI-00748252 Received by IP Australia: Time 15:43 Date 2004-05-13 13/05 '04 15:39 FAX 613 9663 3099 F.B. RIgCE CQ. 008 82 1431-1455, 5235-5259, 6211-6235, 14711-14735, 19174-19198,
18985-19009, 29605- 29629, 42507-42531,
69312-69336, 69169-69193, 69134-69158, 78583-78607, 82147- 82171, 84510-84534,
84798-84822, and 89955-89979, and the complementary sequences thereto. A polynucleotide according to any one of claims 1 to 6, wherein the 3' end of said contiguous span is present at the 3' end of said polynucleotide. 11. A polynucleotide according to any one of claims 5 or 6, wherein the 3' end of said contiguous span is located at the 3' end of said polynucleotide and said biallelic marker is present at the 3' end of said polynucleotide. 12. An isolated, purified, or recombinant polynucleotide consisting essentially of a contiguous span of 12 to 50 nucleotides of any one of SEQ ID NOs:1 and 2 or the 15 complement thereof, wherein the 3' end of said contiguous span is located within nucleotides upstream of a TBC-I-related biallelic marker in said sequence. .13. A polynucleotide according to claim 12, wherein the 3' end of said polynucleotide is located 1 nucleotide upstream of said TBC-J-related biallelic marker in said sequence. 14. A polynucleotide according to claim 13, wherein said polynucleotide consists essentially of a sequence selected from the sequences with the position range 9475-9493 in SEQ ID NO:1 and with the following position ranges in SEQ ID NO:2 25 1424-1442, 5228-5246, 6204-6222, 14704-14722, 19167-19185, 18978-18996, 19872- S19890, 29598-29616, 42500-42518, 69305-69323, 69162-69180, 69127-69145, 76439- 76457, 78576-78594, 82140-82158, 84503-84521, 84791-84809, and 89948-89966, and the complementary position range 9495-9513 in SEQ ID NO:1 and the following complementary position ranges in SEQ ID NO:2: 1444-1462, 5248-5266, 6224-6242, 14724-14742, 19187-19205, 1899.8-19016, 19892-19910, 29618-29636, 42520-42538, 69325-69343, 69182-69200, 69147-69165, 76459-76477, 78596-78614, 82160-82178, 84523-84541, 84811-84829, and 89968-89986. An isolated, purified, or recombinant polynucleotide consisting essentially of a sequence selected from the sequences with the position range 9391-9408 in SEQ ID NO:1 and with the following position ranges in SEQ ID NO:2 988-1006, 5039- COMS ID No: SMBI-00748252 Received by IP Australia: Time 15:43 Date 2004-05-13 13/05 '04 15:40 FAX 613 9663 3099 F.B. RICE Co. i 009 83 5056, 5997-6015, 14371-14390, 18751-18771, 19605-19625, 29529-29547, 42268- 42287, 69026-69046, 76323-76343, 78292-78309, 81893-81912, 84392-84412, and
89746-89765, and the complementary position range 9828-9845 in SEQ ID NO:1 and the following complementary position ranges in SEQ ID NO:2: 1509-1529, 5534-5554, 6332-6350, 14798-14817, 19198-19217, 19986-20005, 30041-30061, 42732-42752, 69525-69543, 76771-76790, 78704-78721, 82353-82372, 84909-84929, and 90179-
90198- 16. An isolated, purified, or recombinant polynucleotide which encodes a polypeptide comprising contiguous span of at least 6 amino acids of SEQ ID wherein the polypeptide is not a murine TBC-1 protein (Mur. tbc 1) as provided in Figure 1. 17. Use of a polynucleotide comprising a contiguous span of at least 12 15 nucleotides of the SEQ ID NO:1 or 2 or the complementary sequence thereto for determining the identity of the nucleotide at a TBC-1-related biallelic marker. 18. Use according to claim 17 in a microsequenciDng assay, wherein the 3' end of said contiguous span is located at the 3' end of said polynucleotide and wherein the 3' end of said 10 polynucleotide is located 1 nucleotide upstream of said TBC-I-related biallelic marker. 19. Use according to claim 17 in a hybridization assay, wherein said contiguous span includes said TBC-1-related biallelic marker. S 20. Use according to claim 17 in a specific amplification assay, wherein the 3' end of said contiguous span is located at the 3' end of said polynucleotide and said biallelic marker is present at the 3' end of said polyoucleotide. 21. Use according to claim 17 in a sequencing assay, wherein the 3' end of said contiguous span is located at the 3' end of said polynucleotide. 22. Use according to any one of claims 17 to 21, wherein said TBC-I-related biallelic marker is a biallelic marker selected from the group consisting of the biallelic markers in positions 9494 of the SEQ ID NO:1, and 1443, 5247, 6223, 14723, 19186, COMS ID No: SMBI-00748252 Received by IP Australia: Time 15:43 Date 2004-05-13 13/05 '04 15:40 FAX 613 9663 3099 F.B. RICE Co. 21010 84 18997, 19891, 29617, 42519, 69324, 69181, 25 69146, 76458, 78595, 82159, 84522, 84810, and 89967 of the SEQ ID NO:2. 23. A polynucleotide according to any one of claims 1 to 16 attached to a solid support. 24. An array of polynudcleotides comprising at least one polynucleotide according to claim 23. 25. An array according to claim 24, wherein said array is addressable. 26. A polynucleotide according to any one of claims 1 to 16 further comprising a label. @0 15 27. A recombinant vector comprising a polynucleotide according to any one 0 0 •of claims I to 4 and 16. 00 o o* 28- A host cell comprising a recombinant vector according to claim 27. 29- A non-human host animal or mammal comprising a recombinant vector according to claim 27. 0 0 0000 "30. A method of genotyping a human comprising determining the identity of a :nucleotide at a human TBC-1-related biallelic marker or the complement thereof in a 0000 25 biological sample. 00o0* 0 0 31. A method according to claim 30, wherein said biological sample is derived from a single subject 32. A method according to claim 31, wherein the identity of the nuoleotides at said biallelic marker is determined for both copies of said biallelic marker present in said individual's genome. 33. A method according to claim 30, wherein said biological sample is derived from multiple subjects. COMS ID No: SMBI-00748252 Received by IP Australia: Time 15:43 Date 2004-05-13 13/05 '04 15:40 FAX 613 9663 3099 F.B. RICE Co. 0_011 34. A method according to claim 30, further comprising amplifying a portion of said sequence comprising the biallelic marker prior to said determining step. A method according to claim 34, wherein said amplifying is performed by PCR. 36. A method according to claim 30, wherein said determining is performed by a hybridization assay. 37- A method according to claim 30, wherein said determining is performed by a sequencing assay. 38. A method according to claim 30, wherein said determining is performed eo by a microsequencing assay. 0. 39. A method according to claim 30, wherein said determining is performed by an enzyme-based mismatch detection assay. 4@ S 0 S 0 A method according to any one of claims 30 to 39 wherein said TBC-1 related biallelic marker is selected, from the group consisting of the biallelic markers in positions 9494 of the SEQ ID NO:1, and 1443, 5247, 6223, 14723, 19186, 18997, 19891, 29617, 42519, 69324, 69181, 69146, 76458, 78595, 82159, 84522, 84810, and 00 89967 of the SEQ ID NO:2. 0 25 41. An isolated, purified, or recombinant polypeptide comprising a continuous span of at least 8 amino acids of SEQ ID NO:5, wherein the polypeptide is not a murine "TBC-1 protein (Mur. tbcl) as provided in Figure 1. 42. An isolated or purified antibody composition capable of selectively binding to an epitope-containing fragment of a polypeptide according to claim 41, wherein the antibody does not bind a murine TBC-1 protein (Mur. tbcl) as provided in Figure 1. 43. An isolated, purified or recombinant polynucleotide comprising a contiguous span of at least 60 nucleotides of SEQ ID No. 1 or the complementary COMS ID No: SMBI-00748252 Received by IP Australia: Time 15:43 Date 2004-05-13 13/05 '04 15:40 FAX 613 9663 3099 F.B. RICE Co. 1012 86 sequence thereof, substantially as hereinbefore described with reference to any one of the Examples. 44. An isolated, purified or recombinant polynucleotide comprising a contiguous span of at least 60 nucleotides of SEQ ID No. 2 or the complementary sequence thereof substantially as hereinbefore described with reference to any one of the Examples. An isolated, purified or recombinant polynucleotide comprising a contiguous span of at least 60 nucleotides of SEQ ID No. 3 or the complementary sequence thereof, substantially as hereinbefore described with reference to any one of the Examples. 46. An isolated, purified or recombinant polynucleotide comprising a 15 contiguous span of at least 60 nucleotides of SEQ ID No. 4 or the complementary sequence thereof, substantially as hereinbefore described with reference to any one of the Examples. be e e 47. An isolated, purified or recombinant polynucleotide comprising a 20 contiguous span of 12 to 50 nucleotides of any one of SEQ ID Nos 1 and 2 or the .O complementary sequence thereof, wherein the span includes a TBC-1-related biallelic marker, substantially as hereinbefore described with reference to any one of the Examples. 25 48. A method of genotyping a human comprising determining the identity of a G nucleotide at a human TBC-1-related biallelic marker or the complement thereof, substantially as hereinbefore described with reference to any one of the examples. Dated this thirteenth day of May 2004 Genset S.A. Patent Attorneys for the Applicant: F B RICE CO COMS ID No: SMBI-00748252 Received by IP Australia: Time 15:43 Date 2004-05-13
AU51878/99A 1998-08-07 1999-08-06 Nucleic acids encoding human TBC-1 protein and polymorphic markers thereof Ceased AU774440B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US9565398P 1998-08-07 1998-08-07
US60/095653 1998-08-07
PCT/IB1999/001444 WO2000008209A2 (en) 1998-08-07 1999-08-06 Nucleic acids encoding human tbc-1 protein and polymorphic markers thereof

Publications (2)

Publication Number Publication Date
AU5187899A AU5187899A (en) 2000-02-28
AU774440B2 true AU774440B2 (en) 2004-06-24

Family

ID=22252989

Family Applications (1)

Application Number Title Priority Date Filing Date
AU51878/99A Ceased AU774440B2 (en) 1998-08-07 1999-08-06 Nucleic acids encoding human TBC-1 protein and polymorphic markers thereof

Country Status (5)

Country Link
EP (1) EP1108059A2 (en)
JP (1) JP2002532057A (en)
AU (1) AU774440B2 (en)
CA (1) CA2337694A1 (en)
WO (1) WO2000008209A2 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5700927A (en) * 1994-12-23 1997-12-23 The Children's Medical Center Corporation Tbc1 gene and uses thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0941366A2 (en) * 1996-11-06 1999-09-15 Whitehead Institute For Biomedical Research Biallelic markers
WO1999032644A2 (en) * 1997-12-22 1999-07-01 Genset Prostate cancer gene

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5700927A (en) * 1994-12-23 1997-12-23 The Children's Medical Center Corporation Tbc1 gene and uses thereof

Also Published As

Publication number Publication date
JP2002532057A (en) 2002-10-02
AU5187899A (en) 2000-02-28
WO2000008209A3 (en) 2000-11-09
EP1108059A2 (en) 2001-06-20
CA2337694A1 (en) 2000-02-17
WO2000008209A2 (en) 2000-02-17

Similar Documents

Publication Publication Date Title
US6605432B1 (en) High-throughput methods for detecting DNA methylation
AU781437B2 (en) A novel BAP28 gene and protein
CN101874120B (en) Genetic variants on chr2 and chr16 as markers for use in breast cancer risk assessment, diagnosis, prognosis and treatment
AU750183B2 (en) Prostate cancer gene
CN107223159A (en) The detection of DNA from particular cell types and correlation technique
CA3119065A1 (en) Use of adeno-associated viral vectors to correct gene defects/ express proteins in hair cells and supporting cells in the inner ear
KR20180093902A (en) Detection of fetal chromosomal anomalies using differentially methylated diene regions between fetuses and pregnant women
GB2424886A (en) Polynucleotide primers against epidermal growth factor receptor and method of detecting gene mutations
CN109476698B (en) Gene-based diagnosis of inflammatory bowel disease
CN1704478A (en) Methods for assessing patients with acute myeloid leukemia
KR20180049093A (en) New biomarkers and methods of treatment of cancer
KR20130123357A (en) Methods and kits for diagnosing conditions related to hypoxia
WO2015114146A1 (en) Method for predicting the response to an anti-her2 containing therapy and/or chemotherapy in patients with breast cancer
WO2006022629A1 (en) Methods of identifying risk of type ii diabetes and treatments thereof
KR20090087486A (en) Genetic susceptibility variants of type 2 diabetes mellitus
CA2497597A1 (en) Methods for identifying subjects at risk of melanoma and treatments
DK2951317T3 (en) PROCEDURE FOR PREDICTING THE BENEFIT OF INCLUSING TAXAN IN A CHEMOTHERAPY PLAN FOR BREAST CANCER PATIENTS
AU784761B2 (en) Biallelic markers related to genes involved in drug metabolism
CN107223162A (en) New RNA biomarkers label for diagnosis of prostate cancer
WO2006022636A1 (en) Methods for identifying risk of type ii diabetes and treatments thereof
US6825004B1 (en) Nucleic acids encoding human TBC-1 protein and polymorphic markers thereof
AU774440B2 (en) Nucleic acids encoding human TBC-1 protein and polymorphic markers thereof
WO2006022634A1 (en) Methods for identifying risk of type ii diabetes and treatments thereof
US20020155119A1 (en) Isolation and use of fetal urogenital sinus expressed sequences
US20030124536A1 (en) Diagnosis and treatment of vascular disease

Legal Events

Date Code Title Description
TC Change of applicant's name (sec. 104)

Owner name: GENSET S.A.

Free format text: FORMER NAME: GENSET

FGA Letters patent sealed or granted (standard patent)