AU5178901A - Coding sequences of the human brca1 gene - Google Patents

Coding sequences of the human brca1 gene Download PDF

Info

Publication number
AU5178901A
AU5178901A AU51789/01A AU5178901A AU5178901A AU 5178901 A AU5178901 A AU 5178901A AU 51789/01 A AU51789/01 A AU 51789/01A AU 5178901 A AU5178901 A AU 5178901A AU 5178901 A AU5178901 A AU 5178901A
Authority
AU
Australia
Prior art keywords
ser
leu
seq
occur
frequencies
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
AU51789/01A
Other versions
AU777341B2 (en
Inventor
Antonette C. Allen
Christopher P. Alvares
Brenda S. Critz
Patricia D. Murphy
Sheri J. Olson
Denise B. Schelter
Bin Zeng
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ore Pharmaceuticals Inc
Original Assignee
OncorMed Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU19778/97A external-priority patent/AU1977897A/en
Application filed by OncorMed Inc filed Critical OncorMed Inc
Publication of AU5178901A publication Critical patent/AU5178901A/en
Assigned to GENE LOGIC ACQUISITION CORPORATION reassignment GENE LOGIC ACQUISITION CORPORATION Alteration of Name(s) of Applicant(s) under S113 Assignors: ONCORMED, INC.
Application granted granted Critical
Publication of AU777341B2 publication Critical patent/AU777341B2/en
Assigned to GENE LOGIC, INC. reassignment GENE LOGIC, INC. Alteration of Name(s) of Applicant(s) under S113 Assignors: GENE LOGIC ACQUISITION CORPORATION
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Description

-1-
AUSTRALIA
PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT
ORIGINAL
Name of Applicant/s: OncorMed, Inc.
Actual Inventor/s: Patricia D. Murphy, Antonette C. Allen, Christopher P. Alvares, Brenda S. Critz, Sheri J. Olson, Denise B. Schelter and Bin Zeng.
Address for Service: BALDWIN SHELSTON WATERS MARGARET STREET SYDNEY NSW 2000 CCN: 3710000352 Invention Title: 'CODING SEQUENCES OF THE HUMAN BRCA1 GENE' Details of Original Application No. 19778/97 dated 12 Feb 1997 The following statement is a full description of this invention, including the best method of performing it known to us:- File: 31831AUP00 IP Australia Documents received on: O 0 7 JUN 2001 Batch No: CODING SEQUENCES OF THE HUMAN BRCA1 GENE FIELD OF THE INVENTION This invention relates to a gene which has been associated with breast and ovarian cancer where the gene is found to be mutated. More specifically, this invention relates to the three coding sequences of the BRCA1 gene BRCAI(omiln), BRCA(=i2), and BRCA(om3)) isolated.from human subjects.
BACKGROUND OF THE INVENTION It has been estimated that about 5-10% of breast cancer is inherited Rowell, et al., American Journal of Human Genetics 55.:861-865 (1994). Located on chromosome 17, BRCA1 is the first gene identified to be conferring increased risk 15 for breast and ovarian cancer. Miki et al., Science266:66-71 (1994). Mutations in this "tumor suppressor" gene are thought to account for roughly 45% of inherited breast cancer and 80-90% of families with increased risk of early onset breast and ovarian cancer. Easton et American Journal of Human Genetics 52:678-701 (1993).
Locating one or more mutations in the BRCA1 region of chromosome 17 provides a promising approach to reducing the high incidence and mortality associated with breast and ovarian cancer through the early detection of women at high risk. These women, once identified, can be targeted for more aggressive prevention programs. Screening is carried out by a variety of methods which include karyotyping, probe binding and DNA sequencing.
In DNA sequencing technology, genomic DNA is extracted from whole blood and the coding sequences of the BRCA1 gene are amplified. The coding sequences might be sequenced completely and the results are compared to the DNA sequence of the gene. Alternatively, the coding sequence of the sample gene may be compared to a panel of known mutations before completely sequencing the gene and comparing it to a normal sequence of the gene.
If a mutation in the BRCA1 coding sequence is found, it may be possible to provide the individual with increased expression of the gene through gene transfer therapy. It. has been demonstrated that the gene transfer of the BRCA1 coding sequence into .cancer cells inhibits their growth and reduces tumorigenesis of human cancer cells in nude mice. Jeffrey Holt and his colleagues conclude that the product of BRCA1 expression is a secreted tumor growth inhibitor, making BRCA1 an ideal gene for gene therapy studies.
Transduction of only a moderate percentage of tumor cells apparently produces enough growth inhibitor to inhibit all tumor cells. Arteaga, CL, and JT Holt Cancer Research 56: 1098-1103 (1996), Holt, JT et al., Nature Genetics 12: 298-302 (1996).
The observation of Holt et al, that the BRCA1 growth inhibitor is a secreted protein leads to the possible use of injection of the growth inhibitor into the area of the tumor for tumor suppression.
The BRCAI gene is divided into 24 separate exons. Exons I and 4 are :noncoding, in that they are not part of the final functional BRCA1 protein product. The BRCA1 coding sequence spans roughly 5600 base pairs Each exon consists of 200-400 bp, except for exon 11 which contains about 3600 bp. To sequence the coding sequence of the BRCA1 gene, each exon is amplified 20 separately and the resulting PCR products are sequenced in the forward and reverse directions. Because exon 11 is so large, we have divided it into twelve overlapping PCR fragments of roughly 350 bp each (segments through of BRCA1 exon 11).
Many mutations and polymorphisms-have already been reported in the BRCA1 gene. A world wide web site has been bt ilt to facilitate the detection and characterization of alterations in breast cancer susceptibility genes. Such mutations in BRCA1 can be accessed through the Breast Cancer Information Core at: http://www.nchgr.nih.gov/dir/labtransfer/bic. This data site became publicly available on November 1, 1995. Friend, S. et al. Nature Genetics 11:238, (1995).
The genetics of Breast/Ovarian Cancer Syndrome is autosomal dominant with reduced penetrance. In simple terms,. this means that the syndrome runs through families such that both sexes can be carriers (only women get the disease but men can pass it on), all generations will likely have breast/ovarian or both diseases and sometimes in the same individual, occasionally women carriers either die young before they have the time to manifest disease (and yet offspring get it) or they never develop breast or ovarian cancer and die of old age (the latter people are said to have "reduced penetrance" because they never develop cancer).
Pedigree analysis and genetic counseling is absolutely essential to the proper workup of a family prior to any lab work.
Until now, only a single coding seq-uence for the BRCA1 gene has been available for comparison to patient samples. That sequence is available as GenBank Accession Number U14680. There is a need in the art, therefore, to have available a coding sequence which is the BRCA1 coding sequence found in the majority of the population, a "consensus coding sequence", BRCA(omil) Seq.
ID. NO. 1. A consensus coding sequence will make it possible for true mutations to be easily identified or differentiated from polymorphisms. Identification of Smutations of the BRCA1 gene and protein would allow more widespread .e e diagnostic screening for hereditary breast and ovarian cancer than is currently 20 possible. Two additional coding sequences have been isolated and characterize.
The BRCAl(omi2) SEQ. ID. NO.: 3, and BRCA1(omi3) SEQ. ID. NO.:5 coding sequences also have utility in diagnosis, gene therapy and in making therapeutic BRCA1 protein.
A coding sequence of the BRCA1 gene-which occurs most commonly in 25 the human gene pool is provided. The most commonly occurring coding sequence more accurately reflects the most likely sequence to be found in a subject. Use of the coding sequence BRCAI(omil) SEQ. ID. NO.: 1, rather than the previously published BRCA1 sequence, will reduce the likelihood of misinterpreting a "sequence variation" found in the population (i.e.
polymorphism) with a pathologic "mutation" causes disease in the individual or puts the individual at a high risk of developing the disease). With large interest in breast cancer predisposition testing, misinterpretation is particularly worrisome. People who already have breast cancer are asking the clinical question: "is my disease caused by a heritable genetic mutation?" The relatives of the those with breast cancer are asking the question: "Am I also a carrier of the mutation my relative has? Thus, is my risk increased, and should I undergo a more aggressive surveillance program." SUMMARY OF THE INVENTION The present invention is based on the isolation of three coding sequences of the BRCA1 gene found in human individuals.
It is an object of the invention to provide the most commonly occurring coding sequence of the BRCA1 gene.
It is another object of this invention to provide two other coding sequences of BRCA1 gene.
It is another object of the invention to provide three protein sequences coded for by three of the coding sequences of the BRCA1 gene.
0* It is another object of the invention to provide a list of the codon pairs which occur at each of seven polymorphic points on the BRCA1 gene.
It is another object of the invention to provide the rates of occurrence for S" the codons.
It is another object of the invention to provide; a method wherein BRCA1, or parts thereof, is amplified with one or more oligonudeotide primers.
It is another object of this invention to provide a method of identifying individuals who carry no mutation(s) of the BRCA1 coding sequerice and therefore have no increased genetic susceptibility to breast or ovarian cancer based on their BRCA1 genes.
It is another object of this invention to provide a method of identifying a mutation leading to an increased genetic susceptibility to breast or ovarian .cancer.
There is a need in the art for a sequence of the BRCA1 gene and for the.
protein sequence of BRCA1 as well as for an accurate list of codons which occur at polymorphic points on a sequence.
a A person skilled in the art of genetic susceptibility testing will find the present invention useful for: a) identifying individuals having a BRCA1 gene with no coding mutations, who therefore cannot be said to have an. increased genetic susceptibility to breast or ovarian cancer from their BRCA1 genes; b) avoiding misinterpretation of polymorphisms found in the BRCA1 gene; c) determining the presence of a previously unknown mutation in the BRCA1 gene.
d) identifying a mutation which increases the genetic susceptibility t o breast or ovarian cancer.
e) probing a human sample of the BRCA1 gene.
f) performing gene therapy.
g) for making a functioning tumor growth inhibitor protein coded for by one of the BRCA1omi genes.
BRIEF DESCRIPTION OF THE FIGURE As shown in FIGURE 1, the alternative alleles at polymorphic (non-mutation causing variations) sites along a chromosome can be represented as a "haplotype" within a gene such as BRCA1. The BRCA1(onmi) haplotype is shown in Figure 1 with dark shading (encompassing the alternative alleles found at nucleotide sites 2201, 2430, 2731, 3232, 3667, 4427, and 4956). For comparison, the haplotype that is in GenBank is shown with no shading. As can be seen from the figure, the- 25 common "consensus" haplotype is found intact in five separate chromosomes labeled with the OMI symbol (numbers 1-5 from left to right). Two additional haplotypes (BRCA1(omi2), and BRCA1(omi3) are represented with mixed dark and light shading (numbers 7 and 9 from left to right). In total, 7 of 10 haplotypes along the BRCA1 gene are unique.
DETAILED DESCRIPTION OF THlE INVENTION
DEFINITIONS
SThe following definitions are provided for the purpose of understanding this invention.
"Breast and Ovarian cancer" is understood by those skilled in the art to include breast and ovarian cancer in women and also breast and prostate cancer in men. BRCA1 is associated genetic susceptibility to inherited breast and ovarian cancer in women and also breast and prostate cancer in men. Therefore, claims in this document which recite breast and/or ovarian cancer refer to breast, ovarian and prostate cancers in men and women.
"Coding sequence" or DNA coding sequence"refers to those portions of a gene which, taken together, code for a peptide (protein), or which nucleic acid •itself has function.
P: rotein" or "peptide" refers to a sequence amino acids which has function.
-:"BRCA1 (omi)" refers collectively to the "BRCA1 (oznui1)" "BRCA1 (ami2)" anid o"BRCA(oni3)" coding sequences.
o o "BRCA1°Omil)" refers to SEQ. ID. NO.: 1, a coding sequence for the BRCA1 gene. The coding sequence was found by end to end sequencing of BRCA1 alleles from individuals randomly drawn from a Caucasian population found to have no family history of breast or ovarian cancer. The sequenced gene was found not .to contain any mutations. BRCA1(omil) was determined to be a consensus sequence by calculating the frequency with which the coding sequence occurred among the sample alleles sequenced.
"BRCAI(a°mi2)" and "BRCA1(omi3)" refer to SEQ. ID. NO.: 3, and SEQ. ID. NO.: respectively. They are two additional coding sequences for the BRCA1 gene which were also isolated from individuals randomly drawn from a Caucasian population found to have no family history of breast or ovarian cancer.
polymorphisms "Primer" as used herein refers to a sequence comprising about 20 or more nucleotides of the BRCA1 gene.
"Genetic susceptibility" refers to the susceptibility to breast or ovarian cancer due to the presence of a mutation in the BRCA1 gene.
A "target polynucleotide" refers to the nucleic acid sequence of interest e.g., the BRCA1 encoding polynucleotide. Other primers which can be used for primer hybridization will be known or readily ascertainable to those of skill in the art.
"Consensus" means the most commonly occurring in the population.
"Consensus genomic sequence" means the allele of the target gene which occurs with the greatest frequency in a population of individuals having no family history of disease associated with the target gene.
"Substantially complementary to" refers to a probe or primer sequences which hybridize to the sequences prbvided under stringent conditions and/or sequences having sufficient homology with BRCA1 sequences, such that the allele specific oligonucleotide probe or primers hybridize to the BRCA1 sequences S. 25 to which they are complimentary.
"Haplotype" refers to a series of alleles within a gene on a chromosome.
"Isolated" as used herein refers to substantially free of other nucleic adds,.
proteins, lipids, carbohydrates or other materials with which they may be associated. Such association is typically either in cellular material or in a synthesis medium.
"Mutation" refers to a base change or a gain or loss of base pair(s) in a DNA sequence, which results in a DNA sequence which codes for a non-functioning protein or a protein with substantially reduced or altered function.
"Polymorphism" refers to a base change which is not associated with known pathology.
"Tumor growth inhibitor protein" refers to the protein coded for by the BRCA1 gene. The functional protein is thought to suppress breast and ovarian tumor growth.
The invention in several of its embodiments includes: 1. An isolated consensus DNA sequence of the BRCA1 coding sequence as set forth in SEQ. ID. NO.: 1.
2. A consensus protein sequence of the BRCA1 protein as set forth in SEQ. ID. NO.: 2.
3. An isolated coding sequence of the BRCA1 gene as set forth in SEQ. ID. NO.: 3.
4. A protein sequence of the BRCA1 protein as set forth in SEQ. ID. NO.: 4.
25 5. An isolated coding sequence of the BRCA1 gene as set forth in SEQ. ID. NO.: 6. A protein sequence of the BRCA1 protein as set forth in SEQ. ID. NO.: 6.
7. A BRCAI gene with a BRCA1 coding sequence not associated with breast or ovarian cancer which comprises an alternative pair of codons, AGC and AGT, which occur at position 2201 at frequencies of about 35-45%, and from about 55-65%, respectively.
8. A BRCA1 gene according to Claim 7 wherein AGC occurs at a frequency of about 9. A set of at least two alternative codon pairs which occur at polymorphic positions in a BRCA1 gene with a BRCA1 coding sequence not associated with breast or ovarian cancer, wherein codon pairs are selected from the group consisting of: AGC and AGT at position 220-; TG and CTG at position 2430; CCG and.CTG at position 2731; GAA and GGA at position 3232; AAA and AGA at position 3667; TCT and TCC at position 4427; and AGT and GGT at position 4956.
S
10. A set of at least two alternative codon pairs according to claim 9, 20 wherein the codon pairs occur in the following frequencies, respectively, in a population of individuals free of disease: S at position 2201, AGC and AGT occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 2430, TITG and CTG occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 2731, CCG and CTG occur at frequencies from about 25-35%, and from about 65-75%, respectively; at position 3232, GAA and GGA occur at frequencies from about 35-45%, and from about 55-65%, respectively: 30 at position 3667, AAA and AGA occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 4427, TCT and TCC occur at frequencies from about 45-55%, and from about 45-55%. respectively; and at position 4956, AGT and GGT occur at frequencies from about 35-45%, and from about 55-65%, respectively.
11 A set according to Claim 10 which is at least three codon pairs.
12 A set according to Claim 10 which is at least four codon pairs.
13. A set according to Claim 10 which is at least five codon pairs.
14. A set according to Claim 10 which is at least six codon pairs.
A set according to Claim 10 which is-at least seven codon pairs.
16. A method of identifying individuals having a BRCA1 gene with a BRCA1 coding sequence not associated with disease, comprising: amplifying a DNA fragment of an individual's BRCA1 coding sequence using an oligonucleotide primer which specifically hybridizes to sequences within the gene; sequencing said amplified DNA fragment by dideoxy sequencing: 20 repeating steps and until said individual's BRCA1 I coding sequence is completely sequenced; comparing the sequence of said amplified DNA fragment to a BRCA1(omi) DNA sequence, SEQ. ID. NO SEQ. ID.
N03, or SEQ. ID. determining the presence or absence of each of the following polymorphic variation in said individual's BRCA1 coding sequence: AGC and AGT at position 2201, TTG and CTG at position 2430, CCG and CTG at position 2731, GAA and GGA at position 3232, AAA and AGA at position 3667, S TCT and TCC at position 4427, and AGT and GGT at position 4956; determining any sequence differences between said individual's BRCA1 coding sequences and SEQ. ID. N01, SEQ. ID. N03, or SEQ. ID. N05 wherein the presence of said polymorphic variations and the absence of a variation outside of positions 2201, 2430, 2731, 3232.
3667, 4427, and 4956, is correlated with an absence of increased genetic susceptibility to breast or ovarian cancer resulting from a BRCA1 mutation in the BRCA1 coding sequence.
17. A method of claim 16 wherein, codon variations occur at the following frequencies, respectively, in a population of individuals free of disease: at position 2201, AGC and AGT occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 2430. TTG and CTG occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 2731, CCG and CTG occur at frequencies from about 20 25-35%, and from about 65-75%, respectively; o at position 3232, GAA and GGA occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 3667, AAA and AGA occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 4427,-TCT and TCC occur at frequencies from about 45-55%, and from about 45-55%, respectively; and at position 4956, AGT and GGT occur at frequencies from about 35-45%, and from about 55-65%, respectively.
30 18. A method according to claim 16 wherein said oligonucleotide primer is labeled with a radiolabel, a fluorescent label a bioluminescent label, a chemiluminescent label, or an enzyme label.
19. A method of detecting a increased genetic susceptibility to breast and ovarian cancer in an individual resulting from the presence of a mutation in the BRCA1 coding sequence, comprising: amplifying a DNA fragment of an individual's BRCA1 coding sequence using an oligonucleotide primer which specifically hybridizes to sequences within the gene; sequencing said amplified DNA fragment by dideoxy sequencing: repeating steps and until said individual's BRCA1 coding sequence is completely sequenced; comparing the sequence of said amplified DNA fragment to a BRCA1(omr) DNA-sequence, SEQ. ID. NO1, SEQ. ID.
N03, or SEQ. ID. determining any sequence differences between said individual's BRCAl coding sequences and SEQ. ID. NO1.
SEQ. ID. N03, or SEQ. ID. NO5; to determine the presence or absence of base changes in said individual's BRCA1 coding sequence wherein a base change which is .not any one of the following: AGC and AGT at position 2201, 20 TG and CTG at position 2430, CCG and CTG at position 2731, GAA and GGA at positioni3232, AAA and AGA at position 3667, TCT and TCC at position 4427. and AGT and GGT at position 4956 is correlated with the potential of increased genetic susceptibility to breast or ovarian cancer resulting from a BRCA1 mutation in the BRCA1 coding sequence.
30 20. A method of claim 19 wherein, codon variations occur at the following frequencies, respectively, in a population free of disease: at position 2201, AGC and AGT occur at frequencies from about and from about 55-65%, respectively at position 2430, TTG and CTG occur at frequencies from about 35-45%, and from about 55-65%, respectively; S at position 2731, CCG and CTG occur at frequencies from about 25-35%, and from about 65-75%, respectively; S at position 3232. GAA and GGA occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 3667, AAA and AGA occur at frequencies from about 35-45%, and from about 55-65%, respectively; S at position 4427, TCT and TCC occur at frequencies from about 45-55%, and from about 45-55%, respectively; and at position 4956, AGT and GGT occur at frequencies from about 35-45%, and from about 55-65%, respectively.
21. A method according to claim 19 wherein said oligonucleotide primer is labeled with a radiolabel, a fluorescent label a bioluminescent label, a chemiluminescent label, or an enzyme label.
22. A set of codon pairs, which occur at polymorphic positions in a .BRCAI gene with a BRCA1 coding sequence according to Claim 1, wherein said set of codon pairs is: 20 AGC and AGT at position 2201: TTG and CTG at position 2430; CCG and CTG at position 2731; GAA and GGA at position 3232; AAA and AGA at position 3667; TCT and TCCat position 4427; and AGT and GGT at position 4956.
23. A set of at least two alternative codon pairs according to claim 22 3" wherein set of at least two alternative codon pairs occur at the following frequencies: S at position 2201, AGC and AGT occur at frequencies of about and from about 55-65%, respectively; at position 2430, TTG and CTG occur at frequencies from about 35-45%. and from about 55-65%, respectively; at position 2731, CCG and CTG occur at frequencies from about 25-35%, and from about 65-75%, respectively;, at position 3232, GAA and GGA occur at frequencies from about 35-45%, and from about 55-65%. respectively; at position 3667, AAA and AGA occur at frequencies from about 3 5-45%, and from about 55-65%, respectively;, at position 4427, TCT and TCC occur at frequencies from about 45-55%, and from about 45-55%, respectively, and at position 4956, AGT and GGT occur at frequencies from about 35-45%, and from about 55-65%, respectively.
24. A BRCA1 coding sequence according to claim 1 wherein the codon pairs occur at the following frequencies: at position 2201, AGC and AGT occur at frequencies of about 40%, and from about 55-65%, respectively-, at position 2430, TG and CTG occur at frequencies from about 35-45%. and from about 55-65%, respectively; a at position 2731, CCG and CTG occur at frequencies from about 25-35%, and from about 65-75%, respectively; 20 at position 3232. GAA and GGA occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 3667, AAA and AGA occur at frequencies from about 3545%, and from about 55-65%, respectively; at position 4427, TOT and TCC occur at frequencies from about 45-55%, and from about 45-55%, respectively;, and at position 4956, AGT and GGT occur at frequencies from about 35-45%. and from about 55-65%, respectively.
A method of determining the consensus genomic sequence or consensus :30 coding 5zequence for a target gene, comprising.
a) screening a number of individuals in a population for a family history which indicates inheritance of normal alleles for a target gene; b) isolating at least one allele of the target gene from individuals found to have a family history which indicates inheritance of normal alleles for a target gene; c) sequencing each allele; d) comparing the nucleic acid sequence of the genomic sequence or of the coding sequence of each allele of the target gene to determine similarities and differences in the nucleic add sequence; and e) determining which allele of the target gene occurs with the greatest frequency.
26. A method of performing gene therapy, comprising: a) transfecting cancer cell in vivo with an effective amount of a vector transformed with a BRCA1 coding sequences of SEQ.
ID. NO.: 1, SEQ. ID. NO.: 3, or SEQ. ID. NO.: b) allowing the cells to take up the vector, and c) measuring a reduction in tumor growth.
27. A method of performing protein therapy, comprising: a) injecting into. a patient, an effective amount of BRCA1 tumor growth inhibiting protein of SEQ. ID. NO.: 2, SEQ. ID. NO.: 20 4, or SEQ. ID. NO.: 6; b) allowing the cells to take up the protein, and c) measuring a reduction in tumor growth.
SEQUENCING
25 Any nucleic acid specimen, in purified or non-purified form, can be utilized as the starting nucleic acid or acids, providing it contains, or is suspected of containing, the specific nucleic acid sequence containing a polymorphic locus.
Thus, the process may amplify, for example, DNA or RNA, including messenger RNA, wherein DNA or RNA may be single stranded or double stranded. In the event that RNA is to be used as a template, enzymes, and/or conditions optimal for reverse transcribing the template to DNA would be utilized. In addition, a DNA-RNA hybrid which contains one strand of each may be utilized. A mixture of nucleic acids may also be employed, or the nucleic acids produced in a previous amplification reaction herein, using the same or different primers may be so utilized. See TABLE IL The specific nucleic acid sequence to be amplified, the polymorphic locus, may be a fraction of a larger molecule or can be present initially as a discrete molecule, so that the specific sequence constitutes the entire nucleic acid. It is not necessary that the sequence to be amplified be present initially in a pure form; it may be a minor fraction of a complex mixture, such as contained in whole human DNA.
DNA utilized herein may be extracted -from a body sample, such as blood, tissue material and the like by a variety of techniques such as that described by Maniatis, et. al. in Molecular Cloning:A Laboratory Manual, Cold Spring Harbor, NY, p 280-281, 1982). If the extracted sample is impure, it may be treated before amplification with an amount of a reagent effective to open the cells, or animal cell membranes of the sample, and to expose and/or separate the strand(s) of the e. nucleic acid(s). This lysing and nucleic acid denaturing step to expose and separate the strands will allow amplification to occur much more readily.
The deoxyribonucleotide triphosphates dATP, dCTP, dGTP, and dTTP are added to the synthesis mixture, either separately or together with the primers, in 20 adequate amounts and the resulting solution is heated to about 9 0-100'C'fri about1 to 10 minutes, preferably from 1 to 4 minutes. After this heating period the solution is allowed to cool, which is preferable for the primer hybridization.
To the cooled mixture is added an appropriate agent for effecting the primer extension reaction (called herein "agent for polymerization"), and the reaction is 25 allowed to occur under conditions known in the art. The agent for polymerization may also be added together with the other reagents if it is heat stable. This synthesis (or amplification) reaction may occur at room temperature Sup to a temperature above which the agent for polymerization no longer functions. Thus, for example, if DNA polymerase is used as the agent, the temperature is generally no greater than about 40'C. Most conveniently the reaction occurs at room temperature.
The primers used to carry out this invention embrace oligonucleotides of sufficient length and appropriate sequence to provide initiation of polymerization. Environmental conditions conducive to synthesis include the presence of nucleoside triphosphates and an agent for polymerization, such as DNA polymerase, and a suitable temperature and pH. The primer is preferably single stranded for maximum efficiency in amplification, but may be double stranded. If double stranded, the primer is first treated to separate its strands before being used to prepare extension products. The primer must be sufficiently long to prime the synthesis of extension products in the presence of the inducing agent for polymerization. The exact length of primer will depend on many factors, including temperature, buffer, and nucleotide composition. The oligonucleotide primer typically contains 12-20 or more nucleotides, although it may contain fewer nucleotides.
Primers used to carry out this invention are designed to be substantially complementary to each strand of the genomic locus to be amplified. This means that the primers must be sufficiently complementary to hybridize with their ~respective strands under conditions which allow the agent for polymerization to perform. In other words, the primers should have sufficient complementarity with the 5' and 3' sequences flanking the mutation to hybridize therewith and permit amplification of the genomic locus.
20 Oligonucleotide primers of the invention are employed in the amplification process which is an enzymatic chain reaction that produces exponential quantities of polymorphic locus relative to the number of reaction steps involved. Typically, one primer is complementary to the negative strand of the polymorphic locus and the other is complementary to the positive strand. Annealing the primers to denatured nucleic acid followed by extension with an enzyme, such as the large fragment of DNA polymerase I (Klenow) and nucleotides, results in newly synthesized and strands Scontaining the target polymorphic locus sequence. Because these newly synthesized sequences are also templates, repeated cycles of denaturing, primer annealing, and extension results in exponential production of the region the target polymorphic locus sequence) defined by the primers. The product of the chain reaction is a discreet nucleic acid duplex with termini corresponding to the ends of the specific primers employed.
The oligonucleotide primers of the invention may be prepared using any suitable method, such as conventional phosphotriester and phosphodiester methods or automated embodiments thereof. In one such automated embodiment, diethylphosphoramidites are used as starting materials and may be synthesized as described by Beaucage, et al., Tetrahedron Letters, 22:1859-1862, 1981. One method for synthesizing oligonucleotides on a modified solid support is described in US. Patent No. 4,458,066.
The agent for polymerization may be any compound or system which will function to accomplish the synthesis of primer extension products, including enzymes. Suitable enzymes for this purpose include, for example, E. coli DNA polymerase I, Klenow fragment of E. coli DNA polymerase, polymerase muteins, reverse transcriptase, other enzymes, including heat-stable enzymes those enzymes which perform primer extension after being subjected to temperatures °sufficiently elevated to cause denaturation), such as Taq polymerase. Suitable enzyme will facilitate combination of the nucleotides in the proper manner to o form the primer extension products which are complementary to each S polymorphic locus nucleic acid strand. Generally, the synthesis will be initiated 'at the 3' end of each primer and proceed in the 5' direction along the template strand, until synthesis terminates, producing molecules of different lengths.
The newly synthesized strand and its complementary rnucleic acid strand will form a double-stranded molecule under hybridizing conditions described above and this hybrid is used in subsequent steps of the process. In the next step, the newly synthesized double-stranded molecule is subjected to denaturing conditions using any of the procedures described above to provide singlestranded molecules.
The steps of denaturing, annealing, and extension product synthesis can be repeated as often as needed to amplify the target polymorphic locus nucleic acid sequence to the extent necessary for detection. The amount of the specific nucleic acid sequence produced will accumulate in an exponential fashion.
Amplification is described in PCR. A Practical Approach, ILR Press, Eds. M. J.
McPherson, P. Quirke, and G. R. Taylor, 1992.
The amplification products may be detected by Southern blots analysis, without using radioactive probes. In such a process, for example, a small sample of.DNA containing a very low level of the nucleic acid sequence of the polymorphic locus is amplified, and analyzed via a Southern blotting technique or similarly, using dot blot analysis. The use of non-radioactive probes or labels is facilitated by the high level of the amplified signal Alternatively, probes used to detect the amplified products can be directly or indirectly detectably labeled, for example, with a radioisotope, a fluorescent compound, a bioluminescent compound, a chemiluminescent compound, a metal chelator or an enzyme.
Those of ordinary skill in the art will know of other suitable labels for binding to the probe, or will be able to ascertain such, using routine experimentation.
Sequences amplified by the methods of the invention can be further evaluated, detected, cloned, sequenced, and the like, either in solution or after binding to a solid support, by any method usually applied to the detection of a specific DNA sequence such as PCR, oligomer restriction (Saiki, et.al., Bio/Technology,3:1008-1012, 1985), allele-specific oligonucleotide (ASO) probe analysis (Conner, et. al., Proc. Natl. Acad. Sci. 8Q:278, 1983), oligonucleotide ligation assays (OLAs) (Landgren, et. al., Science,241:1007, 1988), and the like -Molecilar techniques for DNA analysis have been reviewed (Landgren, et. al., Sence,24229-237, 1988).
Preferably, the method of amplifying is by PCR, as described herein and as is commonly used by those of ordinary skill in the art. Alternative methods of amplification have been described and can also be employed as long as the o. 25 BRCA1 locus amplified by PCR using primers of the invention is similarly amplified by the alternative means. Such alternative amplification systems include but are not limited to self-sustained sequence replication, which begins with a short sequence of RNA of interest and a T7 promoter. Reverse transcriptase copies the RNA into cDNA and degrades the RNA, followed by reverse transcriptase polymerizing a second strand of DNA. Another nucleic acid amplification technique is nucleic acid sequence-based amplification (NASBA) which uses reverse transcription and T7 RNA polymerase and incorporates two primers to target its cycling scheme. NASBA can begin with either'DNA or RNA and finish with either, and amplifies to 108 copies within to 90 minutes. Alternatively, nucleic acid can be amplified by ligation activated transcription (LAT). LAT works from a single-stranded template with a single primer that is partially single-stranded and partially double-stranded.
Amplification is initiated by ligating a cDNA to the promoter oligonucleotide and within a few hours, amplification is 108 to 109 fold. Another amplification system useful in the method of the invention is the QB Replicase System. The QB replicase system can be utilized by attaching an RNA sequence called MDV-1 to RNA complementary to a DNA sequence of interest. Upon mixing with a sample, the hybrid RNA finds its complement among the specimen's mRNAs and binds, activating the replicase to copy the tag-along sequence of interest.
Another nucleic acid amplification technique, ligase chain reaction (LCR), works by using two differently labeled halves of a sequence of interest which are covalently bonded by ligase in the presence of the contiguous sequence in a :4 sample, forming a new target. The repair chain reaction (RCR) nucleic acid amplification technique uses two complementary and target-specific oligonucleotide probe pairs, thermostable polymerase and ligase, and DNA nudeotides to geometrically amplify targeted sequences. A 2-base gap separates the oligonucleotide probe pairs, and the RCR fills and joins the gap, mimicking DNA repair. Nucleic acid amplification by strand displacement activation (SDA) utilizes a short primer containing a recognition site for hinclI with short overhang on the 5' end which binds to target DNA. A DNA polymerase fills in the part of the primer opposite the overhang with sulfur-containing adenine 25 analogs. Hincl is added but only cuts the unmodified DNA strand. A DNA polymerase that lacks 5' exonudease activity enters at the cite of the nick and begins to polymerize, displacing the initial primer strand downstream and building a new one which serves as more primer. SDA produces greater than 7 -fold amplification in 2 hours at 37"C. Unlike PCR and LCR, SDA does not require instrumented Temperature cycling.
Another method is a process for amplifying nucleic acid sequences from a DNA or RNA template which may be purified or may exist in a mixture of nucleic acids. The resulting nucleic acid sequences may be exact copies of the template, or may be modified. The process has advantages over PCR in that it increases the fidelity of copying a specific nucleic acid sequence, and it allows one to more efficiently detect a particular point mutation in a single assay. A target nucleic acid is. amplified enzymatically while avoiding strand displacement.
Three primers are used. A first primer is complementary to the first end of the target. *A second primer is complementary to the second end of the target A third primer which is similar to the first end of the targe t and which is substantially complementary to at least a portion of the first primer such that when the third primer is hybridized to the*first primer, the position of the third primer complementary to the base at -the 5' end of the first primer contains a modification which substantially avoids strand- displacement. This method is detailed in U.S. Patent 5,593,840 to Bhatnagar et al. 1997. Although PCR is the preferred method of amplification if the invention, these other methods can also be used to amplify the BRCA1 locus as described in the method of the invention..
The BRCA1(ami) DNA coding sequences were obtained by end to end sequencing of the BRCA1 alleles of five subjects in the manner described above *.followed by analysis of the data obtained. The data obtained provided us with the opportunity to evaluate seven previously published polymorphisms and to affirm or correct where necessary, the frequency of occurrence of alternative codions.
GENIE TBHERAPY The coding -sequences can be used for gene therapy.
A variety of methods are known for gene transfer, any of which might be available for use.
Direct injection of Recombinant DNA in vivo 1. Direct injection of "naked" DNA directly with a syringe and needle into a ::::*specific tissue, infused through a vascular bed, or transferred through a catheter into endlothelial cells.
2. Direct injection of DNA that is contained in artifcially generated lipid vesicles.
3. -Direct injection of DNA conjugated to a targeting structure, such as an antibody.
4. Direct injection by particle bombardment, where the DNA is coated onto gold particles and shot into the cells.
Human Artificial Chromosomes This novel gene delivery approach involves the use of human chromosomes that have been striped down to contain only the essential components for replication and the genes desired for transfer.
Receptor-Mediated Gene Transfer DNA is linked to a targeting molecule that will bind to specific cell-surface receptors, inducing endocytosis and transfer of the DNA into mammalian cells.
One such technique uses poly-L-lysine to link asialoglycoprotein to DNA. An adenovirus is also added to the complex to disrupt the lysosomes and thus allow the DNA to avoid degradation and move to the nucleus. Infusion of theseparticles intravenously has resulted in gene transfer into hepatocytes.
RECOMBINANT VIRUS VECTORS Several vectors are used in gene therapy. Among them are the Moloney Murine Leukemia Virus (MoMLV) Vectors, the adenovirus vectors, the adeno- Associated Virus (AAV) vectors, the herpes simplex virus (HSV) vectors, the poxvirus vectors, and human immunodeficiency virus (HIV) vectors, GENE REPLACEMENT AND REPAIR 25 The ideal genetic manipulation for treatment of a genetic disease would be the actual replacement of the defective gene with a normal copy of the gene.
Homologous recombination is the term used for switching out a section of DNA and replacing it with a new piece. By this technique, the defective gene can be replaced with a normal gene which expresses a functioning BRCA1 tumor growth inhibitor protein.
A complete description of gene therapy can also be found in "Gene Therapy A Primer For Physicians 2d Ed. by Kenneth W. Culver, M.D. Publ. Mary Ann Liebert Inc. (1996). Two Gene Therapy Protocols for BRCA1 are approved by the Recombinant DNA Advisory Committee for Jeffrey T. Holt et al.. They are listed as 9602-148, and 9603-149 and are available from the NIH. The isolated BRCA1 gene can be synthesized or constructed from amplification products and inserted into a vector such as the LXSN vector.
The BRCA1 amino acid and nucleic acid sequence may be used to make diagnostic probes and antibodies. Labeled diagnostic probes may be used by any hybridization method to determine the level of BRCAI protein in serum or lysed cell suspension of a patient, or solid surface cell sample.
The BRCA1 amino acid sequence. may be used to provide a level of protection for patients against risk of breast or ovarian cancer or to reduce the size of a tumor. Methods of making and extracting proteins are well known.
Itakura et al. U.S. Patents 4,704,362, 5, 221, 619, and 5,583,013.- BRCA1 has been shown to be secreted. Jensen, R.A. et al. Nature Genetics 2: 303-308 (1996).
EXAMPLE
Determination Of The Coding Sequence Of A BRCAli
O
mil Gene From Five Individuals 4* 20 MATERIALS AND METHODS Approximately 150 volunteers, were screened in order to identify individuals with no cancer history in their immediate family first and second degree relatives). Each person was asked to fill out a hereditary cancer prescreening questionnaire See TABLE I below. Five of these were randomly 25 chosen for end-to-end sequencing of their BRCA1 gene. A first degree relative is a parent, sibling, or offspring. A second degree relative is an aunt, uncle, grandparent, grandchild, niece, nephew, or half-sibling.
TABLE I Hereditary Cancer Pre-Screening Questionnaire Part A: Answer the following questions about your family 1. To your knowledge, has anyone in your family been diagnosed with a very specific hereditary colon disease called Familial Adenomatous Polyposis (FAP)? 2. To your knowledge, have you or any aunt had breast cancer diagnosed before the age 3. Have you had Inflammatory Bowel Disease, also called Crohn's Disease or Ulcerative Colitis, for more than 7 years? Part B: Refer to the list of cancers below for your responses only to questions in Part Bladder Cancer Lung Cancer Pancreatic Cancer Breast Cancer Gastric Cancer Prostate Cancer Colon Cancer Malignant Manoma Renal Cancer Endometrial Cancer Ovarian Cancer Thyroid Cancer 4. Have your mother or father, your sisters or brothers or your children had any of the listed cancers? Have there been diagnosed in your mother's brothers or sisters, or your mothe's parents more than ne of the cancers in the above list? 6. Have there been diagnosed in your father's brothers or sisters, or your father's parents more than one of the cancers in the above list? 20 Part C: Refer to the list of relatives below for rp-nra n-m a a a *a a a a Und-nyl«T o qutina rain You Your mother Your sisters or brothers Your mothers's sisters or brothers (maternal aunts and uncles) Your children Your mother's parents (maternal grandparents) 7 Have there been diagnosed in these relatives 2 or more idnticaltypes of cancer? 25 Do not count "simple" skin cancer, also called basal cell or squamous cell skin cancer.
8. Is there a total of 4 or more of any cancers in the list of relatives above other than "simple" skin cancers? Part D: Refer to the list of relatives below for responses only to questions in Part D.
You Your father Your sisters or brothers Your fathers's sisters or brothers (paternal aunts and uncles) Your children Your father's parents (paternal grandparents) 9. Have there been diagnosed in these relatives 2armoredetical types of cancer? Do not count "simple" skin cancer, also called basal cell or squamous cell skin cancer.
Is there a total of 4 or mor of any cancers in the list of relatives above other than "simple" skin cancers? Copyright 19%, OncorMed, Inc.
Genomic DNA was isolated from white blood cells of five subjects selected from analysis of their answers to the questions above. Dideoxy sequence analysis was performed following polymerase chain reaction amplification.
All exons of the BRCA1 gene were subjected to direct dideoxy sequence analysis by asymmetric amplification using the polymerase chain reaction (PCR) to generate a single stranded product amplified from this DNA sample.
Shuldiner, et al., Handbook of Techniques in Endocrine Research, p. 457-486, DePablo,F., Scanes, eds., Academic Press, Inc., 1993. Fluorescent dye was attached for automated sequencing using the Taq Dye Terminator® Kit (Perkin- Elmer cat# 401628). DNA sequencing was performed in both forward and reverse directions on an Applied Biosystems, Inc. (ABI) automated Model 377® sequencer. The software used for analysis of the resulting data was Sequence Navigator software purchased through ABL 1. Polvmerase Chain Reaction (PCR) Amplification Genomic DNA (100 nanograms) extracted from white blood cells of five subjects. Each of the five samples was sequenced end to end. Each sample was amplified in a final volume of 25 microliters containing 1 microliter (100 9.99 nanograms) genomic DNA, 2.5 microliters 10X PCR buffer (100 mM Tris, pH 8.3, 20 500 mM KC1, 1.2 mM MgC12), 2.5 microliters 10X dNTP mix (2 Meai' nucleotide), 2.5 microliters forward primer, 2.5 microliters reverse primer an microliter Taq polymerase (5 units), and 13 microliters of water.
The primers in Table II, below were used to carry out amplification of the various sections of the BRCA1 gene samples. The primers were synthesized on 25 an DNA/RNA Model 394® Synthesizer.
TABLE II BRCA1 PRIMERS AND SEQUENCING
DATA
SECLIDD
EXON 2 2F 5 -GAA GiT GTC ATT TTA
OS
0@ 0@
S
S.
S.
0S*P 0 S S @0 S 0
S
-0 6055
S
S. 0 *055 0 0555 0 0050 2R EXON 3 3F 3V 5F
SR
EXON 6 6/7F 6R EXON 7 7F 6/7R EXON 8 8F1 8R1 EXON 9 9F 9R EXOMl 10lOF 1 OR 30 EXON ilAllAF 11AR EXON 11B11BF1 1IBR1 EXON lid 1CF1 ll CR1 EX(ON11D11DF1 40 11DRi .EXON 11 E11 EF 11ER 45 EXONiFiiFF 11FR EXON11G 11 GF TGT CTT r7C TTC CCT TAA -ACC 777-3, AGT ATG T-3- 7CC TGA CAC AGC AGA CAT TTA-3' TTG GAT TT7 CGT TOT CAC TTA-3' CTC -TTA AGG GCA GTT GTG AG-3' TTC CTA CMTG TTGGr CTr CC CTT ATT TTA GTG 7CC TTA AAA GG-3iT CAT GSA QAG CAC iTT -AGT G-3' CAC AAC AAA GAG CAT ACA TAG GG-3' 7CG GGT TCA CiT TGT AGA*AG-3' lTdC TCT TCA GSA GSA AAA GCA-3' GC C TAC CAC AAA TAd AAA-3' CCA CAG TAG ATG CTC AGT AAATA-3' TAG GAA AAT Add AGC TTC ATA GA-3' TOG TCA OCT TTC TOT AAT CG-3' GTA TCT Add CAC TCT CTT CTT CAG-3' CCA CCT CCA AGG TGT ATC A-3' TGT TAT 017 GO TCC 1TG CT-3' CAC. TAA AGA CAG AAT GAA TCT A-3; GMA GAA CCA GMA TAT TCA TCT A-3- TGA TGG GGA GTC TGA ATC AA-3' TCT GCT TTC 176 ATA AAA TCC T-3T AGC. GTC C TCA CMA ATA AA-3' TCA AGC GCA TGA ATA TGC CT-3' GTA TAA OCA ATA iM MAC TCG A-3- TTA AGT TCA C TG GTA TTT GAA CA-3' SAC AGC GAT ACT TTC CCA GA-3' T1G AAC MAC CAT GMA TTA GTC-3' GSA AGT TAG CAd TCT AGO GA-3' 8 9 10 11 12 13 14 is 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 22 21 21 20 201 23 22 23 21 21 23 23 20 24 19.
21 21 20 22 20 22 23 20 21 -37S -275 1.6 -250 1.6 -275 1.2 -2.70 1.2 -250 1.6 -250 7 24. 1.6 -275 1.2 -0 1.2 -0 1.2 -400 1.2 388 382 423 liOGR S' GCA GTG ATA TTA ACT GM TGT A-3- M13 tailed 36 22 EXONllHllHF 5' TGG GTC CTT AAA GMA ACA AAGT-' 11 HR EXON11 IiF ii EXON 11.11 1JF 11JR EXON 11 Kl1 KF 11 KR-1 EXON 11Ll 11LF-1 1511R EXON 12 12F 1 2R EXON 13 13F 13R EXCN 14 14F 1 4R EXON15 15F EXONi16 16F 16R EX0N 17 17F 17'R EXON 18 18F 35 18R EXON 19 19F 19R 40 EXIN 20 20F 20R I EXON 21 21 F 5 21 R 5 EXON 22 22F 5 22R 5 EXON 23 23F-1 5 23R-1 5 EXON 24 24F 5 24R 5 5' TCA 5' CCA 5' TCA 5' CAA 5' GCA 5' TAT 5' GTA 5' TAA 5' GTC 5' TGT 5' AAT 5 ATG 5' CTA 5' AAC 5' AAT 5' AAA 5' G1TG
G
5' 1 5' CAT 5ATA 5GTA TGA A CAT1 ATG A
GTAC
GGT GAC AT" CiT TTT C GSA TGC TTV AAT TGA ATG GTA ACC CTG AAAGCG Tcc TTG CAG TCA ATA TTG GCA AAT GTG rCT CTG CCA ATG CAG CAA ACC GSA MAG CT TTG GAG CTA ACC TGA ATT TAT. AAMTGC CTG C AGG CAG AAT ATC TCT TAA CAG ACT CiT TC TAG AAC GrM CCT CAT GTG TCT TTA GCT ACC ATT TTC rCA TTC TTC' rGT TAA GSA MG COT GTC k.AT CCA AAT ,TC TTC CiT 3AG AAA TAG :AT TGA SAG kAG ACT TCT 'GT SAC AGT -TT AGC CAT ~AT TGA CAC M AGG ACA rGMA TCT TCC-3' ATC MAG TCA-3' CAA TTA CTT C-3' CTA 7GC, TTA GA-3 AGC CAA AT-3' AGA MAG GA-3' AGT C1T CCA A-3' MAG GCA TCT-3' O AAA AGC A-3' AGA AGAAA-3' TAA GAA-1GT-3' CTC AAAGTA-3' cG3T CCT TAC-3' ATC ACT ATC A-3' CTG TAT GCA-3' MAG TAT G-3' 1-Fr ATG TAG GA-3' AGA CCA GMA C-3' AGA ATG TTG T-3' CAG GAT TG-3' GTT TTA-3' TCT TAG GAC-3' CCA GCA TC-3' CTG TGC TC-3' MAG .rci TGC-3' 7GC TICC AC-3' TAC ACA GC-3' Trr GAA AGT C-3' AAT AGC CTC T-3 GTC 'TTG CT-3' SAG G3ZT AC-3' TCC AGT AGT-3' TCA TTC, AAC MA-3' TAA TCT CTG C-3' GTA GMA GGA-3' 37 22 38 21 39 21 40 21 41 23 42 43 20 44 2-2 45 22 46 22 47 20 48 21 49 21 5o 21 51 22 52 21 53 19 54 23 55 22 56 22 57 20 58 18 59 21 60 61 20 62 21 63 20 64 65 22 66 22 67 20 68 69 21 70 23 71 22 .72 21 396 360 1.2 -300 1.2 -325 1.2 -310 1.2 375 1.6 -550 1.2 -275 1.2 -350 1.2 -250 1.2 -425 1.2 377 1.2 377 -300 -300 1.2 -250 1.4 -285 Thirty-five cycles were performed, each consisting of denaturing (95oC; seconds), annealing (55°C; 1 minute), and extension (72°C; 90 seconds), except during the first cyclein which the denaturing time was increased to 5 minutes, and during the last cycle in which the extension time was increased to 5 minutes.
PCR products were purified using Qia-quick® PCR purification kits (Qiagen cat# 28104; Chatsworth, CA). Yield and purity of the PCR product determined spectrophotometrically at OD 2 60 on a Beckman DU 650 spectrophotometer.
2. Dideoxy Sequence Analysis Fluorescent dye was attached to PCR products for automated sequencing using the Taq Dye Terminator® Kit (Perkin-Elmer cat# 401628).- DNA sequencing was performed in .both forward and reverse directions on an Applied Biosystems, Inc. (ABI) Foster City, CA., automated Model 377® sequencer. The software used for analysis of the resulting data was "Sequence Navigator® software" purchased through ABI.
3.
RESULTS
15 Differences in the nucleic acids of the ten alleles from five individuals were found in seven locations on the gene. The changes and their positions are found on TABLE Im, below.
*00 0** *l t AMINO NUCLEo-n .DE ACID CHANGE CHANGE 1 TABLE MI PANEL TYPING 2 .3 4
SER(SER)
(694) CIT CIT T/T TIT
LEU(LEU)
(771)
PR(LELJ)
(871) T/T C/T C/T C/C C/C C/T C/T C/T T/T T/T
GLU(GLY)
(1038)
LYS(ARG)
(1183)
SER(SER)
(1436) A/A A/G A/G G/G GIG A/A A/G A/G GIG GIG TIT T/T TIC C/C C/C 0.4 C 0.6 T 0.4 T 0.6 C 0.3 C 0.7 T D.4 A 0.6 G 0.4 A 0.6 G 0.5 T 0.5 C 0.4 A 0.6
G
SER(GLY)
(1613) A/A A/G A/.G GIG .GIG Tables 3 and 4 depict one aspect of the invention, sets of at least two alternative codon pairs wherein the codon pairs occur in the following frequencies, respectively, in a population of individuals free of disease: at position 2201, AGC and AGT occur at frequencies from about 45%, and from about 55-65%, respectively; at position 2430. TTG and CTG occur at frequencies from about and from about 55-65%, respectively at position 2731, CCG and CTG occur at frequencies from about and from about 65-75%, respectivelyat position 3232, GAA and GGA occur at frequencies from about and from about 55-65%, respectively at position 3667, AAA and AGA occur at frequencies from about and from about 55-65%, respectively; at position 4427, TCT and TCC occur at frequencies from about 55%, and from about 45-55%, respectively; and at position 4956, AGT and GGT occur at frequencies from about S 45%, and from about 55-65%, respectively.
The data show that for each of the samples. The BRCA1 gene is identical except 20 in the region of seven polymorphisms. These polymorphic regions, together with their locations, the amino acid groups of each codon, the frequency of their occurrence and the amino acid coded for by each codon are found in TABLE IV below.
TABLE IV 25 CODON AND BASE CHANGES IN SEVEN POLYMORPHIC SITES OF BRCA1 GENE SAMPLE BASE POSITION CCDCN AA PUBUSHED
FREQUBCY
NAME CHANGE nt/aa E)CN CHANGE CHANGE FREUENC IN THIS STUDY 2,3,4,5 C-T 2201/694 11E AGC(AGT) SER-SER UNPUBUSHED 2,3,4,5 T-C 2430/771 11F TTG(CTG) LEU-LEU T=67%13 1,2.3,4,5 C-T 2731/871 11G CCG(CTG) PRO-LEU C=34%12 2,314,5 A-G 3232/1038 ill GAA(GGA) GLU-GLY A=67%13 A=400%1 2,3.5 A-G 3667/1183 1 1J AAA(AGA) LYS-ARG A=68%12 A 4 0%01 3,4,5 T-C 4427/1436 13 TCT(TC C) SEF-SER T=67%12 4,5 A-G 4956/1613 16 AGT(GGT) SER-GLY A=67%12 -A=40% 2 Reference numbers correspond to the Table ofRerncsbow EXAMLE 2 Determination Of A Individual Using BRCALOMDAnd The Seven Polymorphisms For Reference A person skilled in the art of genetic susceptibility testing will find the present invention useful for: a) identifying individuals having a BRCA1 gene, who are therefore have no elevated genetic susceptibility to breast or ovarian cancer from a BRCA1 mutation; b) avoiding misinterpretation of polymorphisms found in the i0 BRCA1 gene; Sequencing is carried out as in EXAMPLE 1 using a blood sample from the patient in question. However, a BRCAI(omi) sequence is :used for reference and the polymorphic sites are compared to the nucleic acid sequences listed above for codons at each polymorphic site. A sample is one which compares to a BRCA(omi) sequence and contains one of the base variations which occur at each of the polymorphic sites. The codons which occur at each of the polymorphic sites are paired here reference.
"AGC and AGT at position 2201, TTG and CTG at position 2430, CCG and CTG at position 2731, GAA and GGA at position 3232, "AAA and AGA at position 3667, TCT and TCC at position 4427, and AGT and GGT at position 4956.
The availability of these polymorphic.pairs provides added assurance that one skilled in the art can correctly interpret the polymorphic variations without mistaking a variation for a mutation.
Exon 11 of the BRCA1 gene is subjected to direct dideoxy sequence analysis by asymmetric amplification using the polymerase chain reaction (PCR) to generate a single stranded product amplified from this DNA sample. Shuldiner, et al., Handbook of Techniques in Endocrine Research, p. 457-486, DePablo,F., Scanes, eds., Academic Press, Inc., 1993. Fluorescent dye is attached for automated sequencing using the Taq Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in both forward and reverse directions on an Applied Biosystems, Inc. (ABI) automated Model 3778 sequencer. The software used for analysis of the resulting data is "Sequence Navigator® software" purchased through ABI.
1. Polymerase Chain Reaction (PCR) AmpHification Genomic DNA (100 nanograms) extracted from white blood cells of the subject is amplified in a final volume of 25 microliters containing 1 microliter (100 nanograms) genomic DNA, 2.5 microliters 10X PCR buffer (100 mM Tris, pH 8.3, 500 mM KC1, 1.2 mM MgC1 2 2.5 microliters 10X dNTP mix (2 mM each nucleotide), 2.5 microliters forward primer (BRCAl-11K-F, 10 micromolar solution), 2.5 microliters reverse primer (BRCAI-11K-R, 10 micromolar solution),and 1 microliter Taq polymerase (5 units), and 13 microliters of water.
The PCR primers used to amplify a patient's sample BRCA1 gene are listed in Table II.
The primers were synthesized on an DNA/RNA Model 394® Synthesizer. Thirty-five 15 cycles are of amplification are performed, each consisting of denaturing (95°C; seconds), annealing (55C; 1 minute), and extension (72 0 C; 90 seconds), except during the first cycle in which the denaturing time is increased to 5 minutes, and during the last cycle in which the. extension time is increased to 5 minutes.
PCR products are purified using Qia-quick® PCR purification kits (Qiagen, cat# 28104; Chatsworth, CA). Yield and purity of the PCR product determined e spectrophotometrically at OD260 on a Beckman DU 650 spectrophotometer.
2. Dideoxy Sequence Analysis Fluorescent dye is attached to PCR products for automated sequencing using the Taq Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in both forward and reverse directions on an Applied Biosystems, Inc. (ABI) Foster City, S CA., automated Model 3770 sequencer. The software used for analysis of the resulting data is "Sequence Navigator@ software" purchased through ABI. The BRCA1(O"il)
SEQ.
ID. NO.:1 sequence is entered into the Sequence Navigator® software as the Standard for comparison. The Sequence Navigator® software compares the sample sequence to the BRCA1(omil) SEQ. ID. NO.:1 standard, base by base. The Sequence Navigator® software 4" highlights all differences between the BRCAI(ornil) SEQ. ID. NO.:1 DNA sequence and the patient's sample sequence.
A first technologist checks the computerized results by comparing visually the BRCAI Or a il) SEQ. ID. NO.:1 standard against the patient's sample, and again highlights any differences between the standard and the sample. The first primary technologist then interprets the sequence variations at each position along the sequence.
Chromatograms from each sequence variation are generated by the Sequence Navigator® software and printed on a color printer. The peaks are interpreted by the first primary technologist and a second primary technologist A secondary technologist then reviews the chromatograms. The results are finally interpreted by a geneticist. In each instance, a variation is compared to known polymorphisms for position and base change. If the sample BRCA1 sequence matches the BRCA1(omil) SEQ. ID. NO.:A standard, with only variations within the known list of polymorphisms, it is interpreted as a gene sequence.
EXAMPLE 3 DETERMINING THE ABSENCE OF A MUTATION IN THE BRCA1 GENE USING BRCA1,.mil1 AND SEVEN POLYMORPHIsMS FOR REFERENCE A person skilled in the art of genetic susceptibility testing will find the present invention useful for determining the presence of a known or previously unknown mutation in the BRCA1 gene. A list of mutations of BRCAI is publicly available in the Breast Cancer Information Core at http://www.nchgr.nih.gov/dir/lab-transfer/bic. This data site became publicly available on November 1, 1995. Friend, S. et al. Nature Genetics 11:238, (1995).
Sequencing is carried out as in EXAMPLE 1 using a blood sample from the patient in question. However, a BRCA1(omi) sequence is used for reference and polymorphic sites are compared to the nucleic acid sequences listed above for codons at each polymorphic site. A sample is one which compares to the BRCA1 (ormi2) SEQ. jD. NO.: 3 sequence and contains one of the base variations which occur at each of the polymorphic sites. The codons which occur at each of the polymorphic sites are paired here reference.
AGC and AGT at position 2201, TTG and CTG at position 2430, CCG and CTG at position 2731, GAA and GGA at position 3232, AAA and AGA at position 3667, S TCT and TCC at position 4427, and AGT and GGT at position 4956.
The availability of these polymorphic pairs provides added assurance that one skilled in the art can correctly interpret the polymorphic variations without mistaking a variation for a mutation.
Exon 11 of the BRCA1 gene is subjected to direct dideoxy sequence analysis by asymmetric amplification using the polymerase chain reaction (PCR) to generate a single stranded product amplified from this DNA sample. Shuldiner, et al., Handbook of Techniques in Endocrine Research, p. 457-486, DePablo,F., Scanes, eds., Academic Press, Inc., 1993. Fluorescent dye is attached for automated sequencing using the Taq Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in 15 both forward and reverse directions on an Applied Biosystems, Inc. (ABI) automated Model 377® sequencer. The software used for analysis of the resulting data is "Sequence Navigator® software" purchased through ABL 1. Polymerase Chain Reaction (PCR) Amplification Genomic DNA (100 nanograms) extracted from white blood cells of the-subject is amplified-in a final volume of 25 microliters containing 1 microliter (100 :anograms) genomic DNA, 2.5 microliters 10X PCR buffer (100 mM Tris, pH 8.3, 500 mM K1, 1.2 mM MgC12), 2.5 microliters 10X dNTP mix (2 mM each nucleotide), 2.5 microliters forward primer (BRCAI-11K-F, 10 micromolar solution), 2.5 microliters reverse primer (BRCA1-11K-R, 10 micromolar solution),and 1 microliter Taq polymerase (5 units), and 13 microliters of water.
The PCR primers used to amplify a patient's sample BRCA gene are listed in Table II.
The primers were synthesized on an DNA/RNA Model 394® Synthesizer. Thirty-five cycles are of amplification are performed, each consisting of denaturing (95°C; seconds), annealing (55"C; 1 minute), and extension (72°C; 90 seconds), except during the first cycle in which the denaturing time is increased to 5 minutes, and during the last cycle in which the extension time is increased to 5 minutes.
PCR products are purified using Qia-quick® PCR purification kits (Qiagen, cat# 28104; Chatsworth, CA). Yield and purity of the PCR product determined spectrophotometrically at OD 2 60 on a Beckmarr DU 650 spectrophotometer.
2. Dideoxy Sequence Analysis Fluorescent dye is attached to PCR products for automated sequencing using the Taq Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in both forward and reverse directions on an Applied Biosystems, Inc. (ABI) Foster City, CA., automated Model 377® sequencer. The software used for analysis of the resulting data is "Sequence Navigator® software" purchased through ABL The BRCA1(omi2)
SEQ.
ID. NO.: 3 sequence is entered into the Sequence Navigator® software as the Standard for comparison. The Sequence Navigator® software compares the sample sequence to S 15 the BRCAI(omi2) SEQ. ID. NO.: 3 standard, base by base. The Sequence Navigator® software highlights all differences between the BRCA1(omi2) SEQ. ID. NO.: 3 DNA sequence and the patient's sample sequence.
A first technologist checks the computerized results by comparing visually the BRCA1(omi2) SEQ. ID. NO.: 3 standard against the patient's sample, and again highlights any differences between the standard and the sample. The first primary technologist then interprets the sequence variations at each position along the sequence.
Chromatograms from each sequence variation are generated by the Sequence Navigator® software and printed on a color printer. The peaks are interpreted by the first primary technologist and also by a second primary technologist. A secondary 25 technologist then reviews the chromatograms. The results are finally interpreted by a geneticist In each instance, a variation is compared to known polymorphisms for position and base change. If the sample BRCA1 sequence matches the BRCA1 (om) SEQ.
ID. NO.: 3 standard, with only variations within the known list of polymorphisms, it is interpreted as a gene sequence.
EXAMPLE 4-- DETERMINING THE PRESENC OF A MTA ON IN THE BCA1 GENE USING BRCAlkmil AND SEVEN POLYMORPHISMS FOR RFERENCE A person skilled in the art of genetic susceptibility testing will find the present invention useful for determining the presence of a known or previously unknown mutation in the BRCA1 gene. A list of mutations of BRCA1 is publicly available in the Breast Cancer Information Core at http://www.nchgr.nih.gov/dir/labtransfer/bic. This data site became publicly available on November 1, 1995. Friend, S. et al. Nature Genetics 1.:238, (1995). In this example, a mutation in exon 11 is characterized by amplifying the region of the mutation with a primer which matches the region of the mutation.
Exon 11 of the BRCA1 gene is subjected to direct dideoxy sequence analysis by asymmetric amplification using the polymerase chain reaction (PCR) to generate a single stranded product amplified from this DNA sample. Shuldiner, et al, Handbook S 15 of Techniques in Endocrine Research, p. 457-486, DePabloF., Scanes, eds., Academic Press, Inc., 1993. Fluorescent dye is attached for automated sequencing using the Taq Dye Terminator@ Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in both forward and reverse directions on an Applied Biosystems, Inc. (ABI) automated Model 377@ sequencer. The software used for analysis of the resulting data is "Sequence Navigator® software" purchased through
ABI.
1. Polymerase Chain Reaction (PCR) Amplification Genomic DNA (100 nanograms) extracted from white blood cells of the subject is amplified in a final volume of 25 microliters containing 1 microliter (100 nanograms) 25 genomic DNA, 2.5 microliters 10X PCR buffer (100 mM Tris, pH 8.3, 500 mM KC1, 1.2 mM MgC12), 2.5 microliters 10X dNTP mix (2 mM each nucleotide), 2.5 microliters forward primer (BRCA1-11K-F, 10 micromolar solution), 2.5 microliters reverse primer (BRCA1-11K-R, 10 micromolar solution),and 1 microliter Taq polymerase (5 units), and 13 microliters of water.
The PCR primers used to amplify segment K of exon 11 (where the mutation is found) are as follows: BRCA1-11K-F: 5'-GCA AAA GCG TCC AGA AAG GA-3' SEQ ID NO:69 BRCA1-11K-R. 5'-AGT CTT CCA ATT CAC TGC AC-3' SEQ ID The primers are synthesized on an DNA/RNA Model 394® Synthesizer.
Thirty-five cycles are performed, each consisting of denaturing (95 0 C; 30 seconds), annealing (55"C; 1 minute), and extension (72°C; 90 seconds), except during the first cycle in which the denaturing time is increased to 5 minutes, and during the last cycle in which the extension time is increased to 5 minutes.
PCR products are purified using Qia-quick® PCR purification kits (Qiagen, cat# 28104; Chatsworth, CA). Yield and purity of the PCR product determined spectrophotometrically at OD2so on a Beckman DU 650 spectrophotometer.
2. Dideoxy Sequence Analysis Fluorescent dye is attached to PCR products for automated sequencing using the Taq Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in both forward and reverse directions on an Applied Biosystems, Inc. (ABI) Foster City, CA., automated Model 377® sequencer. The software used for analysis of the resulting data is "Sequence Navigator® software" purchased through ABI. The BRCA1(omi2) SEQ.
ID. NO.: 3 sequence is entered into the Sequence Navigator® software as the Standard for comparison. The Sequence Navigator® software compares the sample sequence to 20 the BRCAI(omi2) SEQ. ID. NO.: 3 standard, base by base. The Sequence Navigator®software highlights all differences between the BRCA1(omi2) SEQ. ID. NO.: 3 DNA sequence and the patient's sample sequence.
A first technologist checks the computerized results by comparing visually the S. BRCA1( mi 2) SEQ. ID. NO.: 3 standard against the patient's sample, and again highlights 25 any differences between the standard and the sample. The first primary technologist then interprets the sequence variations at each position along the sequence.
Chromatograms from each sequence variation are generated by the Sequence Navigator® software and printed on a color printer. The peaks are interpreted by the first primary technologist and a second primary technologist A secondary technologist then reviews the chromatograms. The results are finally interpreted by a geneticist. In each instance, a variation is compared to known polymorphisms for position and base change. Mutations are noted by the length of non-matching variation. Such a lengthy mismatch pattern occurs with deletions and substitutions.
3. Result Using the above PCR amplification and standard fluorescent sequencing technology, The 3888delGA mutation may be found. The 3888delGA mutation The BRCA1 gene lies in segment of exon 11. The DNA sequence results demonstiate the presence of a two base pair deletion at nucleotides 3888 and 3889 of the published BRCA1(°mi) sequence. This mutation interrupts the reading frame of the BRCA1 transcript, resulting in the appearance of an in-frame terminator (TAG) at codon position 1265. This mutation is, therefore, predicted to result in a truncated, and most likely, non-functional protein. The formal name of the mutation will be 3888delGA.
This mutation is named in accordance with the suggested nomenclature for naming mutations, Baudet, A et al., Human Mutation 2:245-248, (1993).
EXAMPLE USE OF THE BRCAInmill GENE THE The growth of ovarian, breast or prostate cancer can be arrested by increasing the 20 expression of the BRCA1 gene where inadequate expression of that gene is responsible for hereditary ovarian, breast and prostate cancer. It has been demonstrated that transfection of BRCA1 into cancer cells inhibits their growth and reduces tumorigenesis. Gene therapy is performed on a patient to reduce the size of a tumor.
The LXSN vector is transformed with any of the BRCA1(omil) SEQ. ID. NO.:1, 25 BRCA1(mi2) SEQ. ID. NO.:3, or BRCAl(Omi3) SEQ. ID. NO:5 coding region.
Vector .e aThe LXSN vector is transformed with wildtype BRCA1(omiD) SEQ. ID. NO.:1 coding ****sequence. The LXSN-BRCAl(omil) retroviral expression vector is constructed by cloning a SalI-linkered BRCA1(omil) cDNA (nucleotides 1-5711) into the XhoI site of the vector LXSN. Constructs are confirmed by DNA sequencing. Holt et al. Nature Genetics 12: 298-302 (1996).
A(etroviral vectors are manufactured from viral producer cells using serum free and phenol-red free conditions and tested for sterility, absence of specific pathogens e and absence of replication-competent retrovirus by standard assays. Retrovirus is stored frozen in aliquots which have been tested.
Patients receive a complete physical exam, blood, and urine tests to determine overall health. They may also have a chest X-ray, electrocardiogram, and appropriate radiologic procedures to assess tumor stage.
Patients with metastatic ovarian cancer are treated with retroviral gene therapy by infusion of recombinant LXSN-BRCAI(omil) retroviral vectors into peritoneal sites containing tumor, between 109 and 1010 viral particles per dose. Blood sampl si t es drawn each day and tested for the presence of retroviral vector by sensitive polymerase chain reaction (PCR)-based assays. The fluid which is removed is analyzed to determine: 1. The percentage of cancer cells which are taking up the recombinant
LXSN-
BRCA1(omil) retroviral vector combination. Successful transfer of BRCA1 gene into cancer cells is shown by both RT-PCR analysis and in situ hybridization.
RT-PCR is performed with by the method of Thompson et al. Nature Genetics 2: 444- 450 (1995), using primers derived from BRCAI(omil) SEQ. ID. NO.:1. Cell lysates are 20 prepared and immunoblotting is performed by the method of Jensen et al. Nature Genetics 303-308 1996) and Jensen et al. Biochemistry 21: 10887-10892 (1992).
2. Presence of programmed cell death using ApoTAGO in situ apoptosis detection kit (Oncor, Inc., Gaithersburg, Maryland) and DNA analysis.
3. Measurement of BRCA I gene expression by slide immunofluorescence or western e" e,-blo t.
Patients with measurable disease are also evaluated for a clinical response to LXSNand local symptoms are followed.
For other sites of disease, conventional response criteria are used as follows: 1. Complete Response complete disappearance of all measurable lesions and of all signs and symptoms of disease for at least 4 weeks.
2. Partial Response decrease of at least 50% of the sum of the products of the 2 largest perpendicular diameters of all measurable lesions as determined by 2 observations not less than 4 weeks apart. To be considered a PR, no new lesions should have appeared during this period and none should have increased in size.
3. Stable Disease, less than 25% change in tumor volume from previous evaluations.
4. Progressive Disease, greater than 25% increase in tumor measurements from prior evaluations.
The number of doses depends upon the response to treatment.
For further information related to this gene therpay approach see in "BRCA1 Retroviral Gene Therapy for Ovarian Cancer" a Human Gene Transfer Protocol:
NIH
ORDA Registration 9603-149 Jeffrey Holt, JT, M.D. and Carlos L. Arteaga,
M.D.
TABLE OF REFERENCES 1. Sanger, et al., 1. Mol. Biol. 42:1617, (1980).
2. Beaucage, et al., Tetrahedron Letters 22,1859-1862, (1981).
3. Maniatis,.et. al in Molecular Cloning:A Laboratory Manual, Cold Spring Harbor, NY,.p 280-281, (1982).
4. Conner, et. al., Proc. Natl. Acad. Sci. U.S.A. 80:278, (1983) 5. Saiki,:et.al., Bio/Technology 3:1008-1012,'(1985) 6. Landgren, et. al., Science 241:1007,(1988) 7. Landgren, et. al., Science 242 "229-237, (1988).
8. PCR. A Practical Approach, ILR Press, Eds. M. J. McPherson, P. Quirke, and G.
R Taylor, (1992).
9. Easton et al., American Journal of Human Genetics 52:678-701, (1993).
U.S. Patent No. 4,458,066.
11. Rowell, et al., American Journal of Human Genetics U:861-865, (1994) 12. Miki, Y. et al., Science 26:66-71, (1994).
13. Friedman, L. et al., Nature Genetics -:399-404, (1994).
14. Baudet, A et al., Human Mutation 2:245-248, (1993).
Friend, S. et al., Nature Genetics 11:238, (1995).
16. Arteaga, CL and JT Holt Cancer Research 56:1098-1103 (1996).
17. Holt, JT et al., Nature Genetics 12:298-302 (1996).
18. Jensen, RA etal., Nature Genetics 12:303-308 (1996).
19. Steeg, P. Nature Genetics 12:223-225 (1996).
Thompson, ME et al., Nature Genetics 9: 444-450 (1995) 21. Holt, JT, and C. Arteaga, Gene Therapy Protocol ORDA 9603-149
ORDA
approved Protocol for BRCA1 Gene Therapy.
"Breast and Ovarian cancer" is understood .by those skilled in the art to include breast and ovarian cancer in women and also breast and prostate cancer in men.
BRCA1 is associated genetic susceptibility to inherited breast and ovarian cancer in women and also breast and prostate cancer in men. Therefore, claims in this document S 15 which recite breast and/or ovarian cancer refer to breast, ovarian and prostate cancers in men and women. Although the invention has been described with reference to the S. presently preferred embodiments, it should be understood that various modifications can be made without departing from the spirit of the invention. Accordingly, the invention is limited only by the following claims.
42 THE NEXT PAGE OF THE SPECIFICATION IS PAGE Page(s) (O2AO6 are claims pages they appear after the sequence listing SEQUENCE LISTING GENERAL INFORMATION: APPLICANT: OncorMed, Inc.
(ii) TITLE OF INVENTION: Coding Sequences of the Human BRCA1 Gene (iii) NUMBER OF SEQUENCES: 72 (iv) CORRESPONDENCE
ADDRESS:
ADDRESSEE: ONCORMED STREET: 200 Perry Parkway CITY: Gaithersberg STATE: MD COUNTRY: USA ZIP: 20877 COMPUTER READABLE FORM: MEDIUM TYPE: Floppy disk COMPUTER: IBM PC compatible OPERATING SYSTEM: PC-DOS/MS-DOS SOFTWARE: PatentIn Release Version #1.30 (vi) CURRENT APPLICATION DATA: APPLICATION NUMBER: US 9703038 FILING DATE:
CLASSIFICATION:
(viii) ATTORNEY/AGENT
INFORMATION:
NAME: Thomas Gallegos REGISTRATION NUMBER: 32,692 REFERENCE/DOCKET NUMBER: PA-0054 CIP PCT S* (ix) TELECOMMUNICATION
INFORMATION:
TELEPHONE: 301-527-2051 TELEFAX: 301-208-6997 INFORMATION FOR SEQ ID NO:1: SEQUENCE CHARACTERISTICS: LENGTH: 5711 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (vi) ORIGINAL
SOURCE:
ORGANISM: Homo sapiens STRAIN: BRCA-1 (Viii) POSITION IN GENOME: CHROMOSOmE/SEGMENT: 17 MAP POSITION: 17q21 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: AGCTCGCTGA GACTrcCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA
TAACTGGGCC
CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGA
CAAGAAA
TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCI *0S@ 0@ 0
S.
SV *5 0*
S
S
*5 S 0*S S 0*SS*S 5 0
S
0
SS*@
S.
*0S*
S
0 009g TCTTAGAGrC
ACATATTTTC
GTCCTTTATG.
AACTTGTTGA
ATGCAAACAG
AAGTTITCTAT
AACCCGAAAA
CTGTGAGAAC
AATTGGGATc
ATCAAGAATT
CAAAAAAGGC
CCAGTAATAA
ATCAGGGTAG
GCTCAI'ACA
AGGCTGAATT
GGGCTGG-AAG
ATCTGAATGC
CAGAGAATCC
TCCCATCTGT
CAAATTTTGC
TAAGAATGAT
,AGAGCTATTG
CTATAATTTT
CATCCAAAGT
TCCTTCCTTG
TCTGAGGACA
TGATTCTTCT
GTTACAAATC
TGCTTGTGAA.
TGATTTGAAC
TTCTGTTTCA
GCATGAGAAC
CTGTAATAAA
TAAGGAAACA
CTGGAGTTGA
ATGCTGAAAC
ATAACCAAAA
AAAATCATTT
GCAAAAAAGG
ATGGGCTACA
CAGGAAACCA
AAGCAGCGGA
GAAGATACCG
ACCCCTCAAG
TTrTTCTGAGA.
A.CCACTGAGA
PACTTGCATG
AGCAGTTTAT
k.GCAAACAGC
I'GTAATGATA
TCAAGGAACC TGTCTCCACA ?ITCTCAACCA GAGAAAGGG GGAGCCTACA AGAAAGTACG GTGCTTTTCA GCTTGACACA AAAATAACTC
TCCTGAACAT
GAAACCGTGC CAAAAGAcI'r GTCTCAGTGT
C-CAACTCTCT
TACAACCTCA
AAAGACGTCT
TTAATAAGGC
AACTTAI'GC
GAACCAGGGA
TGAAATCAGT.
CGGATGTAAC
AAATACTGAA
AGCGTGCAGC TGAGAGGCAT TGGAGCCA'rG
TGGCACAAAT
TACTCACTAA AGAcGAT CTGGCTTAGC
AAGGAGCCA
ATGCAGAAAA
A.AGTGTGACC
CCTTCACAGT
AGATTTAGTC
GGTTTGGAGT
CTAAAAGATG
CTACAGAGTG
AACCTTGGAA
GTCTACATTG
AGTGTGGGAG
TTGG.ATTCTG
CATCATCAAC
CCAGAAAAGT
PCTCATGCCA
A.ATGTAGAAA
180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 GGCGGACTCC CAGCCA. AAAAAGGTAG TGATCCCCTG TGTGAGAGAA AAGAATGGAA TAAGCAGAAA
CTGCCATGCT
TAGAGATACT GAAGATGTTC CTTGGATAAC ACT.AAATAGC
AGCATTCAGA
AAGTTAATGA GTGGTTTTCC AGAAGTGATG AACTGTTAGG
TTCTATGC
GGGAGTCTGA ATCAAATGCC. AAAGTAGCTG ATGTATTGGA
CGTTCTAAT
AATATTCTGG TTCTTCAGAQ AAAATAGACT TACTGGCCAG
TGATCCTCAT
TCACATGATG
GAGGTAGATG
GAGGCTTTAA
1320 1380
TATGTA)AG
TTGGGAAAC
TAAT'rATAGG
AATTAAAGCG
CAGATTTGGC
TGAAAGAGTT
CTATCGGAAG
AGCAT'rTGTT
TAAAAGGAGA
AGTTCAAAAG
AGAATGGTCA
AGTGATGAAT
CTATTCAGAA
AAACGAAAGC
ACAATTCAAA
TGAGAAAAAT
TGAACCTATA
AGCACCTAAA
0000 @0 00 0 0* S S
SS
0S OSOe
S
0 0@ 5 0 000 0 0 0000
S.
S 6 0000 6@@S 0 *500
S.
.0*5
S
*05@
S
0600 ATGCGCTTA
ACTAGTAGTC
TTGATAGTTc
GGCACAGCAG
GTAACAAGCC
AGTTAACAAA
TTGTCAATCC
CTAATAATGC
AAAGATCTGT
AAAGTATCTC
GTGTGAGTCA
ATAATAGAAA
GGGAAACAAG
TCAAGGTTTC
AATGTGCAAC
TTGA.ATGTGA
TTCTAGCAGT
AAACCTACAA
AAATGAACAG
TGCACCTGGT
TAGCCTTCCA
TGAAGACCCC
AGAGAGTAGC
GTTACTGGAA
GTGTGCAGCA
TGACACAGAA
CATAGAAATG
AAAGCGCCAG
ATTCTCtTGCC
ACAAAGGAA
C-ACTCCAAT
AAGGCAAGCC
ACTGAGCCAC
CCTACATCAG
ACTCCTGAAA
ATTACTAATA
CCTAACCCAA
A.GCAGCAGTA
AAGAATAGGC
A.GTAGAAAT~C
GAAGAGATAA
CTCATGGAAG
kCAAGTAAAA rCTTTTACTA kGAGAAGAA
LAAGATCTCA
.GTATTTCAC
'TTAGCACTC TTGAAAACC
C
;GCITrAAGT
LAAGAAAGTG
CATTTGCTC ACTCTGGGT C
A.AAATCAAGG
TAAGCAATAT
TGAGdAGGAA
TAAGCCCACC
AGAAAAAAAA
GTAAAGAAICC
GACATGACAG
,GTGTTCAAA
!.AGAAGAGAA
M'TTAAGTGG
L'GGTACCTGG
'AGGGAAGGC
CAAGGGACT
LTCCATTGG
ACTTGATGC
r-TTTTCAAA CTrAAAGAA
AAAGAATGA
GGAACTCGAA
GTCTTCTACC
TAATTGTACT
GTACAACCAA
TGCAACTG
TGATACTTTC
TACCAGTGAA
ACTAGAAACA
AGAAAGGG'T
TACTGATTAT
AAAAACAGAA
AATTCATGGT
A.CATGAAGTr
TCAGTATTTG
TCCAGGAAAT
LTTAAATATCC
AIGCTATC
rA~CA ATGCCAGTcA
GCCAAGAAGA
CCAGAGCTGA
C~AAAGAAT
GTTAAAGTGT
TTGCAAACTG
GGCACTCAGG
CCAAATAAAT
.TGTTCCAAAG
AACCACAGTC
CAGAATACAT
GCAGAAGAGG
CAGTAGAGAG
TCCCCAACT'r
AGATAATACA
GCCTTCATCc
TGATAAATCA
GTGGTCATGA
TAATAT'rG
AAGCCATGTA
AGAG.CGTCCC
TGAGGATTTr
GGGAACTAAC
q&ATAAAACA
GACAAAATAT
ACTGAAAT
CTCACAAATA
ATCA.AGAAAG
CAAACGGAGC
TAGAATCACT CGAAAAAGA TCTGCTTTCA 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2 460 2520 2580 2640 2700 2760 2820 2880 A.CAAAGTCCA
AAAGTCACTT
GTCTAATATC AAGCCTGTAC AGACAGTTA TATCACTGCA GGCTTTCCTG TGGTTGGTr- GAAAGATAAG
CCAGTTGATA
2940 ATGCCAAT TAGTATCAAA. GGAGGCTCTA GGTTTTGTCT
ATCATCTCAC
ACGAAACTGG ACTCATTACT CCAAATAAAC
ATGGACTTT'
CACCACTTTT
AAAACTTTGA,
GTACAGTGAG
CAAGCAATAT
TAGGTTCCAG
-ATGCTATGCT
GTAATTGTAA
TCCCATCAAG
GGAACATTCA
CACAATTAGC
TAATGAAGTA
TGATGAAAAC
TAGATTAGGG
GCATCCTGAA
000..~ 0
-S
ATACAGATTT.CTCTCCATAT
ATGCATCTCA
GGTTGTTCT
.AAGATACTAG,
TTTTGCTGAA
TCCAGAGAGG
AGAGCTTAGC
GTTACCGAAG
AGGGGCCAAG
AAGAGCTTCC
CTGCTTCCAL
CTACTAGGCA TAGCACCGTr TATCATTGAA
GAATAGCTTA
AGGAACATCA
CCTTAGTGAG
GTGAATTGGP. AGACTTGACr CCAAACAAAT GAGGCATCAG TTTCAGATGA
TGAAGAAAGA
TGGATTCAA
CTTAGGTGAA
ACTGCTCAG
GCTATCCTCT
AACATAACCT
GATAAAGCTC
ATGGGAGCCA GCCTTCTAAC
I
TCATTTGTTj
ATGTCACCT(
CGTAATAC)
GGTTCCAGT;
ATTCAAGCAG-
GTT'rTGCAAC
ATAAAAAAGC
CTGATTTCAG
GAGACACCTG
AATGACATTA
A.GGAGTCCTA
kAATTAGAGT
:ACTTGTTAT
7CTACCG.AGT kATGACTGCA
;AAACAAAAT
;CAAATACAA
'CTGAAAGCC
;GAACGGGCT
;CAGCATCTG
AGAGTGACA
AGCAGGAAA
.GCTACCCTT
k. AA.ACTAAATC
;AAAGAGAAA
LTTAGAGAAAJ
LCTAATGAAGI
AACTAGGTAG
CTGAGGTCTA
AAGAATATGA
ATAACTTAGA
ATGACCTGTT~
AGGAAAGTTrC
GCCCTTTCAC
CCTCAGAAGA
TTGGTAAAGT
GTCTGTCTAA
GTAACCAGGT
GTTCTGCTAG
ACACCCAGGA
AGGGAGTTGG
TGGAAGAAAA
GGTGTGAGAG
TTTTAACCAC
TGGCTGAACT
CCATCATAAG
r~ ACAAAACCU,
TAAGAAA
GGGAATGAG
TGTTTTTAAA
GGGCTCCAGT
AAACAGAGGG
TAAACAAAGT
AGAAGTAGTT
ACAGCCT A
AGATGATGGT
TGCTGTTTTT~
CCATACACAT
GA.ACTTATCT
AAACAATATA
GAACACAGAG
AATATTGGCA
CTTGT'I2'TCT
TCCTTTCTTG
TTCAGAGGCA
LTATCGTATAC
kCrTGCTAGAIG
AACATTCCAA
GGAGCCAGCT
ATTAATGAAA
CCAAAATTGA
CTTCCTGGAA
CAGACTGTTA
GGAAGTAGTC
GAAATAAAGG-
AGCAAAAGCG:
TTGGCTCAGG
AGTGAGGATG
CCI'CTCAGT
GAGAATTTAT
AAGGCATCTC
TCACAGTGCA
ATTGGTTCTT
3000 3060.
3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 ?CTGAGTGAC AAGGAATTGG TAATCAAGAA
GAGCAAAGCA
,TGAAACAAGC GTCTCTGAAG TCGCAGAGG GATACCATGC AG-AAGCTGTG
TAGAACAGC
TGACTCCTCT GCCCTTGAGG ACCTGCGAAA TCCAGAACAA AGCACATCAG AAAAAGCAGT ATTAACTTCA CAG AAAAGTA 4500 GTGAATACCC TATAAGCCAG AATCCAGAAG GCCT'TTCTGC TGACAAGTTT
GAGGTGTCTG,
4560
CAGATAGTTC
GCCCATCATT
ACTACCCATc
AGTCTGGGCC
CCCCTTACCT
TACCAGTAAA
AGATGATAGG
TCAAGAGGAG
ACACGATTTG
GGAATCTGGA
*AATAAAGAAC
TGGTACATGC
CTCATTAAGG
ACGGAAACAT
ATCAGCCTCT
GCTCGTGTTG
GCAGAATCTG
ATGGAAGAAA
AAAAGAATGT
CAGGAGTGG2
ACAGTTGCT(
TTGTTGATG1
CTTACTTGCC
TCTCTGATGA
GCAACATACC
CCCAGGGTCC
GTGTGAGCAG
CCATGGTGGT
AAGGTCATCC
TGGGAGTCT'r
CCTTCTAAAT
CAGA.ATAGAA
GGAGGAGCAA CAGCTGGAAG AAGACAGAGC CCCAGAGTCA
AAGTTCCCCA
ATACTGCTGG
CTTCAACAGA
AATTTATGCT
CTGAAGAGAC
TGAAATATTT
AGTCTATTAA
TCAATGcGAA
TCAGGGGGCT
AATGGATGGT
GCACAGGTGT
TCCATGCAAT
ATTGAAAGTT
GTATAATGCA
AAGGGTCAAC
CGTGTACAAG
TACTCATGTT
TCTAGGAATT
AGAAAGAAAA
AAACCACCAA
AGAAATCTGT
ACAGCTGTGT
CCACCCAATT
TGGGCAGATG
TTTGCCAGAA
AACACCACAT
GTTATGAAAACAGATGCTGA
GCGGGAGGAA,
AATGGGTAGT
ATGCTGAAT G AGCATGAT'TT GGTCCAAAGC GAGCAAGAGA TGCTATGGGC
CCTTCACCAA
GGTGCTTCTG TGGTGAAGGA GTGGTTGTGC
AGCCAGATGC
TGTG.AGGCAC
CTGTGGTGAC
AAGGCAAGAI
CCCTGtAATC'I ATCTrCAACC
AGCTGCTGCT
CaAGAAGCCA
GTCTGGCCTG
CACTTTAACT
GTTTGTGTGT
TAGCTATTTC
TGAAGTCAGA
ATCCCAGGAC
CATGCCCACA
GCTTTCATCA
CTGGACAGAG
CCGAGAGTGG
CTAGAGGGAA
GATCCTTCTG
TCTGCATTGA
CATACTACTG
GAATTGACAG
ACCCCAGAAG
AATCTAATTA
GAACGGACAC
TGGGTGACCC
GGAGATGTGG
AGAAAGATCT
GATCAACTGG
TTCACCCTTG
GACAATGGCT
GTGTTGGACA
4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 570.0 5711 GTGTAGCACT CTACCAGTGC CAGGAGCTGG ACACCTACCT GATACCCCAG, ATCCCCCACA GCCACTACTG
A
INFORMATION FOR SEQ ID NO-2: SEQUENCE CHARACTERISTICS: LENGTH: 1863 amino acids TYPE: amino acid STRA1NDEDNESS: not relevant TOPOLOGY:-not relevant (ii) MOLECULE TYPE: protein (vi) ORIGINAL SOURCE: ORGANISM: Homo sapiens STRAIN:
BRCAI_
(viii) POSITION IN GENOME: CHROMOSOME/ SEGMENT: 17 MAP POSITION: 17q21 (xi) Met 1 Ala SEQUENCE DESCRIPTION: SEQ ID. NO:2:- Asp Leu Ser Ala Leu Arg Val Glu Glu 5 10 Met Gin Lys Ile Leu Glu Cys Pro Ile Gin Asn Asn Lys Giu Pro Val Sex Thx Lys Cys Leu Lys Gin Lys Asn Leu Leu Asp Val Leu Ile Giu Thx Gly Leu Giu Asn Gly Pro 145 Thr Ser Lys Pro Ala Ser 130 Ser Val Val Ala Gin 210 Cys 100 Pro Giu 115 Arg Asn Leu Gin Arg Thr Tyr Ile 180 Thr Tyr 195 Gly Thr Giu Phe Asn Gir Thx Lys 70 Giu Leu Tyr Ala His Leu Axg Ala Giu Thr 150 Leu Arg 165 Giu Leu ILys 55 Arg Leu Asp 40 Lys Ser Lys Ile le cys 90 Asn Sex Tyr Asn Phe 105 Lys Asp Giu Vai Sex 120 Lys Arg Leu Leu Gin 135 Sex Leu Sex Val Gin 155 Thx Lys Gin A-rg Ile 170 Gly Sex Asp Ser Ser 185 Val Gly Asp Gin Giu 200 Glu Ile Sex Leu Asp 215 Thx Asp Vai Thx Asn Ala Phe Gin Leu Ala Lys Lys Glu Ile le Gin Sex 125 Sex Giu Pro Glu 140 Leu Sex- Asn Leu Gin Pro Gin Lys 175 Giu Asp Th~r Val 190 Leu Leu Gin Ile 205 Sex Ala Lys Lys 220 Thx Giu His His Asp Asn Met Asn Giy 160 Thx Asn Thx Alda 31n His Gly Leu lePhe Pro Sex Gin Giu Cys Gin: Sex Phe Pro Ser Asp Glu 225 Pro His Asn Asn Glu Lys 260 Asp 245 Tyr 230 Leu Gln Asn Gly Thr Thr Glu 250 Ser Ser Val Arg Ala Asn Leu Glu 255 Val 240 Arg Glu 265 270 Pro Cys Giy Thr Asn Thr His Ala Ser Ser Leu Gin His 27 280 c o o r r s Ser Cys 305 Trp Glu Trp Asp Trp 385 Gly Asn Ala Ser Tyr 465 Leu Pro His I Leu 290 Asn Ala Lys Asn Val 370 Phe Glu Glu Ser Lys 450 A.rg Tie .eu ?ro Leu Leu Lys Ser Gly Ser Lys Val 340 Lys Gln.
355 Pro Trp Ser Arg Ser Glu Val Asp 420 Asp Pro E 435 Ser Val C Lys Lys Ile Gly A 4 Thr Asn L 500 Glu Asp P Th Lys Lys 325 Asp Lys Ile Ser Ser 105 'lu iis ;lu Ja da 85 'ys 'he Lys Gir 310 Glu Leu Leu Thr Asp 390 Asn Tyr Glu Ser Ser 470 Phe Leu Ile Asp 295 Pro Thr Asn Pro Leu.
375 Glu Ala Ser Ala I Asn 1 455 Leu I Val 'I Lys L Lye L Ar Gl? Cys Ala Cys 360 Asn Leu Lys 3iy jeu 140 le !ro 1hr rg 'ys Met Leu Asn Asp 345 Ser Ser Leu Val Ser 425 Ile Glu Asn Glu Lys 2 505 Ala 2 Asn Ala Asp- 330 Pro Glu Ser Gly Ala.
410 Ser Cys Asp i Leu Pro 190 .rg Val Arg 315 rg Leu Asn Ile Ser 395 Asp lu Lys Ser 175 ;In Lrg Glu 300 Ser Arg Cys Pro Gin 380 Asp.
Val Lys Ser Ile I 460 His Ile I Pro I 285 Lys Gin Thr Glu Arg 365.
Lys Asp Leu Ile ;lu 145 ?he ral :le ,hr i Glu Asn Ala Glu His Asn Pro Ser 335 Arg Lys 350 Asp Thr Val Asn Ser His Asp Val 415 Asp Leu 430 Arg Val I Gly Lye Thr Giu 2 Gin Giu I 495 Ser Giy I Ser Phe Arg 320 Thr Glu Glu Glu Asp 400 Leu Leu lis rhr Lsn 180 Lrg ~eu 510 sp Leu Ala Val Gin Lys Thr Pro Glu 530 Val Met 545 Ser Ile Glu Ser Asn Met Asn Arg 610 Leu Val 625 Ile Asp Gin. Met I Glu Pro a Ser Lys A 690 Ala Pro G 70'5 Phe Val A Thr Val L Ser Gly G 7! Ile Ser L 770 Leu Leu G: 785 Cys Val SE 515 Met Ile Asn Gin Asn Ile Thr Asn 550 Gin Asn Giu Lys 565 Ala Phe Lys Thr 580 Glu Leu Giu Leu, 595 Leu Arg Arg Lys Val Ser Arg Asn 630 Ser Cys Sex Ser 645 ?ro Val Arg His S 660 a Thr Gly Ala I Xrg His Asp Ser A 6 ly Ser Phe Thr L 710 Sn Pro Ser Leu P 725 ys Val Ser Asn A.
740 lu Arg Val Leu G: 55 eu Val Pro Gly T 7.
Lu Vai Ser T= Li .790 3r Gn Cys Ala Al 805 Gi 53 Se Asl Ly, Asi Sex 615 Leu 3er jer ,ys sp 95 ys ro sn in Ir 75 uu aa 520 y Thr Asn Gin Thr
S
r Gly His Giu Asn 555 n Pro Asn Pro Ile 57 0 s Ala Giu Pro Ile 585 I lie His Asn Ser 600 Ser Thr Arg His Ser Pro Pro Asn 635 Glu Giu Ile Lys 650 Arg Asn Leu Gin 665 Lys Ser Asn Lys 1 680 Thr Phe Pro Giu 1 7 Cys .Ser Asn Thr S 715 Arg Giu Glu Lys G 730 Ala Glu Asp Pro L 745 Thr Giu Arg Ser V 760 Asp Tyr Gly Thr G 7 Gly Lys Ala Lys T] 795 Phe Giu Asn Pro L 810 Gi 54 Ly Gl Se~ Lyr Ile 62C Cys Lys eu 'ro ~eu '00 ;er lu ys al In s' 52 u Gi 0 s Th u Se: r Se: .Al 60E SHis Thr Lys Met Asn 685 Lys Glu Glu Asp Glu 765 Glu Glu Gly n A~ rLl r Le r Se 59 I Pr Al
GI
Ly.
Glt 67C Glu Leu Leu Lys Leu 750 Ser Ser Pro Leu 3n Gly Gin 's Gly Asp 560 u Giu Lys 575 r Ile Sex 0 o Lys Lys a Leu Glu i Leu Gin 640 Tyr Asn 655 Gly Lys Gin Thr Thr Asn Lys Glu 720 Leu Glu 735 Met Leu Ser Ser Ile Sex Asn Lys 800 Ile His 815 Gly Cys Leu Gly Giu Ser' 850 Lys Arg 865 Giu Cys Pro Lys Asn Glu Phe Pro 930 Ser Ile Sex His 835 Giu Gin Ala Val Ser 915 Val Lys Lys Asp Asn Arg Asn Asp Thr Glu Gy Phe 820 '825 Glu Val Asn His Ser Arg Giu Th~r Ser le 840 845 Leu Asp Ala Gin Tyr Leu Gi-n As n Thz Phe 855 -860 Ser Phe Ala Leu Phe Ser Asn Pro Gly Asn 870 875 Th~r Phe Ser Ala His Ser Gly Ser Leu Lys 885 890 Thr Phe Giu Cys Giu Gin Lys Glu Giu Asn 900 905 Asn Ile Lys Pro Val Gin *Thr Vai:-Asn le 920 925 Val Gly Gin Lys Asp Lys Pro Val Asp Asn 935 940 Giy Giy Ser Arg Phe Cys Leu Ser Ser Gin Lys 830 Glu Lys Ala Lys Gin 910 Thr Aila Phe Tyr Pro Met Giu Val Ser Giu Giu 880 Gin Ser 895 Giy Lys Ala Gly Lys Cys Arg Gly 950 Asn Giu Thr Giy Leu Ile Thr Pro Asn Lys His Giy Leu*Leu Gin Asn 965 970 975 Pro Ty'r Arg Ile Pro Pro Leu Phe Pro Ile Lys Ser Phe Vai Lys Thr 980. 985 990 LYS Cys Lys Lys Asn Leu Leu Giu Giu Asn Phe Giu Giu His Ser Met 995 1000 i1005 Ser Pro Glu Arg Giu Met Gly Asn Giu Asn Ile Pro 5cr Thr Val Ser 1010 1015 1.020 Thr Ile Ser Arg Asn Asn Ile Ar g Glu Asn Val;Phe Lys Giy Ala Scr 1025 -1030 I- 39 1 040 Ser 5cr Asn Ile Asn Giu Vai Giy Ser Ser Thr Asn Giu Val Gly Ser 1045 1050 1055 Ser Ile Asn Glu Ile Giy Ser Ser Asp Giu Asn Ile Gin Ala Giu Leu 1060 1065 1070 Giy Arg Asn Arg Gly Pro Lys Leu Asn Ala Met Leu Arg Leu Giy Val 1075 1080 1085 Leu Gin Pro Giu Vai Tyr Lys Gin Scr Leu Pro Gly Ser Asn Cys Lys 1090 1095 1100 His Pro Glu Ile Lys Lys Gin Giu Tyr Giu Glu Val Vai Gin Thr Val 1105 1110 1115 1120 Asn Thr Asp Phe Ser Pro Tyr Leu Ile Ser Asp Asn Leu Giu Gin Pro 1125 1130 1135 Met Gly Ser Ser His Ala Ser Gin Val Cys Ser Glu Thr Pro Asp Asp 1140 1145 1150 Leu Leu Asp Asp Giy Glu Ile Lys Glu Asp Thr Ser Phe Ala Giu Asn 1155 .1160 1165 Asp Ile Lys Glu Ser Ser Ala Vai Phe Ser Lys Ser Val Gin Arg Gly 1170 1175 1180 Glu Leu Ser Arg Ser Pro Ser Pro Phe Thr His Thr His Leu Ala Gin 1185 1190 1195 1200 Gly Tyr Arg Arg Gly Aia Lys Lys Leu Glu Ser--Ser Glu Glu Asn Leu 1205 12106 1215 Ser Ser Glu Asp Giu Giu Leu Pro Cys Phe Gin His Leu Leu Phe Gly 1220 1225 1230 Lys Val Asn Asn Ile Pro Ser Gin Ser Thr Arg His Ser Thr Val Ala 1235 1240 1245 Thr Giu Cys Leu Ser Lys Asn Thr Giu Glu Asn Leu Leu Ser Leu Lys 1250 1255 1260 Asn Ser Leu Asn Asp Cys Ser Asn Gn Val Ile Leu Ala Lys Ala Ser 1265 1270 1275 1280 Gin Giu His His Leu Ser Glu Glu Thr Lys Cys Ser Ala Ser Leu Phe 1285 1290 1295 Ser Ser Gin Cys Ser Gu Leu Gu Asp Leu Thr Ala Asn Thr Asn Thr 1300 1305 1310 Gin Asp Pro Phe-Leu Ile Gly Ser Ser Lys Gin Met Asg His Gin Ser 1315 1320 1325 Glu Ser Gin Gly Val Gly Leu Ser Asp Lys Glu -Leu Val Ser Asp Asp 1330 1335 1340 Glu Giu Arg Gly Thx Gly Leu Giu Glu Asn Asn Gin Glu Giu Gin Ser 1345 1350 1355 1360 Met Asp Ser Asn Leu Gly Giu Ala Ala Ser Gly Cys Glu Ser Giu Thz 1365 1370 1375 Ser Val Ser Glu Asp Cys Se: Giy Leu Ser Ser Gin Ser Asp Ile Leu 1380 1385 1390 Thr Thr' Gin Gin Arg Asp Thr Met Gin His Asn Leu Ile Lys Leu Gin 54 1395 1400 1405 Gin Glu Met Ala Glu Leu Glu Ala Val Leu Glu Gin His Gly Ser Gin 1410 1415 1420 Pro Ser Asn Ser Tyr Pro Ser Ile Ile Ser Asp Ser Ser Ala Leu Glu 1425 1430 1435 1440 Asp Leu Arg Asn Pro Glu Gin Ser Thr Ser Glu Lys Ala Val Leu Thr 1445 1450 1455 Ser Gin Lys Ser Ser Glu Tyr Pro Ile Ser Gln Asn Pro Glu Gly Leu 1460 1465 1470 Ser Ala Asp Lys Phe Glu Val Ser Ala Asp Ser Ser Thr Ser Lys Asn 1475 1480 1485 Lys Glu Pro Gly Val Glu Arg Ser Ser ProiSer-ys Cys Pro Ser Leu 1490 1495 -1500 Asp Asp Arg Trp Tyr Met His Ser Cys Ser Gly Ser Leu Gin Asn Arg 1505 1510 1515 1520 Asn Tyr Pro Ser Gin Glu Glu Leu Ile Lys Val Val Asp Val Glu Glu 1525 1530 1535 Gln Gln Leu Glu Glu Ser Gly Pro His Asp Leu Thr Glu Thr Ser Tyr "1540 1545 1550 Leu Pro Arg Gin Asp Leu Glu Gly Thr Pro Tyr Leu Glu Ser Gly Ile 1555 1560 1565 Ser Leu Phe Ser Asp Asp Pro Glu Ser Asp Pro Ser Glu Asp Arg Ala 1570 1575 1580 Pro Glu Ser Ala Arg Val Gly Asn Ile Pro Ser Ser Thr Ser Ala Leu 1585 1590 1595 1600 Lys Val Pro Gin Leu Lys Val Ala Glu Ser Ala Gin Gly Pro Ala Ala 4605 1610 1615 Ala His Thr Thr Asp-Thr Ala Gly Tyr Asn Ala Met-Glu Glu Ser Val 1620 1625 1630 Ser Arg Glu Lys Pro Glu Leu Thr Ala Ser Thr Glu Arg Val Asn Lys 1635 1640 1645 Arg Met Ser Met Val Val Ser Gly Leu Thr Pro Glu Glu Phe Met Leu 1650 1655 1660 Val Tyr Lys Phe Ala Arg Lys His His Ile Thr Leu Thr Asn Leu Ile 1665 1670 1675 1680 Thr Glu Glu Thr Thr His Val Val Met Lys Thr Asp Ala Glu Phe Val 1685 1690 1695 Cys Glu Arg Thr Leu Lys Tyr Phe Leu Gly Ile Ala Gly Gly Lys Trp 1700 1705 1710 Val Val Ser Tyr Phe Trp Val Thr Gin Ser Ile Lys Glu Arg Lys Met 1715 1720 1725 Leu Asn Glu His Asp Phe Glu Val Arg Gly Asp Val Val Asn Gly Arg 1730 1735 .1740 Asn His Gin Gly Pro Lys Arg Ala Arg Glu Ser Gin Asp Arg Lys lie 1745 1750 .1755 1760 Phe Arg Gly Leu Glu Ile Cys Cys Tyr Gly Pro Phe Thr Asn Met Pro 1765 1770 1775 Thr Asp Gin Leu Glu Trp Met Val Gin Leu Cys Gly Ala Ser Val Val 1780 1785 1790 Lys Glu Leu Ser Ser Phe Thr Leu Gly Thr Gly Val His Pro Ile Val 1795 1800 1805 Val Val Gin Pro Asp Ala Trp Thr Glu Asp Asn Gly Phe His Ala Ile 1810 1815 820 Gly Gln Met Cys Glu Ala Pro Val Val Thr Arg Glu Trp Val Leu Asp S1825 1830 1835 .1840 Ser Val Ala Leu Tyr Gin Cys Gin Glu Leu Asp Thr Tyr Leu Ile Pro 1845 1850 1855 Gin Ile Pro His Ser His Tyr 1860 S(2) INFORMATION FOR SEQ ID NO:3: SEQUENCE CHARACTERISTICS: LENGTHR-5711 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear S" (ii) MOLECULE TYPE: cDNA (vi) ORIGINAL
SOURCE:
ORGANISM: Homo sapiens STRAIN: BRCA1 (viii) POSITION IN GENOME: CHROMOSOME/SEGMENT: 17 MAP POSITION: 17q21 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: AGCTCGCTG GACTTCCTGG ACCCCGCACC AGGCTGTGG GTTTC!TCAGA
TAACTGGGCC
CCTGCGCTCA. GGAGGCCTTC ACCCTCTGC TGGATTTATC TGCTCTTcGC TCTTAGAGG
TCCCATCTGT
ACATATTTTG
CAAATTTTGC
GTCCTTTATG TAAGAATGAT AACTTGTTGA. AGAGCTATTG ATGCAAACAG.
CTATAATTTT
AAGTTTCTAT
CATCC.AAAGT
AACCGAAA..TCCTTCCTTG
CTGTGAGAAC
TCTGAGGACA
GTTGAAGAAC
CTGGAGTTG;
ATGCTGAAAC
ATAACCAAAA
AAAATCATT
GCAAAAAAGG
ATGGGCTACA
CAGGAAACCA
AAGCAGCGGA
GAAGATACCG
A CCCCTCAAG
TTTTCTGAGA
ACCACTGAGA
AACTTGCATG
4.
4
AATTGGGATC
ATCAAATT
CAAAAAAGGC
CCAGTAATAA
ATCAGGGTAG
GCTCATTAcA AGGCTGAATr
GGGCTGGAAG
ATCTGAATGC
CAGAGAATCC
AAGTTAATGA
GGGAGTCTGA
AATATTCTGG
TGATTCTTCT
GTTACAAATC
TGCTTGTGAA
TGATTTGAAC
TTCTGTTTCA
r' CTGGGTAAAG
;TACAAAATGT
LTCAAGGAACC
TTCTCAACA
GGAGCCTACA
AAAATAACTC
GAAACCGTGC
GTCTCAGTGT
TACAACCTCA
TTAATAAGGC
GAACCAGGGA
CGGATGTAAC
AGCGTGCAGC
TGGAGCCATG'
TACTCACTAA2 CTGGCTTAGC2
GGCGGACTCCC
AAGAATGGAA
CTTGGATAAC
AACTGTTAGG
ATGTATrGG
C
TACTGGCCAG
TI
TTCATTGGA). CAGAAAGAAA
CATTAATGCT
TGTCTCCACA
GAAGAAAGGG
AGAAAGTACG
GCTTGACACA
TiCCTGAACAT CAAAAG.AcTT
CCAACTCTCT
AAAGACGTCT
AACTTATTGC
TGAAATcAGT AAATACTG7A
TGAGAGGCAT
rGGcAcAAAT kGACAGAATG
ILAGGAGCCAA
!A CACAGAA 7AAGCAGAAA
LCTAAATAGC)
.'TCTGATGAC
GTTCTAAATC
GATCCTCAT
ATGCAGAAA
AAGTGTGACC
CCTTCACAGT
AGATTTAGTC
GGTTTGGAGT
CTAAAAGATG
CTACAGAGTG
AACCTTGGAA
GTCTACATTG
AGTGTGGGAG
TTGGATTcTG
CATCATCAAC
CCAGAAAAGT
A-CTCATGCCA
kATGTAGAAA
:ATAACAGAT
U-AAAAGGTAG
:TGCC-ATGCT
LGCATTCAGA
?CACATGTG
!AGGTAGATG
;AGGCTTTAA
GCATGAGAAC AG-CAGTTTAT CTGTAATA& AGCAAACAGC TAAGGAAACA
TG-TAATGATA
TGATCCCCTG TGTGAGAGAA TAGAGATACT
GAAGATGTTC
GTGGTTTT'CC AGAAGTGATG ATCAAATGCC AAAGTAGCTG TCTTCAGAG
AAAATAGACT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320" 1380 1440 TATGTAAALAG TGAAAGAGTT CACTCCAAAT CAGTAGAGAG T.AATATTGAA GACAAAATAT 10 1500
TTGGGA.AAAC
TAATTATAGG
AATTAAAGCG
CAGATTTGGC
AGAATGGTCA
CTATCGGAAG
AGCATT'rGTT
TAAAAGGAGA
AGTTCAAAAG
AGTGATGAAT
AAGGCAAGCC
ACTGAGCCAC
CCTACATCAG
ACTCCTGAAA
ATTACTAATA
TCCCCACT AAGCCATGTA
ACTGAAAATC
AGATAATACA AGAGCGTCCC
CTCACAAATA
GCCTTCATCC TGAGGATTTT
ATCAAGAAG
TGATAAATCA GGGAACTAAC
CAAACGGAGC
GTGGTCAT GAATAAAACA
AAAGGTGATT
CTATTCAG;A~ TGAGAAAAAT CCTAACCCAA. TAGAATCACT CGAAAAAGA
AAACGAAAGC
ACAATTCA
TGAACCTATA
AGCACCTAAA
AGCAGCAGTA
AAGAATAGGC
TAAGCAATAT
TGAGGAGGAA
S.
*ATGCGCTTGA
TTGATAGTTG
GGCACAGCAG
GTAACAAGCC
AGTTAACAAA
TTGTCAATcc
-CTAATAATGC
AAAGATCTGT
AAAGTATCTC
GTGTGAGTCA.
ATAATAGAAA
GGGAAACAAG
TCAAGGTTTC
TTGAATGTGA
AGACAGTTAA
ATGCCAAATG
ACAAACGG
TTCTAGCAGT
AAACCTACAA
AAATGAACAG
TGCACCTGGT
TAGCCT'rCCA
TGAAGACCCC
AGAGAGTAGC
GTTACTGGAA
GTGTGC-AGC-A
GAAGAGPATAA
CTCATGGAAG
ACAAGTAAAA
TCTTTTACTA
AGAGAAGAAA
AAAGATCTCA
AGTATTTCAT
GTTAGCACTC
TTTGAAAACC
AGAAAAAAAJ
GTAAAGAACC
GACATGACAG
AGTGTTCAAA
AAGAAGAGAA
TGTTAAGTGG
TGGTACCTG
TAGGGAAGGC
CCAAGGGACT
ATCCATTGGG
AACTTGATGC
TGTTTTCAAA
:CCTAAAGAA
GAAAGAATGA
rGGTTGGTCA ACTAGTAGTC AGTAGAAATC
TAAGCCCACC
GGAACTCGAA
GTCTTCTACC
TAATTGTACT
GtAC6ACCAA
TGCAACTGGA
CGATAcCTT
TACCAGTA
ACTAGAAACA
TCTGCTTCA
TAAATATCC
AGGCATArC
GAATTGCAAA
ATGCCAGTCA
GCCAAGAAGA
CCAGAGCTGA
CTTAAAGAAT
GTTAAAGTGT
1560 1620 1680 1740, 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 i~ I'GACACAGAA
GGCTTTAAGT
CATAGAAATG GAAGAAAGTG kAAGcGcCdAG TC-ATTTGcTc k.TTCTCTGCC CACTCTGGGT kCAAAAGGAA
GAAAATCA
E'ATCACTGCA
GGCTI'TCCTG
7'AGTATCAAA
GGAGGCTCTA
LCTCATTACT CCAAATAAAC AGAAAGGGT'r
TTGCAAACTG
TACTGAT'rAT GGCACTCAGG AAAAACAGAA
CCAAATAAAT
AATTCATGGT TGTTCCAAAG ACATGAAGTT AACCACAGTC TCAGTATTTIG CAGAATACAT TCCAdGAAAT
GCAGAAGAGG
ACAAAGTCCA
AAAGTCACTT
GTCTAATATC
AAGCCTGTAC
GAAAGATAAG CCAGTTGATA GGTTTTGTCT ATCATCTCAQ ATGGACTTTrT ACAAAACC
TTCAGAGGCA
CACCACTTrTr TCCCATCAAG TCATTTGTTA AAACTAAATG TAAGAAAT
CTGCTAGAGG
3120 AAAACTTTGA GGAACATTCA
ATGTCACCTG
GTACAGTGAG CACAATTAGC
CGTAATAACA
CAAGCAATAT TAATGAAGTA GGTTCCAGTA
AAAGAGAAAT
TTAGAGAAA
CTAATGALAGT
GGGAAATGAG AACATTCCAA TGTTTTTA
GAAGCCAGCT
GGGCTCCAGT ATTA.ATGAA 3.180 3240 TAGGTTCCAG TGATGAAAAC
ATTCAAGCAG
ATGCTATC
GTLAATTGTAj ATACAGAT'r
ATGCATCTC;
AAGATACTAC
TCCAGA)AG
GTTACCGAAG
AAGAGCTTC.C
C.
TATCATTGAA.
AGGAACATCA
GTGAATTGGA
CCAAACAAAT
TTTCAGATGA
TGGATTCAA
ACTGCTCAG
AACATAACCT
ATGGGAGCCA
ACCTGCGAAA
T' TAGATTAGG
~GCATCCTGA
LCTCTCC-ATA!
GGTTTGTTCI'
TTTTGCTS2U
~AGAGCTTAGC
AGGGGCCAAC
CTGCTTCCAJ
TAGCACCGT'I
GAATAGCTTA
CCTTAGTGAG
AGACTTGACT
GAGGCATCAG
TGAAGAAAGA
CTTAGGTGAA
GCTATCCTC-T
GATAA.AGCTC
GCCTTCTAAC
TCCAGAACA
GAGACACCTG
ATGACCTGT
k. AATGACATTA
AAATTAGAGT
LCACTTGTTAT
GCTACCGAGT
AATGACTGcA
GAAACAAAAT
GCAAATACAA
TCTQAAAGCc
GGAACGGGCT
GCAGCATCTG
CAGAGTGACA
CAGCAGGAAA
AGCTACCCT'r
AGCACATCAG
AATCCAGAAG
AATAAAGAAC
TGGTACATGC
AGGAAAGTC
GCCCTTTCAC
CCTCAGAAGA
TTGGTAAAGT
GTCTGTCTAA
GTAACCAGGT
GTTCTGCTAG
ACACCCAGGA
AGGGAGTTGG
TGGAAGAAAA
GGTGTGAGAG
TTlTTAACCAC TGGCTGA6ACT
CCATCATAAG
AAAAAGCZAGT
GCCTTTCTGC
CAGGAGTGGA
I
ACAGTTGCC
GTT'ITGCAAC
ATAAAAAAGC
CTGAT'rTCAG
AACTAGGTAG
CTGAGGTCTA
AAGAATATGA
ATAACTTAGA
AA6ACAGAGGG
.TAAACAAAGT
AGAAGTAG'TT
ACAGcCTATG
AGATGATGGT
TGCTGTTT
CC-ATfiCACAT
GAACTTATCT
AAACAATATA
GAACACAGAG
A.ATATTGGCA
:TTGTTTTCT
rcCTTTCTG2 E'CTGAGTGAC2
I'AATCAAGAAC
M'AAACAAGC
"~CAGCAGAGG
LCGA&GCTGTG
'GACTCTTCT
LTTAACTTCAC
IGACAAGT'TG
AGGTC-ATCC c G3GGAGTCTT
CCAAAATTGA
CTTCCTGGAA
CAGACTGTTA
GGAAGTAGTC
GAAATAAAGG
AGCAAAAGCG
TTGGCTCAGG
AGTGAGG'ATG'
c CTtCTCAGT
GAGAATTTAT
kAGGC-AT CTC
VCACAGTGCA'
CTTGGTTCTT
LAGGAATTGG
;AGCAAAGCA
YTCTCTGAAG
LZLTACCATGC
TAGAACAGC
CCCTTGAGG
AGAAAAGTA
AGGTGTCTG
CI'CTAAAT
AGAATAGIAA
3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320- 4380 4440 4500 4620 4680 4- .4w.
GTGAATACCC TATAAGCCAG
CAGATAGTTC
GCCCATCATT
TACCAGTAAA
AGATGATAGG
ACTACCCATC TCAAGAGGA.G CTCATTAAGG TTGTTGATGT GGAGGAGCAA~
CAGCTGGAAG
4740: 0 AGTCTGGGCC ACACGATTTG ACGGAAACAT
CTTAC~
CCCCTTACCT .GGAATCTGGA ATCAGCCTCT
"TCTCTC
AAGACAGAGC cC-CAGAGrCA"GCTCGTGTTG
GCAAC
AAGTTCCCCA ATTGAAAGTTz GCAGAATCTG
CCC-AGA
ATACTGCTG GTATAATGCA ATGGAAGWA
GTGTG
CTTCAACAA AAGGGTCAAC AAAAGAATGT
CCATGG
AATTTATGCT CGTGTACAAG ?1VrGCcAGAA
AACAC~
CTGAAGAGAC TACTCATGTT GTTATGAA CAGATG4 TGAAATATIT TCTAGGAATr GCGGGAGGAA.
AATGGG
AGTCTAT'I'A AGAAAGAAA ATGCTGAATG
AGCATGJ
TCAATGGAAG AAACCAcCAA. GGTCCAAAGC
GAGCAAC
TCAGGGGGCT AGAAATCTGT TGCTATGGGC
CCTTCAC
AATGGATGGT ACAGCTGTGT GGTGCTTCTG
TGGTGAA
GCACAGGTGT CCACCCAr'r GTGGTTGTC
AGCCAGA
TCCATGCAAT TGGGCAGATG TGTGAGGCAC
CTGTGGT
GTGTAGCACT CTACCAGG CAGGAGCTGG
ACACCTA
GCCACTACTG
A
INFORMATION FOR SEQ ID NO:.4:.
SEQUENCE
CHARACTERISTICS:
LENGTH: 1863 amino acids TYPE:--amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant MOLECULE TYPE: protein (vi) ORIGINAL
SOURCE:
ORGANISM: Homo sapiens STRAIN: BRCA1 (viii) POSITION IN GENOME: CHROMOSOME/SEGEN: 17 MAP POSITION: 17q21 L'TGCC AAGGC!AAGAT
CTAGAGGA
ATGA CCCTGAATCT
GATCCTTCTG
ITACC ATCTTCAACC
TC!TGCATTGA
GTCC AGCTGCTGCT
CATACTACTG
GCAG GGAGA.AGcr-A
GAATTGACAG
TGTGTCTGGCCTG
ACCCCAGAAG
XCAT CACTTTAACT
AATCTAATTA
:TGA GTTrGTGTGT
GAACGGACAC
rAGT TAGCTATT
TGGGTGACCC
']TT GAAOTCAA
GGAGATGTG
GAATCCCAGGAC
AGAAAGATCT
CAA CATGCCCACA
GATCAACTGG
~GAGCTTTCATCA
TTCACCCTTG
.TCCTGGACAGAG
GACAATGGCT
GAC CCGAGAGTGG
GTGTTGGACA
CCT GATACCCCAG ATCCCCCACA 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5711 (xi) Met Ala Giu Leu Lys Gin Thr Asn SEQUENCE DESCRIPTION: SEQ ID NO:4: Asp Leu Ser Ala Leu Arg Val 'Glu Giu Val Gin 10 Met Gin Lys Ile Leu Glu Cys pro le Cys Leo 25 Pro Vai Ser Thr Lys Cys Asp His Ile Ph~e Cys 40- LYS Leu Leo Asn Gin Lys Lys Gly Pro Ser Gin 55 60 Asn Asp Ile Thx Lys Arg Ser Leu Gin Glu Ser 70 75 Leu Vai. Giu Glu Leu Leu Lys Ile, Ile -Ala 90 Gly Leo Glu Tyr Ala Asn Ser Tyr Asn Phe Ala 100 .105 Ser Pro Giu His Leu Lys Asp Giu Vai Ser Ile Asn Giu Lys Cys Thr Phe Lys VaJ.
Leu Phe Pro Arg Gin Lys 110 l lis Cys Leu Phe Leu Glu aAsn Lys Met Cys Ser Asp Asn .115 lie Gin 5er Met 125 .i Gly Tyr Arg Asn Arg Pro 145 Thr Ser Lys Pro Ala 225 Pro Hi s 130 Ser Val Val Ala Gin 210 Cys Ser Leo A~rg .Tyr Thr 195 Gly Glu Asn, Gin Glu Thr Leo 165 le Gbu 180 Tyr.-Cye Thr Arg Phe Ser Asn Asp 245 Ala Lys 135 Thr Ser 150 Arg Thr Leo Gly Ser Val Asp- Giu 215.
Glu Thr 230 Leo Asn Gin Gly Arg Leo Leu Gin Ser Glu Pro Leu Lys Ser Gly 20 0 Ile Asp Thr Ser Ser Val Gin Gin Asp 185 Asp Ser Val Thr Ser Arg 170 Ser Gin Leu Thr Glu 250 Val 155 Ile Ser Glb Asp Asn 235 Lys2 140 Leo Ser Gin Pro Gio Asp.
Leu Leo 205 Ser Ala 220 Th~r Gbu !.rg Ala Asn Gin Thr 190- Gin
LYS
His Ala .Gl Asn Leo Gly 160 Lys Thr 175 Val Asn le Thr Lys Ala His Gin 240 Glu Arg 255 Pro Glu Lys Tyr 260 Ser Asri Leo His Val Glu 270' Pro Cyc Gly Thr Asn Thr His Ala Ser 61 Ser Leo G-i His Glu Asn Ser Ser Leu 290 Cys Asn 305 Trp. Ala Giu Lys Trp Asn Asp Val 370 Trp Phe.
385 Gly Giu Asn Giu Let Lys Lys 355 Pro.
Ser Ser Tai iLeu Thr Lys Ser Lys Gin 310 Ser Lys Giu 325 Val Asp Leu, 340 Gin Lys Leu Trp 116 Thr A r; Ser Asp 390 Giu Ser Asn 405 Asp Giu Tyr 420 Asp 295 Pro Thr Asn Pro Leu 375 Giu Al1a 280 Arg Met Gly Lev Cys Asn Ala Asp 345 Cys Ser 360 Asn Ser Leu Leu Lys Val Gly Ser Asn Val Giu 300 Ala Arg Ser 315 Asp Ajg Ax; 330.
Pro Lou Cys Glu Asn Pro Ser2Zi-eGn -380 Gly Ser Asp 395 Ala Asp Val 410 Ser Giu Lys 281 Lys Gin Thr Giu Arg 365 Lye As~p Leu lie Ala Giu Phe His Asn Ar; 320 Pro Ser Thr 335 Ar; Lys Giu 350 Asp Thr Giu Val Asn Giu Ser His Asp 400 Asp Val Leui .415 Asp Leu Leu 430
V
V
V
V.
V.
a V.
425 Ala Ser Ser Lys 450 Tyr Ar; 465.
Leu Ile Pro Leu His Pro Pro Giu 530 Val met 545 Asp 435 Ser Lys Ile Thr Giu 515 met Asn Pro His Giu Ala Leu Ile Cys Lys Ser Giu Val. Giu Ser Lys Ala Ser 470 Gly-Aia Phe 485 Asn Lye Leu 500 Asp Phe Ile Ile Asn Gin Ile Thr Asn 550 Asn 455 Leu Val Lys Lys Gly 535 Ser 440 Ile Pro Thr Ax; Lys 520 Gl Giu Asn Giu
LYS
505 Ala Asn Asp Leu Pro 490 Ar; Asp Gin 445 Lye Ile Phe -460 Ser His Val 475 Gin Ile Ile Axg Plo Thr Leu Ala Val 525 Thl Giu Gin 540 Asn Lys Thr Ar; Gly Thr Gin.
Ser 510 Gin Asri Lys Val His Lys Thr Giu Asn 480 'Giu Ar; 495.
Gly Lou LYS Thr Gly Gin Gly Asp 555 Ser Ile Gin Aen Giu Lye Asn Pro Asn 62 Pro Ile Giu Ser Leu Giu Lye Giu Asn Asn Leu 625 Ile Gin Giu Ser 56 Ser. Ala Phe Ly 580 Met Glu Leu Gil 595 Arg Leu Ar; Ar 610 Val Val Ser Ar~ Asp Ser Cys Sex 645 Met Pro Val Arg 660 Pro Ala Thr Gly 675 Lys Arg His Asp 690 Thr Leu.
LYS
Asn 630 Ser His Ala, Ser Lys Asn 5cr 615 Leu Scr Ser Lys Ala le 600 Scr Ser Giu Ar; Lys 680 Thr 570 Glu Pro Ile Ser Scr 585 His -Asn Ser Lys Ala 605 Thr Arg His le His 620 Pro Pro Asn Cys Thr 635 Giu le Lys Lye. Lys 650 Asn Leu Gin Leu Met 665 Ser Asn Lys Pro Asn 685 Phe Pro Giu Leu Lys Ser 590 Pro Ala Giu Lys Giu 670 Glu 575 Ile Lye Leu Leu Tyr 655 Gly Gin Ser Lye Glu Gin 640 Asn Lys Thr k.sn 695.
700 Th A *AA A
*AAA..
A
A
A A.
A
A
Ala 705 Phe Thr Ser le Leu 785 Cye Gly Leu Pro Gly Ser Phe -Thr Lys Cys Ser Asn Th~r 5cr Glu Leu Lys Glu Val Val Giy Ser 770 Leu Val Cye Gly Asn Lye Giu 755 Leu Giu Ser Scr His *Pro Ser 725 Val Ser 740 Ar; Val Val Pro Val Ser- Gin Cye 805 Lye Asp 820 Giu Val 710 Leu Pro Ar; Asn Asn Ala Leu Gin Thr 760 Gly Thr Asp 775 -Thr Leu Gly 790 Ala Ala Phe Asn.Ar; Asn Asn His Ser 840 715 720 Giu Giu 745 Glu Tyr Lys Glu Asp 825 Arg Glu 730 Asp Ar; Gly Ala Asn 810 Thr Giu Lye Giu Glu Pro Lys Asp Ser Val Glu 765 Thr Gin Giu 780 Li'e Thr- Giu 795 Pro Lye Gly Glu Gly Phe Thr 5cr Ile 845 Lys Leu 750 5cr Ser Pro Leu
LYS
830 Glu Leu 735 Met Ser le le 815 Met Giu Leu Ser Ser Lye His Pro Glu Giu Ser 850 Glu Leu Asp Ala Gin .855 Tyr Leu Gin Asn Thr 860 Phe Lys Val Ser Lys Arg Gin Ser Phe Ala Leu Phe Ser Asn Pro Gly Asn Ala Giu Glu 865 870 875. 880 Giu Cys Ala Thr Phe Ser Ala His Ser Giy Ser Leu Lys Lys Gin Ser 885 890 895 Pro Lys Val. Th~r Phe Glu Cys Glu Gin Lys Glu Glu Asn Gin Gly Lys 900 905 91.0 Asn Giu Ser Asn le' Lys Pro Val Gin Thr Val Asn Ile Thr Ala Gly 9215 920 925 Phe Pro Val Val Gly Gin Lys Asp Lys Pro Vai Asp Asn -Ala Lys Cys 930 935 9.40 Ser le Lys.Gly Gly Ser Arg Phe Cys Leu.Ser Ser Gin -Phe Azg Gly 945 950 955 960 Asn Glu'Thr Gly Leu Ile Thr Pro Asn Lys His Giy Leu Leu Gin Asn 965' 970 975 Pro Tyr Arg Ile Pro Pro Leu Phe Pro Ile Lys Ser Phe Val. Lys Tlhr 980 985 .990 Lys Cys Lys Lys Asn'Leu Leu Giu Giu Asn Phe Glu Giu His Ser Met 995 1000 :1005: Ser Pro Giu Arg Giu Met Gly Asn Giu Asn Ile Pro Ser Thir Val Ser 1010 lois 1020 :Thr Ile Ser Arg Asn Asn Ile Arg Giu Asn Vai Phe Lys Glu Ala Ser 1025 1030 1035 1040 Ser Ser As Ile Asn Giu Vai Giy Ser Ser Thr Asn Glu Val Gly Ser *91045 1050 1055 .Ser Ile Asn Giu Ile Gly Ser Ser Asp Giu Asn Ile Gin Ala Glu Leu 1060 1065 1070 SGly Arg Asn Arg Gly Pro Lys Leu Asn Ala Met- -Leu Arg Leu Giy Val .1075 1080 -1-08 Leu Gin Pro Giu Val Tyr Lys Gin Ser Leu Pro'Gly Ser Asn Cys Lys 9.1090 1095 1100 His Pro Glu Ile Lys Lys Gin Giu Tyr Giu Glu Val Val Gin Thr Val 1105 1110 1115 1120 Asn Thr Asp Phe Ser Pro Tyr Leu Ile Ser Asp Asn Leu Giu Gin Pro 1125 1130 1135 Met Gly Ser Ser His Ala Ser Gin Vai Cys Ser Glu Thr Pro Asp Asp 1140 1145 1150 Leu Leu Asp Asp Gly Giu Ile Lys Giu Asp Thr Ser Phe Ala Giu Asn 11551160 1165 As.p Ile Lys Giu Ser Ser Ala *Val Phe Ser Lys Ser Val Gin Lys Gly 1170 1175 1180 Glu Leu Ser Arg Ser Pro Ser Pro Phe Thr His Thr His Leu Ala Gin 1185 1190 1195 1200 Gly Tyr Arg Arg Gly Ala Lye Lys Leu Glu- Ser Ser Glu Giu Asn Leu 1205 1210 1215 Ser Ser Giu Asp Glu Giu Leu Pro Cys Phe Gin His Leu Leu Phe Gly' 1220 1225 .1230 Lys Val Asn Asn Ile Pro Ser Gin Ser Thr Ar g His Ser Th~r Val Ala 1235 1240 1245 Thr Giu.Cys Leu Ser Lye Asn Thr Giu Glu:L&n-Leu Leu Ser Leu Lys 1250 1255 :-1260 Asn Ser Leu Asn Asp Cys Ser Asn Gin Val le Leu Ala Lys Ala Ser 1265 1270 1275 1280 Gin Giu His His Leu S'er Giu Giu Thr Lys Cys Ser Ala Se~r Leu Phe 1285 1290 1295 0 Ser Ser Gin Cys Ser Glu Leu Glu Asp Leu Thr Ala Asn Thr Asn Th-r 1300 1305 1310 :006 Gin Asp Pro Phe Leu Ile Gly Ser Ser Lye Gin Met Arg His Gin Ser **1315 1320 1325 Giu Ser Gin Gly Val Gly Leu Ser Asp Lys Giu, Leu Val Ser Asp Asp 1330 1335 1340 Giu Giu Arg Gly Thr Gly Leu Glu Glu Asn Asn Gin Glu Glu Gin Ser I..1345 1350 1355 1360 *Met Asp Ser Asn-Leu Gly Giu Ala Ala Ser Gly Cys Giu Ser Giu Thr 1365 1370 -1375 Ser Val Ser Giu Asp Cys Ser Gly Leu Ser Ser -in Ser Asp le Leu :1380 1385 1390 Thr Thr Gin Gin Arg Asp Thr Met Gin His Asn Leu le Lye Leu Gin .**1395 1400 1405 Gin Giu Met Ala Giu Leu Glu Ala Val Leu Giu Gin His Gly Ser Gin 1410 1415 1420 Pro Ser Asn Ser Tyr Pro Ser le Ile Ser Asp Ser Ser Ala Leu Glu 1425 1430 1435 1440 Asp Leu Arg Asn Pro Glu Gin Ser Thr Ser Glu Lye Ala Val Leu Thr 1445 1450 1455 Ser Gin Lys Ser Ser Glu Tyr Pro Ile Ser Gin Asn Pro Glu Gly Leu 1460 1465 1470 Ser Ala Asp Lys Phe Glu Val Ser Ala Asp Ser Ser Thr Ser Lys Asn 1475 1480 1485 Lys Giu Pro Gly Val Giu Arg Ser Ser Pro Sedr Lys Cys Pro Ser Leu 1490 1495 1500 Asp Asp Arg Trp Tyr Met His Ser Cys Ser Gly Ser Leu Gin Asn Arg 1505 1510 1515 1520 Asn Tyr Pro Ser Gin Giu Giu Leu Ile Lys Val Val Asp Val Glu Glu 1525 1530 1535 Gin Gin Leu Glu Glu Ser Gly Pro His Asp Leu Thr Glu. Thr Ser Tyr 1540 154S 1550 Leu Pro Arg Gin Asp Leu Glu Gly Thr Pro Tyr Leu Glu Ser Gly Ile 1555 1560 1565 Ser Leu Phe Ser Asp Asp Pro Glu Ser Asp Pro Ser Glu Asp Arg Ala 1570 1575 1580 Pro Giu Ser Ala Arg Val Gly Asn Ile. Pro Ser SerThrSer Ala Leu 1585 1590 1595 1600 Lys Val Pro Gin Leu Lys Val Ala Giu Ser Ala Gin Ser Pro Ala Ala 1605 1610 1615 *0e 6 Ala His Thr Thr Asp Thr Ala Gly Tyr Asn Ala Met Glu Giu Ser Val 1620 1625 1630 Ser Xrg Giu. Lys Pro Glu Leu Thr Ala Ser Thr Giu Arg Val Asn Lys 1635 1640 1645 Arg Met Ser Met Val Val Ser Gly Leu-Thr Pro Giu Glu Phe Met Leu :1650 1655 1660 Val Tyr Lys Phe Ala.Arg Lys His His Ile Thar Leu--Thr Asn Leu Ile 1665 1670 1675-- 1680 Thr Gu Glu Thr Thr His Val Val Met Lys Thr Asp Ala Glu Phe Val *00 **1685 1690 1695 Cys Giu Arg Thr Leu Lys Tyr Phe Leu Gly Ile Ala Gly Giy Lys Trp 1700 1705 1710 Val Val Ser Tyr Phe Trp Val Thx Gin Ser Ile Lys Glu Arg Lye Met 1715 1720 1725 Leu Asn Glu His Asp Phe Glu Val Arg Gly Asp Val Val Asn Giy Arg 1730 1735 1740 Asn His Gln Gly Pro Lys Arg Ala Arg Glu Ser Gln Asp Arg Lys Ile 1745 1750 1755 1760 Phe Arg Gly Leu Glu Ile Cys Cys Tyr Gly Pro Phe Thr Asn Met Pro 1765 1770 1775 Thr Asp. Gn Leu Glu Trp Met Val Gin Leu Cys Gly Ala Ser Val Val 1780 1785 1790 Lys Glu Leu Ser Ser Phe Thr Leu Gly Thr Gly Val His Pro Ile Val 1795 1800 1805 Val Val G1n Pro Asp Ala Trp Thr Glu Asp Asn Gly Phe His Ala Ile 1810 1815 1820 Gly Gln Met Cys Glu Ala Pro Val Val Thr Arg Glu Trp Val Leu Asp 1825 1830 1835 1840 Ser Val Ala Leu Tyr Gln Cys Gln Glu Leu AspVThr Tyr Leu Ile Pro 1845 1850 1855 Gin Ile Pro His Ser His Tyr 1860 owoeD oeee °eeo INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 5711 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (vi) ORIGINAL SOURCE: ORGANISM: Homo sapiens STRAINIr BRCAI (viii) POSITION IN GENOME: CHROMOSOME/SEGMENT: 17 MAP POSITION: 17q21 (xi) SEQUENCE DESCRIPTION: SEQ ID AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA
TAACTGGGCC
CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT
ATGCAGAAAA
67 TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGAcC ACATATT'rrG CAAATTT1"rCcATGCTGAAAC TTCTCAACCA GAAGAAGG
CCTTCACAGT
GTCCTTTATG TAAGAATGAT ATAACCAAAIA
GGAGCCTACA
AGAAAGTACG
AACTTGTTGA AGAGCTATTG AAAATCATI'T GTGCTT'iA
GCTTGACACA
ATGCAAACA.G
AAGTTTCTAT
AACCCGAAAA
CTGTGAGAAC
~AATTGGGATC
ATCAAGAATTr
CAAAAAAGGC
CCAGTAATAA
ATCAGGGTAG
GCTCAkTTACA
AGGCTGAATT
GGGCTGGAAG
ATCTGAATGC
CAGAGAATCC
AAGTTAATGAC
GGGAGTCTGA
AATATTCTGG
TATGTAAAAG *TTGGGAAAA C C
TAATTATAGG
AAT'rAAAGCG
CAGATTTGGCA
AGAATGGTCAA
CTATAATTTI' GCA.AAAAAGC CATCCAAAGT ATGGGCTACA TCCTTCCTTG CAGGAAACCA TCTGAGGACA AAGcAGCGA TGATTCTTCT GAAGATACCG GTTACAAATC ACCCCTCAAG TGCTTGTGAA
TTTTCTGAGA
TGATTTGAAC
ACCACTGAGA
TTCTGTTTCA AACTTGCATG G3CATGAGAAC AGCAGTTTAT CTGTAATAAA AGCAAACAGC TAAGGAAACA TGTAATGATA E'GATCCCCTG TGTGAGAGAA ['AGAGATACT GAAGATGTTC ;TGGTTTCC AGAAGTGATG LTCAAATGCC
AAAGTAGCTG
'TCTTCAGAG AAAATAGACT .r=AAAGAGTT
CACTCCAAAT
TATCGGAAG
AAGGCAAGCC
GAAACCGMG
GTCTCAGTGj
TACAACCTCA
TAATAAGGC
GAACCAGGGA
CGGATGTAAC
AGCGTGC-AGC
TGGAGCCATG
TACTCACTA
CTGGCTTAGC
GGCGGACTCC-
AAGAATGGAA
CTTGGATAAC
*CAAA.AGACTT
CCAACTCTCT
AAAGACGTCT
AATTATTGC
TGAAATCAGT
AAATACTGAA
TGAGAGGCAT
TGGCACAAAT
AGACAGAATG
AAGGAGCCAA
CAGCACAGAA,
TAAGCAGAAAi
ACTAAATAGC,
AAAATAACTC TCCTGAACAT
AGATTTAGTC
GGTTTGGAGT
CTAAAAGATG
CTACAGAGTG
AACCTTGGAA
GTCTACATrG
AGTGTGGGAG
TTGG.ATTCTG
CATCATCAAC
CCAGAAAAGT
PICTCATGCCA
kCATGTAGAAA
:ATAACAGAT
LkAAAGGTAG
'TGCCATGCT
=GATTCAGA
I'CACATGATG
3AGGTAGATG 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 16.20 1680 1740 1800 AACTGTTAGG
TTCTGATGAC
ATGTATTGGA
CGTTCTAAA.T
,GCATTTGTT
'AAAAGGAGA
.GTTCAAAAG
GTGATGAAT
ACTGAGCCAC
CCTACATCAG
ACTCCTGAAA
ATTACTAATA
TAcTGGcr-AG TGATCeTCAT GAGGc.NTrAA CAGTAGAGAG TAATATTGAA GACAAAATAT TCCCCAACTT AAGCCATGTA
ACTGAAAATC
AGATAATACA AGAGCGTCCC
CTCACAAATA
GCCTTCATCC TGAGGATTT
ATCAAGAAAG
TGATAAATCA GGACTAC
CAAACGGAGC
GTGGTCATGA GAATAAAACA.
AAAGGTGATT
CTATTCAGAA TGAGAAAAAT CCTAACCCAA TAGAATCACT CGAAAAGA TCTGCTTTCA 1860
AAACGAAAGC
ACAATTCAAA
TGAACCTATA AGCAGCAGTA AGCACCTAAA AAGAATAGGC
TAAGCAATAT
TGAGGAGGAA
TAAGCCCACC
AGAAAAAAAA
ATGCGCTTGA
ACTAGTAGTC
TTGATAGTTG
GGCACAGCAG
GTAAcAAGc
AGTTAACAAA
-tTGTCAATCC
CTAATAATGC
AAAGATCTGT
AAAGTATCTC
GTGTGAGTCA
ATAATAGAAA
GGGAAACAAG
TCAAGGTTTC
AATGTGCAAC
TTGAATGTGA
AGACAGTTAA
ATGCCAAATG
ACGAAACTGG
CACC-ACTTTT
4 AAAACTTTGA
TTCTAGCAGT
AAACCTACAA
AAATGAACAG
TGCACCTIGGT
TAGCCTTCCA
TGAAGACCCC
AGAGAGTAGC
GTTACTGGAA
GTGTGCAGCA
TGACACAGAA
CATAGAAATG.
A.AAGCGCCAG
ATTCTCTGCC
ACAAAAGGAA
TATCACTGCA
TAGTATCAAA
ACTCATTACT
TCCCATCAAG
GGAACATTCA
CTCATGGAAG GTAAAGAACC
ACAAGTAAAA
TCTTTTACTA
AGAGAAGAAA
AAAGATCTCA
AGTATTTCAC
GTTAGCACTC
TTTGAAA.ACC
GGCTTrTAAGT
GAAGAAAGTG
TCATTTGCTC
CACTCTGGGT
GAAAATCAAG
GGCTTTCCTG
3GAGGCTCTA
CAAATAA.AC
I'CA'FrTGTTA k.TGTCACCTG
.GTAATAACA
3GTTCCAGTA kTTCAAGCAG
;TTTTGCAAC
GACATGACAG
AGTGTTCAAA
AAGAAGAGAA
TGTTAAGTGG
TGGTACCTGG
TAGGGAAGGC
CCAAGGGACT
ATCCAT'rGGG
AACTTGATGC
TGTTTTCAAA
AGTAGAAATC
GAAGAGATAA
GGAACTCGAA
GTCTTCTAcC
TAATTGTACT
GTACAACCAA
TGCAACTGGA
TGATACTTTC
TACCAGTA
ACACUAACA
AGAAAGGGT
TACTGAT'rAT
AAAAACAGAA
AATTCATGGT
kCATGAAGTT rcAGTATTTG
PCCAGGAAAT
TTAAATATCC
AGGCATAT'rC
GAATTGCAAA
ATGCCAGTCA
GCCAAGAAGA
CCAGAGCTGA
CTTAAAGAAT
GTTAAAGTGT
TTGCAAACTG
GGCACTCAGG
CCAAATAAAT
T'GTTCCAAAG'
A~ACCACAGTC
CAGAATACAT
GCAGALAGAGG
A.AAGTCACTT
AAGCCTGTAC
,CAGTTGATA
rTlCAGAGGCA rATCGTATAC
:TGCTAGAGG
kACATTCCAA 3GAGCCAGCT 1920 1980 2040 2100 2160 2220.
2280 2340 2400 2460 25'20 2580.
2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 CCTTAAAGAA AcAAAGTCCA GAAAGAATGA GTCTAATATC TGGT'rGGTCA
GG?T'GTCT
ATGGACTT
AAACTAAATG
AAAGAGAAAT
TTAGAGAAAA
CTAATGAAGT
AACTAGGTAG
CTGAGGTCTA
GAAAGATAAG
ATCATCTCAG
ACAAAAeccA
TAAGAAAAAT
GGGAAATGAG
TGTTTTTAAA
GTACAGTGAG CACAATTAGc
CAAGCAATAT
TAGGTTCCAG
ATGCTATGCT
TAATGAAGTA
TGATGAAAAC
TAGATTAGGG
GGGCTCCAGT ATTAATGAAA AAACAGAGGG CCAAAATGA TAAACAA.AGT CTTCCTGGAA GTAATTGTA GCATCCTGA.A ATAAAAG ATACAGATTT CTCTCCATAT
CTGATTTCAG
ATGCATCTCA GGTTTGTTCT
GAGACACCTG
AAGATACTAG TTTTGCTGAA
AATGACAT
TCCAGAGAGG AGAGCTTAGC
AGGAGTCCTA
GTTACCGAAG
AAGAGCTTCC
CTACTAGGCA
'ATCATTGAA
AGGAACATCA
GTGAATTGGA
CCAAACAAAT
TTTICAGATGA
TGGATTCAAA
ACTGCTCAGG
AACAT.AACCT
ATGGGAGCCA
ACCTGCGAAA
GTGAATACCC
CAGATAGTTC
GCCCATCATT2
ACTACCCATC
AGTCTGGGCC
CCCCTTACCTC
AAGACAGAGC
C
AGGGGCCAAG
CTGCTTCCAA
TAGCACCGTT
GAATAGCTTA
CCTTAGTGAG
AGACTTGACT
GAGGCATCAG
TGAAGAAAGA
CTTAGGTGAA
GCTATCCTCT
GATAAAGCTC
GCCT'rCTAAC
AAATTAGAGT
CACTTGTTAT
GCTACCGAGT.
AATGACTGCA
GAAACAAAAT
GCAAATACAA
TCTGAAAGCC
GGAACGGGCT
GCAGCATCTG
CAGAGTGACA
CAGCAGGAAA
A.GCTACCCTT
AAGAATATGA
AGAAGTAGTT
ATAACTTAGA ACAGCdTATG ATGACCTGTT
AGATGATGGT
AGGAAAGTTC TGCTGT'r'r'r GCCCTTTCAC
CCATACACAT
CCTCAGAAGA
GAACTATCT
TTGGTAAAGT
AAACAATATA
GTCTGTCTMA
GAACACAGAG
GTAACCAGGT
A-ATTGGCA.
GTTCTGCTAG
CTTGTTTTCT
ACACCCAGGA,
TCCTTTCTTG
AGGGAGTTGG,
TCTGAGTGAC
TGGAAGAAA 'TJAATCAI AGAA GGTGTGAGAG
TGAAACA.AGC
TTTI'AACCAC TCAGCAGAG TGGCTGAACT AGAAGCTGTG
CAGACTGTTA
GGAAGTAGTC
GAAATAAAGG
AGCAAAAGCG
TTGGCTCAGG
AGT.APGGATG
CCTTCTCAGT
GPLGAATI'TAT
AAGGCATCTC
TCACAGTGCA
ATTGGTTCTT
AAGGAATTGG
7AGCAAAGCA
-TCTCTGAAG
;ATACCATGC
'AGAACAGC
;CCCTrGAGG
:AGAAAAGTA
aGGTGTCTG
CTTCTKAAT
AGAATAGAA
AGCTGGAAG
TAGAGGGAA
ATCCTTCTG
CTGCATTGA
3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980
CCATCATAAG
TCCAGAACAA AGCAcATcAG TATAAGCCAG AATCCAGAAG I'ACCAGTAAA AATAAAGAAC k.GATGATAGG
TGGTACATGC
M'AAGAGGAG
CTCATTAAGG
LCACGATTTG
ACGGAAACAT
;GAATCTGGA ATCAGCCTCT CCAGAGTC-A GcTcGTGTTG
AAAAAGCAGT
GCCTCTGC.
CAGGAGTGGA
ACAGTTGCTc
TTGTTGATGT
CTTACTTGCC
TCTCTGATGA
GCAACATACC
CCCAGGGTCC
TGACTcTTCT
ATTAACTTCA
TGACAAG'rIT
AAGGTCATCC
TGGGAGTCTT
GGAGGAGCAA
AAGGCAAGAT
CCCTGAATCT
ATCTTCAACC
c c
G
AAGTTCCCr-A ATTGAAAGTT GCAGAATC AGCTGCTGCT CATACTAcTG ATACTGCTGG GTATAATGCA ATGGAAGAAA, GTGTr.GrCAG GGGAGC
GAATTGACAG
5040 CTTCAACAGA AAGGGTCAAC
AAAAGAATGT
AATTTATGCT CGTGTACAAG
TTGCCAGAA
CCATGGTGGT *GTCTGGCCTG
ACCCCAGAAG
AACACCACAT CACTTTAACT
AATCTAATTA
5100 CTGAAGAGAC TACTCATGTT GTTATGAAA CAGATG TGAAATATTT TCTAGGAATT GCGGGAGGA
AATGGG~
AGTCTATAA AGAAGAA ATGCTGAATG
AGCATGJ
TCAATGGAAG AAACCACCAA GGTCCAAAGC
GAGCAAC
TCAGGGGGT AGAAATCTGT TGCTATGGGC
CCPCAC
AATGGATGGT ACAGCTGTGT GGTGCTTCTG
TGGTGA.A
GCACAGGTGT. CCACCCAATT GTGGTTGTGC
AGCCAGA
TCCATGCAAT TGGGCAGATG TGTGAGGCAC
CTGTGGT
GTGTAGCACT ,CTACCAGTGC CAGGAGCTGG
ACACCTA
GCCACTACTG
A,
INFORMATION FOR SEQ ID NO: 6: SEQUENCE CHARACTERISTICS: LENGTH: 1863 amino acids TYPE: amino acid STRANDEDNESS: not relevant TOPOLOGY: not relevant (ii) MOLECULE TYPE: protein (vi) ORIGINAL SOURCE: ORGANISM: Homo sapiens STRAIN: BRCA1 (viii) POSITION IN GENOME: CHROMOSOMELSEGMENT: 17 MAP POSITION: 17q21
TGA
VAGT
AA
GAC
CCT
GTTTGTGTGT GAACGGACAC 5220 TAGCTATTTC TGGGTGACCC 5280 TGAAGTCAGA GGAGATGG 5340 ATCCCAGGAC AGAA.AGATCT 5400 CATGCCCACA GATCAACTGG 5460 GCTrTCATCA TTCACCCTTG 5520 CTGGAeAGAG GACAATGGCT 5580 CCGAGAGTGG GTGTTGGACA 5640 GATACCCC-AG ATCCCCCACA 5700o (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: Met Asp Leu Ser Ala Leu Arg Val Giu Giu Val Gin. Asn Val Ile Asn 1 5 10 Ala Met Gin Lys Ile Leu Glu Cys Pro le. Cys Leu Giu Leu Ile Lys 25 Giu Pro Val Ser Thr Lys Cys Asp His le Phe Cys. Lys Phe Cys Met Leu Lys Gin Thr Asn Lys so Asn Leu Gly Ser Leu Asp Val Leu Pro Leu Asn Gin Lys Ile Thr Lys Arg 70 Giu Giu Lou Leu Giu Tyr Ala Asn 100 Giu His Lou Lys Lys Gly Pro Ser. Leu Gin Lys Ile Ilie_ 90 Ser Tyr Asn 105 Asp Giu Val 120 Ser Giu 75 C) s Phe Gin Ser Ala Ala Pro Arg Gin
LYS
110 Leu Cys Phe Ser Leu Asp Giu Asn Ser Met 115 125 Gly Tyr, Arg Asn Arg Ala Lys Arg Lou LeuLn-Ser 130 :140 Glu Pro Glu Asn Pro Ser 145 Thr Val Ser Val Lys Ala Pro Gin 210 Ala Cys 225 Pro Ser His Pro Pro Cys Ser Lou 290 Cys Asn 305 Trp Ala Lou Arg Thr 195 Gly Giu Asn Giu Gly 275 Lou Lys Gly *Gin Glu Thr Lou 165 le Glu 180 Tyr Cys Thr Arg Phe Ser Asn--Asp 245 Lys Tyr 260 Thr Asn Lou Thr Sor Lys Sor Lys Thr 150 Arg Lou Sor Asp Giu 230 Lou Gin Thr Lys Gln 310 Glu Sor Lou Ser Th~tys- Gin- Gly Ser Asp 185 Val Gly Asp 200 Glu Ile Ser 215 Thr Asp Vai Asn Thr Th-r Gly Ser Sor 265 His Ala Ser 280 Asp Arg Met 295 Pro Gly Lou Thr Cys Asn 72 Vd-L Arg 170 Ser Gin Lou Th~r Glu 250 Val Ser Asri Ala Asp Gln £.eu Ser Asn 155 Lou 175 Sor Glu Asp Th~r Val Asn Glu Lou Leu 205 Asp Ser Ala 220 Asn Thr Glu 235 Lys Arg Ala Sor *"-Xsn Lou Lou Gin His 285 Val Glu Lys 300 Arg Sor Gin 315 Arg-Arg Thr 190 Gin Lys His Ala His 270 Glu Ala His Pro Ile Lys His Glu 255 Val Asn Giu A sn Ser Thr Ala GIn 240 Axg Giu Ser Phe Arg 320 Thr 325 Asp Giu Lys Lys Val 340 330 Leu Asn Ala ASP Pro Leu Cys Giu Arg 345 350 335 Lyi Giu Trp Asp 385 Gly Asn Ala -Ser 465 Leu Pro His I Pro C Val~ 545 Sex I Giu S Asn M~ Asn Val 370 Phe Giu Giu.
Sex 450 Arxg Ile eu ~ro C liu Do.
let le G erA et G
LYS
355 Pro Sex Ser Val Asp Gin Trp Axg Giu Asp 420 Pro Lys Leu Pro Cys 360 Ile Thx Leu Asn 375 Sex Asp Glu Leu 390 Ser Asn Ala Lys 405 Giu Tyr Ser Gly His Glu Ala Leu Sex' Giu Asn Pro Axg Asp Th~r 365 435 Ser Ele Dhx ;iu let Lsn in l~a iu Val Giu Lys Ala Giy Ala 485 Asn Lys 500 Asp Phe Iie--Asn Ile Thx Asn Glu 565 Phe Lys 580 Leu Giu Sex Sex 470 Phe Leu le Gin Asn 550 Lys Thz Leu Asn 455 Leu Val Lys Lys Giy 535 Ser Asn Lys Asn Sex 615S 440 Sle Pro Thx Arg Lys 520 Thr~ Gly Pro Ala C 5 le E~ 600 Se Let Val Sez 425S le Giu Asn Glu LJys 505 Alia ksn lis sn ;iu ~85 [is kSex- le Gin .380 .1 Gly Sex Asp 395 Ala Asp Val 410 Ser-Gamtys Cys Lys Sex Asp Lys Ile 460 Leu Sex His 475 Pro Gin le 490 Axg Arg Pro Asp Leu Ala Gin Thr Giu -7540 Glu Asn-ys Pro Ile Giu S 570 Pro Ile Sex S Asn Sex Lys
LYS
Asp Leu le Giu 445 Phe VIal Ile Thr lai 525 ;in hr ~er lex Va2 Se~ As; Asp 430 Axg Giy Thr Gin Sex 510 Gin Asn Lys Leu Sex 590 Pro I Asn *His Vai 415 Leu *Val Giu Giu 495 Gly Lys Gly Gly GiuI 575 le Lys Giu A-sp, 400 Leu Leu His The~ Asn 480 Arxg Leu Thr ;in k-sp 560 ~er Jys Glu 595 Asn Ax; 610 Leu Axg Arg Lys Sex Thr Ax; His His Ala Leu Glu
L.
L~~i Leu 625 Ile Gin Glu Ser Ala 705 Phe Thr Ser C Ile S 7 Leu L 785 Cys V Gly C Leu G Glu Sc 8! Lys A 865 Glu C Pro L) As Me Pr Ly 69 Pr Val lal ;ly jer !70 .eu al ys ly er 50
PS
LI Vai Ser Arg Asa 630 p Ser Cys Ser Ser 645 t Pro Val Arg His 660 o Ala Thr Gly Ala 675 s Arg His Asp Ser 0 o Gly Ser Phe Thr 710 Asa Pro Ser Leu 725 Lys Val Ser Asa 2 740 *Glu Arg Val Leu C 755 Leu Val Pro Gly T 7 Glu Vai Ser Thr L 790 Ser Gin Cys Ala A 805 Ser Lys Asp Asa A 820 His Giu Val-Asn H 835 Glu Leu Asp Ala G: 8! Gin Ser Phe Ala L 870 Ala Thr Phe Ser A 885 Val Thr Phe Giu C) 900 Le Se Se Ly.
Asi 69! LyE Pro ksn ;In Ihr '75 eu la rg is In 55 eu La Ps !u Ser Pro Pro Asa Cys Thr Giu Leu Gin 635 640 .r Giu Giu Ile Lys LyS Lys Lys Tyr Asa 650 655 r Arg Asn Leu Gin Leu Met.Glu Gly Lys 665 670 s Lys Ser Asn Lys Pro Asa Gia Gin Thr 680 685 p Thr Phe Pro Giu Leu Lys Leu Thr Asa 5 700 Cys Ser Asn Thr Ser Giu Leu Lys Glu 720 Arg Giu Glu Lys Giu Giu Lys Leu Glu 730 735 Ala Giu Asp Pro Lys Asp Leu Met*Leu 745 750 Thr Glu Arg Ser Val Giu Ser Ser Ser 760 765 Asp Tyr Gly Thr Gin Giu Ser Ile Ser 780 Gly Lys Ala Lye Thr Giu Pro Asa Lye 795 800 Phe Giu Asa Pro Lys Gly Leu Ile His 810 815 Asn Asp Thr Giu Gly Phe Lye Tyr Pro 825 830 Ser Arg Glu Thir Ser-Ile Giu Met Glu 840 845 Tyr Leu Gin Asa Thr Phe Lys Val Ser 860 Phe Ser Asn Pro Gly Asn Ala Giu Glu 875 880 His Ser Gly Ser Leu Lys Lye Gin Ser 890 895 Glu Gin Lys Glu Glu Asn Gin Giy Lye 905 910 Asn Giu Ser Asn Ile Lys Pro Val Gin Thr Val Asn Ile Thr Ala Gly 92.5 920 925 Phe Pro Val. Vai Giy G-in Lys Asp Lys Pro Vai Asp Asn Ala Lys Cys 930 935 940 Ser Ile Lys Giy Giy Ser Arg Phe Cys Leu Ser Ser Gin Phe Arg Giy 945 950 955 960 Asn Giu Thr Giy Leu Ile Thr Pro Asn Lys His Gly Leu Leu Gin Asn 965 970 975 Pro Tyr Arg Ile Pro Pro Leu Phe Pro Ile Lys Ser Phe Vai Lys Thr 980 985 990 Lys Cys Lys Lys Asn Leu Leu Giu Giu Asn Phe Giu Giu His Ser Met 995 1000 1.005 Ser Pro Giu Arg Giu Met Giy Asri Giu Asn Ile Pro Ser Thr Val Ser 1.010 1015 1020 Thr Ile Ser Arg Asn Asn Ile Arg Giu Asn Val Phe Lys Gly Ala Ser 1025 1.030 1035 1040 Ser Ser Asn Ile Asn Giu Vai.Giy Ser Ser Thr Asn Giu Val Giy Sor 1.045 1050 1.055 Ser Ile Asn Giu Ile Giy Ser Ser Asp Giu Asn le Gin Ala Giu Lou 1.060 1065 1070 Gly Arg Asn Arg Gly Pro Lys Lou Asn Ala Met Leu Arg Leu Giy Vai 1075 1 2080 1085 Leu Gin Pro Giu Val Tyr Lys Gin Ser Lou Pro Giy Ser Asn Cys Lys 1.090 1.095 1100 9His Pro Glu Ile Lys Lys Gin Giu Tyr Giu Giu Vai Val Gin Thr Val 110 1is2110 1115 1.120 Asn Th~r Asp Phe Ser Pro Tyr Lou Ile Ser Asp Asn Lou Giu Gin Pro 1.125 1.130 1135 Met Gly Ser Sor His Ala Sor Gin Val Cys Ser Giu Tbr Pro Asp Asp 1140 2.145 1150 *Lou Lou Asp Asp Gly Giu Ile Lys Giu Asp Thr Ser Phe Ala Glu Asn 2.155 1.160 1.165 Asp Ile Lys Giu Ser Ser A1a Val Phe Ser Lys Sor Val Gin Arg Giy 1170 1.175 2.180 Glu Lou Ser Arg Ser Pro Ser Pro Ph~e Thr His Thr His Lou Ala Gin 2.185 1190 1195 1200 Giy Tyr Arg Arg Giy Ala Lys Lys Lou Giu Ser Ser Glu Glu Asn Lou 1205 1110 1215 Ser Ser Glu Asp GlU Glu Leu Pro Cys Phe Gin His Leu Leu Phe Gly 1220 1225 1230 Lys Val Asn Asn Il~e Pro Ser Gin Ser Th~r Arg His Ser Thr Val Ala 1235 1240 1245 Thr Glu Cys Leu Ser Lys Asn Thr. Giu Glu Asn Leu Leu Ser Leu Lys 1250 1255 1260 Asri Ser Leu Asn Asp Cys Ser Asn Gin Val Ile Leu, Ala Lys Ala Ser 1265 1270 1275 1280 Gin Glu His His Leu Ser Giu Glu Thx Lys Cys Ser Ala Ser Leu Phe 1285 12-90 1295.
Ser Ser Gin Cys Ser Glu Leu Glu Asp Leu. Thr Ala Asn Thr Asn Thr 1300 1305, 1310 Gin Asp Pro Phe Leu Ile Gly Ser Ser Lys Gin Met Ar; His Gin Ser 1315 1320 1325 Glu Ser Gin Gly Val Gly Leu Ser Asp Lys Giu Leu Val Ser Asp Asp 1330 1335 1340 Glu Giu Ar; Gly Thr Gly Leu Glu Glu Asn Asn Gin Glu Glu Gin Ser *1345 1350: 1355 1360 Met Asp Ser Asn Leu Gly Giu Ala Ala Ser Giy Cys Glu Ser Giu Thr 1365 1370 1375 Ser Val Ser Giu Asp Cys Ser Gly Leu Ser Ser Gin Ser Asp Ile Leu *1380 1385 1390 Thx Thr Gin Gin Ax; Asp Th~r Met Gin His Asn Leu le Lys Leu Gin *1395 1400- 1405 Gin Giu Met Ala Giu Leu Giu Ala Val Leu Glu Gin His Gly Ser Gin 1410 1415 1420 .Pro Ser Asn Ser Tyr Pro Ser Ile Ile Ser Asp Ser Ser Ala Lieu Glu 1425 1430 1435 1440 Asp Leu Ax; Asn Pro Giu Gin Ser Thr Ser Giu Lys.Ala Val Leu Th~r 1445 1450 1455 Ser Gin Lys Ser Ser Giu Tyr Pro Ile Ser Gin Asn Pro Giu Gly Leu.
1460 1465 1470 Ser Ala Asp Lys Phe Giu Val Ser Ala Asp Ser Ser Thr Ser Lys Asn 14.75 1480 1485 LYS.Glu Pro Gly Val Glu Ar; Ser Ser Pro. Set Lys Cys Pro Ser Leu 76 1490 1495 1500 Asp Asp Arg Trp Tyr Met His Ser Cys Ser Gly Ser Leu Gin Asn Arg 1505 1510 1515 1520 Asn Tyr Pro Ser Gin Glu Glu Leu Ile Lys Val Val Asp Val Glu Glu 1525 1530 1535 Gin Gin Leu Glu Glu Ser Giy Pro His Asp Leu Thr Glu Thr Ser Tyr 1540 1545 1550 Leu Pro Arg Gin Asp Leu Giu Gly Thr Pro Tyr Leu Giu Ser Gly Ile 1555 1560 1565 Ser Leu Phe Ser Asp Asp Pro Glu Ser Asp Pro Ser Glu Asp Arg Ala 1570 1575 1580 Pro Giu Ser Ala Arg Val Gly Asn Ile Pro Ser Ser Thxr Ser Ala Leu 1585 1590 1595 C. 1600 Lys Val Pro Gin Leu Lys Val Ala Giu Ser Ala Gin Gly Pro Ala Ala 1605 1610 1615 Ala His Thr Thr Asp Thr Ala Gly Tyr Asn Ala Met Glu Giu Ser Val 1620 1625 1630 Ser Arg Glu Lys Pro Glu Leu Thr Ala Ser Thr Glu4-Arg Val Asn Lys 1635 1640 1645
S..
Arg Met Ser Met Val Val Ser Gly Leu Thr Pro Giu Glu Phe Met Leu 1650 1655 1660 Val Tyr Lys Phe Ala Arg Lys His His Ile Thr Leu Thr Asn Leu Ile 1665 1670 1675 1680 Thr Glu Glu Thr Thr His Val Val Met Lys Thr Asp Ala Glu Phe Val 1685 1690 1695 Cys Giu Arg Thr Leu Lys Tyr Phe Leu Gly Ile Ala Gly Gly Lys Trp 1700 1705 1710 Val Val Ser Tyr Phe Trp Val Thr Gin Ser Ile Lys Giu Arg Lys Met 1715 1720 1725 Leu Asn Gu His Asp Phe Glu Val Arg Gly Asp Val Val Asr Gly Arg 1730 1735 1740 Asn His Gin Gly Pro Lys Arg Ala Arg Glu Ser Gin Asp Arg Lys Ile 1745 1750 1755 1760 Phe Arg Gly Leu Giu Ile Cys Cys Tyr Gly Pro Phe Thr Asn Met Pro 1765 1770 1775 Thr Asp Gin Leu Giu Trp Met Val Gin Leu Cys Gly Ala Ser Val Val 77 1780 1785 1790 Lys Glu Leu Ser Ser Phe Thr Leu Gly Thr Gly Val His Pro Ile Val 1795 1800 1805 Val Val Gin Pro Asp Ala Trp Thr Glu Asp Asn Gly Phe His Ala Ile 1810 1815 1820 Gly Gin Met Cys Glu Ala Pro Val Val Thr Arg Glu Trp Val Leu Asp 1825 1830 -1835 1840 Ser Val Ala Leu Tyr Gin Cys Gin Glu Leu Asp Thr Tyr Leu Ile Pro 1845 1850 1855 Gin Ile Pro His Ser His Tyr 1860 INFORMATION FOR SEQ ID NO:7: SEQUENCE CHARACTERISTICS: LENGTH: 24 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 2F primer a a* (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: GAAGTTGTCA TTTTATAAAC CTTT 24 INFORMATION FOR SEQ ID NO:8: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 2R primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: TGTCTTTTCT TCCCTAGTAT GT 22 INFORMATION FOR SEQ ID NO:9: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 3F primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: TCCTGACACA GCAGACATTT A 21 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) S* (vi) ORIGINAL SOURCE: STRAIN: 3R primer (xi) SEQUENCE DESCRIPTION: SEQ ID TTGGATTTTC GTTCTCACTT A 21 INFORMATION FOR SEQ ID NO:ll: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 5F primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11: CTCTTAAGGG
CAGTTGTGAG
INFORMATION FOR SEQ ID NO:12: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 5R-M13* primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: TTCCTACTGT GGTTGCTTCC INFORMATION FOR SEQ ID NO:13: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear s8 4 (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: B) STRAIN: 6/7F primer 0:.e S (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: CTTATTTTAG TGTCCTTAAA AGG 23 INFORMATION FOR SEQ ID NO:14: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 6R (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: TTTCATGGAC AGCACTTGAG
TG
22 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 7F primer (xi) SEQUENCE DESCRIPTION: SEQ ID CACAACAAAG AGCATACATA GGG S(2) INFORMATION FOR SEQ ID NO:16: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 6/7R primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: TCGGGTTCAC TCTGTAGAAG INFORMATION FOR SEQ ID NO:17: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 8F1 primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: TTCTCTTCAG GAGGAAAAGC A 21 INFORMATION FOR SEQ ID NO:18: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 8R1 primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: GCTGCCTACC ACAAATACAA A 21 INFORMATION FOR SEQ ID NO:19: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 9F primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: CCACAGTAGA TGCTCAGTAA ATA INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 9R primer (xi) SEQUENCE DESCRIPTION: SEQ ID TAGGAAAATA CCAGCTTCAT AGA 23 INFORMATION FOR SEQ ID NO:21: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) ORIGINAL SOURCE: STRAIN: 10F primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: S TGGTCAGCTT TCTGTAATCG INFORMATION FOR SEQ ID NO:22: SEQUENCE
CHARACTERISTICS:
LENGTH: 24 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 10R primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: GTATCTACCC ACTCTCTTCT TCAG 24 83 INFORMATION FOR SEQ ID NO:23: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11AF primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: CCACCTCCAA GGTGTATCA INFORMATION FOR SEQ ID NO:24: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11AR primer **eeo (xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: TGTTATGTTG GCTCCTTGCT INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11BF1 primer (xi) SEQUENCE DESCRIPTION: SEQ ID 84 CACTAAAGAC AGAATGAATC
TA
22 INFORMATION FOR SEQ ID NO:26: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11BR1 primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: GAAGAACCAG AATATTCATC TA 22 INFORMATION FOR SEQ ID NO:27: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11CF1 primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: TGATGGGGAG TCTGAATCAA INFORMATION FOR SEQ ID NO:28: S(i) SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11CR1 primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: TCTGCTTTCT TGATAAAATC CT 22 INFORMATION FOR SEQ ID NO:29: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11DF1 primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: AGCGTCCCCT CACAAATAAA INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11DR1 primer (xi) SEQUENCE DESCRIPTION: SEQ ID TCAAGCGCAT
GAATATGCCT
S INFORMATION FOR SEQ ID NO:31: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11EF primer 86 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: GTATAAGCAA TATGGAACTC GA 22 INFORMATION FOR SEQ ID NO:32: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11ER primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: TTAAGTTCACT GGTATTTGAA CA INFORMATION FOR SEQ ID NO:33: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11FF primer SEQUENCE DESCRIPTION: SEQ ID NO:33: GACAGCGATA CTTTCCCAGA INFORMATION FOR SEQ ID NO:34: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11FR primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: TGGAACAACC ATGAATTAGT C INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11GF primer (xi) SEQUENCE DESCRIPTION: SEQ ID GGAAGTTAGC ACTCTAGGGA INFORMATION FOR SEQ ID NO:36: S* SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant.
TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11GR primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: GCAGTGATAT TAACTGTCTG TA INFORMATION FOR SEQ ID NO:37: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant 88 TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11HP primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:37:- TGGGTCCTTA AAGAAACAAA GT 22 INFORMATION FOR SEQ ID NO:38: SEQUENCE CHAR.ACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANflEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11HR primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: TCAGGTGAA TGAATCTTC C 2 INFORMATION FOR SEQ ID NO:39: i)SEQUENCE
CHARACTERISTICS:
(A)*LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11IF primer (xi) SEQUENCE DESCRIPTION: SEQ II) NO: 39: CCACTTT'rTC CCATCAAGTC A 21 INFORMATION FOR SEQ ID) NO: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11IR primer (xi) SEQUENCE DESCRIPTION: SEQ ID TCAGGATGCT TACAATTACT
TC
22 INFORMATION FOR SEQ ID NO:41: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11JF primer S* (xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: CAAAATTGAA TGCTATGCTT AGA 23 INFORMATION FOR SEQ ID NO:42: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 11JR primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: TCGGTAACCC
TGAGCCAAAT
INFORMATION FOR SEQ ID NO:43: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRAN4DEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomi1c) (vi) ORIGINAL SOURCE: STRAIN: 1lKF primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: GCAAAAGCGT CCAGAAAGGP. INFORMA~TION FOR SEQ II) 11:44: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant- TOPOLOGY: linear (i)MOLECULE TYPE: DNA (genomic) OR.IGINAL SOURCE: STRAIN: 11KR-1 primer SEQUENCE DESCRIPTION: SEQ ID NO:44: TATTTGCAGT CAAGTCTrCC AA 22 IFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucl eic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA'(genomic) (vi) ORIGINAL SOURCE: STRAIN: 11LF-1 primer (xi) SEQUENCE-DESCRIPTION: SEQ ID GTAATATTGG CAAAGGCATC T 21 INFORMATION FOR SEQ ID NO-46:.
SEQUENCE CHARACTERISTICS: LENGTH: 22. base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 1lLR primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: TAAAATGTGC TC!CCCAAA&AG CA 22 INFORMATION FOR SEQ ID NO:47: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear- (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 12F primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: Osseo: '160 GTCCTGCCA TGAGAAGAAA 6*60 INFORMATION FOR SEQ ID NY: 48:
SEQUENCE.CHARACTERISTICS:
LENGTH: 21 base pairs *too TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 12R primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: TGTCAGCA. CCTAAGAATG T 21 INFORMATION FOR SEQ ID NO:49: i)SEQUENCE
CHARACTERISTICS:
LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DN& (genomic) (vi) ORIGINAL SOURCE: STRAIN: 13F primer (xi) SEQUENCE DESCRIPTION: SEQ ID.NO:49: AATGGAAAGC TTCTCAAAGT A 21 INFORM~ATION FOR SEQ ID Ci)SEQUENCE
CHARACTERISTICS:
LENGTH: 21 base pairs TYPE: nucleic acid TOPOLOGY: linear Ci)MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL
SOURCE:
STRAIN: 13R primer (xi) SEQUENCE DESCRIPTION: SEQ ID *.ATGTTGGAGC TAGGTCCTTA C 21 INFORMATION FOR SEQ ID NO:51: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant C(D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN:,14F primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: CTAACCTGAA TTATCACTAT CA 22 INFORMATION FOR SEQ ID NO:52: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 14R primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: GTGTATAAAT GCCTGTATGC A 21 INFORMATION FOR SEQ ID NO:53: i) SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL
SOURCE:
STRAIN: 15F primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: TGGCTGCCCA
GGAAGTATG
.19 INFORMATION FOR SEQ ID NO:54: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 15R primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: AACCAGAATA TCTTTATGTA GGA INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 16F primer (xi) SEQUENCE DESCRIPTION: SEQ ID AATTCTTAAC AGAGACCAGA AC 22 INFORMATION FOR SEQ ID NO:56: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs S* TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 16R primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: AAAACTCTTT CCAGAATGTT GT 22 INFORMATION FOR SEQ ID NO:57: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 17F primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: GTGTAGAACG
TGCAGGATTG
INFORMATION FOR SEQ ID NO:58: SEQUENCE CHARACTERISTICS: LENGTH: 18 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 17R primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: "TCGCCTCATG
TGGTTTTA
18 INFORMATION FOR SEQ ID NO:59: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 18F primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: GGCTCTTTAG CTTCTTAGGA C 21 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN:.18R primer (xi) SEQUENCE DESCRIPTION: SEQ ID GAGACCATTT
TCCCAGCATC
INFORMATION FOR SEQ ID NO:61: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 19F primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:61: S. CTGTCATTCT TCCTGTGCTC INFORMATION FOR SEQ ID NO:62: SEQUENCE
CHARACTERISTICS:
LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 19R primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: CATTGTTAAG GAAAGTGGTG C 21 INFORMATION FOR SEQ ID NO:63: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 20F primer (xi) SEQUENCE DESCRIPTION: SEQ ID N:63: ATATGACGTG TCTGCTCCAC INFORMATION FOR SEQ ID NO:64: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 20R primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:64: GGGAATCCAA
ATTACACAGC
INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) S(vi) ORIGINAL SOURCE: STRAIN: 21F primer (xi) SEQUENCE DESCRIPTION: SEQ ID AAGCTCTTCC TTTTTGAAAG TC 22 INFORMATION FOR SEQ ID NO:66: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 21R primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:66: GTAGAGAAAT AGAATAGCCT CT 22 INFORMATION FOR SEQ ID NO:67: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear S (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL
SOURCE:
S(B) STRAIN: 22F primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:67: TCCCATTGAG AGGTCTTGCT 4 INFORMATION FOR SEQ ID NO:68: SEQUENCE CHARACTERISTICS: LENGTH: 20 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear 0*.
(ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 22R primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: GAGAAGACTT CTGAGGCTAC INFORMATION FOR SEQ ID NO:69: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 23F-1 primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: TGAAGTGACA GTTCCAGTAG T 21 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 23R-1 primer (xi) SEQUENCE DESCRIPTION: SEQ ID CATTTTAGCC ATTCATTCAA CAA 23 S. INFORMATION FOR SEQ ID NO:71: SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 24F primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:71: ATGAATTGAC ACTAATCTCT GC 22 22 INFORMATION FOR SEQ ID NO:72: Wi SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: not relevant TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (vi) ORIGINAL SOURCE: STRAIN: 24R primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: GTAGCCAGGA CAGTAGAAGG A 21 4101

Claims (21)

1. An isolated consensus DNA sequence of the BRCA1 coding sequence as set forth in SEQ. ID. NO.: 1.
2. A consensus protein sequence of the BRCA1 protein as set forth in SEQ. ID. NO.: 2.
3. An isolated coding sequence of the BRCA1 gene as set forth in SEQ. ID. NO.: 3.
4. A protein sequence of the BRCA1 protein as set forth in SEQ. ID. NO.: 4.
5. An isolated coding sequence of the BRCA1 gene as set forth in SEQ. ID. NO.:
6. A protein sequence of the BRCA1 protein as set forth in SEQ. ID. NO.: 6.
7. A BRCA1 gene with a BRCAI coding sequence not associated with breast or ovarian cancer which comprises an alternative pair of codons. AGC and AGT, which occur at position 2201 at frequencies of about 35-45%, and from about 55-65%. respectively.
8. A BRCA1 gene according to Claim 7 wherein AGC occurs at a frequency of about
9. A set of at least two alternative codon pairs which occur at polymorphic positions in a BRCA1 gene with a BRCA1 coding sequence not associated with breast or ovarian cancer, wherein codon pairs are selected from the group consisting of: AGC and AGT at position 2201; T'TG and CTG at position 2430; 102 CCG and CTG at position 2731; GAA and GGA at position 3232; a AAA and AGA at position 3667; TCT and TCC at position 4427; and AGT and GGT at position 4956. A set of at least two alternative codon pairs according to claim 9, wherein the codon pairs occur in the following frequencies, respectively, in a population of individuals free of disease: at position 2201, AGC and AGT occur at frequencies from about 35-45%, and from about 55-65%, respectively, at position 2430, TTG and CTG occur at frequencies from about 35-45%, and from e about 55-65%, respectively;, at position 2731, CCG and CTG occur at frequencies from about 25-35%, and from about 65-75%, respectively;, at position 3232, GAA and GGA occur at frequencies from about 35-45%. and rom about 55-65%, respectively;, 0 at position 3667, AAA and AGA occur at frequencies from about 35-45%, and from about 55-65%, respectively; *I at position 4427, TCT and TCC occur at frequencies from about 45-95%, and from about 45-55%, respectively;, and at position 4956, AGT and GGT occur at frequencies from about 35-45%, and from about 55-65%, respectively. 11 A set according to Claim 10 which is at least three codon pairs. 12 A set according to Claim 10 which is at least four codon pairs.
13. A set according to Claim 10 which is at least five codon pairs.
14. A set according to Claim 10 which is at least six codon pairs. A set according to Claim 10 which is at least seven codon pairs. 103
16. A method of identifying individuals having a BRCA1 gene with a BRCA1 coding sequence not associated with disease, comprising: amplifying a.DNA fragment of an individual's BRCA1 coding sequence using an oligonucleotide primer which specifically hybridizes to sequences within the gene; sequencing said amplified DNA fragment by dideoxy sequencing; repeating steps and until said individual's BRCA1 coding sequence is completely sequenced; comparing the sequence of said amplified DNA fragment to a BRCA1 (omi DNA sequence, SEQ. ID. N01, SEQ. ID. N03, or SEQ. ID. determining the presence or absence of each of the following polymorphic variation in said individual's BRCA1 coding sequence: AGC and AGT at position 2201, FTTG and CTG at position 2430, CCG and CTG at position 2731, GAA and GGA at position 3232, AAA and AGA at position 3667, S TCT and TCC at position 4427, and AGT and GGT at position 4956; S. determining any sequence differences between said individual's BRCA1 coding sequences and SEQ. ID. N01, SEQ. ID. N03, or SEQ. ID. wherein the presence of said polymorphic variations and the absence of a variation outside of positions 2201. 2430, 2731, 3232, 3667. 4427, and 4956, is correlated with an absence of increased genetic susceptibility to breast or ovarian cancer resulting from a BRCA1 mutation in the BRCA1 coding sequence.
17. A method of claim 16 wherein, codon variations occur at the following frequencies, respectively, in a caucasian population of individuals free of disease: at position 2201, AGC and AGT occur at frequencies from about 35-45%, and from about 55-65%, respectively; 0 at position 2430, TrG and CTG occur at frequencies from about 35-45%, and from about 55-65%, respectively 104 at position 2731, CCG and CTG occur at frequencies from about 25-35%, and from about 65-75%, respectively; at position 3232, GAA and GGA occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 3667, AAA and AGA occur at frequencies from about 35-45%, and from about 55-65%. respectively; at position 4427, TCT and TCC occur at frequencies from about 45-55%, and from about 45-55%, respectively;: and at position 4956, AGT and GGT occur at frequencies from about 35-45%, and from about 55-65%, respectively.
18. A method according to claim 16 wherein said oligonucleotide primer is labeled with a radiolabel, a fluorescent label a bioluminescent label, a chemiluminescent label. :or an enzyme label. •o e
19. A method of detecting a increased genetic susceptibility to breast and ovarian cancer in an individual resulting from the presence of a mutation in the BRCA1 coding sequence, comprising: amplifying a DNA fragment of an individual's BRCA1 coding sequence using an oligonucleotide primer which specifically hybridizes to sequences within the gene; sequencing said amplified DNA fragment by dideoxy sequencing; repeating steps and until said individual's BRCA1 coding sequence is completely sequenced; comparing the sequence of said amplified DNA fragment to a BRCA1 (mi) DNA sequence; SEQ. ID. NO1, SEQ. ID. N03, or SEQ. ID. determining any sequence differences between said individual's BRCA1 coding sequences and SEQ. ID. N01. SEQ. ID. N03, or SEQ. ID. N05; to determine the presence or absence of base changes in said individual's BRCA1 coding sequence wherein a base change which is not any one of the following: AGC and AGT at position 2201, a TTG and CTG at position 2430, CCG and CTG at position 2731. 105 GAA and GGA at position 3232, AAA and AGA at position 3667, TCT and TCC at position 4427, and AGT and GOT at position 4956 is correlated with the potential of increased genetic susceptibility to breast or ovarian cancer resulting from a BRCAI mutation in the BRCA1 coding sequence. A method of claim 19 wherein, codon variations occur at the following frequencies, respectively, in a population free of disease: S at position 2201, AGC and AGT occur at frequencies from about 40%, and from about 55-65%, respectively; S at position 2430, TTG and CTG occur at frequencies from about 35-45%, and from about 55-65%, respectively- at position 2731, CCG and CTG occur at frequencies from about 25-35%, and from about 65-75%, respectively; at position 3232, GAA and GGA occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 3667, AAA and AGA occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 4427, TCT and TCC occur at frequencies:from about 45-55%, and from about 45-55%, respectively; and at position 4956, AGT and GGT occur at frequencies-from about 35-45%, and from about 55-65%, respectively.
21. A method according to claim 19 wherein said oligonucleotide primer is labeled with a radiolabeL a fluorescent label a bioluminescent label, a chemiluminescent labeL or an enzyme label
22. A set of codon pairs, which occur at polymorphic positions in a BRCAI gene with a BRCA1 ,coding sequence according to Claim 1. wherein said set of codon pairs is: AGC and AGT at position 2201; TTG and CTG at position 2430; CCG and CTG at position 2731; GAA and GGA at position 3232; 106 AAA and AGA at position 3667; TCT and TCC at position 4427; and AGT and GGT at position 4956.
23. A set of at least two alternative codon pairs according to claim 22 wherein set of at least two alternative codon pairs occur at the following frequencies: at position 2201, AGC and AGT occur at frequencies of about 40%, and from about 55-65%, respectively at position 2430, TrG and CTG occur at frequencies from about 35-45%, and from about 55-65%, respectively; S at position 2731, CCG and CTG occur at frequencies from about 25-35%, and from about 65-75%, respectively: at position 3232, GAA and GGA occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 3667, AAA and AGA occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 4427, TCT and TCC occur at frequencies from about 45-55%, and from about 45-55%. respectively; and :i at position 4956, AGT and GGT occur at frequencies from about 35-45%. and from about 55-65%, respectively.
24. A BRCAl coding sequence according to claim 1 wherein the codon pairs-occur at the following frequencies: S at position 2201, AGC and AGT occur at frequencies of about 40%, and from about 55-65%, respectively- at position 2430, TTG and CTG occur at frequencies from about 35-45%, and from about 55-65%, respectively; at position 2731, CCG and CTG occur at frequencies from about 25-35%, and from about 65-75%, respectively; at position 3232, GAA and GGA occur at frequencies from about 35-45%, and from about 55-65%. respectively; at position 3667, AAA and AGA occur at frequencies from about 35-45%, and from about 55-65%, respectively: 0 at position 4427, TCT and TCC occur at frequencies from about 45-55%, and from 107 about 45-55%. respectively: and at position 4956, AGT and GGT-occur at frequencies from about 35-45%, and from about 55-65%, respectively. ,A method of determining the consensus genomic sequence or consensus coding sequence for a target gene, comprising: a) screening a number of individuals in a population for a family history which indicates inheritance of normal alleles for a target gene; b) isolating at least one allele of the target gene from individuals found to have a family history which indicates inheritance of normal alleles for a target gene; c) sequencing each allele; d) comparing the nucleic acid sequence of the genomic sequence or of the coding sequence Sof each allele of the target gene to determine similarities and differences in the nucleic acid sequence; and e) determining which allele of the target gene occurs with the greatest frequency.
26. A method of performing gene therapy, comprising: a) transfecting cancer cell in vivo with an effective amount of a vector transformed with a BRCA1 coding sequences ofSEQ. ID. NO.: 1, SEQ. ID. NO.: 3, or SEQ. ID. NO.: b) allowing the cells to take up the vector, and c) measuring a reduction in tumor growth.
27. A method of performing protein therapy, comprising: a) injecting into a patient, an effective amount of BRCA1 tumor growth inhibiting protein of SEQ. ID. NO.: 2, SEQ. ID. NO.: 4, or SEQ. ID. NO.: 6; b) allowing the cells to take up the protein, and c) measuring a reduction in tumor growth. DATED this 7th day of June 2001 ONCORMED, INC. Attorney: JACINTA FLATTERY-O'BRIEN Registered Patent Attorney of The Institute of Patent and Trade Mark Attorneys of Australia of BALDWIN SHELSTON WATERS 108
AU51789/01A 1996-02-12 2001-06-07 Coding sequences of the human BRCA1 gene Ceased AU777341B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US598591 1996-02-12
AU19778/97A AU1977897A (en) 1996-02-12 1997-02-12 Coding sequences of the human BRCA1 gene

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
AU19778/97A Division AU1977897A (en) 1996-02-12 1997-02-12 Coding sequences of the human BRCA1 gene

Publications (2)

Publication Number Publication Date
AU5178901A true AU5178901A (en) 2001-10-25
AU777341B2 AU777341B2 (en) 2004-10-14

Family

ID=33163441

Family Applications (1)

Application Number Title Priority Date Filing Date
AU51789/01A Ceased AU777341B2 (en) 1996-02-12 2001-06-07 Coding sequences of the human BRCA1 gene

Country Status (1)

Country Link
AU (1) AU777341B2 (en)

Also Published As

Publication number Publication date
AU777341B2 (en) 2004-10-14

Similar Documents

Publication Publication Date Title
CA2218251C (en) Coding sequences of the human brca1 gene
Cawthon et al. A major segment of the neurofibromatosis type 1 gene: cDNA sequence, genomic structure, and point mutations
Simon et al. Gitelman's variant of Barter's syndrome, inherited hypokalaemic alkalosis, is caused by mutations in the thiazide-sensitive Na–Cl cotransporter
Hansson et al. Hypertension caused by a truncated epithelial sodium channel γ subunit: genetic heterogeneity of Liddle syndrome
Hyland et al. Three unrelated Rh D gene polymorphisms identified among blood donors with Rhesus CCee (r'r') phenotypes
Price et al. Analysis of the HNF4 α gene in Caucasian Type II diabetic nephropathic patients
US20020192647A1 (en) Diagnostic method
Tao et al. HepG2/erythrocyte glucose transporter (GLUT1) gene in NIDDM: a population association study and molecular scanning in Japanese subjects
US20090269814A1 (en) Method of Analyzing a BRCA2 Gene in a Human Subject
JP2013143958A (en) Method for determining phenotype of human brca2 gene
US20020183268A1 (en) Coding sequences of the human BRCA1 gene
AU5178901A (en) Coding sequences of the human brca1 gene
US6831153B2 (en) Gene and methods for diagnosing neuropsychiatric disorders and treating such disorders
US6686163B2 (en) Coding sequence haplotype of the human BRCA1 gene
US5231009A (en) Cdnas coding for members of the carcinoembryonic antigen family
US20030022184A1 (en) Coding sequences of the human BRCA1 gene
WO2000006768A1 (en) Genetic polymorphisms in the human neurokinin 1 receptor gene and their uses in diagnosis and treatment of diseases
US20060154272A1 (en) Novel coding sequence haplotypes of the human BRCA2 gene
US20130280703A1 (en) Method of analyzing a brca2 gene in a human subject
Friedman Molecular genetics of human breast cancer: the search for BRCA1
Ikegami et al. A microsatellite polymorphism in the human insulin receptor gene: a highly informative marker for linkage analysis

Legal Events

Date Code Title Description
PC1 Assignment before grant (sect. 113)

Owner name: GENE LOGIC ACQUISITION CORPORATION

Free format text: THE FORMER OWNER WAS: ONCORMED, INC.