WO2012029080A1 - Sequence variants associated with prostate specific antigen levels - Google Patents

Sequence variants associated with prostate specific antigen levels Download PDF

Info

Publication number
WO2012029080A1
WO2012029080A1 PCT/IS2011/050012 IS2011050012W WO2012029080A1 WO 2012029080 A1 WO2012029080 A1 WO 2012029080A1 IS 2011050012 W IS2011050012 W IS 2011050012W WO 2012029080 A1 WO2012029080 A1 WO 2012029080A1
Authority
WO
WIPO (PCT)
Prior art keywords
allele
psa
individual
rsl
rsl7632542
Prior art date
Application number
PCT/IS2011/050012
Other languages
French (fr)
Inventor
Patrick Sulem
Daniel Gudbjartsson
Julius Gudmundsson
Original Assignee
Decode Genetics Ehf
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Decode Genetics Ehf filed Critical Decode Genetics Ehf
Publication of WO2012029080A1 publication Critical patent/WO2012029080A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • Prostate cancer is among the leading causes of cancer death in men .
  • prostate cancer has become the most frequent cause of cancer in men with more than 192,000 predicted new cases (25% of all new male cancer diagnoses) and 27,360 deaths (9% of all cancer deaths in men) in 2009.
  • Early diagnosis and treatment are key factors in determining the survival and prognosis of prostate cancer patients, prompting intensive searches for biomarkers for screening.
  • PSA Prostate-specific antigen
  • PSA is a protein produced by the cells of prostate gland .
  • PSA is present in small quantities in serum of men with a healthy prostate, but is often elevated in individuals with prostate cancer and other prostate disorders.
  • a blood test to measure PSA is considered the most effective test currently available for the early detection of prostate cancer, although but its clinical effectiveness has been questioned. Rising levels of PSA over time are associated with both localized and metastatic prostate cancer.
  • PSA values ranging from 2.5 ng/mL to 4 ng/mL are considered as cut-off values for suspected cancer, and levels above 10 ng/mL indicate higher risk.
  • PSA screening test it is limited both in specificity and sensitivity and substantial controversy exists about its beneficial effect for patients.
  • PSA is not a specific marker of prostate cancer since its serum levels increase in prostatic hyperplasia and are affected by many other factors such as medication, urologic manipulations and inflammation .
  • a recent study showed that 47% of men with PSA levels between 10 and 50 ng/ml were not diagnosed with prostate cancer(3) .
  • not all individuals with prostate cancer have raised levels of PSA.
  • PSA levels in the population are known to be variable.
  • One approach to increase the specificity and sensitivity of the PSA test is to work out a model that defines what is a "normal" PSA value for a given man . Genetic factors have been shown to account for as much as 40 to 45% of the variability in PSA levels among men in the general population .
  • the present invention provides methods for correcting PSA levels based on genetic factors.
  • the present invention relates to methods for determining corrected PSA quantity in humans.
  • the invention also provides methods for determining prostate cancer risk, and prognostic methods for prostate cancer.
  • the invention provides a method of determining corrected PSA quantity in a human individual, the method comprising obtaining data identifying an uncorrected PSA quantity in a first biological sample from the human individual, analyzing sequence data about at least one polymorphic marker from the first biological sample or a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans; and determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker.
  • the at least one marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith
  • the invention provides a method of diagnosis of prostate cancer in a human individual, the method comprising (a) Detecting an uncorrected PSA quantity in a first biological sample from the human individual; (b) Obtaining sequence data about at least one polymorphic marker in the first biological sample or in a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans; (c) Determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker; (d) Determining whether the corrected PSA quantity is greater than normal PSA quantity in humans; and (e) Performing a further diagnostic evaluation procedure selected from the group consisting of rectal ultrasound imaging and prostate biopsy on the individual if the corrected PSA quantity is determined to be greater than the reference range; wherein determination of a positive outcome of the ultrasound imaging or prostate biopsy is indicative of prostate cancer in the individual.
  • Also provided is a method of determining a susceptibility to prostate cancer comprising analyzing nucleic acid sequence data from a human individual for at least one polymorphic marker selected from the group consisting of rsl7632542, and markers in linkage disequilibrium therewith, wherein different alleles of the at least one polymorphic marker are associated with different susceptibilities to prostate cancer in humans, and determining a susceptibility to prostate cancer from the nucleic acid sequence data.
  • identifying a human individual who is a candidate for further diagnostic evaluation for prostate cancer comprising the steps of (a) obtaining data representing uncorrected values of PSA quantity in the individual; (b) determining, in the genome of the human individual, the allelic identity of at least one allele of at least one polymorphic marker, wherein different alleles of the at least one marker are associated with different levels of PSA quantity in humans, and wherein the at least one marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith; (c) determining a corrected PSA quantity in the individual based on the allelic identity of the at least one polymorphic marker; and (d) identifying the subject as a subject who is a candidate for further diagnostic evaluation for prostate cancer
  • the invention also relates to computer-implemented aspects.
  • One such aspect provides an apparatus for determining PSA quantity in a human individual, comprising a processor, a computer-readable memory having instructions for execution on a processor, wherein the instructions relate to the determination of corrected PSA quantity for a human individual.
  • a computer-readable medium that comprises data representing uncorrected PSA values, data comprising sequence data about at least one polymorphic marker predictive of PSA quantity in humans, and a routine stored on the medium for execution on a processor to determine corrected PSA values.
  • a system for determining corrected PSA levels in a human subject comprising (i) at least one processor; (ii) at least one computer-readable medium; (iii) a susceptibility database operatively coupled to a computer-readable medium of the system and containing population information correlating the presence or absence of one or more alleles of at least one polymorphic marker with PSA levels in a population of humans; (iv) a measurement tool that receives an input about the human subject and generates information from the input about (a) uncorrected PSA levels in the human subject, and (b) the presence or absence of at least allele of at least one polymorphic marker in the human subject that is correlated with PSA levels in humans; and (v) an analysis tool that (a)is operatively coupled to the susceptibility database and the the measurement tool; (b)is stored on a computer-readable medium of the system; and (c) is adapted to be executed on a processor of the system, to compare the information about the human subject with the population information
  • the invention also provides a system for assessing or selecting a treatment protocol for a subject diagnosed with, or at risk for, prostate cancer, comprising (i) at least one processor; (ii) at least one computer-readable medium; (iii) a medical treatment database operatively connected to a computer-readable medium of the system and containing information correlating values of corrected PSA levels and efficacy of treatment regimens for prostate cancer; (iv) a measurement tool to receive an input about the human subject and generate information from the input about genetically corrected PSA levels in humans; and (v) a medical protocol tool operatively coupled to the medical treatment database and the measurement tool, stored on a computer-readable medium of the system, and adapted to be executed on a processor of the system, to compare the information with respect to the corrected PSA levels for the subject and the medical treatment database, and generate a conclusion with respect to at least one of (1) the probability that one or more medical treatments will be efficacious for treatment of prostate cancer for the patient; and (2) which of two or more medical treatments for the cancer will be more eff
  • FIG 1 provides a diagram illustrating a computer-implemented system utilizing risk variants as described herein.
  • FIG 2 shows the distribution of personalized PSA cutoff values after applying a genetic correction for the commonly used PSA cutoff of 4ng/mL, based on the effect of four SNPs (rs2736098, rsl0788160, rsl l067228 and rsl7632542) in samples from the Icelandic (ICE) and UK populations.
  • the Y-axis indicates personalized PSA cutoff values (ng/mL) based on the correction for the four SNPs, and the X-axis indicates % of the distribution .
  • FIG 3 shows results for four biopsy outcome models._Shown are results from analyses of the area under the receiver-operating-characteristic curve (AUC) for four biopsy outcome models.
  • the four different models included data on : 1) PSA levels (red line (1)), 2) the combined prostate cancer risk prediction of 23 established sequence variants (green line (2)), 3) genetic correction of PSA values based on the sequence variants rs2736098, rsl0788160, rsl l067228 and rsl7632542 (blue line (3)), 4) both the genetic correction of PSA levels and the combined risk of the 23 prostate cancer risk variants (pink line (4)) .
  • the black diagonal line indicates random classification, for comparison to the four different models.
  • FIG 4 provides a diagram illustrating a system comprising computer implemented methods utilizing risk variants as described herein .
  • FIG 5 shows an exemplary system for determining corrected PSA levels as described further herein .
  • FIG 6 shows a system for selecting a treatment protocol for a subject diagnosed with, or at risk for, prostate cancer.
  • nucleic acid sequences are written left to right in a 5' to 3' orientation .
  • Numeric ranges recited within the specification are inclusive of the numbers defining the range and include each integer or any non-integer fraction within the defined range.
  • all technical and scientific terms used herein have the same meaning as commonly understood by the ordinary person skilled in the art to which the invention pertains. The following terms shall, in the present context, have the meaning as indicated :
  • the marker can comprise any allele of any variant type found in the genome, including SNPs, mini- or microsateiiites, translocations and copy number variations (insertions, deletions, duplications) .
  • Polymorphic markers can be of any measurable frequency in the population . For mapping of disease genes, polymorphic markers with population frequency higher than 5-10% are in general most useful .
  • polymorphic markers may also have lower population frequencies, such as 1-5% frequency, or even lower frequency, in particular copy number variations (CNVs) .
  • CNVs copy number variations
  • the term shall, in the present context, be taken to include polymorphic markers with any population frequency.
  • sequence listing provided herein identifies polymorphic sites as described herein in the context of their genomic sequence, i.e. by providing information about the flanking sequence of the polymorphic site in the human genome assembly.
  • an “allele” refers to the nucleotide sequence of a given locus (position) on a chromosome.
  • a polymorphic marker allele thus refers to the composition (i.e., sequence) of the marker on a chromosome.
  • CEPH sample (Centre d'Etudes du Polymorphisme Humain, genomics repository, CEPH sample 1347-02) is used as a reference, the shorter allele of each microsatellite in this sample is set as 0 and all other alleles in other samples are numbered in relation to this reference.
  • allele 1 is 1 bp longer than the shorter allele in the CEPH sample
  • allele 2 is 2 bp longer than the shorter allele in the CEPH sample
  • allele 3 is 3 bp longer than the lower allele in the CEPH sample
  • allele -1 is 1 bp shorter than the shorter allele in the CEPH sample
  • allele -2 is 2 bp shorter than the shorter allele in the CEPH sample, etc.
  • Sequence conucleotide ambiguity as described herein is according to WIPO ST.25 :
  • a nucleotide position at which more than one sequence is possible in a population is referred to herein as a "polymorphic site”.
  • a "Single Nucleotide Polymorphism” or "SNP” is a DNA sequence variation occurring when a single nucleotide at a specific location in the genome differs between members of a species or between paired chromosomes in an individual. Most SNP polymorphisms have two alleles. Each individual is in this instance either homozygous for one allele of the polymorphism (i.e. both chromosomal copies of the individual have the same nucleotide at the SNP location), or the individual is heterozygous (i .e. the two sister chromosomes of the individual contain different nucleotides).
  • the SNP nomenclature as reported herein refers to the official Reference SNP (rs) ID identification tag as assigned to each unique SNP by the National Center for Biotechnological Information (NCBI) .
  • a “variant”, as described herein, refers to a segment of DNA that differs from the reference DNA.
  • a “marker” or a “polymorphic marker”, as defined herein, is a variant. Alleles that differ from the reference are referred to as “variant” alleles.
  • a "microsatellite” is a polymorphic marker that has multiple small repeats of bases that are 2-8 nucleotides in length (such as CA repeats) at a particular site, in which the number of repeat lengths varies in the general population .
  • An “indel” is a common form of polymorphism comprising a small insertion or deletion that is typically only a few nucleotides long.
  • haplotype refers to a segment of genomic DNA that is characterized by a specific combination of alleles arranged along the segment.
  • a haplotype comprises one member of the pair of alleles for each polymorphic marker or locus along the segment.
  • the haplotype can comprise two or more alleles, three or more alleles, four or more alleles, or five or more alleles.
  • susceptibility refers to the proneness of an individual towards the development of a certain state (e.g., a certain trait, phenotype or disease), or towards being less able to resist a particular state than the average individual.
  • particular alleles at polymorphic markers may be characteristic of increased susceptibility (i.e., increased risk) of prostate cancer, as characterized by a relative risk (RR) or odds ratio (OR) of greater than one for the particular allele.
  • the markers are characteristic of decreased susceptibility (i.e., decreased risk) of prostate, as characterized by a relative risk of less than one.
  • RR relative risk
  • OR odds ratio
  • the markers are characteristic of decreased susceptibility (i.e., decreased risk) of prostate, as characterized by a relative risk of less than one.
  • the term "and/or" shall in the present context be understood to indicate that either or both of the items connected by it are involved
  • look-up table is a table that correlates one form of data to another form, or one or more forms of data to a predicted outcome to which the data is relevant, such as phenotype or trait.
  • a look-up table can comprise a correlation between allelic data for at least one polymorphic marker and a particular trait or phenotype, such as a particular disease diagnosis, that an individual who comprises the particular allelic data is likely to display, or is more likely to display than individuals who do not comprise the particular allelic data.
  • Look-up tables can be multidimensional, i.e. they can contain information about multiple alleles for single markers simultaneously, or the can contain information about multiple markers, and they may also comprise other factors, such as particulars about diseases diagnoses, racial information, biomarkers, biochemical measurements, therapeutic methods or drugs, etc.
  • a "computer-readable medium” is an information storage medium that can be accessed by a computer using a commercially available or custom-made interface.
  • Exemplary computer- readable media include memory (e.g., RAM, ROM, flash memory, etc.), optical storage media (e.g. , CD-ROM), magnetic storage media (e.g., computer hard drives, floppy disks, etc.), punch cards, or other commercially available media .
  • Information may be transferred between a system of interest and a medium, between computers, or between computers and the computer- readable medium for storage or access of stored information .
  • Such transmission can be electrical, or by other available methods, such as IR links, wireless connections, etc.
  • nucleic acid sample refers to a sample obtained from an individual that contains nucleic acid (DNA or RNA) .
  • the nucleic acid sample comprises genomic DNA.
  • a nucleic acid sample can be obtained from any source that contains genomic DNA, including a blood sample, sample of amniotic fluid, sample of cerebrospinal fluid, or tissue sample from skin, muscle, buccal or conjunctival mucosa, placenta, gastrointestinal tract or other organs.
  • antisense agent or “antisense oligonucleotide” refers, as described herein, to molecules, or compositions comprising molecules, which include a sequence of purine an pyrimidine heterocyclic bases, supported by a backbone, which are effective to hydrogen bond to a corresponding contiguous bases in a target nucleic acid sequence.
  • the backbone is composed of subunit backbone moieties supporting the purine an pyrimidine hetercyclic bases at positions which allow such hydrogen bonding .
  • These backbone moieties are cyclic moieties of 5 to 7 atoms in size, linked together by phosphorous-containing linkage units of one to three atoms in length .
  • the antisense agent comprises an oligonucleotide molecule.
  • PSA quantity refers to the amount or level of a particular compound or substance.
  • PSA quantity refers to the amount of PSA in a particular object or sample.
  • the quantity may be determined as a mass or a molar quantity.
  • the quantity may also suitably be reported as a concentration, for example as mass/volume or molar quantity/volume.
  • PSA quantity is sometimes determined in units of ng/mL (nanograms per milliliter) .
  • PSA is widely used as a screening test for prostate cancer, it is limited in both specificity and sensitivity. This is mainly due to the fact that PSA is not a specific marker for prostate cancer, since its levels increase due to other conditions, including prostatic hyperplasia, and PSA levels are also known to be affected by factors such as medication, urologic
  • the present inventors have discovered that certain genetic variants are predictive of PSA levels in humans. Such variants determine in part normal PSA levels in humans. By applying information about the effect of genetic variants on PSA levels, methods to determine corrected PSA levels can be developed. Results from estimating the combined relative effect of variants shown herein to be associated with PSA levels demonstrate a considerable variation in PSA levels between individuals based on their genotypes. By applying the combined genetic effect on commonly used PSA cutoff values, a personalized PSA cutoff value can be obtained. The data indicate that for a substantial fraction of men undergoing PSA-based prostate cancer screening, the personalized PSA cutoff value (for the decision of doing a biopsy or not) is shifted and hence men would be reclassified with respect to whether or not they should undergo a biopsy.
  • the present invention provides a method of determining corrected PSA quantity in a human individual.
  • Such a method may in one aspect comprise steps of
  • an "uncorrected" PSA quantity is in this context a quantity of PSA that is determined in a biological sample, and is not corrected or adjusted based on the presence, absence or magnitude of other substances in the sample.
  • the uncorrected PSA quantity is a PSA quantity that has not been corrected based on the identity of genetic variants in the genome of the individual.
  • a "corrected" PSA quantity is, by consequence, a PSA quantity that has been corrected based on the identity of genetic variants in the genome of the individual, as described in detail herein .
  • the human individual is a male individual.
  • the step of obtaining data identifying an uncorrected PSA quantity comprises detecting an uncorrected PSA quantity in a first sample from the human individual.
  • the first sample is preferably a sample that comprises PSA protein .
  • the sample is selected from the group consisting of a blood sample, a serum sample, a semen sample, a saliva sample, a urine sample, a prostate biopsy sample.
  • the sample is a serum sample.
  • the sample may also be any other biological sample from the individual that contains PSA protein.
  • the step of obtaining data identifying an uncorrected PSA quantity includes a sample collection step, i.e. a step of obtaining a first sample from the human individual prior to the detecting.
  • Determination of PSA quantity in human tissue can be done using any method available to the skilled person. Such methods include, but are not limited to, immunogenic tests such as Hybritech PSA test (Beckman Coulter) and Elecsys PSA assay (Roche) . The skilled person will appreciate that the methods described herein are applicable for correction of PSA levels determined by any particular method that detects the amount or quantity of PSA protein.
  • Correction of PSA quantity is suitably done by using the determined allelic effect of any one allele of a polymorphic marker. For example, if a particular allele has been determined to lead to increased PSA levels by 15% in the population, then measured PSA values for an individual who carries one copy of the allele will be decreased by 15% to obtain a corrected PSA value.
  • the effect of multiple markers in general can be assumed to be independent, and the multiplicative model applied.
  • the magnitude of the PSA correction obtained by the current method depends on the genotype of the individual for the markers are assessed to apply a genetic correction.
  • the corrected PSA quantity differs from the uncorrected PSA quantity by at least O. lng/mL In certain embodiments, the corrected PSA quantity differs from the uncorrected PSA quantity by at least 0.5ng/mL In certain embodiments, the corrected PSA quantity differs from the uncorrected PSA quantity by at least l .Ong/mL It will be appreciated that other values of the difference between uncorrected and corrected PSA values are possible and are also contemplated, including but not limited to at least 0.2ng/mL, at least 0.3ng/mL, at least 0.4ng/mL, at least 0.6ng/mL, at least 0.7ng/mL, at least 0.8ng/mL, at least 0.9ng/mL, at least l . lng/mL, and at least 1.2ng/mL.
  • At least one allele of the at least one marker is predictive of an increased quantity of PSA in humans. In certain embodiments, at least one other allele of the at least one marker is predictive of a decreased quantity of PSA in humans.
  • determining corrected PSA quantity in an individual comprises adjusting uncorrected PSA quantity based on the predicted effect of the particular alleles in the genome of the individual on PSA quantity in humans.
  • a further step comprising preparing a report containing results from the determination of corrected PSA quantity.
  • the report may be in any suitable format, including but not limited to a report written in a computer readable medium, printed on paper, or displayed on a visual display.
  • the allele that is detected can be the allele of the complementary strand of DNA, such that the nucleic acid sequence data includes the identification of at least one allele which is complementary to any of the alleles of the polymorphic markers referenced above.
  • the methods described herein for correcting PSA levels may be practiced using any one, or a combination of, polymorphic markers that are predictive of PSA levels in humans.
  • the markers may be independent, i.e. in linkage equilibrium.
  • the markers may also be in linkage disequilibrium .
  • the skilled person will appreciate how to use any such marker in the methods described herein.
  • at least one allele of the marker is predictive of increased PSA levels in humans, compared with the general population. Certain other allele(s) the marker may also be predictive of decreased PSA levels in humans.
  • markers useful for correcting PSA levels are selected from the group consisting of rs401681 (Which is identified in SEQ ID NO: l herein), rs2736098 (SEQ ID NO: 2), rsl0788160 (SEQ ID NO: 3), rsll067228 (SEQ ID NO: 5), rsl0993994 (SEQ ID NO:4), rs4430796 (SEQ ID NO: 6), rs2735839 (SEQ ID NO: 7) and rsl7632542 (SEQ ID NO: 8), and markers in linkage disequilibrium therewith .
  • the markers are selected from the group consisting of s.51165690, s.51172808, s.51175013, s.56037076, s.56054527, s.56058688, s.56060000, s.56066550, s.56066560, s.56066619, rsl058205, rsl061657, rsl0749412, rsl0749413, rsl0763534, rsl0763536, rsl0763546, rsl0763576, rsl0763588, rsl0788154, rsl0788159, rsl0788162, rsl0788163, rsl0788164, rsl0788165, rsl0788166, rsl0788167, rsl0825652, rsl0826075, rsl0826125, rs
  • the markers are selected from the group consisting of rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, and rsl7632542, and markers in linkage disequilibrium therewith. In certain embodiments, the markers are selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl7632542 and rsl l067228, and markers in linkage disequilibrium therewith.
  • the markers are selected from the group consting of rs401681, rs2736098, rsl0788160 and rsll067228, and markers in linkage disequilibrium therewith. In one embodiment, the markers are selected from the group consisting of rs2736098, and markers in linkage disequilibrium therewith. In one embodiment, the markers are selected from the group consisting of rsl0788160, and markers in linkage disequilibrium therewith. In one embodiment, the markers are selected from the group consisting of rsl l067228, and markers in linkage disequilibrium therewith.
  • the markers are selected from the group consisting of rsl0993994, and markers in linkage disequilibrium therewith. In one embodiment, the markers are selected from the group consisting of rs4430796, and markers in linkage disequilibrium therewith. In one embodiment, the markers are selected from the group consisting of rsl7632542, and markers in linkage disequilibrium therewith. Certain alleles at these polymorphic markers are predictive of an increased PSA quantity in humans.
  • determination of the presence of a marker allele selected from the group consisting of the C allele of rs401681, the A allele of rs2736098, the A allele of rsl0788160, the T allele of rsl0993994, the A allele of rsll067228, the A allele of rs4430796, the G allele of rs2735839 and the T allele of rsl7632542 is indicative of elevated PSA quantity in the human individual.
  • the allele is the C allele of rs401681.
  • the allele is the A allele of rs2736098.
  • the allele is the A allele of rsl0788160.
  • the allele is the T allele of rsl0993994.
  • the allele is the A allele of rsll067228. In one embodiment, the allele is the A allele of rs4430796. In one embodiment, the allele is the G allele of rs2735839. In one embodiment, the allele is the T allele of rsl7632542. Marker alleles in linkage disequilibrium with any one of these marker alleles are also predictive of increased PSA quantity in humans, and are therefore also useful in the methods described herein.
  • a marker allele selected from the group consisting of s.51165690 allele C, s.51172808 allele G, s.51175013 allele A, s.56037076 allele T, s.56054527 allele T, s.56058688 allele T, s.56060000 allele A, s.56066550 allele T, s.56066560 allele C, s.56066619 allele G, rsl058205 allele T, rsl061657 allele T, rsl0749412 allele T, rsl0749413 allele T, rsl0763534 allele C, rsl0763536 allele G, rsl0763546 allele C, rsl0763576 allele A, rsl0763588 allele G, rsl0788154 allele C, rsl0788159 allele G, rsl
  • marker alleles selected from the group consisting of s.122837469 allele A, rs2130779 allele T, s.122876448 allele A, s.122901140 allele T, s.122901142 allele C, s.122905335 allele A, rsl0788149 allele G, rsl0749408 allele C, rs2172071 allele C, rsll592107 allele A, rsl907218 allele T, rsl907220 allele A, rsl994655 allele T, rsl907221 allele C, rsl907225 allele C, rsl907226 allele G, rsl0749409 allele C, rslll99835 allele G, s.122991926 allele C, rs729014 allele T, s.122993518 allele G, s.122994309
  • marker alleles selected from the group consisting of the T allele of rs401681, the G allele of rs2736098, the G allele of rsl0788160, the C allele of rsl0993994, the G allele of rsll067228, the G allele of rs4430796, the A allele of rs2735839 and the C allele of rsl7632542 are indicative of reduced PSA quantity in the individual.
  • a marker allele selected from the group consisting of s.51165690 allele
  • marker alleles selected from the group consisting of s.122837469 allele C, rs2130779 allele G, s.122876448 allele G, s.122901140 allele C, s.122901142 allele A, s.122905335 allele G, rsl0788149 allele A, rsl0749408 allele T, rs2172071 allele T, rsl l592107 allele G, rsl907218 allele C, rsl907220 allele G, rsl994655 allele G, rsl907221 allele T, rsl907225 allele T, rsl907226 allele A, rsl0749409 allele G, rsl l l99835 allele A, s.122991926 allele T, rs729014 allele C, s.122993518 allele A, s.
  • PSA Prostate Specific Antigen
  • PSA is a protein that is secreted by the epithelial cells of the prostate gland, including cancer cells. PSA is concentrated in prostatic tissue, and serum PSA levels are normally very low. Disruption of the normal prostate architecture, for example by prostatic disease, inflammation or trauma, allows greater amounts of PSA to enter the circulation . Thus, an elevated level in the blood indicates an abnormal condition of the prostate, either benign or malignant. PSA is used to detect potential problems in the prostate gland and to follow the progress of prostate cancer therapy.
  • results of PSA assay are usually made based on results of a PSA assay, which is sometimes also followed by a Digital Rectal Examination (DRE) .
  • DRE Digital Rectal Examination
  • Results of PSA assay, alone or in combination with results of DRE, are used to select those individuals for prostate biopsy. Further factors may be considered, including free and total PSA, age of the patient, the rate of PSA change with age (PSA velocity), family history, ethnicity, history of prior biopsy and combordity.
  • Prostate cancer is not limited to men with high PSA values. On the contrary, it has been found that even with men with PSA levels below 4.0ng/mL, prostate cancer is fairly common
  • PSA levels vary considerably in the population, and that this variation is to a large extent due to genetic factors, it is likely that a correction of PSA values of any particular individual based on the individual's genotype at genetic markers known to affect PSA levels could lead to significantly improved utility - through increased specificity and sensitivity - of PSA screening for reducing prostate cancer mortality in the population .
  • Correcting PSA levels by the methods described herein may in certain cases lead to corrected PSA values that are below the cutoff applied (such as 4ng/mL), even though the uncorrected PSA value is above the threshold. This means that some individuals, who otherwise would undergo further diagnostic evaluation might not be selected for such follow-up, since it is likely that their increased uncorrected PSA value is due to natural fluctuations in PSA levels in the population rather than an actual underlying disease. However, in some cases corrected PSA values will be significantly higher than uncorrected values, and this could mean that individuals who normally would not be selected for further follow-up because their uncorrected PSA level is below the threshold applied for further clinical evaluation would, based on the corrected PSA values, be considered at risk for prostate cancer and thus selected for further evaluation.
  • the cutoff applied such as 4ng/mL
  • the benefit of applying a correction to observed (uncorrected) PSA levels can be striking .
  • the personalized cutoff value of 4ng/mL is in some cases shifted dramatically when correction for variants affecting PSA levels is applied.
  • the corrected PSA value in those individuals may be as high as 5-8ng/mL or as low as l-2ng/ml_. Further examples illustrating the usefulness of applying the PSA correction are described in Example 5 and Example 6 herein.
  • PSA levels as determined by the methods described herein could have enormous implications for the management of prostate cancer, since PSA screening based on PSA values corrected for genetic background will better reflect physical changes in the individual (e.g., prostate cancer or other prostate disease) than do uncorrected PSA values, which may be largely dominated by inherent PSA levels, and not necessarily representing underlying disease.
  • the present invention provides diagnostic applications based on the determination of corrected PSA quantity.
  • a method of diagnostic evaluation of prostate cancer in a human individual is provided, the method comprising :
  • determination of a corrected PSA quantity that is greater than the reference range is indicative of suspected prostate cancer in the individual.
  • the invention provides a method of diagnosis of prostate cancer in humans, the method comprising : (a) Obtaining an uncorrected PSA quantity in a first biological sample from the human individual;
  • determination of a positive outcome of the ultrasound imaging or prostate biopsy is indicative of prostate cancer in the individual.
  • the obtaining of uncorrected PSA quantity comprises detecting the PSA quantity in a first biological sample from the individual.
  • a further aspect provides a method of diagnosis of prostate cancer, the method comprising :
  • Analyzing corrected PSA quantity of a human individual wherein if the corrected PSA levels of the human individual are determined to be greater than normal PSA quantity in humans, a further diagnostic evaluation selected from the group consisting of rectal ultrasound imaging and prostate biopsy is performed; and
  • the corrected PSA quantity is determined using any one of the methods of determining corrected PSA quantity described herein .
  • a further diagnostic application relates to selection processes for individuals who are undergoing evaluation for prostate cancer.
  • an individual who is a candidate for further diagnostic evaluation for prostate cancer can be selected by (a) obtaining data representing uncorrected values of PSA quantity in the individual; (b) determining, in the genome of the human individual, the allelic identity of at least one allele of at least one polymorphic marker, wherein different alleles of the at least one marker are associated with different levels of PSA quantity in humans, and wherein the at least one marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsll067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith; (c) determining a corrected PSA quantity in the individual based on the allelic identity of the at least one polymorphic marker; and (d) identifying the subject as a subject who is a candidate
  • the invention further provides methods of treatment of prostate cancer diagnosed by the diagnostic methods described herein .
  • methods of diagnosing prostate cancer as described herein may in certain embodiment comprise an additional step of treatment of prostate cancer, wherein the treatment is selected from the group consisting of surgery, radiation therapy, proton therapy, hormonal therapy and chemotherapy.
  • a further aspect of the invention relates to a method of treatment of prostate cancer, the method comprising (i) determining a corrected PSA quantity in the individual, wherein the corrected PSA quantity is determined based on the allelic identity of at least one allele of at least one polymorphic marker, wherein different alleles of the at least one marker are associated with different levels of PSA quantity in humans, and wherein the at least one marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith; and (ii) performing a prostate biopsy if the corrected PSA quantity is greater than values of normal PSA quantity in humans; wherein if the individual is determined to have prostate cancer based on the prostate biopsy, the individual is selected for at least one treatment module selected from the group consisting of surgery, radiation therapy, proto
  • the range of normal PSA quantity in humans may in certain embodiments by less than 50ng/mL, less than 40ng/mL, less than 30ng/mL, less than 20ng/mL, less than lOng/mL, less than 9ng/mL, less than 8ng/mL, less than 7ng/mL, less than 6ng/mL, less than 5ng/mL, less than 4ng/mL, less than 3.5ng/mL, less than 3.0ng/mL, less than 2.5ng/mL, less than 2.0ng/mL, less than 1.5ng/mL, less than l .Ong/mL or less than 0.5ng/mL
  • normal PSA quantity in humans is less than 4.0ng/mL
  • normal PSA quantity in humans is less than 3.5ng/mL
  • normal PSA quantity is less than 3.0ng/mL
  • normal PSA quantity is less than 2.5ng/mL
  • the human individual is in a particular age group.
  • the individual may be less than age 40, the individual may be age 40 - 49, age 50 - 59, age 60 - 69, age 70 - 79, age 70 or higher.
  • the normal PSA quantity is determined in the same age group as the individual .
  • the reference value of normal PSA quantity in humans is suitably determined in individuals age 40 - 49.
  • the invention is applicable to any particular age range, and all age ranges are contemplated and within scope of the invention.
  • normal PSA values are determined in the same age range as the individual who is undergoing diagnostic evaluation.
  • PSA is determined in human blood samples, in particular in human serum.
  • the present invention is applicable for correcting PSA levels determined in any human tissue.
  • the invention provides a method of determining a susceptibility to prostate cancer, the method comprising analyzing nucleic acid sequence data from a human individual for at least one polymorphic marker selected from the group consisting of rsl7632542, and markers in linkage disequilibrium therewith, wherein different alleles of the at least one polymorphic marker are associated with different susceptibilities to prostate cancer in humans, and determining a susceptibility to prostate cancer from the nucleic acid sequence data.
  • markers in linkage disequilibrium with rsl7632542 are in linkage disequilibrium as characterized by values of r 2 with rsl7632542 of 0.2 or greater.
  • markers in linkage disequilibrium with rsl7632542 are selected from the group consisting of s.55554247, s.55566277, s.55582344, rs2546552, s.55596785, s.55597645, s.55598078 s.55600121 s.55605246, s.55606024, s.55607242, s.55624341, s.55630396, s.55630578 s.55630679 s.55630791, s.55631170, s.55632347, s.55632363, s.55636052, s.55637350 s.55640040 s
  • determination of the presence of the T allele of rsl7632542 is indicative of increased susceptibility to prostate cancer in the individual.
  • Other marker alleles indicative of increased susceptibility to prostate cancer may also be suitably selected using the information provided in Table 1.
  • marker alleles indicative of increased susceptibility in humans are selected from the group consisting of s.55554247 allele A, s.55566277 allele T, s.55582344 allele C, rs2546552 allele G, s.55596785 allele T, s.55597645 allele A, s.55598078 allele A, s.55600121 allele A, s.55605246 allele G, s.55606024 allele A, s.55607242 allele G, s.55624341 allele C, s.55630396 allele T, s.55630578 allele T, s.55630679 allele T, s.55630791 allele T, s.55631170 allele C, s.55632347 allele A, s.55632363 allele A, s.55636052 allele T, s.55637350 allele C,
  • Determination of the absence of at least one of the at-risk alleles recited above is indicative of a decreased risk of prostate cancer for the human individual.
  • the analyzing comprises determining the presence or absence of at least one at- risk allele of the polymorphic marker. Individuals who are homozygous for at-risk alleles are at particularly high risk. Thus, in certain embodiments determination of the presence of two alleles of one or more of the above-recited risk alleles is indicative of particularly high risk
  • the allele that is detected can be the allele of the complementary strand of DNA.
  • the nucleic acid sequence data may include the identification of at least one allele which is complementary to any of the alleles of the polymorphic markers referenced above.
  • the nucleic acid sequence data is obtained from a biological sample containing nucleic acid from the human individual .
  • the nucleic acids sequence may suitably be obtained using a method that comprises at least one procedure selected from (i) amplification of nucleic acid from the biological sample; (ii) hybridization assay using a nucleic acid probe and nucleic acid from the biological sample; and (iii) hybridization assay using a nucleic acid probe and nucleic acid obtained by amplification of the biological sample.
  • the nucleic acid sequence data may also be obtained from a preexisting record.
  • the preexisting record may comprise a genotype dataset for at least one polymorphic marker.
  • the determining comprises comparing the sequence data to a database containing correlation data between the at least one polymorphic marker and susceptibility to the condition.
  • certain embodiments of the methods of the invention comprise a further step of preparing a report containing results from the
  • report is written in a computer readable medium, printed on paper, or displayed on a visual display.
  • it may be convenient to report results of susceptibility to at least one entity selected from the group consisting of the individual, a guardian of the individual, a genetic service provider, a physician, a medical organization, and a medical insurer.
  • determination of the presence of at least one copy of the T allele of rsl7632542 in the genome of an individual is indicative of increased risk of prostate cancer with an early age of onset. In other embodiments, determination of the presence of at least one copy of a marker allele in linkage disequilibrium with the T allele of rsl7632542 is indicative of increased risk of prostate cancer with an early age of onset. Individuals who are homozygous for such risk alleles are at particularly increased risk of prostate cancer with an early onset. In certain embodiments, the age of onset of prostate cancer is below 50 years. In certain embodiments, the age of onset of prostate cancer is below 45 years. In certain embodiments, the age of onset of prostate cancer is below 40 years.
  • An individual who is at an increased susceptibility (i.e., increased risk) for prostate cancer is an individual in whom at least one specific allele at one or more polymorphic marker, or haplotype, conferring increased susceptibility (increased risk) for the disease is identified (i.e., at-risk marker alleles or haplotypes) .
  • the at-risk marker or haplotype is one that confers an increased risk (increased susceptibility) of the disease.
  • significance associated with a marker or - is measured by a relative risk (RR) .
  • significance associated with a marker or haplotype is measured by an odds ratio (OR) .
  • the significance is measured by a percentage.
  • a significant increased risk is measured as a risk (relative risk and/or odds ratio) of at least 1.1, including but not limited to: at least 1.15, at least 1.20, at least 1.25, at least 1.30, at least 1.35, at least 1.40, at least 1.45, at least 1.5, at least 1.6, at least 1.7, at least 1.8, at least 1.9, and at least 2.0.
  • a risk (relative risk and/or odds ratio) of at least 1.2 is significant.
  • a risk of at least 1.30 is significant.
  • a risk of at least 1.35 is significant.
  • a relative risk of at least 1.5 is significant.
  • a significant increase in risk is at least about 20%, including but not limited to about 20%, 25%, 30%, 35%, 40%, 45%, 50%, 60%, 70%, 80%, 90%, and 100%.
  • a significant increase in risk is characterized by a p-value, such as a p-value of less than 0.05, less than 0.01, less than 0.001, less than 0.0001, less than 0.00001, less than 0.000001, less than 0.0000001, less than 0.00000001, or less than 0.000000001.
  • An at-risk polymorphic marker as described herein is one where at least one allele of at least one marker or haplotype is more frequently present in an individual at risk for prostate cancer
  • control group may in one embodiment be a population sample, i.e. a random sample from the general population .
  • the control group is represented by a group of individuals who are disease- free, i.e. not diagnosed with prostate cancer.
  • markers with two alleles present in the population being studied such as SNPs
  • the other allele of the marker will be found in decreased frequency in the group of individuals with the trait or disease, compared with controls.
  • one allele of the marker (the one found in increased frequency in individuals with the trait or disease) will be the at-risk allele, while the other allele will be a protective allele.
  • an individual who is at a decreased susceptibility (i.e., at a decreased risk) for prostate cancer is an individual in whom at least one specific allele at one or more polymorphic marker or haplotype conferring decreased susceptibility for prostate cancer is identified.
  • the marker alleles conferring decreased risk are also said to be protective.
  • the protective marker or haplotype is one that confers a significant decreased risk (or susceptibility) of prostate cancer.
  • significant decreased risk is measured as a relative risk (or odds ratio) of less than 0.9, including but not limited to less than 0.8, less than 0.7, less than 0.6, and less than 0.5. In one particular embodiment, significant decreased risk is less than 0.80.
  • significant decreased risk is less than 0.75. In yet another embodiment, significant decreased risk is less than 0.70. In another embodiment, the decrease in risk (or susceptibility) is at least 20%, including but not limited to at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, and at least 50%. Other cutoffs or ranges as deemed suitable by the person skilled in the art to characterize the invention are however also
  • relative risk and the population attributable risk (PAR) can be calculated assuming a multiplicative model (haplotype relative risk model) (Terwilliger, J.D. & Ott, J ., Hum. Hered. 42: 337-46 (1992) and Falk, C.T. & Rubinstein, P, Ann. Hum. Genet. 51 (Pt 3) : 227-33 (1987)), i .e., that the risks of the two alleles/haplotypes a person carries multiply.
  • a multiplicative model haplotype relative risk model
  • haplotypes are independent, i.e., in Hardy-Weinberg equilibrium, within the affected population as well as within the control population .
  • haplotype counts of the affected and controls each have multinomial distributions, but with different haplotype frequencies under the alternative hypothesis.
  • the methods can comprise obtaining sequence data about any number of polymorphic markers and/or about any number of genes.
  • the method can comprise obtaining sequence data for about at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 100, 500, 1000, 10,000 or more polymorphic markers.
  • the markers can be independent and/or the markers may be in linkage disequilibrium.
  • the markers may also form a haplotype.
  • the polymorphic markers can be the ones of the group specified herein or they can be different polymorphic markers that are not listed herein, including, for example, polymorphic markers in linkage disequilibrium with the markers described herein.
  • the method comprises obtaining sequence data about at least two polymorphic markers.
  • each of the markers may be associated with a different gene.
  • the method comprises obtaining nucleic acid data about a human individual identifying at least one allele of a polymorphic marker, then the method comprises identifying at least one allele of at least one polymorphic marker.
  • the method can comprise obtaining sequence data about a human individual identifying alleles of multiple, independent markers or haplotypes, which are not in linkage disequilibrium.
  • the method comprises obtaining nucleic acid sequence data about at least one polymorphic marker from associated with at least one gene selected from the group consisting of the KLK3 gene, the HNF1B gene, the FGFR2 gene, the TBX3 gene, the MSMB gene and the TERT gene.
  • Sequence data can be nucleic acid sequence data, which may be obtained by means known in the art.
  • nucleic acid sequence data may be obtained through direct analysis of the sequence of the polymorphic position (allele) of a polymorphic marker.
  • Suitable methods include, for instance, whole genome analysis using a whole genome SNP chip (e.g., Infinium HD BeadChip), cloning for polymorphisms, non-radioactive PCR-single strand conformation polymorphism analysis, denaturing high pressure liquid chromatography (DHPLC), DNA hybridization, computational analysis, single-stranded conformational polymorphism (SSCP), restriction fragment length polymorphism (RFLP), automated fluorescent sequencing; clamped denaturing gel electrophoresis (CDGE); denaturing gradient gel electrophoresis (DGGE), mobility shift analysis, restriction enzyme analysis;
  • whole genome analysis using a whole genome SNP chip e.g., Infinium HD BeadChip
  • heteroduplex analysis chemical mismatch cleavage (CMC), RNase protection assays, use of polypeptides that recognize nucleotide mismatches, such as E. coli mutS protein, allele-specific PCR, and direct manual and automated sequencing.
  • CMC chemical mismatch cleavage
  • RNase protection assays use of polypeptides that recognize nucleotide mismatches, such as E. coli mutS protein, allele-specific PCR, and direct manual and automated sequencing.
  • sequence data establishes the identity of particular nucleotide along a nucleic acid molecule.
  • sequence data established the identity of particular alleles at the polymorphic site.
  • sequence data establishes whether particular alleles are present or absent at a polymorphic site.
  • sequence data may be obtained from a first sample that is also used to determine PSA values.
  • sequence data is obtained from a second sample.
  • Nucleic acid sequence data is preferably obtained from a sample that contains nucleic acid, preferably genomic nucleic acid.
  • High-throughput sequencing Recent technological advances have resulted in technologies that allow massive parallel sequencing, also called high-throughput sequencing, to be performed in relatively condensed format. These technologies share sequencing-by-synthesis principle for generating sequence information, with different technological solutions implemented for extending, tagging and detecting sequences.
  • Exemplary high-throughput sequencing technologies include 454 pyrosequencing technology (Nyren, P. et al. Anal Biochem 208: 171-75 (1993);
  • sequence data useful for performing the present invention may be obtained by any such sequencing method, or other sequencing methods that are developed or made available.
  • any sequence method that provides the allelic identity at particular polymorphic sites ⁇ e.g., the absence or presence of particular alleles at particular polymorphic sites) is useful in the methods described and claimed herein .
  • test sample genomic DNA, RNA, or cDNA
  • the subject can be an adult, child, or fetus.
  • a test sample of DNA from fetal cells or tissue can be obtained by appropriate methods, such as by amniocentesis or chorionic villus sampling.
  • the DNA, RNA, or cDNA sample is then examined.
  • the presence of a specific marker allele can be indicated by sequence-specific hybridization of a nucleic acid probe specific for the particular allele.
  • the presence of more than one specific marker allele or a specific haplotype can be indicated by using several sequence-specific nucleic acid probes, each being specific for a particular allele.
  • a haplotype can be indicated by a single nucleic acid probe that is specific for the specific haplotype (i.e., hybridizes specifically to a DNA strand comprising the specific marker alleles characteristic of the haplotype) .
  • a sequence-specific probe can be directed to hybridize to genomic DNA, RNA, or cDNA.
  • a "nucleic acid probe”, as used herein, can be a DNA probe or an RNA probe that hybridizes to a complementary sequence. One of skill in the art would know how to design such a probe so that sequence specific hybridization will occur only if a particular allele is present in a genomic sequence from a test sample.
  • a hybridization sample can be formed by contacting the test sample, such as a genomic DNA sample, with at least one nucleic acid probe.
  • a probe for detecting mRNA or genomic DNA is a labeled nucleic acid probe that is capable of hybridizing to mRNA or genomic DNA sequences described herein.
  • the nucleic acid probe can be, for example, a full-length nucleic acid molecule, or a portion thereof, such as an oligonucleotide of at least 10, 15, 30, 50, 100, 250 or 500 nucleotides in length that is sufficient to specifically hybridize under stringent conditions to appropriate mRNA or genomic DNA.
  • the nucleic acid probe is capable of hybridizing specifically under stringent conditions to a nucleic acid molecule with sequence as set forth in any one of SEQ ID NO: 1-728, or a nucleic acid molecule with the complementary sequence of any one of SEQ ID NO: 1-728.
  • Other suitable probes for use in the diagnostic assays of the invention are described herein .
  • Hybridization can be performed by methods well known to the person skilled in the art (see, e.g., Current Protocols in Molecular Biology, Ausubel et al., eds., John Wiley & Sons, including all supplements) .
  • hybridization refers to specific hybridization, i.e., hybridization with no mismatches (exact hybridization) .
  • the hybridization conditions for specific hybridization are high stringency.
  • Specific hybridization if present, is detected using standard methods. If specific hybridization occurs between the nucleic acid probe and the nucleic acid in the test sample, then the sample contains the allele that is complementary to the nucleotide that is present in the nucleic acid probe. The process can be repeated for any markers of the invention, or markers that make up a haplotype of the invention, or multiple probes can be used concurrently to detect more than one marker alleles at a time.
  • nucleic acid sequence data is obtained by a method that comprises at least one procedure selected from the group consisting of amplification of nucleic acid from a first or second biological sample, hybridization assay using a nucleic acid probe and nucleic acid from the first or second biological sample, and hybridization assay using a nucleic acid probe and nucleic acid obtained by amplification of nucleic acid from the first or second biological sample.
  • Allele-specific oligonucleotides can also be used to detect the presence of a particular allele in a nucleic acid.
  • An "allele-specific oligonucleotide” (also referred to herein as an “allele-specific oligonucleotide probe") is an oligonucleotide of approximately 10-50 base pairs or approximately 15-30 base pairs, that specifically hybridizes to a nucleic acid which contains a specific allele at a polymorphic site (e.g., a polymorphicmarker as described herein) .
  • An allele-specific oligonucleotide is an oligonucleotide of approximately 10-50 base pairs or approximately 15-30 base pairs, that specifically hybridizes to a nucleic acid which contains a specific allele at a polymorphic site (e.g., a polymorphicmarker as described herein) .
  • An allele-specific allele-specific oligonucleotide is an oligonucle
  • oligonucleotide probe that is specific for one or more particular alleles at polymorphic markers can be prepared using standard methods (see, e.g., Current Protocols in Molecular Biology, supra) . PCR can be used to amplify the desired region . Specific hybridization of an allele- specific oligonucleotide probe to DNA from the subject is indicative of a specific allele at a polymorphic site (see, e.g., Gibbs et al., Nucleic Acids Res. 17 : 2437-2448 (1989) and WO 93/22456) .
  • arrays of oligonucleotide probes that are complementary to target nucleic acid sequence segments from a subject can be used to identify polymorphisms in a nucleic acid
  • the polymorphism may for example be any one or a combination of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith).
  • an oligonucleotide array can be used.
  • Oligonucleotide arrays typically comprise a plurality of different oligonucleotide probes that are coupled to a surface of a substrate in different known locations. These arrays can generally be produced using mechanical synthesis methods or light directed synthesis methods that incorporate a combination of photolithographic methods and solid phase oligonucleotide synthesis methods, or by other methods known to the person skilled in the art (see, e.g ., Bier et al., Adv Biochem Eng Biotechnol 109 :433-53 (2008); Hoheisel, Nat Rev Genet 7: 200-10 (2006); Fan et al., Methods Enzymol 410 : 57-73 (2006); Raqoussis & Elvidge, Expert Rev Mol Diagn 6: 145-52 (2006); Mockler et al., Genomics 85 : 1-15 (2005), and references cited therein, the entire teachings of each of which are incorporated by reference herein) .
  • genotyping can be used, such as fluorescence-based techniques (e.g. , Chen et al., Genome Res. 9(5) : 492-98 (1999); Kutyavin et al., Nucleic Acid Res. 34: el28 (2006)), utilizing PCR, LCR, Nested PCR and other techniques for nucleic acid amplification .
  • SNP genotyping include, but are not limited to, TaqMan genotyping assays and SNPlex platforms (Applied Biosystems), gel electrophoresis (Applied Biosystems), mass spectrometry (e.g., MassARRAY system from Sequenom), minisequencing methods, real-time PCR, Bio-Plex system (BioRad), CEQ and SNPstream systems (Beckman), array hybridization technology(e.g., Affymetrix GeneChip; Perlegen ), BeadArray Technologies (e.g ., Illumina GoldenGate and Infinium assays), array tag technology (e.g., Parallele), and endonuclease-based fluorescence hybridization technology (Invader; Third Wave) .
  • TaqMan genotyping assays and SNPlex platforms Applied Biosystems
  • Gel electrophoresis Applied Biosystems
  • mass spectrometry e.g., MassARRAY system from Sequenom
  • minisequencing methods minise
  • Some of the available array platforms include SNPs that tag certain copy number variations (CNVs) . This allows detection of CNVs via surrogate SNPs included in these platforms.
  • CNVs copy number variations
  • the direct sequence analysis can be of the nucleic acid of a biological sample obtained from the human individual for which a susceptibility is being determined .
  • the biological sample can be any sample containing nucleic acid (e.g., genomic DNA) obtained from the human individual .
  • the biological sample can be a blood sample, a serum sample, a leukapheresis sample, an amniotic fluid sample, a cerebrospinal fluid sample, a hair sample, a tissue sample from skin, muscle, buccal, or conjuctival mucosa, placenta, gastrointestinal tract, or other organs, a semen sample, a urine sample, a saliva sample, a nail sample, a tooth sample, and the like.
  • obtaining nucleic acid sequence data comprises obtaining nucleic acid sequence information from a preexisting record, e.g., a preexisting medical record comprising genotype information of the human individual .
  • a preexisting record e.g., a preexisting medical record comprising genotype information of the human individual .
  • direct sequence analysis of the allele of the polymorphic marker can be accomplished by mining a pre-existing genotype dataset for the sequence of the allele of the polymorphic marker.
  • the nucleic acid sequence data may be obtained through indirect analysis of the nucleic acid sequence of the allele of the polymorphic marker.
  • the allele could be one which leads to the expression of a variant protein comprising an altered amino acid sequence, as compared to the non-variant (e.g., wild-type) protein, due to one or more amino acid substitutions, deletions, or insertions, or truncation (due to, e.g., splice variation) .
  • the allele could be the T allele of rsl7632542, which leads to a substitution of
  • nucleic acid sequence data about the allele of the polymorphic marker (e.g., rsl7632542) can be obtained through detection of the amino acid substitution of the variant protein .
  • Methods of detecting variant proteins are known in the art. For example, direct amino acid sequencing of the variant protein followed by comparison to a reference amino acid sequence can be used.
  • Immunoassays e.g., immunofluorescent immunoassays, immunoprecipitations, radioimmunoasays, ELISA, and Western blotting, in which an antibody specific for an epitope comprising the variant sequence among the variant protein and non-variant or wild-type protein can be used.
  • the variant protein can demonstrate altered (e.g., upregulated or downregulated) biological activity, in comparison to the non-variant or wild-type protein.
  • the biological activity can be, for example, a binding activity or enzymatic activity.
  • nucleic acid sequence data about the allele of the polymorphic marker can be obtained through detection of the altered biological activity.
  • Methods of detecting binding activity and enzymatic activity include, for instance, ELISA, competitive binding assays, quantitative binding assays using instruments such as, for example, a Biacore® 3000 instrument, chromatographic assays, e.g ., HPLC and TLC.
  • the polymorphic variant (the allele of the polymorphic marker) could lead to an altered expression level, e.g., an increased expression level of an mRNA or protein, a decreased expression level of an mRNA or protein.
  • Nucleic acid sequence data about the allele of the polymorphic marker can, in these instances, be obtained through detection of the altered expression level.
  • Methods of detecting expression levels are known in the art. For example, ELISA, radioimmunoassays, immunofluorescence, and Western blotting can be used to compare the expression of protein levels. Alternatively, Northern blotting can be used to compare the levels of mRNA.
  • the indirect sequence analysis can be of a nucleic acid (e.g., DNA, mRNA) or protein of a biological sample obtained from the human individual for which a susceptibility is being determined.
  • the biological sample can be any nucleic acid or protein containing sample obtained from the human individual.
  • the biological sample can be any of the biological samples described herein.
  • analyzing the sequence of at least one polymorphic marker can comprise determining the presence or absence of at least one allele of the marker.
  • the analyzing can comprise analyzing the sequence of the polymorphic marker in a particular sample.
  • analyzing the sequence of the at least one polymorphic marker can comprise determining the presence or absence of an amino acid substitution in the amino acid sequence encoded by the polymorphic marker, or it can comprise obtaining a biological sample from the human individual and analyzing the amino acid sequence encoded by at least one gene of the group.
  • analyzing sequence comprises determining the identity of both alleles of the at least one polymorphic marker. Such sequence analysis thus corresponds to establishing the genotype of a particular marker for an individual.
  • the nucleic acid sequence data may be obtained through other means of indirect analysis of the nucleic acid sequence of the allele of the polymorphic marker.
  • obtaining nucleic acid data can comprise identifying at least one allele of a marker in linkage disequilibrium with at least one polymorphic marker associated with PSA levels.
  • Linkage Disequilibrium refers to a non-random assortment of two genetic elements. For example, if a particular genetic element (e.g. , an allele of a polymorphic marker, or a haplotype) occurs in a population at a frequency of 0.50 (50%) and another element occurs at a frequency of 0.50 (50%), then the predicted occurrance of a person's having both elements is 0.25 (25%), assuming a random distribution of the elements.
  • a particular genetic element e.g. , an allele of a polymorphic marker, or a haplotype
  • Allele or haplotype frequencies can be determined in a population by genotyping individuals in a population and determining the frequency of the occurence of each allele or haplotype in the population .
  • populations of diploids e.g. , human populations, individuals will typically have two alleles for each genetic element (e.g. , a marker, haplotype or gene) .
  • the r 2 measure is arguably the most relevant measure for association mapping, because there is a simple inverse relationship between r 2 and the sample size required to detect association between susceptibility loci and SNPs. These measures are defined for pairs of sites, but for some applications a determination of how strong LD is across an entire region that contains many polymorphic sites might be desirable (e.g., testing whether the strength of LD differs significantly among loci or across populations, or whether there is more or less LD in a region than predicted under a particular model) . Measuring LD across a region is not straightforward, but one approach is to use the measure r, which was developed in population genetics.
  • a significant r 2 value between markers can be at least 0.1 such as at least 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99 or 1.0.
  • the significant r 2 value can be at least 0.2. This means that markers are considered to be in LD if the correlation coefficient r 2 between the markers has a value of least 0.2.
  • linkage disequilibrium refers to linkage
  • linkage disequilibrium characterized by values of
  • linkage disequilibrium represents a correlation between alleles of distinct markers. It is measured by correlation coefficient or
  • Linkage disequilibrium can be determined in a single human population, as defined herein, or it can be determined in a collection of samples comprising individuals from more than one human population. In one embodiment of the invention, LD is determined in a sample from one or more of the HapMap populations.
  • LD is determined in the Caucasian CEU population of the HapMap samples.
  • LD is determined in samples from the Icelandic population.
  • LD is determined in samples from the UK population.
  • Genomic LD maps have been generated across the genome, and such LD maps have been proposed to serve as framework for mapping disease-genes (Risch, N . & Merkiangas, K, Science 273 : 1516-1517 (1996); Maniatis, N ., et ai., Proc Natl Acad Sci USA 99 : 2228-2233 (2002); Reich, DE et ai, Nature 411 : 199-204 (2001)) .
  • blocks can be defined as regions of DNA that have limited haplotype diversity (see, e.g., Daly, M . et al., Nature Genet. 29: 229-232 (2001); Patil, N . et ai., Science 294: 1719-1723 (2001); Dawson, E. et ai., Nature 4.28: 544-548 (2002); Zhang, K. et ai., Proc. Natl. Acad. Sci. USA 99: 7335-7339 (2002)), or as regions between transition zones having extensive historical recombination, identified using linkage disequilibrium (see, e.g., Gabriel, S.B.
  • haplotype block or "LD block” includes blocks defined by any of the above described characteristics, or other alternative methods used by the person skilled in the art to define such regions.
  • Haplotype blocks can be used to map associations between phenotype and haplotype status, using single markers or haplotypes comprising a plurality of markers.
  • the main haplotypes can be identified in each haplotype block, and then a set of "tagging" SNPs or markers (the smallest set of SNPs or markers needed to distinguish among the haplotypes) can then be identified .
  • These tagging SNPs or markers can then be used in assessment of samples from groups of individuals, in order to identify association between phenotype and haplotype. If desired, neighboring haplotype blocks can be assessed concurrently, as there may also exist linkage disequilibrium among the haplotype blocks.
  • markers used to detect association thus in a sense represent "tags" for a genomic region (i.e., a haplotype block or LD block) that is associating with a given disease or trait, and as such are useful for use in the methods and kits of the invention .
  • One or more causative (functional) variants or mutations may reside within the region found to be associating to the disease or trait.
  • the functional variant may be another SNP, a tandem repeat polymorphism (such as a minisatellite or a microsatellite), a transposable element, or a copy number variation, such as an inversion, deletion or insertion.
  • a tandem repeat polymorphism such as a minisatellite or a microsatellite
  • a transposable element such as a transposable element
  • a copy number variation such as an inversion, deletion or insertion.
  • Such variants in LD with other variants used to detect an association to a disease or trait may confer a higher relative risk (RR) or odds ratio (OR) than observed for the tagging markers used to detect the association .
  • RR relative risk
  • OR odds ratio
  • the invention thus refers to the markers used for detecting association to the disease, as described herein, as well as markers in linkage disequilibrium with the markers.
  • markers that are in LD with the markers and/or haplotypes of the invention, as described herein may be used as surrogate markers.
  • the surrogate markers have in one embodiment relative risk (RR) and/or odds ratio (OR) values smaller than for the markers or haplotypes initially found to be associating with the disease, as described herein .
  • the surrogate markers have RR or OR values greater than those initially determined for the markers initially found to be associating with the disease, as described herein .
  • An example of such an embodiment would be a rare, or relatively rare ( ⁇ 10% allelic population frequency) variant in LD with a more common variant (> 10% population frequency) initially found to be associating with the disease, such as the variants described herein . Identifying and using such markers for detecting the association discovered by the inventors as described herein can be performed by routine methods well known to the person skilled in the a rt, and are therefore within the scope of the invention .
  • the marker in li nkage disequilibrium with a polymorphic marker associated with PSA levels may be one of the surrogate markers listed i n Ta ble 1.
  • the markers were selected using data for Caucasia n CEU samples from the 1000 Genomes Project
  • Su rrogate ma rkers for the ma rkers shown herei n to be associated with PSA levels Shown are (1) anchor marker name and the allele correlating with increased PSA levels; (2) the surrogate ma rker; (3) chromosome a nd position of the surrogate ma rker in NCBI Build 36; (4) identity of the su rrogate a llele predicted to correlate with reduced PSA levels; (5) identity of the surrogate a llele predicted to correlate with elevated PSA levels; (6) D' values for the correlation between the a nchor and the su rrogate; and (7) r 2 va lues for the correlation between the a nchor a nd the surrogate.
  • Suita ble markers in li nkage disequili brium with any one of rs401681, rs2736098, rsl0788160, rsl0993994, rsl l067228, rs4430796, rs2735839 and rsl7632542 may for example be selected using the data provided in Table 1.
  • suitable ma rkers in lin kage disequilibriu m with rs401681 are selected from the group consisting of rs2736098, rs31484, rs4635969, rs9418, s.1282167, s.1285240, s.1285775, s.1287049, s.1349759, s.1350079, rs2736108, s.1350854, rs2735948, rs2735846, s.1352392, s.1353401, rs2735946, rs2736102, rs2853666, rs2735945, s.1359165, rs4530805, s.1359765, rs61574973, s.1362904, s.1363152, rsl2332579, rs6866783, s.1365329, rsl33567
  • suitable markers in linkage disequilibrium with rs2736098 are selected from the group consisting of rs2735845, rs31484, rs401681, s.1030492, s.1233724, s.1251946, s.1257345, s.1258032, s.1292191, s.1334730, s.1407682, s.1426206, s.1426336, s.1428371, s.1428373, s.1472454, s.1518154, s.1557827, rsll743119, s.1583465, rs4551123, s.1589581, s.1591616, s.1607388, rs6893515, s.1618305, s.1621550, s.1621551, rs6892057, s.1638061, rs6898387, rs7724451,
  • suitable markers in linkage disequilibrium with rsl0788160 are selected from the group consisting of rslll99892, rsll593067, s.122837469, rs2130779, s.122876448, s.122901140, s.122901142, s.122905335, rsl0788149, rsl0749408, rs2172071, rsll592107, rsl907218, rsl907220, rsl994655, rsl907221, rsl907225, rsl907226, rsl0749409, rslll99835, s.122991926, rs729014, s.122993518, s.122994309, s.122994946, rsl873450, rs2901290, s.122998594, s.122
  • rs7900630 s.123074016, rsl896416, s.123074531, s.123074928, s.123076274, s.123076472, rs2420925, s.123077398, s.123077455, rsl2779205, rsll l99912, rs4752534, s.123078389, rsl896420, rsl896419, s.123079199, s.123081990, s.123081993, s.123081998, and s.123201870.
  • suitable markers in linkage disequilibrium with rsl0993994 are selected from the group consisting of s.51157005, s.51159221, rs35716372, s.51159373, s.51159376, s.51159399, s.51159786, rs4935090, rsl2781411, s.51162137, s.51162792, s.51162795, rsll004246, s.51165690, rsl l004324, rs2843562, rsl l004409, rsll004415, rsl l004422, s.51168415, rsl l004435, rsl l599333, s.51170094, s.51170307, rsl2763717, rs67289834, s.51172442, s.5117
  • suitable markers in linkage disequilibrium with rsll067228 are selected from the group consisting of rsl2820376, s.113576401, s.113582477, s.113584188, s.113584539, s.113585097, rsl2819162, rsll609105, rs514849, rs513061, s.113590733, rsl061657, rs8853, rs3741698, s.113594635, rs567223, rs551510, rs59336, s.113601412, rs515746, rs545076, and s.113614584.
  • suitable markers in linkage disequilibrium with rs4430796 are selected from the group consisting of rs757210, rs7213769, rsl016990, rsl7626423, rs3744763, rs7405776, rs2005705, s.33170591, rsl l263761, rs4239217, rsll651755, rsl0908278, s.33174083, rsll657964, rs7501939, rs8064454, s.33175746, s.33176039, rs7405696, rsl l651052, rsll263763, rsl l658063, rs9913260, rs3760511, and s.33182344.
  • suitable markers in linkage disequilibrium with rs2735839 are selected from the group consisting of rs2659051, rs266849, rsl7632542, and rs2659122.
  • suitable markers in linkage disequilibrium with rsl7632542 are selected from the group consisting of rs273622, s.55554247, s.55566277, s.55582344, rs2546552, s.55596785, s.55597645, s.55598078, s.55600121, s.55605246, s.55606024, s.55607242, s.55624341, s.55630396, s.55630578, s.55630679, s.55630791, s.55631170, s.55632347, s.55632363, s.55636052, s
  • suitable surrogate markers may be selected based on suitable cutoff values for the LD measures r 2 and D'.
  • Alleles for SNP markers as referred to herein refer to the bases A, C, G or T as they occur at the polymorphic site.
  • a haplotype refers to a single-stranded segment of DNA that is characterized by a specific combination of alleles arranged along the segment.
  • a haplotype comprises one member of the pair of alleles for each polymorphic marker or locus.
  • the haplotype can comprise two or more alleles, three or more alleles, four or more alleles, or five or more alleles, each allele corresponding to a specific polymorphic marker along the segment.
  • Haplotypes can comprise a combination of various polymorphic markers, e.g. , SNPs and microsatellites, having particular alleles at the polymorphic sites. The haplotypes thus comprise a combination of alleles at various genetic markers.
  • genotypes of un-genotyped relatives For every un-genotyped case, it is possible to calculate the probability of the genotypes of its relatives given its four possible phased genotypes. In practice it may be preferable to include only the genotypes of the case's parents, children, siblings, half-siblings (and the half-sibling's parents), grand-parents, grand-children (and the grand-children's parents) and spouses. It will be assumed that the individuals in the small sub-pedigrees created around each case are not related through any path not included in the pedigree. It is also assumed that alleles that are not transmitted to the case have the same frequency - the population allele frequency. Let us consider a SNP marker with the alleles A and G. The probability of the genotypes of the case's relatives can then be computed by:
  • Pr(genotypes of relatives; #) ⁇ Pr(/z; #)Pr(genotypes of relatives
  • the likelihood function in (*) may be thought of as a pseudolikelihood approximation of the full likelihood function for ⁇ which properly accounts for all dependencies.
  • genotyped cases and controls in a case-control association study are not independent and applying the case-control method to related cases and controls is an analogous approximation .
  • the method of genomic control (Devlin, B. et al ., Nat Genet 36, 1129-30; author reply 1131 (2004)) has proven to be successful at adjusting case-control test statistics for relatedness. We therefore apply the method of genomic control to account for the dependence between the terms in our
  • a genetic variant associated with a disease or a trait such as PSA quantity can be used alone to predict the risk of the disease for a given genotype.
  • a bia I le lie marker such as a SNP
  • Risk associated with variants at multiple loci can be used to estimate overall risk.
  • For multiple SNP variants, there are k possible genotypes k 3" x 2 P ; where n is the number autosomal loci and p the number of gonosomal (sex chromosomal) loci.
  • Overall risk assessment calculations for a plurality of risk variants usually assume that the relative risks of different genetic variants multiply, i.e.
  • the overall risk (e.g. , RR or OR) associated with a particular genotype combination is the product of the risk values for the genotype at each locus. If the risk presented is the relative risk for a person, or a specific genotype for a person, compared to a reference population with matched gender and ethnicity, then the combined risk is the product of the locus specific risk values and also corresponds to an overall risk estimate compared with the population. If the risk for a person is based on a comparison to non-carriers of the at risk allele, then the combined risk corresponds to an estimate that compares the person with a given combination of genotypes at all loci to a group of individuals who do not carry risk variants at any of those loci.
  • the group of non-carriers of any at risk variant has the lowest estimated risk and has a combined risk compared with itself ⁇ i.e., non-carriers) of 1.0, but has an overall risk, compare with the population, of less than 1.0. It should be noted that the group of non-carriers can potentially be very small, especially for large number of loci, and in that case, its relevance is correspondingly small.
  • the multiplicative model is a parsimonious model that usually fits the data of complex traits reasonably well. Deviations from multiplicity have been rarely described in the context of common variants for common diseases, and if reported are usually only suggestive since very large sample sizes are usually required to be able to demonstrate statistical interactions between loci.
  • the combined or overall effect of any plurality of variants associated with PSA quantity and prostate cancer risk, as described herein, may be assessed .
  • an absolute risk of developing a disease or trait defined as the chance of a person developing the specific disease or trait over a specified time-period .
  • a woman's lifetime absolute risk of breast cancer is one in nine. That is to say, one woman in every nine will develop breast cancer at some point in their lives.
  • Risk is typically measured by looking at very large numbers of people, rather than at a particular individual. Risk is often presented in terms of Absolute Risk (AR) and Relative Risk (RR) .
  • AR Absolute Risk
  • RR Relative Risk
  • Relative Risk is used to compare risks associating with two variants or the risks of two different groups of people. For example, it can be used to compare a group of people with a certain genotype with another group having a different genotype.
  • a relative risk of 2 means that one group has twice the chance of developing a disease as the other group.
  • the creation of a model to calculate the overall genetic risk involves two steps: i) conversion of odds-ratios for a single genetic variant into relative risk and ii) combination of risk from multiple variants in different genetic loci into a single relative risk value. Deriving risk from odds-ratios
  • allelic odds ratio equals the risk factor:
  • RR(aa) Pr(A
  • aa)/Pr(A) (Pr(A
  • RR(gl,g2) RR(g l)RR(g2)
  • gl,g2) Pr(A
  • g2)/Pr(A) and Pr(gl,g2) Pr(gl)Pr(g2)
  • Obvious violations to this assumption are markers that are closely spaced on the genome, i .e. in linkage disequilibrium, such that the concurrence of two or more risk alleles is correlated.
  • the model applied is not expected to be exactly true since it is not based on an underlying bio-physical model.
  • the multiplicative model has so far been found to fit the data adequately, i.e. no significant deviations are detected for many common diseases for which many risk variants have been discovered.
  • certain polymorphic markers and haplotypes comprising such markers are found to be useful for risk assessment of prostate cancer. Certain markers have also been found to be useful for correcting PSA quantity to establish a corrected PSA quantity based on the genotype of individuals at particular polymorphic markers. Markers in linkage disequilibrium with any such marker are, by necessity, also useful in such applications. This fact is obvious to the skilled person, who thus knows that surrogate markers may be suitably selected to detect the effect of any particular anchor marker. The stronger the linkage disequilibrium to the anchor marker, the better the surrogate, and thus the more similar the results obtained by detecting the surrogate will be to that of the anchor marker.
  • Markers with values of r 2 equal to 1 are perfect surrogates anchor marker, i .e. genotypes for the surrogate marker perfectly predicts genotypes for the anchor marker. Markers with smaller values of r 2 than 1 can also be useful surrogates, although they are expected to give rise to observed effects that are smaller than for the anchor marker. Alternatively, such surrogate markers may represent variants with effects (e.g., OR, RR for prostate cancer, or effect on PSA levels) as high as or possibly even higher than that of the anchor marker. In this scenario, the anchor variant identified may not be the functional variant itself, but is in this instance in linkage disequilibrium with the true functional variant.
  • the functional variant may be a SNP, but may also for example be a tandem repeat, such as a minisatellite or a microsatellite, a transposable element (e.g., an Alu element), or a structural alteration, such as a deletion, insertion or inversion (sometimes also called copy number variations, or CNVs) .
  • the present invention encompasses the assessment of such surrogate markers for the markers as disclosed herein.
  • markers are annotated, mapped and listed in public databases, as well known to the skilled person, or can alternatively be readily identified by sequencing a genomic region or a part of the region identified by the markers of the present invention in a group of individuals, and identify polymorphisms in the resulting group of sequences.
  • the person skilled in the art can readily and without undue experimentation identify and genotype surrogate markers in linkage disequilibrium with the markers described herein.
  • Detection of nucleic acid sequence as described herein can in certain embodiments be practiced by assessing a sample comprising genomic DNA from an individual for the presence of certain variants described herein to be associated with PSA levels and risk of prostate cancer. Such assessment typically includes steps that detect the presence or absence of at least one allele of at least one polymorphic marker, using methods well known to the skilled person and further described herein, and based on the outcome of such assessment, determine whether the individual from whom the sample is derived is at increased or decreased risk (i.e., increased or decreased susceptibility) of prostate, or determine a corrected PSA value based on the outcome.
  • nucleic acid sequence data can comprise nucleic acid sequence at a single nucleotide position, which is sufficient to identify alleles at SNPs.
  • the nucleic acid sequence data can also comprise sequence at any other number of nucleotide positions, in particular for genetic markers that comprise multiple nucleotide positions, and can be anywhere from two to hundreds of thousands, possibly even millions, of nucleotides (in particular, in the case of copy number variations (CNVs)) .
  • CNVs copy number variations
  • the invention can be practiced utilizing a dataset comprising information about the genotype status of at least one polymorphic marker.
  • a dataset containing information about particular polymorphic markers for example in the form of genotype counts at a certain polymorphic marker, or a plurality of markers (e.g., an indication of the presence or absence of certain at-risk alleles, or the presence or absence of certain alleles predictive of increased or decreased PSA quantity), or actual genotypes for one or more markers, can be queried for the presence or absence of certain alleles.
  • the methods described herein for determining corrected PSA quantity and methods of assessing prostate cancer susceptibility may be performed using multiple markers.
  • any one, or a combination of the markers described herein may be used.
  • the use of additional polymorphic markers useful in the method is contemplated. Methods known in the art and described herein may be used to determine the overall effect of such multiple markers.
  • the Icelandic population is a Caucasian population of Northern European ancestry.
  • a large number of studies reporting results of genetic linkage and association in the Icelandic population have been published in the last few years. Many of those studies show replication of variants, originally identified in the Icelandic population as being associating with a particular disease, in other populations (Sulem, P., et al. Nat Genet May 17 2009 (Epub ahead of print); Rafnar, T., et al. Nat Genet 41 : 221-7 (2009); Greta rsdottir, S., et al. Ann Neurol 64:402-9 (2008); Stacey, S.N ., et al.
  • Chromosome 2pl5 (rs2710646), Chromosome l lq l3 (rsl0896450) and Chromosome Xpl l.22 (rs5945572), all of which had originally been identified in samples from the Icelandic population have been confirmed as risk variants of prostate cancer in many other populations.
  • Such embodiments relate to human individuals that are from one or more human population including, but not limited to, Caucasian populations, European populations, American populations, Eurasian populations, Asian populations, Central/South Asian populations, East Asian populations, Middle Eastern populations, African populations, Hispanic populations, and Oceanian populations.
  • the invention relates to markers and/or haplotypes identified in specific populations, as described in the above.
  • linkage disequilibrium may vary across human populations. This is due to different population history of different human populations as well as differential selective pressures that may have led to differences in LD in specific genomic regions.
  • certain markers e.g. SNP markers, have different population frequency in different populations, or are polymorphic in one population but not in another.
  • selecting markers in LD with an anchor marker may in certain embodiments be done using Caucasian samples.
  • markers in LD with an anchor markers may be suitably selected using LD determined in a particular population that is intended for study.
  • a particular anchor marker e.g., any of the markers shown herein to be predictive of PSA quantity in humans
  • Such selection of markers is well known to the skilled person, and can be done using data from the public domain, for example data from the HapMap project (http://www. hapmap.org), utilizing methods known in the art.
  • certain embodiments of the invention pertain to markers that are in linkage disequilibrium with a marker selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, wherein linkage disequilibrium is determined in samples from the same human population as the individual being studied.
  • the individual is Caucasian and the population is a Caucasian population.
  • the population may also suitably be a European population, for example in cases where the individual is European or of European origin .
  • Certain other embodiments relate to populations with a European origin.
  • nucleic acids and polypeptides described herein can be used in methods and kits of the present invention.
  • An "isolated" nucleic acid molecule is one that is separated from nucleic acids that normally flank the gene or nucleotide sequence (as in genomic sequences) and/or has been completely or partially purified from other transcribed sequences (e.g. , as in an RNA library) .
  • an isolated nucleic acid of the invention can be substantially isolated with respect to the complex cellular milieu in which it naturally occurs, or culture medium when produced by recombinant techniques, or chemical precursors or other chemicals when chemically synthesized.
  • the isolated material will form part of a composition (for example, a crude extract containing other substances), buffer system or reagent mix.
  • the material can be purified to essential homogeneity, for example as determined by polyacrylamide gel electrophoresis (PAGE) or column chromatography (e.g. , HPLC) .
  • An isolated nucleic acid molecule of the invention can comprise at least about 50%, at least about 80% or at least about 90% (on a molar basis) of all macromolecular species present.
  • genomic DNA the term "isolated" also can refer to nucleic acid molecules that are separated from the chromosome with which the genomic DNA is naturally associated .
  • the isolated nucleic acid molecule can contain less than about 250 kb, 200 kb, 150 kb, 100 kb, 75 kb, 50 kb, 25 kb, 10 kb, 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of the nucleotides that flank the nucleic acid molecule in the genomic DNA of the cell from which the nucleic acid molecule is derived .
  • nucleic acid molecule can be fused to other coding or regulatory sequences and still be considered isolated.
  • recombinant DNA contained in a vector is included in the definition of "isolated” as used herein.
  • isolated nucleic acid molecules include recombinant DNA molecules in heterologous host cells or heterologous organisms, as well as partially or substantially purified DNA molecules in solution .
  • isolated nucleic acid molecules also encompass in vivo and in vitro RNA transcripts of the DNA molecules of the present invention .
  • An isolated nucleic acid molecule or nucleotide sequence can include a nucleic acid molecule or nucleotide sequence that is synthesized chemically or by recombinant means.
  • Such isolated nucleotide sequences are useful, for example, in the manufacture of the encoded polypeptide, as probes for isolating homologous sequences (e.g. , from other mammalian species), for gene mapping (e.g. , by in situ hybridization with chromosomes), or for detecting expression of the gene in tissue (e.g. , human tissue), such as by Northern blot analysis or other hybridization techniques.
  • homologous sequences e.g. , from other mammalian species
  • gene mapping e.g. , by in situ hybridization with chromosomes
  • tissue e.g. , human tissue
  • the invention also pertains to nucleic acid molecules that hybridize under high stringency hybridization conditions, such as for selective hybridization, to a nucleotide sequence described herein (e.g. , nucleic acid molecules that specifically hybridize to a nucleotide sequence containing a polymorphic site associated with a marker or haplotype described herein) .
  • nucleic acid molecules can be detected and/or isolated by allele- or sequence-specific hybridization (e.g. , under high stringency conditions) .
  • Stringency conditions and methods for nucleic acid hybridizations are well known to the skilled person (see, e.g. , Current Protocols in Molecular Biology, Ausubel, F.
  • the length of a sequence aligned for comparison purposes is at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95%, of the length of the reference sequence.
  • the actual comparison of the two sequences can be accomplished by well-known methods, for example, using a mathematical algorithm.
  • a non-limiting example of such a mathematical algorithm is described in Karlin, S. and Altschul, S., Proc. Natl. Acad. Sci. USA, 90: 5873-5877 (1993) .
  • Such an algorithm is incorporated into the NBLAST and XBLAST programs (version 2.0), as described in Altschul, S. et al., Nucleic Acids Res., 25: 3389-3402 (1997) .
  • Another example of an algorithm is BLAT (Kent, W.J. Genome Res. 12: 656-64 (2002)) .
  • Other examples include the algorithm of Myers and Miller, CABIOS (1989), ADVANCE and ADAM as described in Torellis, A. and Robotti, C, Comput. Appl. Biosci.
  • the percent identity between two amino acid sequences can be accomplished using the GAP program in the GCG software package (Accelrys, Cambridge, UK) .
  • the present invention also provides isolated nucleic acid molecules that contain a fragment or portion that hybridizes under highly stringent conditions to a nucleic acid that comprises, or consists of, the nucleotide sequence of any one of the KLK3 gene, the HNF1B gene, the FGFR2 gene, the TBX3 gene, the MSMB gene and the TERT gene, or a nucleotide sequence comprising, or consisting of, the complement of the nucleotide sequence of any one of the KLK3 gene, the HNF1B gene, the FGFR2 gene, the TBX3 gene, the MSMB gene and the TERT gene.
  • the nucleotide sequence comprises at least one polymorphic allele contained in the markers described herein .
  • the nucleic acid fragments of the invention are at least about 15, at least about 18, 20, 23 or 25 nucleotides, and can be 30, 40, 50, 100, 200, 500, 1000, 10,000 or more nucleotides in length . In a specific embodiment, the nucleic acid fragments are 15-500 nucleotides in length .
  • probes or primers are oligonucleotides that hybridize in a base- specific manner to a complementary strand of a nucleic acid molecule.
  • probes and primers include polypeptide nucleic acids (PNA), as described in Nielsen, P. et al. , Science 254: 1497-1500 (1991) .
  • PNA polypeptide nucleic acids
  • a probe or primer comprises a region of nucleotide sequence that hybridizes to at least about 15, typically about 20-25, and in certain embodiments about 40, 50 or 75, consecutive nucleotides of a nucleic acid molecule.
  • the probe or primer comprises at least one allele of at least one polymorphic marker or at least one haplotype described herein, or the complement thereof.
  • a probe or primer can comprise 100 or fewer nucleotides; for example, in certain embodiments from 6 to 50 nucleotides, or, for example, from 12 to 30 nucleotides.
  • the probe or primer is at least 70% identical, at least 80% identical, at least 85% identical, at least 90% identical, or at least 95% identical, to the contiguous nucleotide sequence or to the complement of the contiguous nucleotide sequence.
  • the probe or primer is capable of selectively hybridizing to the contiguous nucleotide sequence or to the complement of the contiguous nucleotide sequence.
  • the probe or primer further comprises a label, e.g., a radioisotope, a fluorescent label, an enzyme label, an enzyme co-factor label, a magnetic label, a spin label, an epitope label .
  • the nucleic acid molecules of the invention can be identified and isolated using standard molecular biology techniques well known to the skilled person.
  • the amplified DNA can be labeled (e.g. , radiolabeled, fluorescently labeled) and used as a probe for screening a cDNA library derived from human cells.
  • the cDNA can be derived from mRNA and contained in a suitable vector.
  • Corresponding clones can be isolated, DNA obtained following in vivo excision, and the cloned insert can be sequenced in either or both orientations by art- recognized methods to identify the correct reading frame encoding a polypeptide of the appropriate molecular weight. Using these or similar methods, the polypeptide and the DNA encoding the polypeptide can be isolated, sequenced and further characterized .
  • Kits useful in the methods of the invention comprise components useful in any of the methods described herein, including for example, primers for nucleic acid amplification, hybridization probes, restriction enzymes (e.g. , for RFLP analysis), allele-specific oligonucleotides, antibodies useful for detecting PSA, e.g. antibodies that bind to PSA epitopes, antibodies that bind to an altered PSA polypeptide (e.g. , antibodies that bind to PSA epitopes that comprise a I179T variation) or to a non-altered (native) polypeptide encoded, means for analyzing the nucleic acid sequence of a nucleic acid, , etc.
  • kits can for include necessary buffers, nucleic acid primers for amplifying nucleic acids of the invention, and reagents for allele-specific detection of the fragments amplified using such primers and necessary enzymes (e.g. , DNA polymerase) .
  • necessary enzymes e.g. , DNA polymerase
  • kits can provide reagents for assays to be used in combination with the methods of the present invention, e.g. , reagents for use with other diagnostic assays.
  • kits provide reagents for performing a PSA assay.
  • the invention pertains to a kit for assaying a sample from a subject to detect a the presence or absence of certain alleles at certain polymorphic markers in a subject, wherein the kit comprises reagents necessary for selectively detecting at least one allele of at least one polymorphism as described herein in the genome of the individual.
  • the reagents comprise at least one contiguous oligonucleotide that hybridizes to a fragment of the genome of the individual comprising at least one polymorphism of the present invention .
  • the reagents comprise at least one pair of oligonucleotides that hybridize to opposite strands of a genomic segment obtained from a subject, wherein each oligonucleotide primer pair is designed to selectively amplify a fragment of the genome of the individual that includes at least one polymorphism that is useful in the methods described herein .
  • the polymorphism is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsll067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith.
  • the fragment is at least 20 base pairs in size.
  • oligonucleotides or nucleic acids e.g. , oligonucleotide primers
  • the kit comprises one or more labeled nucleic acids capable of allele- specific detection of one or more specific polymorphic markers, and reagents for detection of the label.
  • Suitable labels include, e.g., a radioisotope, a fluorescent label, an enzyme label, an enzyme co-factor label, a magnetic label, a spin label, an epitope label.
  • the polymorphic marker or haplotype to be detected by the reagents of the kit comprises one or more markers, two or more markers, three or more markers, four or more markers, five or more markers, six or more markers, seven or more markers, eight or more markers, nine or more markers, or ten or more markers.
  • a pack comprising (i) reagents for determining PSA levels in humans, and (ii) reagents for determining sequence information about at least one polymorphic marker, wherein the at least one polymorphic marker is correlated with PSA quantity in humans.
  • the reagents for determining sequence information comprise reagents for determining the presence or absence of at least one allele of at least one polymorphic marker.
  • the kit further comprises a set of instructions for using the reagents comprising the kit.
  • the kit further comprises instructions for interpreting results obtained by using reagents in the kit.
  • the instructions in one embodiment comprise instructions for determining corrected PSA levels based on (a) uncorrected PSA levels obtained using reagents provided in the kit and (b) sequence information obtained using reagents provided in the kit.
  • the kit contains a data sheet providing information on corrected PSA values based on results on uncorrected PSA values and sequence information about at least one polymorphic marker obtained using the reagents provided in the kit.
  • the invention also provides antibodies which bind to an epitope comprising either a variant amino acid sequence (e.g., comprising an amino acid substitution) encoded by a variant allele or the reference amino acid sequence encoded by the corresponding non-variant or wild-type allele.
  • antibody refers to immunoglobulin molecules and immunologically active portions of immunoglobulin molecules, i.e. , molecules that contain antigen-binding sites that specifically bind an antigen.
  • a molecule that specifically binds to a polypeptide of the invention is a molecule that binds to that polypeptide or a fragment thereof, but does not substantially bind other molecules in a sample, e.g. , a biological sample, which naturally contains the polypeptide.
  • immunologically active portions of immunoglobulin molecules include F(ab) and F(ab fragments which can be generated by treating the antibody with an enzyme such as pepsin.
  • the invention provides polyclonal and monoclonal antibodies that bind to a polypeptide of the invention.
  • the term "monoclonal antibody” or “monoclonal antibody composition”, as used herein, refers to a population of antibody molecules that contain only one species of an antigen binding site capable of immunoreacting with a particular epitope of a polypeptide of the invention. A monoclonal antibody composition thus typically displays a single binding affinity for a particular polypeptide of the invention with which it immunoreacts.
  • Polyclonal antibodies can be prepared as described above by immunizing a suitable subject with a desired immunogen, e.g. , polypeptide of the invention or a fragment thereof.
  • a desired immunogen e.g. , polypeptide of the invention or a fragment thereof.
  • the antibody titer in the immunized subject can be monitored over time by standard techniques, such as with an enzyme linked immunosorbent assay (ELISA) using immobilized polypeptide.
  • ELISA enzyme linked immunosorbent assay
  • the antibody molecules directed against the polypeptide can be isolated from the mammal (e.g., from the blood) and further purified by well-known techniques, such as protein A
  • antibody-producing cells can be obtained from the subject and used to prepare monoclonal antibodies by standard techniques, such as the hybridoma technique originally described by Kohler and Milstein, Nature 256:495-497 (1975), the human B cell hybridoma technique (Kozbor et al., Immunol. Today 4: 72 (1983)), the EBV-hybridoma technique (Cole et al., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, 1985, Inc., pp. 77-96) or trioma techniques.
  • hybridomas The technology for producing hybridomas is well known (see generally Current Protocols in Immunology (1994) Coligan et al., (eds.) John Wiley & Sons, Inc., New York, NY) .
  • an immortal cell line typically a myeloma
  • lymphocytes typically splenocytes
  • the culture supernatants of the resulting hybridoma cells are screened to identify a hybridoma producing a monoclonal antibody that binds a polypeptide of the invention .
  • a monoclonal antibody to a polypeptide of the invention can be identified and isolated by screening a recombinant combinatorial immunoglobulin library (e.g., an antibody phage display library) with the polypeptide to thereby isolate immunoglobulin library members that bind the polypeptide.
  • Kits for generating and screening phage display libraries are commercially available (e.g., the Pharmacia Recombinant Phage Antibody System, Catalog No. 27-9400-01; and the Stratagene SurfZkPTM Phage Display Kit, Catalog No. 240612). Additionally, examples of methods and reagents particularly amenable for use in generating and screening antibody display library can be found in, for example, U.S.
  • recombinant antibodies such as chimeric and humanized monoclonal antibodies, comprising both human and non-human portions, which can be made using standard recombinant DNA techniques, are within the scope of the invention .
  • chimeric and humanized monoclonal antibodies can be produced by recombinant DNA techniques known in the art.
  • antibodies of the invention can be used to isolate a polypeptide of the invention by standard techniques, such as affinity chromatography or immunoprecipitation .
  • a polypeptide-specific antibody can facilitate the purification of natural polypeptide from cells and of recombinantly produced polypeptide expressed in host cells.
  • an antibody specific for a polypeptide of the invention can be used to detect the polypeptide (e.g. , in a cellular lysate, cell supernatant, or tissue sample) in order to evaluate the abundance and pattern of expression of the polypeptide.
  • Antibodies can be used diagnostically to monitor protein levels in tissue as part of a clinical testing procedure, e.g. , to, for example, determine the efficacy of a given treatment regimen.
  • the antibody can be coupled to a detectable substance to facilitate its detection . Examples of detectable substances include various enzymes, prosthetic groups, fluorescent materials, luminescent materials,
  • bioluminescent materials examples include horseradish peroxidase, alkaline phosphatase, beta-galactosidase, or acetylcholinesterase; examples of suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin; examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a luminescent material includes luminol; examples of bioluminescent materials include luciferase, luciferin, and aequorin, and examples of suitable radioactive material include 125 I, 131 I, 35 S or 3 H.
  • Antibodies may also be useful in pharmacogenomic analysis.
  • antibodies against variant proteins encoded by nucleic acids according to the invention such as variant proteins that are encoded by nucleic acids that contain at least one polymorphic marker of the invention, can be used to identify individuals that require modified treatment modalities.
  • Antibodies can furthermore be useful for assessing expression of variant proteins in disease states, such as in active stages of a disease, or in an individual with a predisposition to a disease related to the function of the protein, in particular prostate cancer.
  • antibodies are useful for assessing PSA quantity in humans.
  • Antibodies specific for a variant protein of the present invention can be used to screen for the presence of the variant protein, for example to screen for a predisposition to prostate cancer as indicated by the presence of the variant protein .
  • the variant protein is a I179T variant of the KLK3 protein .
  • Antibodies can be used in other methods. Thus, antibodies are useful as diagnostic tools for evaluating proteins, such as variant proteins of the invention, in conjunction with analysis by electrophoretic mobility, isoelectric point, tryptic or other protease digest, or for use in other physical assays known to those skilled in the art. Antibodies may also be used in tissue typing . In one such embodiment, a specific variant protein has been correlated with expression in a specific tissue type, and antibodies specific for the variant protein can then be used to identify the specific tissue type.
  • Subcellular localization of proteins can also be determined using antibodies, and can be applied to assess aberrant subcellular localization of the protein in cells in various tissues. Such use can be applied in genetic testing, but also in monitoring a particular treatment modality. In the case where treatment is aimed at correcting the expression level or presence of the variant protein or aberrant tissue distribution or developmental expression of the variant protein, antibodies specific for the variant protein or fragments thereof can be used to monitor therapeutic efficacy.
  • Antibodies are further useful for inhibiting variant protein function, for example by blocking the binding of a variant protein to a binding molecule or partner. Such uses can also be applied in a therapeutic context in which treatment involves inhibiting a variant protein's function .
  • An antibody can be for example be used to block or competitively inhibit binding, thereby modulating (i.e., agonizing or antagonizing) the activity of the protein .
  • Antibodies can be prepared against specific protein fragments containing sites required for specific function or against an intact protein that is associated with a cell or cell membrane.
  • an antibody may be linked with an additional therapeutic payload, such as radionuclide, an enzyme, an immunogenic epitope, or a cytotoxic agent, including bacterial toxins (diphtheria or plant toxins, such as ricin) .
  • an additional therapeutic payload such as radionuclide, an enzyme, an immunogenic epitope, or a cytotoxic agent, including bacterial toxins (diphtheria or plant toxins, such as ricin) .
  • the in vivo half-life of an antibody or a fragment thereof may be increased by pegylation through conjugation to polyethylene glycol.
  • kits for using antibodies in the methods described herein includes, but is not limited to, kits for detecting the quantity of protein in a sample, and kits for detecting the presence of a variant protein in a sample.
  • kits for detecting the quantity of protein in a sample includes kits for detecting the presence of a variant protein in a sample.
  • One preferred embodiment comprises antibodies such as a labelled or labelable antibody and a compound or agent for detecting PSA in a biological sample and/or means for determining the quantity of PSA protein in the sample, as well as instructions for use of the kit.
  • antisense agents are comprised of single stranded oligonucleotides (RNA or DNA) that are capable of binding to a complimentary nucleotide segment.
  • RNA or DNA single stranded oligonucleotides
  • the antisense oligonucleotides are complementary to the sense or coding strand of a gene. It is also possible to form a triple helix, where the antisense oligonucleotide binds to duplex DNA.
  • antisense oligonucleotide binds to target RNA sites, activate intracellular nucleases (e.g., RnaseH or Rnase L), that cleave the target RNA.
  • Blockers bind to target RNA, inhibit protein translation by steric hindrance of the ribosomes. Examples of blockers include nucleic acids, morpholino compounds, locked nucleic acids and methylphosphonates (Thompson, Drug
  • Antisense oligonucleotides are useful directly as therapeutic agents, and are also useful for determining and validating gene function, for example by gene knock-out or gene knock-down experiments. Antisense technology is further described in Lavery et al. , Curr. Opin. Drug Discov. Devel. 6: 561-569 (2003), Stephens et al., Curr. Opin. Mol. Ther. 5 : 118-122 (2003), Kurreck, Eur. J. Biochem. 270: 1628-44 (2003), Dias et al., Mol. Cancer Ter. 1 : 347-55 (2002), Chen, Methods Mol. Med. 75: 621-636 (2003), Wang et al., Curr. Cancer Drug Targets 1 : 177-96 (2001), and Bennett, Antisense Nucleic Acid Drug Dev. 12 : 215- 24 (2002) .
  • the antisense agent is an oligonucleotide that is capable of binding to a particular nucleotide segment.
  • the nucleotide segment comprises a fragment of a gene selected from the group consisting of the KLK3 gene, the HNF1B gene, the FGFR2 gene, the TBX3 gene, the MSMB gene and the TERT gene.
  • the antisense nucleotide is capable of binding to a nucleotide segment of as set forth in SEQ ID NO: 1-728.
  • Antisense nucleotides can be from 5-500 nucleotides in length, including 5-200 nucleotides, 5-100 nucleotides, 10-50 nucleotides, and 10-30 nucleotides. In certain preferred embodiments, the antisense nucleotides are from 14-50 nucleotides in length, including 14-40 nucleotides and 14-30 nucleotides.
  • the variants described herein can also be used for the selection and design of antisense reagents that are specific for particular variants. Using information about the variants described herein, antisense oligonucleotides or other antisense molecules that specifically target mRNA molecules that contain one or more variants of the invention can be designed. In this manner, expression of mRNA molecules that contain one or more variant of the present invention (i.e. certain marker alleles and/or haplotypes) can be inhibited or blocked.
  • the antisense molecules are designed to specifically bind a particular allelic form (i.e., one or several variants (alleles and/or haplotypes)) of the target nucleic acid, thereby inhibiting translation of a product originating from this specific allele or haplotype, but which do not bind other or alternate variants at the specific polymorphic sites of the target nucleic acid molecule.
  • allelic form i.e., one or several variants (alleles and/or haplotypes)
  • the molecules can be used for disease treatment.
  • the methodology can involve cleavage by means of ribozymes containing nucleotide sequences complementary to one or more regions in the mRNA that attenuate the ability of the mRNA to be translated .
  • Such mRNA regions include, for example, protein-coding regions, in particular protein-coding regions corresponding to catalytic activity, substrate and/or ligand binding sites, or other functional domains of a protein .
  • RNA interference also called gene silencing, is based on using double-stranded RNA molecules (dsRNA) to turn off specific genes.
  • dsRNA double-stranded RNA molecules
  • siRNA small interfering RNA
  • the siRNA guide the targeting of a protein-RNA complex to specific sites on a target mRNA, leading to cleavage of the mRNA (Thompson, Drug Discovery Today, 7 : 912-917 (2002)) .
  • the siRNA molecules are typically about 20, 21, 22 or 23 nucleotides in length .
  • one aspect of the invention relates to isolated nucleic acid molecules, and the use of those molecules for RNA interference, i.e. as small interfering RNA molecules (siRNA) .
  • the isolated nucleic acid molecules are 18-26 nucleotides in length, preferably 19-25 nucleotides in length, more preferably 20-24 nucleotides in length, and more preferably 21, 22 or 23 nucleotides in length .
  • RNAi-mediated gene silencing originates in endogenously encoded primary microRNA (pri-miRNA) transcripts, which are processed in the cell to generate precursor miRNA (pre-miRNA) .
  • pri-miRNA primary microRNA
  • pre-miRNA precursor miRNA
  • miRNA molecules are exported from the nucleus to the cytoplasm, where they undergo processing to generate mature miRNA molecules (miRNA), which direct translational inhibition by recognizing target sites in the 3' untranslated regions of mRNAs, and subsequent mRNA degradation by processing P-bodies (reviewed in Kim & Rossi, Nature Rev. Genet. 8: 173-204 (2007)) .
  • RNAi Clinical applications of RNAi include the incorporation of synthetic siRNA duplexes, which preferably are approximately 20-23 nucleotides in size, and preferably have 3' overlaps of 2 nucleotides. Knockdown of gene expression is established by sequence-specific design for the target mRNA. Several commercial sites for optimal design and synthesis of such molecules are known to those skilled in the art.
  • siRNA molecules typically 25-30 nucleotides in length, preferably about 27 nucleotides
  • shRNAs small hairpin RNAs
  • siRNAs and shRNAs are substrates for In vivo processing, and in some cases provide more potent gene-silencing than shorter designs (Kim et al., Nature Biotechnol. 23: 222-226 (2005); Siolas et al., Nature Biotechnol. 23: 227-231 (2005)) .
  • siRNAs provide for transient silencing of gene expression, because their intracellular concentration is diluted by subsequent cell divisions.
  • expressed shRNAs mediate long-term, stable knockdown of target transcripts, for as long as transcription of the shRNA takes place (Marques et ai., Nature Biotechnol. 23 : 559-565 (2006); Brummelkamp et al., Science 296: 550-553 (2002)) .
  • RNAi molecules including siRNA, miRNA and shRNA
  • the variants presented herein can be used to design RNAi reagents that recognize specific nucleic acid molecules comprising specific alleles and/or haplotypes (e.g., the alleles and/or haplotypes of the present invention), while not recognizing nucleic acid molecules comprising other alleles or haplotypes.
  • RNAi reagents can thus recognize and destroy the target nucleic acid molecules.
  • RNAi reagents can be useful as therapeutic agents (i.e., for turning off disease-associated genes or disease-associated gene variants), but may also be useful for characterizing and validating gene function (e.g., by gene knock-out or gene knockdown experiments) .
  • RNAi may be performed by a range of methodologies known to those skilled in the art. Methods utilizing non-viral delivery include cholesterol, stable nucleic acid-lipid particle (SNALP), heavy-chain antibody fragment (Fab), aptamers and nanoparticles. Viral delivery methods include use of lentivirus, adenovirus and adeno-associated virus.
  • the siRNA molecules are in some embodiments chemically modified to increase their stability. This can include modifications at the 2' position of the ribose, including 2'-0-methylpurines and 2'- fluoropyrimidines, which provide resistance to Rnase activity. Other chemical modifications are possible and known to those skilled in the art.
  • the polymorphic markers of the invention are useful in determining prognosis of human individuals. Accurate pretreatment staging is important for prostate cancer treatment. Serum PSA levels correlate with aggressiveness of disease. Thus, individuals with serum PSA levels less than lOng/mL are most likely to respond to local therapy. Further, the PSA velocity (change in levels per year) is an independent predictor of mortality following treatment.
  • the invention therefore provides a method for determining the prognosis of an individual diagnosed with prostate cancer, the method comprising (i) detecting an uncorrected PSA quantity in a first biological sample from the human individual; (ii) obtaining sequence data about at least one polymorphic marker in the first biological sample or in a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans; and (iii) determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker; wherein the corrected PSA quantity is indicative of the prognosis of the individual.
  • a corrected PSA quantity of lOng/mL or greater is indicative of a worse prognosis.
  • the method further comprises determining corrected PSA velocity by repeating steps (i) - (iii) using a first sample and/or a second sample taken at a different time than the first set of first and/or second sample, and calculating a corrected PSA velocity based on the corrected PSA quantity determined for samples obtained at different times.
  • the at least one polymorphic marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsll067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith .
  • PSA quantity is a useful tool for assessing recurrence risk in individuals who have undergone treatment for prostate cancer. Following treatment, PSA levels should decrease and remain at a low and steady level over time. A detection of an increased PSA levels in individuals who have undergone treatment is thus an indication of disease recurrence.
  • the invention in a further aspect provides a method of assessing recurrence risk of prostate cancer in a human individual who has undergone treatment for prostate cancer, the method comprising (i) detecting an uncorrected PSA quantity in a first biological sample from the human individual; (ii) obtaining sequence data about at least one polymorphic marker in the first biological sample or in a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans; and (iii) determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker; wherein the corrected PSA quantity is indicative of recurrence risk of the individual.
  • a corrected PSA quantity above a certain threshold is indicative of recurrence in the individual .
  • a corrected PSA quantity of 0.5 or greater is indicative of recurrence in the individual.
  • a corrected PSA quantity of 1.0 or greater is indicative of recurrence in the individual.
  • a corrected PSA quantity of 2.0 or greater is indicative of recurrence in the individual.
  • a corrected PSA quantity of 3.0 or greater is indicative of recurrence in the individual.
  • a corrected PSA quantity of 4.0 or greater is indicative of recurrence in the individual.
  • the method further comprises determining corrected PSA velocity by repeating steps (i) - (iii) using a first sample and/or a second sample taken at a different time than the first set of first and/or second sample, and calculating a corrected PSA velocity based on the corrected PSA quantity determined for samples obtained at said different times.
  • the at least one polymorphic marker is suitably selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith .
  • the methods and information described herein may be implemented, in all or in part, as computer executable instructions on known computer readable media.
  • the methods described herein may be implemented in hardware.
  • the method may be implemented in software stored in, for example, one or more memories or other computer readable medium and implemented on one or more processors.
  • the processors may be associated with one or more controllers, calculation units and/or other units of a computer system, or implanted in firmware as desired.
  • the routines may be stored in any computer readable memory such as in RAM, ROM, flash memory, a magnetic disk, a laser disk, or other storage medium, as is also known .
  • this software may be delivered to a computing device via any known delivery method including, for example, over a communication channel such as a telephone line, the Internet, a wireless connection, etc., or via a transportable medium, such as a computer readable disk, flash drive, etc.
  • a communication channel such as a telephone line, the Internet, a wireless connection, etc.
  • a transportable medium such as a computer readable disk, flash drive, etc.
  • the various steps described above may be implemented as various blocks, operations, tools, modules and techniques which, in turn, may be implemented in hardware, firmware, software, or any combination of hardware, firmware, and/or software.
  • some or all of the blocks, operations, techniques, etc. may be implemented in, for example, a custom integrated circuit (IC), an application specific integrated circuit (ASIC), a field programmable logic array (FPGA), a programmable logic array (PLA), etc.
  • the software When implemented in software, the software may be stored in any known computer readable medium such as on a magnetic disk, an optical disk, or other storage medium, in a RAM or ROM or flash memory of a computer, processor, hard disk drive, optical disk drive, tape drive, etc. Likewise, the software may be delivered to a user or a computing system via any known delivery method including, for example, on a computer readable disk or other transportable computer storage mechanism.
  • Fig. 1 illustrates an example of a suitable computing system environment 100 on which a system for the steps of the claimed method and apparatus may be implemented.
  • the computing system environment 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the method or apparatus of the claims. Neither should the computing environment 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 100.
  • the steps of the claimed method and system are operational with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the methods or system of the claims include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the methods and apparatus may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including memory storage devices.
  • an exemplary system for implementing the steps of the claimed method and system includes a general purpose computing device in the form of a computer 110.
  • Components of computer 110 may include, but are not limited to, a processing unit 120, a system memory 130, and a system bus 121 that couples various system components including the system memory to the processing unit 120.
  • the system bus 121 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
  • bus architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
  • Computer 110 typically includes a variety of computer readable media .
  • Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media .
  • Computer readable media may comprise computer storage media and
  • Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer 110.
  • Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media .
  • modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
  • the system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132.
  • ROM read only memory
  • RAM random access memory
  • BIOS basic input/output system
  • RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120.
  • Fig. 1 illustrates operating system 134, application programs 135, other program modules 136, and program data 137.
  • the computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media .
  • Fig. 1 illustrates a hard disk drive 140 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolatile magnetic disk 152, and an optical disk drive 155 that reads from or writes to a removable, nonvolatile optical disk 156 such as a CD ROM or other optical media .
  • removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like.
  • the hard disk drive 141 is typically connected to the system bus 121 through a non-removable memory interface such as interface 140, and magnetic disk drive 151 and optical disk drive 155 are typically connected to the system bus 121 by a removable memory interface, such as interface 150.
  • hard disk drive 141 is illustrated as storing operating system 144, application programs 145, other program modules 146, and program data 147. Note that these components can either be the same as or different from operating system 134, application programs 135, other program modules 136, and program data 137. Operating system 144, application programs 145, other program modules 146, and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies.
  • a user may enter commands and information into the computer 20 through input devices such as a keyboard 162 and pointing device 161, commonly referred to as a mouse, trackball or touch pad .
  • Other input devices may include a microphone, joystick, game pad, satellite dish, scanner, or the like.
  • These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB) .
  • a monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190.
  • computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 190.
  • the computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180.
  • the remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in Fig . 1.
  • the logical connections depicted in Fig . 1 include a local area network (LAN) 171 and a wide area network (WAN) 173, but may also include other networks.
  • LAN local area network
  • WAN wide area network
  • Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
  • the computer 110 When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet.
  • the modem 172 which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism.
  • program modules depicted relative to the computer 110, or portions thereof may be stored in the remote memory storage device.
  • Fig. 1 illustrates remote application programs 185 as residing on memory device 181. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
  • the risk evaluation system and method, and other elements have been described as preferably being implemented in software, they may be implemented in hardware, firmware, etc., and may be implemented by any other processor.
  • the elements described herein may be implemented in a standard multi-purpose CPU or on specifically designed hardware or firmware such as an application-specific integrated circuit (ASIC) or other hard-wired device as desired, including, but not limited to, the computer 110 of Fig . 1.
  • ASIC application-specific integrated circuit
  • the software routine may be stored in any computer readable memory such as on a magnetic disk, a laser disk, or other storage medium, in a RAM or ROM of a computer or processor, in any database, etc.
  • this software may be delivered to a user or a diagnostic system via any known or desired delivery method including, for example, on a computer readable disk or other transportable computer storage mechanism or over a communication channel such as a telephone line, the internet, wireless communication, etc. (which are viewed as being the same as or interchangeable with providing such software via a transportable storage medium) .
  • the invention provides an apparatus for determining corrected PSA quantity in a human individual, comprising (a) a processor; and (b) a computer readable memory having computer executable instructions adapted to be executed on the processor, wherein said instructions comprise steps of (i) obtaining data representing uncorrected PSA quantity in a biological sample from the human individual; (ii) obtaining sequence data about at least one polymorphic marker in the genome of the human individual, wherein different alleles of the at least one polymorphic marker are predictive of different PSA quantity in humans; (iii) determining a corrected PSA quantity based on the sequence data about the at least one polymorphic marker.
  • the at least one allele of the at least one marker is predictive of an increased quantity of PSA in humans, and wherein at least one other allele of the at least one marker is predictive of a decreased quantity of PSA in humans.
  • a system of the invention includes one or more machines used for analysis of biological material (e.g ., genetic material), as described herein . In some variations, this analysis of the biological material involves a chemical analysis and/or a nucleic acid amplification.
  • biological material e.g ., genetic material
  • an exemplary system of the invention which may be used to implement one or more steps of methods of the invention, includes a computing device in the form of a computer 110.
  • a computing device in the form of a computer 110.
  • Components shown in dashed outline are not technically part of the computer 110, but are used to illustrate the exemplary embodiment of Fig. 4.
  • Components of computer 110 may include, but are not limited to, a processor 120, a system memory 130, a
  • memory/graphics interface 121 also known as a Northbridge chip
  • I/O interface 122 also known as a Southbridge chip
  • the system memory 130 and a graphics processor 190 may be coupled to the memory/graphics interface 121.
  • a monitor 191 or other graphic output device may be coupled to the graphics processor 190.
  • a series of system busses may couple various system components including a high speed system bus 123 between the processor 120, the memory/graphics interface 121 and the I/O interface 122, a front-side bus 124 between the memory/graphics interface 121 and the system memory 130, and an advanced graphics processing (AGP) bus 125 between the memory/graphics interface 121 and the graphics processor 190.
  • the system bus 123 may be any of several types of bus structures including, by way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus and Enhanced ISA (EISA) bus.
  • ISA Industry Standard Architecture
  • MCA Micro Channel Architecture
  • EISA Enhanced ISA
  • the computer 110 typically includes a variety of computer-readable media.
  • Computer-readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media .
  • Computer readable media may comprise computer storage media.
  • Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other physical medium which can be used to store the desired information and which can accessed by computer 110.
  • the system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132.
  • the system ROM 131 may contain permanent system data 143, such as identifying and manufacturing information .
  • a basic input/output system (BIOS) may also be stored in system ROM 131.
  • RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processor 120.
  • Fig . 4 illustrates operating system 134, application programs 135, other program modules 136, and program data 137.
  • the I/O interface 122 may couple the system bus 123 with a number of other busses 126, 127 and 128 that couple a variety of internal and external devices to the computer 110.
  • a serial peripheral interface (SPI) bus 126 may connect to a basic input/output system (BIOS) memory 133 containing the basic routines that help to transfer information between elements within computer 110, such as during start-up.
  • BIOS basic input/output system
  • a super input/output chip 160 may be used to connect to a number of 'legacy' peripherals, such as floppy disk 152, keyboard/mouse 162, and printer 196, as examples.
  • the super I/O chip 160 may be connected to the I/O interface 122 with a bus 127, such as a low pin count (LPC) bus, in some embodiments.
  • a bus 127 such as a low pin count (LPC) bus, in some embodiments.
  • LPC low pin count
  • Various embodiments of the super I/O chip 160 are widely available in the commercial marketplace.
  • bus 128 may be a Peripheral Component Interconnect (PCI) bus, or a variation thereof, may be used to connect higher speed peripherals to the I/O interface 122.
  • PCI Peripheral Component Interconnect
  • a PCI bus may also be known as a Mezzanine bus.
  • Variations of the PCI bus include the Peripheral Component Interconnect-Express (PCI-E) and the Peripheral Component Interconnect - Extended (PCI-X) busses, the former having a serial interface and the latter being a backward compatible parallel interface.
  • bus 128 may be an advanced technology attachment (ATA) bus, in the form of a serial ATA bus (SATA) or parallel ATA (PATA) .
  • ATA advanced technology attachment
  • the computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media .
  • Fig. 4 illustrates a hard disk drive 140 that reads from or writes to non-removable, nonvolatile magnetic media.
  • the hard disk drive 140 may be a conventional hard disk drive.
  • Removable media such as a universal serial bus (USB) memory 153, firewire (IEEE 1394), or CD/DVD drive 156 may be connected to the PCI bus 128 directly or through an interface 150.
  • a storage media 154 may coupled through interface 150.
  • Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like.
  • the drives and their associated computer storage media discussed above and illustrated in Fig . 4, provide storage of computer readable instructions, data structures, program modules and other data for the computer 110.
  • hard disk drive 140 is illustrated as storing operating system 144, application programs 145, other program modules 146, and program data 147. Note that these components can either be the same as or different from operating system 134, application programs 135, other program modules 136, and program data 137. Operating system 144, application programs 145, other program modules 146, and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies.
  • a user may enter commands and information into the computer 20 through input devices such as a mouse/keyboard 162 or other input device combination .
  • Other input devices may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processor 120 through one of the I/O interface busses, such as the SPI 126, the LPC 127, or the PCI 128, but other busses may be used. In some embodiments, other devices may be coupled to parallel ports, infrared interfaces, game ports, and the like (not depicted), via the super I/O chip 160.
  • the computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180 via a network interface controller (NIC) 170.
  • the remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110.
  • the logical connection between the NIC 170 and the remote computer 180 depicted in Fig . 4 may include a local area network (LAN), a wide area network (WAN), or both, but may also include other networks.
  • LAN local area network
  • WAN wide area network
  • Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets, and the Internet.
  • the remote computer 180 may also represent a web server supporting interactive sessions with the computer 110, or in the specific case of location-based applications may be a location server or an application server.
  • the network interface may use a modem (not depicted) when a broadband connection is not available or is not used. It will be appreciated that the network connection shown is exemplary and other means of establishing a communications link between the computers may be used.
  • the invention is a system for determining corrected PSA levels in a human subject.
  • the system includes tools for performing at least one step, preferably two or more steps, and in some aspects all steps of a method of the invention, where the tools are operably linked to each other.
  • Operable linkage describes a linkage through which components can function with each other to perform their purpose.
  • a system of the invention is a system for determining corrected PSA levels in a human subject, and comprises:
  • a susceptibility database operatively coupled to a computer-readable medium of the system and containing population information correlating the presence or absence of one or more alleles of at least one polymorphic marker with PSA levels in a population of humans;
  • a measurement tool that receives an input about the human subject and generates information from the input about (i) uncorrected PSA levels in the human subject and (ii) the presence or absence of at least allele of at least one polymorphic marker in the human subject that is correlated with PSA levels in humans;
  • (iii) is adapted to be executed on a processor of the system, to compare the information about the human subject with the population information in the susceptibility database and generate a conclusion with respect to corrected PSA levels for the human subject.
  • the at least one polymorphic marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsll067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith .
  • Exemplary processors include all variety of microprocessors and other processing units used in computing devices. Exemplary computer-readable media are described above.
  • the system When two or more components of the system involve a processor or a computer- readable medium, the system generally can be created where a single processor and/or computer readable medium is dedicated to a single component of the system; or where two or more functions share a single processor and/or share a single computer readable medium, such that the system contains as few as one processor and/or one computer readable medium. In some variations, it is advantageous to use multiple processors or media, for example, where it is convenient to have components of the system at different locations.
  • some components of a system may be located at a testing laboratory dedicated to laboratory or data analysis, whereas other components, including components (optional) for supplying input information or obtaining an output communication, may be located at a medical treatment or counseling facility (e.g., doctor's office, health clinic, HMO, pharmacist, geneticist, hospital) and/or at the home or business of the human subject (patient) for whom the testing service is performed.
  • a medical treatment or counseling facility e.g., doctor's office, health clinic, HMO, pharmacist, geneticist, hospital
  • an exemplary system includes a susceptibility database 208 that is operatively coupled to a computer-readable medium of the system and that contains population information correlating the presence or absence of one or more alleles associated with PSA levels in a population of humans, for example allels of the polymorphic markers rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542.
  • the susceptibility database contains 208 data relating to the correlation between a particular marker allele and PSA levels in humans.
  • the correlation may suitably be contained in a form of percentage or fractional increase for a particular marker allele.
  • the alternate allele by necessity, will then be correlated with decreased PSA levels by the same percentage or fraction .
  • Such data provides an indication as to the genetic contribution of observed PSA levels for the subject having the allele in question .
  • the susceptibility database includes similar data with respect to two or more polymorphic markers, thus providing information about the contribution of two or more markers to PSA levels.
  • the susceptibility database includes additional quantitative personal, medical, or genetic information about the individuals in the database diagnosed with prostate cancer or those who are free of prostate cancer.
  • information includes, but is not limited to, information about parameters such as age, sex, ethnicity, race, medical history, weight, diabetes status, blood pressure, family history of prostate cancer, smoking history, and alcohol use in humans and impact of the at least one parameter on susceptibility to prostate cancer and/or PSA levels.
  • the information also can include information about other genetic risk factors for prostate cancer.
  • the system further includes a measurement tool 206 programmed to receive an input 204 from or about the human subject and generate an output that contains information about the presence or absence of the at least one allele of at least one polymorphic marker.
  • the input 204 is not part of the system per se but is illustrated in the schematic Figure 5.
  • the input 204 will contain a specimen or contain data from which the presence or absence of the at least one allele can be directly read, or analytically determined.
  • the input contains annotated information about genotypes or allele counts for at least one polymorphic marker in the genome of the human subject, in which case no further processing by the measurement tool 206 is required, except possibly
  • the input 204 from the human subject contains data that is unannotated or insufficiently annotated with respect to particular polymorphic markers, requiring analysis by the measurement tool 206.
  • the input can be genetic sequence of a chromosomal region or chromosome on which the particular polymorphic markers of interest reside, or whole genome sequence information, or unannotated information from a gene chip analysis of a variable loci in the human subject's genome.
  • measurement tool 206 comprises a tool, preferably stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to receive a data input about a subject and determine information about the presence or absence of the at least one allele of at least one polymorphic marker in a human subject from the data.
  • the measurement tool 206 contains instructions, preferably executable on a processor of the system, for analyzing the unannotated input data and determining the presence or absence of at least one allele of interest in the human subject.
  • the measurement tool optionally comprises a sequence analysis tool stored on a computer readable medium of the system and executable by a processor of the system with instructions for determining the presence or absence of the at least one allele from the genomic sequence information.
  • the input 204 from the human subject comprises a biological sample, such as a fluid (e.g., blood) or tissue sample, that contains genetic material that can be analyzed to determine the presence or absence of the allele of interest.
  • a biological sample such as a fluid (e.g., blood) or tissue sample, that contains genetic material that can be analyzed to determine the presence or absence of the allele of interest.
  • an exemplary measurement tool 206 includes laboratory equipment for processing and analyzing the sample to determine the presence or absence (or identity) of the allele(s) in the human subject.
  • the measurement tool includes: an oligonucleotide microarray (e.g., "gene chip") containing a plurality of oligonucleotide probes attached to a solid support; a detector for measuring interaction between nucleic acid obtained from or amplified from the biological sample and one or more oligonucleotides on the oligonucleotide microarray to generate detection data; and an analysis tool stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to determine the presence or absence of the at least one allele of interest based on the detection data.
  • an oligonucleotide microarray e.g., "gene chip”
  • a detector for measuring interaction between nucleic acid obtained from or amplified from the biological sample and one or more oligonucleotides on the oligonucleotide microarray to generate detection data
  • an analysis tool stored on a computer-readable medium of the system and adapted to be executed on
  • the input 204_from the human subject comprises a biological sample that is suitable for determining PSA levels, such as a fluid (e.g. blood) or tissue sample that can be analyzed to determine uncorrected PSA levels.
  • the exemplary measurement tool 206 includes laboratory equipment and reagents for processing and analyzing the sample to determine uncorrrected PSA levels in the human subject.
  • the reagents may comprise an antibody assay for determining PSA levels.
  • the measurement tool 206 includes: a nucleotide sequencer (e.g., an automated DNA sequencer) that is capable of determining nucleotide sequence information from nucleic acid obtained from or amplified from the biological sample; and an analysis tool stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to determine the presence or absence of the at least one allele associated with PSA levels, based on the nucleotide sequence information.
  • a nucleotide sequencer e.g., an automated DNA sequencer
  • an analysis tool stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to determine the presence or absence of the at least one allele associated with PSA levels, based on the nucleotide sequence information.
  • the measurement tool 206 further includes additional equipment and/or chemical reagents for processing the biological sample to purify and/or amplify nucleic acid of the human subject for further analysis using a sequencer, gene chip, or other analytical equipment. In further variations, he measurement tool 206 further includes additional equipment and/or chemical reagents for processing the biological sample to purify protein of the human subject for determining PSA levels using appropriate analytical equipment.
  • the exemplary system further includes an analysis tool or routine 210 that: is operatively coupled to the susceptibility database 208 and operatively coupled to the measurement tool 206, is stored on a computer-readable medium of the system, is adapted to be executed on a processor of the system to compare the information about the human subject with the population information in the susceptibility database 208 and generate a conclusion with respect to corrected PSA levels for the human subject.
  • the analysis tool 210 looks at the alleles identified by the measurement tool 206 for the human subject, and compares this information to the susceptibility database 208, to determine corrected PSA levels for the subject.
  • the susceptibility can be based on the single parameter (the identity of one or more marker alleles), or can involve a calculation based on multiple genetic markers and/or other genetic and non-genetic data, as described above, that is collected and included as part of the input 204 from the human subject, and that also is stored in the susceptibility database 208 with respect to a population of other humans.
  • each parameter of interest is weighted to provide a conclusion with respect to susceptibility to PSA levels.
  • system as just described further includes a
  • the communication tool is operatively connected to the analysis routine 210 and comprises a routine stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to: generate a communication containing the conclusion; and to transmit the communication to the human subject 200 or the medical practitioner 202, and/or enable the subject or medical practitioner to access the communication .
  • the subject and medical practitioner are depicted in the schematic Fig. 2, but are not part of the system per se, though they may be considered users of the system.
  • the communication tool 212 provides an interface for communicating to the subject, or to a medical practitioner for the subject (e.g., doctor, nurse, genetic counselor), the conclusion generated by the analysis tool 210 with respect to corrected PSA levels for the subject.
  • the medical practitioner will share the communication with the human subject 200 and/or counsel the human subject about the medical significance of the communication.
  • the communication is provided in a tangible form, such as a printed report or report stored on a computer readable medium such as a flash drive or optical disk.
  • the communication is provided electronically with an output that is visible on a video display or audio output (e.g., speaker) .
  • the communication is transmitted to the subject or the medical practitioner, e.g., electronically or through the mail.
  • the system is designed to permit the subject or medical practitioner to access the communication, e.g ., by telephone or computer.
  • the system may include software residing on a memory and executed by a processor of a computer used by the human subject or the medical practitioner, with which the subject or practitioner can access the communication, preferably securely, over the internet or other network connection .
  • this computer will be located remotely from other components of the system, e.g., at a location of the human subject's or medical practitioner's choosing.
  • the system as described further includes components that add a treatment or prophylaxis utility to the system.
  • value is added to a determination of corrected PSA levels and/or susceptibility to prostate cancer when a medical practitioner can prescribe or administer a standard of care that can reduce susceptibility to the cancer; and/or delay onset of the cancer; and/or increase the likelihood of detecting the cancer at an early stage, to facilitate early treatment when the cancer has not spread and is most curable.
  • Exemplary lifestyle change protocols include loss of weight, increase in exercise, cessation of unhealthy behaviors such as smoking, and change of diet.
  • Exemplary medicinal and surgical intervention protocols include administration of pharmaceutical agents for prophylaxis; and surgery, including in extreme cases surgery to remove a tissue or organ before it has become cancerous.
  • Exemplary diagnostic protocols include non-invasive and invasive imaging; monitoring metabolic biomarkers; and biopsy screening .
  • the system further includes a medical protocol database 214 operatively connected to a computer-readable medium of the system and containing information correlating the presence or absence of the at least one marker allele of interest and medical protocols for human subjects at risk for prostate cancer.
  • medical protocols include any variety of medicines, lifestyle changes, diagnostic tests, increased frequencies of diagnostic tests, and the like that are designed to achieve one of the aforementioned goals.
  • the information correlating marker alleles with protocols could include, for example, information about PSA levels and the success with which the cancer is avoided or delayed, or success with which the cancer is detected early and treated, if a subject has particular corrected PSA levels and follows a protocol.
  • the system of this embodiment further includes a medical protocol tool or routine 216, operatively connected to the medical protocol database 214 and to the analysis tool or routine 210.
  • the medical protocol tool or routine 216 preferably is stored on a computer-readable medium of the system, and adapted to be executed on a processor of the system, to: (i) compare (or correlate) the conclusion that is obtained from the analysis routine 210 (with respect to corrected PSA levels for the subject) and the medical protocol database 214, and (ii) generate a protocol report with respect to the probability that one or more medical protocols in the medical protocol database will achieve one or more of the goals of reducing susceptibility to prostate cancer; delaying onset of prostate cancer; and increasing the likelihood of detecting the cancer at an early stage to facilitate early treatment.
  • the probability can be based on empirical evidence collected from a population of humans and expressed either in absolute terms (e.g ., compared to making no intervention), or expressed in relative terms, to highlight the comparative or additive benefits of two or more protocols.
  • the communication tool 212 Some variations of the system just described include the communication tool 212.
  • the communication tool generates a communication that includes the protocol report in addition to, or instead of, the conclusion with respect to susceptibility.
  • Information about marker allele status not only can provide useful information about identifying or quantifying PSA levels and/or determine susceptibility to prostate cancer; it can also provide useful information about possible causative factors for a human subject identified with a cancer, and useful information about therapies for the cancer patient. In some variations, systems of the invention are useful for these purposes.
  • the invention is a system for assessing or selecting a treatment protocol for a subject diagnosed with a cancer.
  • An exemplary system schematically depicted in Figure 6, comprises:
  • a medical treatment database 308 operatively connected to a computer-readable medium of the system and containing information correlating values of corrected PSA levels and efficacy of treatment regimens for prostate cancer;
  • a measurement tool 306 to receive an input (304, depicted in Fig . 3 but not part of the system per se) about a human subject and generate information from the input 304 about genetically corrected PSA levels in humans;
  • a medical protocol routine or tool 310 operatively coupled to the medical treatment database 308 and the measurement tool 306, stored on a computer-readable medium of the system, and adapted to be executed on a processor of the system, to compare the information with respect to corrected PSA levels for the human subject, and generate a conclusion with respect to at least one of: (i) the probability that one or more medical treatments will be efficacious for treatment of the prostate cancer for the patient; and
  • such a system further includes a communication tool 312 operatively connected to the medical protocol tool or routine 310 for communicating the conclusion to the subject 300, or to a medical practitioner for the subject 302 (both depicted in the schematic of Fig . 3, but not part of the system per se) .
  • An exemplary communication tool comprises a routine stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to generate a communication containing the conclusion; and transmit the
  • the markers useful in the computer-implemented functions described herein are selected from the group consisting of rs7193343, rs7618072, rsl0077199, rsl0490066, rsl0516002, rsl0519674, rsl394796, rs2935888, rs4560443, rs6010770 and rs7733337, and markers in linkage disequilibrium therewith .
  • GWAS genome-wide association study
  • the allele frequency was comparable in the Icelandic and UK populations with frequencies ranging from 24% to 93% (Table 4) and their observed effect on the PSA level ranges from 7% to 39% per allele in the Icelandic samples and from 5% to 102% per allele in the UK samples (see Table 4 and Table 5 for genotype effect of the variants.) .
  • the strongest overall association effect observed in the present study is for two SNPs, rs2735839 and rsl7632542, located near or in the PSA coding gene KLK3 (Table 4), of which rs2735839-G (and highly correlated markers) has previously been reported to associate with PSA levels (18- 20, 26) .
  • the SNP rsl7632542 is a missense mutation (an amino acid change denoted as I179T) in KLK3. This amino acid alteration is defined as either neutral or deleterious by different online protein structure algorithms (see Table 6) . A deleterious mutation could conceivably destabilize the protein, affecting circulating PSA levels. Alternatively, the mutation might affect the antigenicity of the protein and thereby influence its detectability in PSA tests.
  • MSMB lOq ll
  • HNF1B 17ql2
  • rsl0788160-A and rsl2413088-T were genome- wide significant and had similar effects on PSA levels.
  • the two variants are located within an LD- region not known to contain any genes, 324 and 305 Kb centromeric to the start of the FGFR2 gene, respectively.
  • the most significant variant on 12q24, the second novel PSA locus, is rsl l067228-A.
  • This SNP is located in an LD-block that contains the gene TBX3 in which mutations have been found to cause the ulnar-mammary syndrome (OMIM #181450) but not previously shown to affect PSA levels.
  • Variants at two other loci l lq l3 also have greater effects on PSA levels but the effects did not reach genome-wide significance levels.
  • These six loci can roughly be divided into two groups: those with a moderate effect on the PSA levels compared to their effect on prostate cancer risk (8q24, l lql3, lOql l and 17ql2) and those comprised of variants that have a relatively strong PSA effect compared to their effect on prostate cancer risk (i.e. variants at: KLK3 on 19q l3.33, and TERT on 5pl5) .
  • Benign prostatic hyperplasia can affect PSA levels.
  • BPH Benign prostatic hyperplasia
  • drugs in the G04C group of the ATC classification e.g. Tamsulosin, Finasteride and Dutasteride
  • BPH is unlikely to account for a significant fraction of the observed association with PSA levels for the variants discussed here.
  • loci that associate with PSA levels with genome-wide significance. Variants at three of these loci had previously been shown to associate with PSA levels whereas three of the loci, at 10q26, 5pl5 and 12q24, are novel. Unlike the variants previously reported to associate with PSA levels, two of the novel loci, i .e. 12q24 and 10q26, do not associate with prostate cancer risk and the third locus, at 5pl5, has only a moderate effect on prostate cancer. Furthermore, we have shown that two of these variants (rsl0788160-A on 10q26 and rsll067228-A on 12q24), together with the KLK3 variant, are associated with a greater probability of having a normal prostate biopsied.
  • these new markers primarily predict the outcome of the PSA-based prostate cancer screening process, i.e. the decision of performing a biopsy or not, and the outcome of the biopsy, rather than predisposition to prostate cancer.
  • a missense mutation, rsl7632542-T in the KLK3 gene on 19q33.33 is associated with higher PSA levels.
  • This variant has a stronger effect on PSA than the variant rs2735839, previously reported at this locus.
  • the KLK3 variant was also found to predispose to prostate cancer but the association effect was confined to the group of cases primarily diagnosed after the introduction of the PSA test.
  • biopsy negative 960 62 (5) 1 4.10 (3.50, 5.07) 1999-2007
  • Part a) of the table shown are genome-wide association results for SNPs with P ⁇ 1 E-05, the number of individuals (n) with PSA measurement and either genotyped using the lllumina 317K chip (on average 4,599 men) or by the in-silico genotyping method (on average 2,918 men), the allele associated with increased PSA levels, the association effect per allele and the two-sided P- value.
  • Part b) of the table shown are association results for the three SNPs that showed a stronger effect than the chip-genotyped SNPs.
  • the imputation analysis was based on 2.5M HapMap SNPs, testing all SNPs within a window of 500 Kb for all six loci shown in section a) of this table.
  • results for SNPs present on the lllumina chips are based on genotypes from chip (-50%), in- silico genotyping using family imputation (-30%), and single track assay genotyping (-20%)
  • nsSNPs are predicted by a support vector machine (SVM) trained on OMIM amino-acid variants and putatively neutral nsSNPs from dbSNP.
  • SVM support vector machine
  • the SNPeffect database uses sequence- and structure-based bioinformatics tools to predict the effect of non- synonymous SNPs on the molecular phenotype of proteins. Reumers J, et al., Bioinformatics 22:2183-2185, 2006. 9 SNPs3D assigns molecular functional effects of non-synonymous SNPs based on structure and sequence analysis. Peng Y and John M, J Mol Biol. 356(5) :1263-7 4 , 2006. h ESEfinder uses position weighted matrices to predict putative human exonic splicing enhancers (ESEs). Cartegni L, et al., Nucleic Acids Res 31 (13): 3568-3571 , 2003.
  • 'ESRSearch uses the evolutionary conservation of wobble positions between human and mouse orthologous exons and the analysis of the overabundance of sequence motifs, compared with their random expectation, given by their codon relative frequency, to predict ESEs.
  • 'PESX compares the frequency of all 65536 8-mers in internal non-coding exons against their adjacent pseudo exons and in internal non-coding exons against 5'UTR of intronless genes to predict ESEs.
  • the average number of persons with in-silico derived genotypes is 332, the remaining individuals were directly genotyped using the lllumina chip or single track SNP assays.
  • the OR and P-values were estimated using the Mantel-Haenszel model.
  • the measured PSA levels is estimated to be decreased by 30% to 56% compared to the population average.
  • the estimated relative effect on PSA levels are even greater; the range of increase is 40% to 92% for the top 5% of the distribution with the greatest genotypic effect compared to the population average, whereas for the bottom 5% of the distribution, the range of decrease is 53% to 80% compared to the population average.
  • a personalized PSA cutoff value corresponding to the commonly used cutoff of 4 ng/ml. This was done by multiplying the value of 4 ng/ml with the estimated relative genetic effect for the PSA SNPs. For individuals with the highest (top 5% of the distribution) genotypic effect, the personalized PSA cutoff value increased from 4 ng/ml to cutoff values between 4.9 and 5.9 ng/ml based on the estimates from Iceland, and to cutoff values between 5.6 and 7.7 ng/ml based on the UK estimates.
  • the personalized PSA cutoff values move from 4 ng/ml to cutoff values between 1.7 and 2.8 ng/ml according to the Icelandic estimates, and to cutoff values between 0.8 and 1.9 ng/ml according to the UK estimates (see Fig. 2) .
  • Icelandic men diagnosed with prostate cancer were identified based on a nationwide list from the ICR that contained all 4,732 Icelandic prostate cancer patients diagnosed from January 1, 1955, to December 31, 2008.
  • the Icelandic prostate cancer sample collection included 2,289 patients (diagnosed from December 1974 to December 2008) who were recruited from November 2000 until June 2009.
  • a total of 2,249 patients were included in the study which all had genotypes from a genome wide SNP genotyping effort, using the Infinium II assay method and the Sentrix HumanHap300 BeadChip (Illumina, San Diego, CA, USA) or a Centaurus single SNP genotyping assay (see Supplementary Materials) .
  • the mean age at diagnosis for the consenting patients is 70.7 years (ranging from 40 to 96 years), while the mean age at diagnosis is 73 years for all prostate cancer patients in the ICR.
  • the median time from diagnosis to blood sampling is 2 years (range 0 to 26 years) .
  • aggressive prostate cancer is defined as: Gleason >7 and/or T3 or higher and/or node positive and/or metastatic disease, while the less aggressive disease is defined as Gleason ⁇ 7 and T2 or lower.
  • BPH benign hyperplasia of the prostate
  • the 35,470 controls (15,359 men (43.3%) and 20,111 femen (56.7%)) used in this study consisted of individuals recruited through different genetic research projects at deCODE.
  • the individuals have been diagnosed with common diseases of the ca rdio-vascu la r system (e.g . stroke or myocardial infraction), psychiatric and neurological diseases (e.g. schizophrenia, bipolar disorder), endocrine and autoimmune system (e.g. type 2 diabetes, asthma), malignant diseases other than prostate cancer as well as individuals randomly selected from the Icelandic genealogical database.
  • No single disease project represented more than 6% of the total number of controls.
  • the controls had a mean age of 84 years and the range was from 8 to 105 years.
  • the controls were absent from the nation-wide list of prostate cancer patients according to the ICR.
  • the DNA for both the Icelandic cases and controls was isolated from whole blood using standard methods.
  • the total number of Dutch prostate cancer cases used in this study was 1, 100.
  • the Dutch study population consisted of two recruitment-sets of prostate cancer cases; Group-A was comprised of 360 hospital-based cases recruited from January 1999 to June 2006 at the Urology Outpatient Clinic of the Radboud University Nijmegen Medical Centre (RUNMC); Group-B consisted of 707 cases recruited from June 2006 to December 2006 through a population-based cancer registry held by the Comprehensive Cancer Centre IKO. Both groups were of self-reported European descent.
  • the average age at diagnosis for patients in Group-A was 63 years (median 63 years; range 43 to 83 years) .
  • the average age at diagnosis for patients in Group-B was 65 years (median 66 years; range 43 to 75 years) .
  • the 2,021 control individuals (1,004 men and 1,017 femen) were cancer free and were matched for age with the cases. They were recruited within a project entitled "The Nijmegen Biomedical Study", in the Netherlands. This is a population-based survey conducted by the Department of Epidemiology and Biostatistics and the Department of Clinical Chemistry of RUNMC, in which 9,371 individuals participated from a total of 22,500 age and sex stratified, randomly selected inhabitants of Nijmegen . Control individuals from the Nijmegen Biomedical Study were invited to participate in a study on gene-environment interactions in multifactorial diseases, such as cancer. All the 2,021 participants in the present study are of self-reported European descent and were fully informed about the goals and the procedures of the study. The study protocol was approved by the Institutional Review Board of Radboud University and all study subjects gave written informed consent.
  • the Spanish study population used in this study consisted of 618 prostate cancer cases. The cases were recruited from the Oncology Department of Zaragoza Hospital in Zaragoza, Spain, from June 2005 to September 2007. All patients were of self- reported European descent. Clinical information including age at onset, grade and stage was obtained from medical records. The average age at diagnosis for the patients was 69 years (median 70 years) and the range was from 44 to 83 years. The 1,605 Spanish control individuals (737 men and 868 femen) were approached at the University Hospital in Zaragoza, and the men were prostate cancer free at the time of recruitment. Study protocols were approved by the Institutional Review Board of Zaragoza University Hospital . All subjects gave written informed consent.
  • the Chicago study population used consisted of 1,560 prostate cancer cases. The cases were recruited from the Pathology Core of Northwestern University's Prostate Cancer Specialized Program of Research Excellence (SPORE) from May 2002 to May 2009. The average age at diagnosis for the patients was 60 years (median 59 years) and the range was from 39 to 87 years.
  • the 1,172 European American controls (781 men and 391 femen) were recruited as healthy control subjects for genetic studies at the University of Chicago and
  • the Romanian study population used in this study consisted of 362 prostate cancer cases.
  • the cases were recruited from the Urology Clinic "Theodor Burghele” of The University of Medicine and Pharmacy “Carol Davila” Bucharest, Romania, from May 2008 to November 2009. All patients were of self- re ported European descent.
  • Clinical information including age at onset, grade and stage were obtained from medical records at the hospital. The average age at diagnosis for the cases was 70 years (median 71 years) and the range was from 46 to 89 years.
  • the 182 Romanian controls were recruited at the General Surgery Clinic "St.
  • Centaurus SNP assay The quality of each Centaurus SNP assay was evaluated by genotyping each assay in the CEU and/or YRI HapMap samples and comparing the results with the HapMap publicly released data. Assays with > 1.5% mismatch rate were not used and a linkage disequilibrium (LD) test was used for markers known to be in LD.
  • LD linkage disequilibrium
  • PSA levels Two populations were used to study PSA levels; Iceland and UK.
  • PSA levels among unaffected men in Iceland we excluded subjects who had been diagnosed with prostate cancer as recorded by the ICR (between 1955 and 2008) or were known to have undergone TURP between 1983 and 2008.
  • PSA levels were corrected for age at measurement for each center separately, using a generalized additive model with a smooth component on the age. Also, the PSA levels were standardized so that they had a normal distribution, using a quantile
  • case control association analysis for example when comparing prostate cancer cases, benign prostatic hyperplasia cases or biopsied individuals to population controls and within group comparisons (aggressive vs. non-aggressive, biopsy pos. vs. biopsy neg.), we used a standard likelihood ratio statistic, implemented in the NEMO software to calculate two-sided P values for each individual allele, assuming a multiplicative model for risk (Greta rsdottir, S. et a/. Nat Genet 35 : 131-8 (2003)) . Combined significance levels were calculated using a Mantel-Haenszel model. Heterogeneity was examined using a likelihood ratio test by comparing the null hypothesis of the effect being the same in all populations to the alternative hypothesis of each population having a different effect.
  • AUC area under the receiver-operating-characteristic curve
  • the variables included in the models are (1) PSA value, (2) prostate cancer multi-marker genetic risk prediction and (3) PSA with genetic correction .
  • To calculate the prostate cancer multi-marker genetic risk prediction for each individual we use published estimates of the allelic frequencies and effects of 23 markers associated with prostate cancer (list of SNPs: rsl0086908, rsl0486567, rsl0896450, rsl0934853, rsl0993994, rsl2621278, rsl447295, rsl512268, rsl6901979, rsl6902104, rsl859962, rs2660753, rs2710646, rs4430796, rs445114, rs5759167, rs5945572, rs6465657, rs6983267, rs7127900, rs7679673, rs8102476, rs9364554) .
  • ROC curves and calculate the area under the curve (AUC) to assess the discriminative ability of each model.
  • AUC area under the curve
  • model-3 The model with genetic correction of PSA levels (model-3) has an AUC of 70.9% and 58.5% in Iceland and UK, respectively (Fig. 3) .
  • model-1 which has an AUC of 70.4% and 57.1% in Iceland and UK, respectively
  • the inclusion of PSA levels corrected for sequence variants increases the discriminatory power by 0.5 and 1.4 percentage points in Iceland and UK, respectively.
  • model-4 has the greatest discriminatory power; with an AUC of 73.2% and 63.6% in Iceland and UK, respectively.
  • model-4 Compared to model-1 the increased AUC of model-4 is 2.8 and 6.5 percentage points in Iceland and UK, respectively. Hence, the most gain in discriminatory power is achieved by including both the 23 prostate cancer risk variants and the genetic correction of PSA levels.
  • this type of modeling would have to be done in a population where biopsies are done systematically, irrespective of individual PSA levels, similar to what was done in the PCPT study(3) . Nevertheless, the results indicate that genetic correction of PSA levels lead to improved specificity of the models.

Abstract

Certain sequence variants have been found to be useful for correcting Prostate Specific Antigen levels in humans. The invention provides diagnostic applications based on such correction, including methods of diagnosis of prostate cancer.

Description

SEQUENCE VARIANTS ASSOCIATED WITH PROSTATE
SPECIFIC ANTIGEN LEVELS
INTRODUCTION
Prostate cancer is among the leading causes of cancer death in men . In the US, prostate cancer has become the most frequent cause of cancer in men with more than 192,000 predicted new cases (25% of all new male cancer diagnoses) and 27,360 deaths (9% of all cancer deaths in men) in 2009. Early diagnosis and treatment are key factors in determining the survival and prognosis of prostate cancer patients, prompting intensive searches for biomarkers for screening.
Prostate-specific antigen (PSA) is a protein produced by the cells of prostate gland . PSA is present in small quantities in serum of men with a healthy prostate, but is often elevated in individuals with prostate cancer and other prostate disorders. A blood test to measure PSA is considered the most effective test currently available for the early detection of prostate cancer, although but its clinical effectiveness has been questioned. Rising levels of PSA over time are associated with both localized and metastatic prostate cancer. In general, PSA values ranging from 2.5 ng/mL to 4 ng/mL are considered as cut-off values for suspected cancer, and levels above 10 ng/mL indicate higher risk. However, despite the widespread use of the PSA screening test, it is limited both in specificity and sensitivity and substantial controversy exists about its beneficial effect for patients. This is mainly due to the fact that PSA is not a specific marker of prostate cancer since its serum levels increase in prostatic hyperplasia and are affected by many other factors such as medication, urologic manipulations and inflammation . Notably, a recent study showed that 47% of men with PSA levels between 10 and 50 ng/ml were not diagnosed with prostate cancer(3) . Furthermore, not all individuals with prostate cancer have raised levels of PSA.
PSA levels in the population are known to be variable. One approach to increase the specificity and sensitivity of the PSA test is to work out a model that defines what is a "normal" PSA value for a given man . Genetic factors have been shown to account for as much as 40 to 45% of the variability in PSA levels among men in the general population .
Knowledge about genetic variants that affect PSA levels is important for establishing PSA levels that are considered normal, taking into account the genetic background of any given individual . The present invention provides methods for correcting PSA levels based on genetic factors.
SUMMARY OF THE INVENTION
The present invention relates to methods for determining corrected PSA quantity in humans. The invention also provides methods for determining prostate cancer risk, and prognostic methods for prostate cancer.
In a first aspect, the invention provides a method of determining corrected PSA quantity in a human individual, the method comprising obtaining data identifying an uncorrected PSA quantity in a first biological sample from the human individual, analyzing sequence data about at least one polymorphic marker from the first biological sample or a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans; and determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker. In one embodiment, the at least one marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith
In a second aspect, the invention provides a method of diagnosis of prostate cancer in a human individual, the method comprising (a) Detecting an uncorrected PSA quantity in a first biological sample from the human individual; (b) Obtaining sequence data about at least one polymorphic marker in the first biological sample or in a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans; (c) Determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker; (d) Determining whether the corrected PSA quantity is greater than normal PSA quantity in humans; and (e) Performing a further diagnostic evaluation procedure selected from the group consisting of rectal ultrasound imaging and prostate biopsy on the individual if the corrected PSA quantity is determined to be greater than the reference range; wherein determination of a positive outcome of the ultrasound imaging or prostate biopsy is indicative of prostate cancer in the individual.
Also provided is a method of determining a susceptibility to prostate cancer, the method comprising analyzing nucleic acid sequence data from a human individual for at least one polymorphic marker selected from the group consisting of rsl7632542, and markers in linkage disequilibrium therewith, wherein different alleles of the at least one polymorphic marker are associated with different susceptibilities to prostate cancer in humans, and determining a susceptibility to prostate cancer from the nucleic acid sequence data.
Further provide is method for identifying a human individual who is a candidate for further diagnostic evaluation for prostate cancer, the method comprising the steps of (a) obtaining data representing uncorrected values of PSA quantity in the individual; (b) determining, in the genome of the human individual, the allelic identity of at least one allele of at least one polymorphic marker, wherein different alleles of the at least one marker are associated with different levels of PSA quantity in humans, and wherein the at least one marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith; (c) determining a corrected PSA quantity in the individual based on the allelic identity of the at least one polymorphic marker; and (d) identifying the subject as a subject who is a candidate for further diagnostic evaluation for prostate cancer if said corrected PSA quantity is greater than values of normal PSA quantity in humans.
The invention also relates to computer-implemented aspects. One such aspect provides an apparatus for determining PSA quantity in a human individual, comprising a processor, a computer-readable memory having instructions for execution on a processor, wherein the instructions relate to the determination of corrected PSA quantity for a human individual.
Further provided is a computer-readable medium that comprises data representing uncorrected PSA values, data comprising sequence data about at least one polymorphic marker predictive of PSA quantity in humans, and a routine stored on the medium for execution on a processor to determine corrected PSA values.
Also provided is a system for determining corrected PSA levels in a human subject, the system comprising (i) at least one processor; (ii) at least one computer-readable medium; (iii) a susceptibility database operatively coupled to a computer-readable medium of the system and containing population information correlating the presence or absence of one or more alleles of at least one polymorphic marker with PSA levels in a population of humans; (iv) a measurement tool that receives an input about the human subject and generates information from the input about (a) uncorrected PSA levels in the human subject, and (b) the presence or absence of at least allele of at least one polymorphic marker in the human subject that is correlated with PSA levels in humans; and (v) an analysis tool that (a)is operatively coupled to the susceptibility database and the the measurement tool; (b)is stored on a computer-readable medium of the system; and (c) is adapted to be executed on a processor of the system, to compare the information about the human subject with the population information in the susceptibility database and generate a conclusion with respect to corrected PSA levels for the human subject. The invention also provides a system for assessing or selecting a treatment protocol for a subject diagnosed with, or at risk for, prostate cancer, comprising (i) at least one processor; (ii) at least one computer-readable medium; (iii) a medical treatment database operatively connected to a computer-readable medium of the system and containing information correlating values of corrected PSA levels and efficacy of treatment regimens for prostate cancer; (iv) a measurement tool to receive an input about the human subject and generate information from the input about genetically corrected PSA levels in humans; and (v) a medical protocol tool operatively coupled to the medical treatment database and the measurement tool, stored on a computer-readable medium of the system, and adapted to be executed on a processor of the system, to compare the information with respect to the corrected PSA levels for the subject and the medical treatment database, and generate a conclusion with respect to at least one of (1) the probability that one or more medical treatments will be efficacious for treatment of prostate cancer for the patient; and (2) which of two or more medical treatments for the cancer will be more efficacious for the patient.
It should be understood that all combinations of features described herein are contemplated, even if the combination of feature is not specifically found in the same sentence or paragraph herein . This includes in particular the use of all markers disclosed herein, alone or in combination, for use in all aspects of the invention as described herein . BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention .
FIG 1 provides a diagram illustrating a computer-implemented system utilizing risk variants as described herein.
FIG 2 shows the distribution of personalized PSA cutoff values after applying a genetic correction for the commonly used PSA cutoff of 4ng/mL, based on the effect of four SNPs (rs2736098, rsl0788160, rsl l067228 and rsl7632542) in samples from the Icelandic (ICE) and UK populations. The Y-axis indicates personalized PSA cutoff values (ng/mL) based on the correction for the four SNPs, and the X-axis indicates % of the distribution .
FIG 3 shows results for four biopsy outcome models._Shown are results from analyses of the area under the receiver-operating-characteristic curve (AUC) for four biopsy outcome models. The four different models included data on : 1) PSA levels (red line (1)), 2) the combined prostate cancer risk prediction of 23 established sequence variants (green line (2)), 3) genetic correction of PSA values based on the sequence variants rs2736098, rsl0788160, rsl l067228 and rsl7632542 (blue line (3)), 4) both the genetic correction of PSA levels and the combined risk of the 23 prostate cancer risk variants (pink line (4)) . The black diagonal line indicates random classification, for comparison to the four different models. (A) results from Iceland (n = 415) : AUC for model-1 = 70.4%, AUC for model-2 = 63.0%, AUC for model-3 = 70.9%, AUC for model-4 = 73.2%. (B) results from the UK (n = 1,291) : AUC for model-1 = 57.1%, AUC for model-2 = 61.1%, AUC for model-3 = 58.5%, AUC for model-4 = 63.3%.
FIG 4 provides a diagram illustrating a system comprising computer implemented methods utilizing risk variants as described herein .
FIG 5 shows an exemplary system for determining corrected PSA levels as described further herein .
FIG 6 shows a system for selecting a treatment protocol for a subject diagnosed with, or at risk for, prostate cancer.
DETAILED DESCRIPTION
Definitions
Unless otherwise indicated, nucleic acid sequences are written left to right in a 5' to 3' orientation . Numeric ranges recited within the specification are inclusive of the numbers defining the range and include each integer or any non-integer fraction within the defined range. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by the ordinary person skilled in the art to which the invention pertains. The following terms shall, in the present context, have the meaning as indicated :
A "polymorphic marker", sometime referred to as a "marker", as described herein, refers to a genomic polymorphic site. Each polymorphic marker has at least two sequence variations characteristic of particular alleles at the polymorphic site. Thus, genetic association to a polymorphic marker implies that there is association to at least one specific allele of that particular polymorphic marker. The marker can comprise any allele of any variant type found in the genome, including SNPs, mini- or microsateiiites, translocations and copy number variations (insertions, deletions, duplications) . Polymorphic markers can be of any measurable frequency in the population . For mapping of disease genes, polymorphic markers with population frequency higher than 5-10% are in general most useful . However, polymorphic markers may also have lower population frequencies, such as 1-5% frequency, or even lower frequency, in particular copy number variations (CNVs) . The term shall, in the present context, be taken to include polymorphic markers with any population frequency. The sequence listing provided herein identifies polymorphic sites as described herein in the context of their genomic sequence, i.e. by providing information about the flanking sequence of the polymorphic site in the human genome assembly.
An "allele" refers to the nucleotide sequence of a given locus (position) on a chromosome. A polymorphic marker allele thus refers to the composition (i.e., sequence) of the marker on a chromosome. Genomic DNA from an individual contains two alleles (e.g. , allele-specific sequences) for any given polymorphic marker, representative of each copy of the marker on each chromosome. Sequence codes for nucleotides used herein are : A = 1, C = 2, G = 3, T = 4. For microsatellite alleles, the CEPH sample (Centre d'Etudes du Polymorphisme Humain, genomics repository, CEPH sample 1347-02) is used as a reference, the shorter allele of each microsatellite in this sample is set as 0 and all other alleles in other samples are numbered in relation to this reference. Thus, e.g., allele 1 is 1 bp longer than the shorter allele in the CEPH sample, allele 2 is 2 bp longer than the shorter allele in the CEPH sample, allele 3 is 3 bp longer than the lower allele in the CEPH sample, etc., and allele -1 is 1 bp shorter than the shorter allele in the CEPH sample, allele -2 is 2 bp shorter than the shorter allele in the CEPH sample, etc.
Sequence conucleotide ambiguity as described herein is according to WIPO ST.25 :
Figure imgf000006_0001
A nucleotide position at which more than one sequence is possible in a population (either a natural population or a synthetic population, e.g. , a library of synthetic molecules) is referred to herein as a "polymorphic site".
A "Single Nucleotide Polymorphism" or "SNP" is a DNA sequence variation occurring when a single nucleotide at a specific location in the genome differs between members of a species or between paired chromosomes in an individual. Most SNP polymorphisms have two alleles. Each individual is in this instance either homozygous for one allele of the polymorphism (i.e. both chromosomal copies of the individual have the same nucleotide at the SNP location), or the individual is heterozygous (i .e. the two sister chromosomes of the individual contain different nucleotides). The SNP nomenclature as reported herein refers to the official Reference SNP (rs) ID identification tag as assigned to each unique SNP by the National Center for Biotechnological Information (NCBI) .
A "variant", as described herein, refers to a segment of DNA that differs from the reference DNA. A "marker" or a "polymorphic marker", as defined herein, is a variant. Alleles that differ from the reference are referred to as "variant" alleles.
A "microsatellite" is a polymorphic marker that has multiple small repeats of bases that are 2-8 nucleotides in length (such as CA repeats) at a particular site, in which the number of repeat lengths varies in the general population . An "indel" is a common form of polymorphism comprising a small insertion or deletion that is typically only a few nucleotides long.
A "haplotype," as described herein, refers to a segment of genomic DNA that is characterized by a specific combination of alleles arranged along the segment. For diploid organisms such as humans, a haplotype comprises one member of the pair of alleles for each polymorphic marker or locus along the segment. In a certain embodiment, the haplotype can comprise two or more alleles, three or more alleles, four or more alleles, or five or more alleles.
Allelic identities are described herein in the context of the marker name and the particular allele of the marker, e.g., "4 rsl7632542" refers to the 4 allele of marker rsl7632542, and is equivalent to "rsl7632542 allele 4". Furthermore, allelic codes are as for individual markers, i.e. 1 = A, 2 = C, 3 = G and 4 = T.
The term "susceptibility", as described herein, refers to the proneness of an individual towards the development of a certain state (e.g., a certain trait, phenotype or disease), or towards being less able to resist a particular state than the average individual. The term, also referred to as "risk", encompasses both increased susceptibility and decreased susceptibility. Thus, particular alleles at polymorphic markers may be characteristic of increased susceptibility (i.e., increased risk) of prostate cancer, as characterized by a relative risk (RR) or odds ratio (OR) of greater than one for the particular allele. Alternatively, the markers are characteristic of decreased susceptibility (i.e., decreased risk) of prostate, as characterized by a relative risk of less than one. The term "and/or" shall in the present context be understood to indicate that either or both of the items connected by it are involved . In other words, the term herein shall be taken to mean "one or the other or both".
The term "look-up table", as described herein, is a table that correlates one form of data to another form, or one or more forms of data to a predicted outcome to which the data is relevant, such as phenotype or trait. For example, a look-up table can comprise a correlation between allelic data for at least one polymorphic marker and a particular trait or phenotype, such as a particular disease diagnosis, that an individual who comprises the particular allelic data is likely to display, or is more likely to display than individuals who do not comprise the particular allelic data. Look-up tables can be multidimensional, i.e. they can contain information about multiple alleles for single markers simultaneously, or the can contain information about multiple markers, and they may also comprise other factors, such as particulars about diseases diagnoses, racial information, biomarkers, biochemical measurements, therapeutic methods or drugs, etc.
A "computer-readable medium", is an information storage medium that can be accessed by a computer using a commercially available or custom-made interface. Exemplary computer- readable media include memory (e.g., RAM, ROM, flash memory, etc.), optical storage media (e.g. , CD-ROM), magnetic storage media (e.g., computer hard drives, floppy disks, etc.), punch cards, or other commercially available media . Information may be transferred between a system of interest and a medium, between computers, or between computers and the computer- readable medium for storage or access of stored information . Such transmission can be electrical, or by other available methods, such as IR links, wireless connections, etc.
A "nucleic acid sample" as described herein, refers to a sample obtained from an individual that contains nucleic acid (DNA or RNA) . In certain embodiments, i.e. the detection of specific polymorphic markers and/or haplotypes, the nucleic acid sample comprises genomic DNA. Such a nucleic acid sample can be obtained from any source that contains genomic DNA, including a blood sample, sample of amniotic fluid, sample of cerebrospinal fluid, or tissue sample from skin, muscle, buccal or conjunctival mucosa, placenta, gastrointestinal tract or other organs.
The term "antisense agent" or "antisense oligonucleotide" refers, as described herein, to molecules, or compositions comprising molecules, which include a sequence of purine an pyrimidine heterocyclic bases, supported by a backbone, which are effective to hydrogen bond to a corresponding contiguous bases in a target nucleic acid sequence. The backbone is composed of subunit backbone moieties supporting the purine an pyrimidine hetercyclic bases at positions which allow such hydrogen bonding . These backbone moieties are cyclic moieties of 5 to 7 atoms in size, linked together by phosphorous-containing linkage units of one to three atoms in length . In certain preferred embodiments, the antisense agent comprises an oligonucleotide molecule.
The term "quantity", as described herein, refers to the amount or level of a particular compound or substance. For example, PSA quantity refers to the amount of PSA in a particular object or sample. The quantity may be determined as a mass or a molar quantity. The quantity may also suitably be reported as a concentration, for example as mass/volume or molar quantity/volume. As an example, PSA quantity is sometimes determined in units of ng/mL (nanograms per milliliter) .
Methods of determining corrected PSA values
Although PSA is widely used as a screening test for prostate cancer, it is limited in both specificity and sensitivity. This is mainly due to the fact that PSA is not a specific marker for prostate cancer, since its levels increase due to other conditions, including prostatic hyperplasia, and PSA levels are also known to be affected by factors such as medication, urologic
manipulation and inflammation . Further, it has been established that between 40 and 45% of the variability in PSA levels in the general population is due to inherited factors.
One approach to increase the specificity and sensitivity of the PSA test is to work out a model that defines what is a "normal" PSA value for a given human . Such a model would have to take into account a number of factors, including genetic variants. However, to date these genetic variants have remained largely unknown, and methods for applying such variants for correcting PSA values have not been established.
The present inventors have discovered that certain genetic variants are predictive of PSA levels in humans. Such variants determine in part normal PSA levels in humans. By applying information about the effect of genetic variants on PSA levels, methods to determine corrected PSA levels can be developed. Results from estimating the combined relative effect of variants shown herein to be associated with PSA levels demonstrate a considerable variation in PSA levels between individuals based on their genotypes. By applying the combined genetic effect on commonly used PSA cutoff values, a personalized PSA cutoff value can be obtained. The data indicate that for a substantial fraction of men undergoing PSA-based prostate cancer screening, the personalized PSA cutoff value (for the decision of doing a biopsy or not) is shifted and hence men would be reclassified with respect to whether or not they should undergo a biopsy. This reclassification is likely to affect both the sensitivity and the specificity of the PSA test, and thereby, also the long term outcome of the patients since early diagnosis is the most powerful way to improve the patient's prognosis. For a screening test as important and widely used as the PSA test, having a better way to interpret the measured PSA level is likely to improve substantially the clinical performance of the test.
As a consequence, methods are described herein for correcting PSA levels determined in humans to determine a PSA value that reflects the genetic composition of individuals at variants known to influence normal PSA levels.
Accordingly, the present invention provides a method of determining corrected PSA quantity in a human individual. Such a method may in one aspect comprise steps of
(a) Obtaining data identifying an uncorrected PSA quantity in a first sample from the human individual; (b) Analyzing sequence data about at least one polymorphic marker from the first sample or a second sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans; and
(c) Determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker.
An "uncorrected" PSA quantity is in this context a quantity of PSA that is determined in a biological sample, and is not corrected or adjusted based on the presence, absence or magnitude of other substances in the sample. In one preferred embodiment, the uncorrected PSA quantity is a PSA quantity that has not been corrected based on the identity of genetic variants in the genome of the individual. A "corrected" PSA quantity is, by consequence, a PSA quantity that has been corrected based on the identity of genetic variants in the genome of the individual, as described in detail herein .
In certain embodiments, the human individual is a male individual.
In certain embodiments, the step of obtaining data identifying an uncorrected PSA quantity comprises detecting an uncorrected PSA quantity in a first sample from the human individual.
The first sample is preferably a sample that comprises PSA protein . In certain embodiments, the sample is selected from the group consisting of a blood sample, a serum sample, a semen sample, a saliva sample, a urine sample, a prostate biopsy sample. Preferably, the sample is a serum sample. The sample may also be any other biological sample from the individual that contains PSA protein. In certain embodiments, the step of obtaining data identifying an uncorrected PSA quantity includes a sample collection step, i.e. a step of obtaining a first sample from the human individual prior to the detecting.
Determination of PSA quantity in human tissue can be done using any method available to the skilled person. Such methods include, but are not limited to, immunogenic tests such as Hybritech PSA test (Beckman Coulter) and Elecsys PSA assay (Roche) . The skilled person will appreciate that the methods described herein are applicable for correction of PSA levels determined by any particular method that detects the amount or quantity of PSA protein.
Correction of PSA quantity is suitably done by using the determined allelic effect of any one allele of a polymorphic marker. For example, if a particular allele has been determined to lead to increased PSA levels by 15% in the population, then measured PSA values for an individual who carries one copy of the allele will be decreased by 15% to obtain a corrected PSA value. The effect of multiple markers in general can be assumed to be independent, and the multiplicative model applied.
As a consequence, the magnitude of the PSA correction obtained by the current method depends on the genotype of the individual for the markers are assessed to apply a genetic correction. In certain embodiments, the corrected PSA quantity differs from the uncorrected PSA quantity by at least O. lng/mL In certain embodiments, the corrected PSA quantity differs from the uncorrected PSA quantity by at least 0.5ng/mL In certain embodiments, the corrected PSA quantity differs from the uncorrected PSA quantity by at least l .Ong/mL It will be appreciated that other values of the difference between uncorrected and corrected PSA values are possible and are also contemplated, including but not limited to at least 0.2ng/mL, at least 0.3ng/mL, at least 0.4ng/mL, at least 0.6ng/mL, at least 0.7ng/mL, at least 0.8ng/mL, at least 0.9ng/mL, at least l . lng/mL, and at least 1.2ng/mL.
In certain embodiments, at least one allele of the at least one marker is predictive of an increased quantity of PSA in humans. In certain embodiments, at least one other allele of the at least one marker is predictive of a decreased quantity of PSA in humans. Thus, determining corrected PSA quantity in an individual comprises adjusting uncorrected PSA quantity based on the predicted effect of the particular alleles in the genome of the individual on PSA quantity in humans.
In certain embodiments, a further step is included, comprising preparing a report containing results from the determination of corrected PSA quantity. The report may be in any suitable format, including but not limited to a report written in a computer readable medium, printed on paper, or displayed on a visual display.
The skilled person will appreciate that for any polymorphic marker, the allele that is detected can be the allele of the complementary strand of DNA, such that the nucleic acid sequence data includes the identification of at least one allele which is complementary to any of the alleles of the polymorphic markers referenced above.
Suitable polymorphic markers
The methods described herein for correcting PSA levels may be practiced using any one, or a combination of, polymorphic markers that are predictive of PSA levels in humans. The markers may be independent, i.e. in linkage equilibrium. The markers may also be in linkage disequilibrium . The skilled person will appreciate how to use any such marker in the methods described herein. In certain embodiments, if a marker is predictive of PSA levels in humans, at least one allele of the marker is predictive of increased PSA levels in humans, compared with the general population. Certain other allele(s) the marker may also be predictive of decreased PSA levels in humans. Identifying which allele(s) is predictive of increased PSA level, and which allele(s) is predictive of decreased PSA levels is a trivial exercise for the skilled person, once the marker has been identified, since a simple correlation with the particular allele(s) and PSA levels will in such cases be observed.
In preferred embodiments, markers useful for correcting PSA levels are selected from the group consisting of rs401681 (Which is identified in SEQ ID NO: l herein), rs2736098 (SEQ ID NO: 2), rsl0788160 (SEQ ID NO: 3), rsll067228 (SEQ ID NO: 5), rsl0993994 (SEQ ID NO:4), rs4430796 (SEQ ID NO: 6), rs2735839 (SEQ ID NO: 7) and rsl7632542 (SEQ ID NO: 8), and markers in linkage disequilibrium therewith .
In certain embodiments, the markers are selected from the group consisting of s.51165690, s.51172808, s.51175013, s.56037076, s.56054527, s.56058688, s.56060000, s.56066550, s.56066560, s.56066619, rsl058205, rsl061657, rsl0749412, rsl0749413, rsl0763534, rsl0763536, rsl0763546, rsl0763576, rsl0763588, rsl0788154, rsl0788159, rsl0788162, rsl0788163, rsl0788164, rsl0788165, rsl0788166, rsl0788167, rsl0825652, rsl0826075, rsl0826125, rsl0826127, rsl0886880, rsl0886882, rsl0886883, rsl0886885, rsl0886886, rsl0886887, rsl0886890, rsl0886893, rsl0886894, rsl0886895, rsl0886896, rsl0886897, rsl0886898, rsl0886899, rsl0886900, rsl0886901, rsl0886902, rsl0886903, rsl0908278, rsll004246, rsl l004324, rsl l004409, rsl l004415, rsl l004422, rsll004435, rsl l006207, rsll006274, rsl l l99862, rsl l l99866, rsl l l99867, rsl l l99868, rsll l99869, rsl ll99871, rsll l99872, rsl l l99874, rsl l l99879, rsl l l99881, rsl l25527, rsl l25528, rsll263761, rsll263763, rsl l593361, rsl l598592, rsl l599333, rsl l609105, rsll651052, rsl l651755, rsll657964, rsl l658063, rsl2146156, rsl2146366, rsl2413088, rsl2413648, rsl2415826, rsl2761612, rsl2763717, rsl2781411, rsl74776, rsl7632542, rsl873450, rsl873451, rsl873452, rs2005705, rs2125770, rs2201026, rs2249986, rs2569735, rs2611489, rs2611506, rs2611507, rs2611508, rs2611509, rs2611512, rs2611513, rs2659051, rs2659122, rs2659124, rs266849, rs266878, rs27068, rs2735839, rs2735846, rs2735945, rs2736102, rs2736108, rs2843549, rs2843550, rs2843551, rs2843554, rs2843560, rs2843562, rs2901290, rs2926494, rs3101227, rs3123078, rs35716372, rs3741698, rs3744763, rs3760511, rs3925042, rs4131357, rs4237529, rs4239217, rs4304716, rs4306255, rs4393247, rs4465316, rs4468286, rs4486572, rs4489674, rs4512771, rs4554834, rs4581397, rs4630240, rs4630241, rs4630243, rs4631830, rs4752520, rs4935090, rs4935162, rs515746, rs545076, rs551510, rs567223, rs57263518, rs57858801, rs59336, rs62113216, rs6481329, rs67289834, rs7071471, rs7074985, rs7075009, rs7075697, rs7076500, rs7077830, rs7081532, rs7081844, rs7090326, rs7091083, rs7098889, rs7405696, rs7405776, rs7501939, rs7896156, rs7910704, rs7915008, rs7920517, rs7922901, rs7923130, rs8064454, rs8853, rs9630106, rs9787697, and rs9913260, which are the markers listed in Table 13 herein.
In certain embodiments, the markers are selected from the group consisting of rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, and rsl7632542, and markers in linkage disequilibrium therewith. In certain embodiments, the markers are selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl7632542 and rsl l067228, and markers in linkage disequilibrium therewith. In certain embodiments, the markers are selected from the group consting of rs401681, rs2736098, rsl0788160 and rsll067228, and markers in linkage disequilibrium therewith. In one embodiment, the markers are selected from the group consisting of rs2736098, and markers in linkage disequilibrium therewith. In one embodiment, the markers are selected from the group consisting of rsl0788160, and markers in linkage disequilibrium therewith. In one embodiment, the markers are selected from the group consisting of rsl l067228, and markers in linkage disequilibrium therewith. In one embodiment, the markers are selected from the group consisting of rsl0993994, and markers in linkage disequilibrium therewith. In one embodiment, the markers are selected from the group consisting of rs4430796, and markers in linkage disequilibrium therewith. In one embodiment, the markers are selected from the group consisting of rsl7632542, and markers in linkage disequilibrium therewith. Certain alleles at these polymorphic markers are predictive of an increased PSA quantity in humans. In certain embodiments, determination of the presence of a marker allele selected from the group consisting of the C allele of rs401681, the A allele of rs2736098, the A allele of rsl0788160, the T allele of rsl0993994, the A allele of rsll067228, the A allele of rs4430796, the G allele of rs2735839 and the T allele of rsl7632542 is indicative of elevated PSA quantity in the human individual. In one embodiment, the allele is the C allele of rs401681. In one embodiment, the allele is the A allele of rs2736098. In one embodiment, the allele is the A allele of rsl0788160. In one embodiment, the allele is the T allele of rsl0993994. In one
embodiment, the allele is the A allele of rsll067228. In one embodiment, the allele is the A allele of rs4430796. In one embodiment, the allele is the G allele of rs2735839. In one embodiment, the allele is the T allele of rsl7632542. Marker alleles in linkage disequilibrium with any one of these marker alleles are also predictive of increased PSA quantity in humans, and are therefore also useful in the methods described herein.
For example, a marker allele selected from the group consisting of s.51165690 allele C, s.51172808 allele G, s.51175013 allele A, s.56037076 allele T, s.56054527 allele T, s.56058688 allele T, s.56060000 allele A, s.56066550 allele T, s.56066560 allele C, s.56066619 allele G, rsl058205 allele T, rsl061657 allele T, rsl0749412 allele T, rsl0749413 allele T, rsl0763534 allele C, rsl0763536 allele G, rsl0763546 allele C, rsl0763576 allele A, rsl0763588 allele G, rsl0788154 allele C, rsl0788159 allele G, rsl0788162 allele G, rsl0788163 allele G, rsl0788164 allele T, rsl0788165 allele G, rsl0788166 allele T, rsl0788167 allele A, rsl0825652 allele A, rsl0826075 allele G, rsl0826125 allele G, rsl0826127 allele G, rsl0886880 allele C, rsl0886882 allele T, rsl0886883 allele G, rsl0886885 allele T, rsl0886886 allele G, rsl0886887 allele T, rsl0886890 allele G, rsl0886893 allele C, rsl0886894 allele C, rsl0886895 allele A, rsl0886896 allele A, rsl0886897 allele C, rsl0886898 allele G, rsl0886899 allele T, rsl0886900 allele G, rsl0886901 allele C, rsl0886902 allele C, rsl0886903 allele G, rsl0908278 allele A, rsll004246 allele C, rsll004324 allele G, rsll004409 allele C, rsll004415 allele A, rsll004422 allele G, rsll004435 allele A, rsll006207 allele T, rsll006274 allele T, rslll99862 allele A, rslll99866 allele A, rslll99867 allele T, rslll99868 allele A, rslll99869 allele G, rslll99871 allele A, rslll99872 allele A, rslll99874 allele A, rslll99879 allele C, rslll99881 allele C, rsll25527 allele A, rsll25528 allele A, rsll263761 allele A, rsll263763 allele A, rsll593361 allele A, rsll598592 allele A, rsll599333 allele C, rsll609105 allele A, rsll651052 allele G, rsll651755 allele T, rsll657964 allele G, rsll658063 allele G, rsl2146156 allele C, rsl2146366 allele T, rsl2413088 allele T, rsl2413648 allele A, rsl2415826 allele C, rsl2761612 allele A, rsl2763717 allele G, rsl2781411 allele T, rsl74776 allele C, rsl7632542 allele T, rsl873450 allele G, rsl873451 allele C, rsl873452 allele C, rs2005705 allele G, rs2125770 allele T, rs2201026 allele G, rs2249986 allele T, rs2569735 allele G, rs2611489 allele G, rs2611506 allele C, rs2611507 allele T, rs2611508 allele T, rs2611509 allele G, rs2611512 allele A, rs2611513 allele C, rs2659051 allele G, rs2659122 allele T, rs2659124 allele T, rs266849 allele A, rs266878 allele C, rs27068 allele C, rs2735839 allele G, rs2735846 allele G, rs2735945 allele C, rs2736102 allele C, rs2736108 allele T, rs2843549 allele C, rs2843550 allele C, rs2843551 allele C, rs2843554 allele G, rs2843560 allele G, rs2843562 allele C, rs2901290 allele A, rs2926494 allele T, rs3101227 allele C, rs3123078 allele C, rs35716372 allele A, rs3741698 allele C, rs3744763 allele A, rs3760511 allele G, rs3925042 allele T, rs4131357 allele C, rs4237529 allele G, rs4239217 allele A, rs4304716 allele A, rs4306255 allele A, rs4393247 allele A, rs4465316 allele A, rs4468286 allele A, rs4486572 allele A, rs4489674 allele G, rs4512771 allele C, rs4554834 allele A, rs4581397 allele A, rs4630240 allele G, rs4630241 allele G, rs4630243 allele T, rs4631830 allele C, rs4752520 allele T, rs4935090 allele T, rs4935162 allele G, rs515746 allele A, rs545076 allele A, rs551510 allele T, rs567223 allele T, rs57263518 allele A, rs57858801 allele T, rs59336 allele A, rs62113216 allele T, rs6481329 allele G, rs67289834 allele T, rs7071471 allele T, rs7074985 allele A, rs7075009 allele T, rs7075697 allele C, rs7076500 allele A, rs7077830 allele G, rs7081532 allele A, rs7081844 allele T, rs7090326 allele T, rs7091083 allele A, rs7098889 allele C, rs7405696 allele C, rs7405776 allele G, rs7501939 allele C, rs7896156 allele A, rs7910704 allele C, rs7915008 allele A, rs7920517 allele G, rs7922901 allele G, rs7923130 allele A, rs8064454 allele C, rs8853 allele C, rs9630106 allele G, rs9787697 allele C, rs9913260 allele G, rsl016990 allele C, rsl7626423 allele C, rs2012677 allele A, and rs757210 allele G is predictive of increased PSA levels.
In certain embodiments, marker alleles selected from the group consisting of s.122837469 allele A, rs2130779 allele T, s.122876448 allele A, s.122901140 allele T, s.122901142 allele C, s.122905335 allele A, rsl0788149 allele G, rsl0749408 allele C, rs2172071 allele C, rsll592107 allele A, rsl907218 allele T, rsl907220 allele A, rsl994655 allele T, rsl907221 allele C, rsl907225 allele C, rsl907226 allele G, rsl0749409 allele C, rslll99835 allele G, s.122991926 allele C, rs729014 allele T, s.122993518 allele G, s.122994309 allele A, s.122994946 allele G, rsl873450 allele G, rs2901290 allele A, s.122998594 allele A, s.122998678 allele T, s.122998978 allele T, rs2201026 allele G, rs4237529 allele G, s.122999386 allele G, rsl873451 allele C, rsl873452 allele C, rs4752520 allele T, rsl0886880 allele C, rsl0749412 allele T, s.123008216 allele A, rs3925042 allele T, rsll25527 allele A, rsll25528 allele A, rs4319451 allele G, rsl0788154 allele C, rs7081844 allele T, rs7076500 allele A, s.123011774 allele T, s.123011879 allele T, rslll99862 allele A, s.123014171 allele C, rsl2146156 allele C, s.123014499 allele G, s.123014519 allele A, rsl2146366 allele T, s.123014684 allele A, rs7091083 allele A, rs7074985 allele A, rs7915008 allele A, s.123015342 allele A, s.123015365 allele A, rsl0749413 allele T, rslll99866 allele A, s.123016003 allele A, rs7923130 allele A, rs7922901 allele G, rsl0886882 allele T, rsl0886883 allele G, rslll99867 allele T, s.123017698 allele T, s.123018111 allele C, rs4393247 allele A, s.123018188 allele T, rs4489674 allele G, rslll99868 allele A, s.123018670 allele T, s.123019408 allele G, s.123019759 allele G, rslll99869 allele G, s.123020245 allele T, s.123020365 allele T, rsl0886885 allele T, rsl0788159 allele G, rsl0886886 allele G, rslll99871 allele A, rslll99872 allele A, rsl2761612 allele A, rs4575197 allele G, rslll99874 allele A, rsl0886887 allele T, s.123023625 allele T, s.123023836 allele C, rs4465316 allele A, rs4468286 allele A, rsl0886890 allele G, rsl0788162 allele G, s.123028135 allele A, rsl2413648 allele A, s.123029102 allele C, rsl0788163 allele G, s.123031617 allele T, s.123031811 allele T, rsl0788164 allele T, rsll598592 allele A, rsl0788165 allele G, rs9630106 allele G, rsl0886893 allele C, s.123034821 allele C, rslll99879 allele C, rslll99881 allele C, rsl2415826 allele C, rsl0788166 allele G, rsl0886894 allele C, rsl0886895 allele A, rsl0886896 allele A, rsl0886897 allele C, rsl0886898 allele G, rsl0886899 allele T, rsl0886900 allele G, rsl0886901 allele C, rsl0886902 allele C, rsl0886903 allele G, rsl2413088 allele T, rsl0788167 allele A, s.123047182 allele T, rs7085073 allele T, rs7071101 allele A, rsl2570783 allele A, rslll99884 allele A, rs7085506 allele G, rsl0886905 allele C, rsl0736302 allele C, s.123061811 allele T, s.123062031 allele C, rslll99886 allele T, s.123063327 allele T, s.123063715 allele A, rsl0886907 allele C, s.123064252 allele T, s.123064345 allele T, s.123064780 allele T, s.123064783 allele C, s.123066424 allele C, s.123066700 allele C, rs3981043 allele T, rslll99896 allele T, rslll99897 allele A, rslll99898 allele C, s.123067963 allele A, rslll99900 allele T, rslll99901 allele T, s.123068178 allele T, s.123068222 allele A, s.123068236 allele T, s.123068424 allele G, s.123068619 allele T, s.123068743 allele G, s.123068926 allele T, s.123068997 allele A, s.123069012 allele T, s.123069326 allele T, s.123069570 allele T, s.123069989 allele C, s.123070105 allele T, s.123071090 allele A, s.123071347 allele C, rs4254007 allele A, s.123071495 allele A, s.123071914 allele T, s.123072804 allele A, rs7900630 allele T, s.123074016 allele C, rsl896416 allele A, s.123074531 allele T, s.123074928 allele T, s.123076274 allele C, s.123076472 allele G, rs2420925 allele C, s.123077398 allele G, s.123077455 allele C, rsl2779205 allele T, rslll99912 allele T, rs4752534 allele C, s.123078389 allele T, rsl896420 allele T, rsl896419 allele C, s.123079199 allele A, s.123081990 allele A, s.123081993 allele A, s.123081998 allele G, s.123201870 allele C, s.51157005 allele G, s.51159221 allele C, rs35716372 allele A, s.51159373 allele C, s.51159376 allele C, s.51159399 allele T, s.51159786 allele C, rs4935090 allele T, rsl2781411 allele T, s.51162137 allele G, s.51162792 allele A, s.51162795 allele A, rsll004246 allele C, s.51165690 allele C, rsll004324 allele G, rs2843562 allele C, rsll004409 allele C, rsll004415 allele A, rsll004422 allele G, s.51168415 allele T, rsll004435 allele A, rsll599333 allele C, s.51170094 allele G, s.51170307 allele A, rsl2763717 allele G, rs67289834 allele T, s.51172442 allele A, s.51172558 allele G, rs57858801 allele T, s.51172618 allele A, s.51172808 allele G, s.51173184 allele G, rs7071471 allele T, rs7090326 allele T, s.51173565 allele G, s.51173983 allele C, s.51174391 allele G, s.51174499 allele C, s.51174610 allele T, s.51174944 allele A, s.51175013 allele A, s.51175409 allele G, s.51176290 allele T, s.51176963 allele C, s.51180209 allele A, rsl0825652 allele A, s.51180819 allele A, rs2843560 allele G, rs2125770 allele T, rs2611513 allele C, rs2611512 allele A, rs2611509 allele G, s.51186305 allele G, rs2926494 allele T, rs2611508 allele T, rs2611507 allele T, s.51188694 allele A, rs2611506 allele C, rs57263518 allele A, s.51189522 allele G, rs3101227 allele C, rs2843549 allele C, rs2843550 allele C, rs2249986 allele T, rs2843551 allele C, s.51192126 allele C, rs7077830 allele G, s.51193219 allele A, rs2843554 allele G, s.51194280 allele C, rs2611489 allele G, rs3123078 allele C, rs4935162 allele G, rs7081532 allele A, rsl0826075 allele G, rs7896156 allele A, s.51199599 allele A, rs6481329 allele G, rs7910704 allele C, rs4554834 allele A, rsl0826125 allele G, rsl0826127 allele G, rs4486572 allele A, rs4581397 allele A, rs4630240 allele G, rs7920517 allele G, rs4630241 allele G, rs9787697 allele C, rsl0763534 allele C, rsl0763536 allele G, s.51205998 allele C, rsl0763546 allele C, s.51206890 allele C, rs4131357 allele C, s.51207437 allele C, s.51207481 allele G, s.51208175 allele A, rsll006207 allele T, rsl0763576 allele A, s.51208921 allele G, rsll593361 allele A, rsl0763588 allele G, rsll006274 allele T, s.51210619 allele A, s.51210866 allele G, rs4630243 allele T, rs4512771 allele C, rs4306255 allele A, s.51213076 allele T, rs4631830 allele C, rs7075009 allele T, rs7098889 allele C, rs4304716 allele A, s.51214689 allele A, s.51214690 allele T, rs7477953 allele G, s.51215034 allele G, s.51216121 allele A, s.51216342 allele A, rs7075697 allele C, s.51219226 allele C, s.51219227 allele T, s.51219230 allele C, s.51219320 allele T, s.51221179 allele C, s.113576401 allele A, s.113582477 allele G, s.113584188 allele G, s.113584539 allele G, s.113585097 allele T, rsl2819162 allele A, rsl l609105 allele A, rs514849 allele G, rs513061 allele T, s.113590733 allele A, rsl061657 allele T, rs8853 allele C, rs3741698 allele C, s.113594635 allele G, rs567223 allele T, rs551510 allele T, rs59336 allele A, s.113601412 allele G, rs515746 allele A, rs545076 allele A, s.113614584 allele C, rs3744763 allele A, rs7405776 allele G, rs2005705 allele G, s.33170591 allele T, rsl l263761 allele A, rs4239217 allele A, rsl l651755 allele T, rsl0908278 allele A, s.33174083 allele T, rsll657964 allele G, rs7501939 allele C, rs8064454 allele C, s.33175746 allele T, s.33176039 allele A, rs7405696 allele C, rsll651052 allele G, rsll263763 allele A, rsl l658063 allele G, rs9913260 allele G, rs3760511 allele G, s.33182344 allele C, s.55554247 allele A, s.55566277 allele T, s.55582344 allele C, rs2546552 allele G, s.55596785 allele T, s.55597645 allele A, s.55598078 allele A, s.55600121 allele A, s.55605246 allele G, s.55606024 allele A, s.55607242 allele G, s.55624341 allele C, s.55630396 allele T, s.55630578 allele T, s.55630679 allele T, s.55630791 allele T, s.55631170 allele C, s.55632347 allele A, s.55632363 allele A, s.55636052 allele T, s.55637350 allele C, s.55640040 allele T, s.55646568 allele A, s.55649132 allele T, s.55650629 allele A, s.55650844 allele G, s.55652397 allele G, s.55653401 allele T, s.55653991 allele A, s.55654907 allele A, s.55657973 allele G, s.55659043 allele A, s.55660011 allele G, s.55660013 allele T, s.55660139 allele T, s.55660143 allele T, s.55661660 allele C, s.55661718 allele T, rs6509476 allele A, s.55664020 allele G, s.55664897 allele T, s.55665723 allele G, s.55665726 allele G, s.55672641 allele C, s.55673254 allele G, s.55674252 allele G, s.55674254 allele A, s.55674727 allele T, s.55676073 allele A, s.55683393 allele G, s.55687122 allele A, s.55695317 allele A, s.55697027 allele C, s.55701748 allele C, rs7257447 allele T, s.55702308 allele A, s.55703568 allele T, s.55706751 allele T, s.55708051 allele T, s.55709067 allele A, s.55709498 allele T, s.55709766 allele T, s.55710030 allele C, s.55710848 allele T, s.55710851 allele A, s.55711749 allele A, s.55712802 allele G, s.55713451 allele T, s.55713453 allele G, s.55713458 allele C, s.55713862 allele T, s.55716007 allele G, s.55718272 allele A, s.55723496 allele C, s.55724346 allele T, s.55726794 allele G, s.55729556 allele A, s.55729562 allele G, s.55729563 allele A, s.55731588 allele G, s.55733658 allele G, s.55741403 allele C, s.55743524 allele T, s.55745833 allele A, s.55746123 allele T, s.55747079 allele T, s.55748269 allele T, s.55748274 allele T, s.55748844 allele T, s.55749193 allele G, s.55752178 allele T, s.55752271 allele A, s.55770158 allele A, rs7247686 allele T, s.55771401 allele T, s.55772266 allele C, s.55775314 allele C, s.55778756 allele G, s.55788661 allele G, s.55790622 allele T, s.55791942 allele A, rsl0413426 allele G, s.55798366 allele G, s.55818900 allele G, s.55822129 allele C, s.55825528 allele G, s.55825624 allele T, s.55833489 allele T, s.55833938 allele G, s.55848124 allele G, s.55848125 allele G, s.55849044 allele A, s.55857289 allele T, s.55857585 allele A, s.55861107 allele G, s.55861111 allele A, s.55861196 allele T, s.55862851 allele T, s.55865439 allele T, s.55867208 allele A, s.55867650 allele G, s.55868902 allele G, s.55870429 allele C, rs73598616 allele G, s.55874339 allele T, s.55875249 allele C, s.55875725 allele C, s.55881262 allele A, s.55882788 allele T, s.55883542 allele C, s.55886467 allele T, s.55887498 allele T, s.55889175 allele G, s.55892113 allele A, s.55892618 allele T, s.55892866 allele T, s.55893305 allele G, s.55896443 allele G, s.55896826 allele A, s.55898241 allele T, s.55898245 allele A, s.55899120 allele T, s.55900597 allele G, s.55900764 allele A, s.55912567 allele T, s.55914840 allele A, s.55915776 allele G, s.55936192 allele T, s.55940336 allele C, s.55946316 allele G, s.55949971 allele C, s.55955333 allele G, s.55962188 allele T, s.55963864 allele G, s.55969754 allele T, s.55979135 allele T, rs67367861 allele C, s.55989580 allele A, s.56004001 allele A, s.56006528 allele G, s.56012046 allele G, s.56013739 allele G, rs2411330 allele G, rs3212825 allele G, s.56018053 allele G, s.56019106 allele C, rs7246740 allele A, s.56025860 allele G, s.56026713 allele T, rs55786312 allele T, s.56026881 allele A, s.56026882 allele A, s.56027319 allele A, s.56029265 allele C, s.56029362 allele G, s.56032778 allele G, s.56032963 allele T, s.56032964 allele G, s.56033138 allele G, s.56033138 allele G, s.56033664 allele T, s.56033664 allele T, s.56036363 allele G, s.56037076 allele T, s.56037076 allele T, rs2659051 allele G, s.56038334 allele A, s.56038334 allele A, s.56039736 allele C, rs266849 allele A, s.56042100 allele C, s.56042603 allele A, s.56042603 allele A, rs2659124 allele T, rs2659124 allele T, s.56046798 allele C, rs266878 allele C, rs266878 allele C, rsl74776 allele C, rsl74776 allele C, s.56052630 allele T, s.56052630 allele T, s.56052652 allele C, s.56052652 allele C, rsl7632542 allele T, s.56053983 allele C, s.56054527 allele T, s.56054527 allele T, rs2659122 allele T, rsl058205 allele T, rsl058205 allele T, rs2569735 allele G, rs2569735 allele G, rs2735839 allele G, rs62113216 allele T, rs62113216 allele T, s.56058308 allele G, s.56058606 allele A, s.56058688 allele T, s.56058866 allele T, s.56060000 allele A, s.56061277 allele G, s.56062250 allele C, s.56066550 allele T, s.56066560 allele C, s.56066619 allele G, s.56067024 allele C, s.56067024 allele C, rs73592873 allele G, s.56076121 allele G, s.56076122 allele G, s.56078845 allele G, s.56085550 allele G, s.56093594 allele G, s.56472259 allele C, s.1030492 allele G, s.1233724 allele C, s.1251946 allele C, s.1257345 allele A, s.1258032 allele G, rs9418 allele T, s.1282167 allele T, s.1285240 allele T, s.1285775 allele A, s.1287049 allele A, s.1292191 allele C, s.1334730 allele A, s.1349759 allele T, s.1350079 allele A, rs2736108 allele T, s.1350854 allele T, rs2735948 allele G, rs2735846 allele G, s.1352392 allele G, s.1353401 allele C, rs2735946 allele G, rs2736102 allele C, rs2853666 allele A, rs2735945 allele C, s.1359165 allele C, rs4530805 allele C, s.1359765 allele G, rs61574973 allele C, s.1362904 allele A, s.1363152 allele A, rsl2332579 allele T, rs6866783 allele C, s.1365329 allele C, rsl3356727 allele A, rsl3355267 allele C, s.1366701 allele G, rsl0078017 allele T, rs4975615 allele A, rs4975616 allele A, rs6554759 allele A, rs3816659 allele G, rsl801075 allele T, rs451360 allele C, rs421629 allele G, rs380286 allele G, rs402710 allele C, rsl0073340 allele C, rs414965 allele G, rs421284 allele T, rs466502 allele A, rs465498 allele A, rs452932 allele T, rs452384 allele T, rs370348 allele A, s.1386077 allele A, s.1386169 allele G, s.1386204 allele G, s.1386674 allele G, rs457130 allele A, rs467095 allele T, s.1389243 allele A, rs462608 allele T, rs456366 allele T, s.1390106 allele T, s.1390174 allele T, rs31487 allele G, s.1395154 allele T, rs31489 allele C, rs31490 allele G, rs27996 allele A, rs27071 allele T, rs27070 allele G, rs27068 allele C, s.1401106 allele T, rs37011 allele A, s.1402130 allele G, s.1402535 allele A, rs37009 allele C, rs40182 allele G, rs37008 allele G, rs37007 allele G, s.1407027 allele A, rs40181 allele G, s.1407682 allele A, rs37006 allele C, s.1408859 allele C, rs37005 allele C, s.1409771 allele A, rs37002 allele C, s.1411822 allele C, s.1411901 allele T, s.1412098 allele C, rs31494 allele G, s.1418662 allele T, s.1419748 allele G, s.1426206 allele T, s.1426336 allele T, s.1428371 allele A, s.1428373 allele A, s.1472454 allele T, s.1518154 allele C, s.1557827 allele A, rsll743119 allele C, s.1583465 allele A, rs4551123 allele G, s.1589581 allele G, s.1591616 allele C, s.1607388 allele T, rs6893515 allele T, s.1618305 allele C, s.1621550 allele C, s.1621551 allele A, rs6892057 allele G, s.1638061 allele C, rs6898387 allele C, rs7724451 allele G, rs2937006 allele A, s.1663985 allele T, s.1667254 allele A, s.1668831 allele T, s.1673499 allele A, s.1737379 allele G, s.1756873 allele A, s.1782909 allele G, s.1788485 allele C, s.1799150 allele A, s.1800043 allele T, s.1804565 allele A, s.1812409 allele G, s.886453 allele G, and s.887600 allele C, which are marker alleles as shown in Table 1, are indicative of increased PSA levels in the individual. These alleles are predicted to lead to elevated PSA levels in humans. Thus, a corrected PSA value for the individual for the particular marker allele will be lower than an uncorrected PSA value.
Certain other alleles at these markers are predictive of decreased PSA quantity in humans. In certain embodiments, marker alleles selected from the group consisting of the T allele of rs401681, the G allele of rs2736098, the G allele of rsl0788160, the C allele of rsl0993994, the G allele of rsll067228, the G allele of rs4430796, the A allele of rs2735839 and the C allele of rsl7632542 are indicative of reduced PSA quantity in the individual.
In further embodiments, a marker allele selected from the group consisting of s.51165690 allele
A, s.51172808 allele C, s.51175013 allele G, s.56037076 allele C, s.56054527 allele G, s.56058688 allele A, s.56060000 allele C, s.56066550 allele A, s.56066560 allele G, s.56066619 allele T, rsl058205 allele C, rsl061657 allele C, rsl0749412 allele A, rsl0749413 allele A, rsl0763534 allele T, rsl0763536 allele A, rsl0763546 allele G, rsl0763576 allele T, rsl0763588 allele T, rsl0788154 allele A, rsl0788159 allele A, rsl0788162 allele A, rsl0788163 allele T, rsl0788164 allele C, rsl0788165 allele T, rsl0788166 allele A, rsl0788167 allele T, rsl0825652 allele G, rsl0826075 allele C, rsl0826125 allele A, rsl0826127 allele A, rsl0886880 allele T, rsl0886882 allele C, rsl0886883 allele C, rsl0886885 allele G, rsl0886886 allele T, rsl0886887 allele C, rsl0886890 allele A, rsl0886893 allele T, rsl0886894 allele T, rsl0886895 allele C, rsl0886896 allele C, rsl0886897 allele T, rsl0886898 allele T, rsl0886899 allele G, rsl0886900 allele A, rsl0886901 allele T, rsl0886902 allele T, rsl0886903 allele C, rsl0908278 allele T, rsll004246 allele T, rsll004324 allele T, rsll004409 allele G, rsll004415 allele G, rsll004422 allele A, rsll004435 allele C, rsll006207 allele C, rsll006274 allele C, rslll99862 allele G, rslll99866 allele G, rslll99867 allele G, rslll99868 allele T, rslll99869 allele A, rslll99871 allele C, rslll99872 allele G, rslll99874 allele G, rslll99879 allele T, rslll99881 allele T, rsll25527 allele G, rsll25528 allele T, rsll263761 allele G, rsll263763 allele G, rsll593361 allele G, rsll598592 allele G, rsll599333 allele A, rsll609105 allele C, rsll651052 allele A, rsll651755 allele C, rsll657964 allele A, rsll658063 allele C, rsl2146156 allele T, rsl2146366 allele C, rsl2413088 allele C, rsl2413648 allele G, rsl2415826 allele T, rsl2761612 allele G, rsl2763717 allele C, rsl2781411 allele C, rsl74776 allele T, rsl7632542 allele C, rsl873450 allele T, rsl873451 allele T, rsl873452 allele T, rs2005705 allele A, rs2125770 allele C, rs2201026 allele T, rs2249986 allele G, rs2569735 allele A, rs2611489 allele A, rs2611506 allele T, rs2611507 allele C, rs2611508 allele A, rs2611509 allele A, rs2611512 allele G, rs2611513 allele T, rs2659051 allele C, rs2659122 allele C, rs2659124 allele A, rs266849 allele G, rs266878 allele G, rs27068 allele T, rs2735839 allele A, rs2735846 allele C, rs2735945 allele T, rs2736102 allele T, rs2736108 allele C, rs2843549 allele A, rs2843550 allele T, rs2843551 allele A, rs2843554 allele T, rs2843560 allele C, rs2843562 allele T, rs2901290 allele G, rs2926494 allele C, rs3101227 allele A, rs3123078 allele T, rs35716372 allele G, rs3741698 allele G, rs3744763 allele G, rs3760511 allele T, rs3925042 allele C, rs4131357 allele A, rs4237529 allele A, rs4239217 allele G, rs4304716 allele G, rs4306255 allele G, rs4393247 allele G, rs4465316 allele C, rs4468286 allele C, rs4486572 allele G, rs4489674 allele A, rs4512771 allele A, rs4554834 allele C, rs4581397 allele G, rs4630240 allele A, rs4630241 allele A, rs4630243 allele C, rs4631830 allele T, rs4752520 allele C, rs4935090 allele A, rs4935162 allele C, rs515746 allele G, rs545076 allele G, rs551510 allele C, rs567223 allele G, rs57263518 allele G, rs57858801 allele A, rs59336 allele T, rs62113216 allele A, rs6481329 allele A, rs67289834 allele C, rs7071471 allele C, rs7074985 allele T, rs7075009 allele G, rs7075697 allele G, rs7076500 allele G, rs7077830 allele C, rs7081532 allele G, rs7081844 allele C, rs7090326 allele A, rs7091083 allele G, rs7098889 allele T, rs7405696 allele G, rs7405776 allele A, rs7501939 allele T, rs7896156 allele G, rs7910704 allele T, rs7915008 allele G, rs7920517 allele A, rs7922901 allele C, rs7923130 allele G, rs8064454 allele A, rs8853 allele T, rs9630106 allele A, rs9787697 allele T, rs9913260 allele A, rsl016990 allele G, rsl7626423 allele T, rs2012677 allele T, and rs757210 allele A is predictive of reduced PSA levels.
In certain embodiments, marker alleles selected from the group consisting of s.122837469 allele C, rs2130779 allele G, s.122876448 allele G, s.122901140 allele C, s.122901142 allele A, s.122905335 allele G, rsl0788149 allele A, rsl0749408 allele T, rs2172071 allele T, rsl l592107 allele G, rsl907218 allele C, rsl907220 allele G, rsl994655 allele G, rsl907221 allele T, rsl907225 allele T, rsl907226 allele A, rsl0749409 allele G, rsl l l99835 allele A, s.122991926 allele T, rs729014 allele C, s.122993518 allele A, s.122994309 allele G, s.122994946 allele T, rsl873450 allele T, rs2901290 allele G, s.122998594 allele G, s.122998678 allele G, s.122998978 allele A, rs2201026 allele T, rs4237529 allele A, s.122999386 allele A, rsl873451 allele T, rsl873452 allele T, rs4752520 allele C, rsl0886880 allele T, rsl0749412 allele A, s.123008216 allele G, rs3925042 allele C, rsl l25527 allele G, rsl l25528 allele T, rs4319451 allele A, rsl0788154 allele A, rs7081844 allele C, rs7076500 allele G, s.123011774 allele C, s.123011879 allele C, rsl ll99862 allele G, s.123014171 allele T, rsl2146156 allele T, s.123014499 allele A, s.123014519 allele G, rsl2146366 allele C, s.123014684 allele C, rs7091083 allele G, rs7074985 allele T, rs7915008 allele G, s.123015342 allele C, s.123015365 allele G, rsl0749413 allele A, rsl ll99866 allele G, s.123016003 allele G, rs7923130 allele G, rs7922901 allele C, rsl0886882 allele C, rsl0886883 allele C, rsll l99867 allele G, s.123017698 allele C, s.123018111 allele G, rs4393247 allele G, s.123018188 allele C, rs4489674 allele A, rslll99868 allele T, s.123018670 allele G, s.123019408 allele T, s.123019759 allele C, rslll99869 allele A, s.123020245 allele G, s.123020365 allele A, rsl0886885 allele G, rsl0788159 allele A, rsl0886886 allele T, rslll99871 allele C, rslll99872 allele G, rsl2761612 allele G, rs4575197 allele A, rslll99874 allele G, rsl0886887 allele C, s.123023625 allele G, s.123023836 allele T, rs4465316 allele C, rs4468286 allele C, rsl0886890 allele A, rsl0788162 allele A, s.123028135 allele C, rsl2413648 allele G, s.123029102 allele T, rsl0788163 allele T, s.123031617 allele G, s.123031811 allele A, rsl0788164 allele C, rsll598592 allele G, rsl0788165 allele T, rs9630106 allele A, rsl0886893 allele T, s.123034821 allele T, rslll99879 allele T, rslll99881 allele T, rsl2415826 allele T, rsl0788166 allele A, rsl0886894 allele T, rsl0886895 allele C, rsl0886896 allele C, rsl0886897 allele T, rsl0886898 allele T, rsl0886899 allele G, rsl0886900 allele A, rsl0886901 allele T, rsl0886902 allele T, rsl0886903 allele C, rsl2413088 allele C, rsl0788167 allele T, s.123047182 allele C, rs7085073 allele C, rs7071101 allele G, rsl2570783 allele G, rslll99884 allele G, rs7085506 allele C, rsl0886905 allele T, rsl0736302 allele T, s.123061811 allele C, s.123062031 allele G, rslll99886 allele G, s.123063327 allele A, s.123063715 allele G, rsl0886907 allele G, s.123064252 allele C, s.123064345 allele G, s.123064780 allele C, s.123064783 allele T, s.123066424 allele T, s.123066700 allele T, rs3981043 allele A, rslll99896 allele C, rslll99897 allele G, rslll99898 allele T, s.123067963 allele T, rslll99900 allele A, rslll99901 allele C, s.123068178 allele G, s.123068222 allele G, s.123068236 allele C, s.123068424 allele A, s.123068619 allele C, s.123068743 allele A, s.123068926 allele A, s.123068997 allele G, s.123069012 allele C, s.123069326 allele G, s.123069570 allele C, s.123069989 allele T, s.123070105 allele C, s.123071090 allele G, s.123071347 allele G, rs4254007 allele T, s.123071495 allele G, s.123071914 allele G, s.123072804 allele G, rs7900630 allele C, s.123074016 allele T, rsl896416 allele G, s.123074531 allele C, s.123074928 allele C, s.123076274 allele T, s.123076472 allele C, rs2420925 allele T, s.123077398 allele A, s.123077455 allele G, rsl2779205 allele A, rslll99912 allele G, rs4752534 allele T, s.123078389 allele A, rsl896420 allele C, rsl896419 allele A, s.123079199 allele G, s.123081990 allele T, s.123081993 allele T, s.123081998 allele A, s.123201870 allele T, s.51157005 allele A, s.51159221 allele T, rs35716372 allele G, s.51159373 allele T, s.51159376 allele G, s.51159399 allele G, s.51159786 allele G, rs4935090 allele A, rsl2781411 allele C, s.51162137 allele A, s.51162792 allele C, s.51162795 allele C, rsll004246 allele T, s.51165690 allele A, rsll004324 allele T, rs2843562 allele T, rsll004409 allele G, rsll004415 allele G, rsll004422 allele A, s.51168415 allele C, rsll004435 allele C, rsll599333 allele A, s.51170094 allele T, s.51170307 allele G, rsl2763717 allele C, rs67289834 allele C, s.51172442 allele T, s.51172558 allele T, rs57858801 allele A, s.51172618 allele C, s.51172808 allele C, s.51173184 allele A, rs7071471 allele C, rs7090326 allele A, s.51173565 allele C, s.51173983 allele T, s.51174391 allele A, s.51174499 allele A, s.51174610 allele C, s.51174944 allele G, s.51175013 allele G, s.51175409 allele A, s.51176290 allele C, s.51176963 allele T, s.51180209 allele G, rsl0825652 allele G, s.51180819 allele C, rs2843560 allele C, rs2125770 allele C, rs2611513 allele T, rs2611512 allele G, rs2611509 allele A, s.51186305 allele T, rs2926494 allele C, rs2611508 allele A, rs2611507 allele C, s.51188694 allele C, rs2611506 allele T, rs57263518 allele G, s.51189522 allele A, rs3101227 allele A, rs2843549 allele A, rs2843550 allele T, rs2249986 allele G, rs2843551 allele A, s.51192126 allele T, rs7077830 allele C, s.51193219 allele T, rs2843554 allele T, s.51194280 allele T, rs2611489 allele A, rs3123078 allele T, rs4935162 allele C, rs7081532 allele G, rsl0826075 allele C, rs7896156 allele G, s.51199599 allele C, rs6481329 allele A, rs7910704 allele T, rs4554834 allele C, rsl0826125 allele A, rsl0826127 allele A, rs4486572 allele G, rs4581397 allele G, rs4630240 allele A, rs7920517 allele A, rs4630241 allele A, rs9787697 allele T, rsl0763534 allele T, rsl0763536 allele A, s.51205998 allele T, rsl0763546 allele G, s.51206890 allele A, rs4131357 allele A, s.51207437 allele T, s.51207481 allele A, s.51208175 allele C, rsll006207 allele C, rsl0763576 allele T, s.51208921 allele T, rsl l593361 allele G, rsl0763588 allele T, rsll006274 allele C, s.51210619 allele C, s.51210866 allele A, rs4630243 allele C, rs4512771 allele A, rs4306255 allele G, s.51213076 allele G, rs4631830 allele T, rs7075009 allele G, rs7098889 allele T, rs4304716 allele G, s.51214689 allele G, s.51214690 allele C, rs7477953 allele A, s.51215034 allele A, s.51216121 allele G, s.51216342 allele G, rs7075697 allele G, s.51219226 allele G, s.51219227 allele G, s.51219230 allele G, s.51219320 allele C, s.51221179 allele T, s.113576401 allele T, s.113582477 allele A, s.113584188 allele A, s.113584539 allele A, s.113585097 allele C, rsl2819162 allele G, rsl l609105 allele C, rs514849 allele A, rs513061 allele C, s.113590733 allele C, rsl061657 allele C, rs8853 allele T, rs3741698 allele G, s.113594635 allele T, rs567223 allele G, rs551510 allele C, rs59336 allele T, s.113601412 allele T, rs515746 allele G, rs545076 allele G, s.113614584 allele G, rs3744763 allele G, rs7405776 allele A, rs2005705 allele A, s.33170591 allele C, rsl l263761 allele G, rs4239217 allele G, rsl l651755 allele C, rsl0908278 allele T, s.33174083 allele C, rsl l657964 allele A, rs7501939 allele T, rs8064454 allele A, s.33175746 allele G, s.33176039 allele G, rs7405696 allele G, rsl l651052 allele A, rsl l263763 allele G, rsll658063 allele C, rs9913260 allele A, rs3760511 allele T, s.33182344 allele T, s.55554247 allele G, s.55566277 allele C, s.55582344 allele G, rs2546552 allele T, s.55596785 allele G, s.55597645 allele T, s.55598078 allele C, s.55600121 allele T, s.55605246 allele T, s.55606024 allele C, s.55607242 allele A, s.55624341 allele A, s.55630396 allele C, s.55630578 allele C, s.55630679 allele C, s.55630791 allele C, s.55631170 allele A, s.55632347 allele T, s.55632363 allele T, s.55636052 allele C, s.55637350 allele A, s.55640040 allele C, s.55646568 allele G, s.55649132 allele C, s.55650629 allele C, s.55650844 allele C, s.55652397 allele A, s.55653401 allele C, s.55653991 allele T, s.55654907 allele C, s.55657973 allele A, s.55659043 allele G, s.55660011 allele A, s.55660013 allele C, s.55660139 allele A, s.55660143 allele A, s.55661660 allele T, s.55661718 allele A, rs6509476 allele C, s.55664020 allele C, s.55664897 allele A, s.55665723 allele C, s.55665726 allele C, s.55672641 allele T, s.55673254 allele A, s.55674252 allele C, s.55674254 allele T, s.55674727 allele A, s.55676073 allele T, s.55683393 allele A, s.55687122 allele T, s.55695317 allele T, s.55697027 allele A, s.55701748 allele A, rs7257447 allele A, s.55702308 allele T, s.55703568 allele A, s.55706751 allele A, s.55708051 allele A, s.55709067 allele T, s.55709498 allele G, s.55709766 allele A, s.55710030 allele G, s.55710848 allele A, s.55710851 allele T, s.55711749 allele G, s.55712802 allele C, s.55713451 allele G, s.55713453 allele T, s.55713458 allele A, s.55713862 allele A, s.55716007 allele T, s.55718272 allele T, s.55723496 allele T, s.55724346 allele C, s.55726794 allele T, s.55729556 allele C, s.55729562 allele T, s.55729563 allele C, s.55731588 allele A, s.55733658 allele T, s.55741403 allele G, s.55743524 allele G, s.55745833 allele T, s.55746123 allele C, s.55747079 allele G, s.55748269 allele A, s.55748274 allele C, s.55748844 allele G, s.55749193 allele A, s.55752178 allele C, s.55752271 allele T, s.55770158 allele G, rs7247686 allele C, s.55771401 allele C, s.55772266 allele G, s.55775314 allele A, s.55778756 allele C, s.55788661 allele A, s.55790622 allele C, s.55791942 allele G, rsl0413426 allele A, s.55798366 allele T, s.55818900 allele C, s.55822129 allele T, s.55825528 allele A, s.55825624 allele G, s.55833489 allele C, s.55833938 allele A, s.55848124 allele C, s.55848125 allele C, s.55849044 allele G, s.55857289 allele G, s.55857585 allele T, s.55861107 allele T, s.55861111 allele C, s.55861196 allele C, s.55862851 allele C, s.55865439 allele C, s.55867208 allele T, s.55867650 allele T, s.55868902 allele A, s.55870429 allele G, rs73598616 allele T, s.55874339 allele A, s.55875249 allele G, s.55875725 allele A, s.55881262 allele T, s.55882788 allele G, s.55883542 allele T, s.55886467 allele G, s.55887498 allele A, s.55889175 allele A, s.55892113 allele G, s.55892618 allele A, s.55892866 allele A, s.55893305 allele C, s.55896443 allele A, s.55896826 allele T, s.55898241 allele G, s.55898245 allele T, s.55899120 allele C, s.55900597 allele A, s.55900764 allele C, s.55912567 allele C, s.55914840 allele G, s.55915776 allele T, s.55936192 allele G, s.55940336 allele T, s.55946316 allele A, s.55949971 allele G, s.55955333 allele A, s.55962188 allele A, s.55963864 allele A, s.55969754 allele A, s.55979135 allele A, rs67367861 allele T, s.55989580 allele T, s.56004001 allele G, s.56006528 allele C, s.56012046 allele T, s.56013739 allele A, rs2411330 allele C, rs3212825 allele C, s.56018053 allele T, s.56019106 allele A, rs7246740 allele T, s.56025860 allele A, s.56026713 allele C, rs55786312 allele A, s.56026881 allele G, s.56026882 allele G, s.56027319 allele G, s.56029265 allele A, s.56029362 allele T, s.56032778 allele C, s.56032963 allele G, s.56032964 allele T, s.56033138 allele A, s.56033138 allele A, s.56033664 allele A, s.56033664 allele A, s.56036363 allele T, s.56037076 allele C, s.56037076 allele C, rs2659051 allele C, s.56038334 allele G, s.56038334 allele G, s.56039736 allele G, rs266849 allele G, s.56042100 allele G, s.56042603 allele G, s.56042603 allele G, rs2659124 allele A, rs2659124 allele A, s.56046798 allele T, rs266878 allele G, rs266878 allele G, rsl74776 allele T, rsl74776 allele T, s.56052630 allele C, s.56052630 allele C, s.56052652 allele T, s.56052652 allele T, rsl7632542 allele C, s.56053983 allele G, s.56054527 allele G, s.56054527 allele G, rs2659122 allele C, rsl058205 allele C, rsl058205 allele C, rs2569735 allele A, rs2569735 allele A, rs2735839 allele A, rs62113216 allele A, rs62113216 allele A, s.56058308 allele A, s.56058606 allele T, s.56058688 allele A, s.56058866 allele C, s.56060000 allele C, s.56061277 allele C, s.56062250 allele A, s.56066550 allele A, s.56066560 allele G, s.56066619 allele T, s.56067024 allele T, s.56067024 allele T, rs73592873 allele A, s.56076121 allele C, s.56076122 allele C, s.56078845 allele C, s.56085550 allele C, s.56093594 allele T, s.56472259 allele A, s.1030492 allele A, s.1233724 allele G, s.1251946 allele G, s.1257345 allele G, s.1258032 allele A, rs9418 allele C, s.1282167 allele C, s.1285240 allele C, s.1285775 allele T, s.1287049 allele G, s.1292191 allele T, s.1334730 allele C, s.1349759 allele C, s.1350079 allele C, rs2736108 allele C, s.1350854 allele C, rs2735948 allele A, rs2735846 allele C, s.1352392 allele A, s.1353401 allele T, rs2735946 allele T, rs2736102 allele T, rs2853666 allele G, rs2735945 allele T, s.1359165 allele T, rs4530805 allele T, s.1359765 allele C, rs61574973 allele T, s.1362904 allele G, s.1363152 allele G, rsl2332579 allele C, rs6866783 allele T, s.1365329 allele T, rsl3356727 allele G, rsl3355267 allele T, s.1366701 allele A, rsl0078017 allele C, rs4975615 allele G, rs4975616 allele G, rs6554759 allele G, rs3816659 allele A, rsl801075 allele C, rs451360 allele A, rs421629 allele A, rs380286 allele A, rs402710 allele T, rsl0073340 allele T, rs414965 allele A, rs421284 allele C, rs466502 allele G, rs465498 allele G, rs452932 allele C, rs452384 allele C, rs370348 allele G, s.1386077 allele G, s.1386169 allele A, s.1386204 allele A, s.1386674 allele C, rs457130 allele T, rs467095 allele C, s.1389243 allele G, rs462608 allele A, rs456366 allele C, s.1390106 allele A, s.1390174 allele C, rs31487 allele C, s.1395154 allele C, rs31489 allele A, rs31490 allele A, rs27996 allele G, rs27071 allele C, rs27070 allele C, rs27068 allele T, s.1401106 allele C, rs37011 allele T, s.1402130 allele C, s.1402535 allele G, rs37009 allele T, rs40182 allele A, rs37008 allele A, rs37007 allele C, s.1407027 allele G, rs40181 allele T, s.1407682 allele T, rs37006 allele T, s.1408859 allele T, rs37005 allele T, s.1409771 allele C, rs37002 allele T, s.1411822 allele T, s.1411901 allele C, s.1412098 allele T, rs31494 allele T, s.1418662 allele C, s.1419748 allele A, s.1426206 allele A, s.1426336 allele C, s.1428371 allele C, s.1428373 allele C, s.1472454 allele C, s.1518154 allele A, s.1557827 allele C, rsll743119 allele G, s.1583465 allele T, rs4551123 allele A, s.1589581 allele C, s.1591616 allele G, s.1607388 allele C, rs6893515 allele C, s.1618305 allele G, s.1621550 allele T, s.1621551 allele G, rs6892057 allele C, s.1638061 allele T, rs6898387 allele T, rs7724451 allele A, rs2937006 allele G, s.1663985 allele G, s.1667254 allele G, s.1668831 allele C, s.1673499 allele G, s.1737379 allele A, s.1756873 allele C, s.1782909 allele A, s.1788485 allele G, s.1799150 allele G, s.1800043 allele G, s.1804565 allele G, s.1812409 allele A, s.886453 allele A, and s.887600 allele T, which are marker alleles listed in Table 1 herein, are indicative of reduced PSA levels in the individual. These alleles are predicted to lead to reduced PSA levels. Thus, a corrected PSA value for the individual for the particular marker allele will be greater than an uncorrected PSA value.
Methods of diagnosing Prostate Cancer
Prostate Specific Antigen (PSA) is a protein that is secreted by the epithelial cells of the prostate gland, including cancer cells. PSA is concentrated in prostatic tissue, and serum PSA levels are normally very low. Disruption of the normal prostate architecture, for example by prostatic disease, inflammation or trauma, allows greater amounts of PSA to enter the circulation . Thus, an elevated level in the blood indicates an abnormal condition of the prostate, either benign or malignant. PSA is used to detect potential problems in the prostate gland and to follow the progress of prostate cancer therapy.
After the introduction of PSA testing, a dramatic increase in diagnosis of prostate cancer was observed. Subsequently, a gradual decline in prostate cancer mortality in the US has been observed (Ries, L.A., et al.SEER Cancer Statistics Review, 1975 - 2005, National Cancer Institute, Bethesda, MD, http://seer.cancer.gov/csr/1975_2005/) . Most cases of prostate cancer in the US are identified based on results of PSA testing. There is also evidence that PSA screening has led to a substantial shift towards detection of prostate cancer at earlier stages (Etzioni, R., et al. Med Decis Making 28: 323 (2008)) . Recent studies have also indicated that there is a modest reduction in prostate cancer deaths among those screened for PSA compared with those that were not (Schroder, F.H., et al. N Engl J Med 360 : 11320-8 (2009); Andriole, G.L. et al. N Engl J Med 360 : 1310-19 (2009)) . A cutoff of 4ng/mL PSA in human serum is typically used for selection of individuals for further screening, including prostate biopsy.
The decision to proceed with prostate biopsy is usually made based on results of a PSA assay, which is sometimes also followed by a Digital Rectal Examination (DRE) . Results of PSA assay, alone or in combination with results of DRE, are used to select those individuals for prostate biopsy. Further factors may be considered, including free and total PSA, age of the patient, the rate of PSA change with age (PSA velocity), family history, ethnicity, history of prior biopsy and combordity.
Currently, the specificity of PSA testing using a cutoff level of 4ng/mL is about 60 to 70% (Brawer, M .K., CA Cancer J Clin 49 : 264 (1999)). Because PSA levels tend to increase with age, ranging from 0-2.5ng/mL in individuals age 40-49 to 0-6.5ng/mL in individuals age 70-79 (Caucasians), it has been suggested that a higher "normal" value of PSA should be used for older individuals. However, it is clear that such increase in the applied cutoff values will lead to increased number of missed cancers in older men .
Prostate cancer is not limited to men with high PSA values. On the contrary, it has been found that even with men with PSA levels below 4.0ng/mL, prostate cancer is fairly common
(Thompson, I.M ., et al. N Engl J Med 350 : 2239 (2004)), and in fact as much as 50 to 80% of prostate cancer is missed by applying this cutoff. Thus, while widespread PSA testing has been criticized as leading to overdetection of prostate cancer, possibly leading to overtreatment, it is also clear that many cases of prostate cancer are silent to current guidelines of PSA testing . As a consequence, biopsies are sometimes also done at lower PSA levels than 4ng/mL
Since it is known that PSA levels vary considerably in the population, and that this variation is to a large extent due to genetic factors, it is likely that a correction of PSA values of any particular individual based on the individual's genotype at genetic markers known to affect PSA levels could lead to significantly improved utility - through increased specificity and sensitivity - of PSA screening for reducing prostate cancer mortality in the population .
Correcting PSA levels by the methods described herein may in certain cases lead to corrected PSA values that are below the cutoff applied (such as 4ng/mL), even though the uncorrected PSA value is above the threshold. This means that some individuals, who otherwise would undergo further diagnostic evaluation might not be selected for such follow-up, since it is likely that their increased uncorrected PSA value is due to natural fluctuations in PSA levels in the population rather than an actual underlying disease. However, in some cases corrected PSA values will be significantly higher than uncorrected values, and this could mean that individuals who normally would not be selected for further follow-up because their uncorrected PSA level is below the threshold applied for further clinical evaluation would, based on the corrected PSA values, be considered at risk for prostate cancer and thus selected for further evaluation. For example, let's consider a case where an individual is determined to have an uncorrected PSA value of 3.0. If this individual is determined not to carry the T allele of rsl7632542, which leads to significantly elevated PSA levels (39-100% increase per allele), i.e. the individual is homozygous for the alternate C allele of rsl7632542, then it is clear that the individual's PSA level is lower compared with the population in general because of the lack of the T allele in the individual's genome. The T allele is very common in the population (91% in Iceland, 93% in the UK), which means that the average PSA levels in the population are greatly affected by this allele. The corrected PSA value for this particular individual would be above the threshold of 4.0 that is routinely used for screening, and therefore the individual would undergo further testing, either DRE or biopsy, or both .
As further illustrated herein, the benefit of applying a correction to observed (uncorrected) PSA levels can be striking . For example, when considering the exemplary data as described in Example 2 herein, the personalized cutoff value of 4ng/mL is in some cases shifted dramatically when correction for variants affecting PSA levels is applied. Thus, in the particular example shown in Example 2 herein, in certain cases some individuals with apparent PSA levels of 4.0ng/mL, the corrected PSA value in those individuals may be as high as 5-8ng/mL or as low as l-2ng/ml_. Further examples illustrating the usefulness of applying the PSA correction are described in Example 5 and Example 6 herein.
Thus, corrected PSA levels as determined by the methods described herein could have enormous implications for the management of prostate cancer, since PSA screening based on PSA values corrected for genetic background will better reflect physical changes in the individual (e.g., prostate cancer or other prostate disease) than do uncorrected PSA values, which may be largely dominated by inherent PSA levels, and not necessarily representing underlying disease.
As a consequence, the present invention provides diagnostic applications based on the determination of corrected PSA quantity. In one such application, a method of diagnostic evaluation of prostate cancer in a human individual is provided, the method comprising :
(a) Detecting an uncorrected PSA quantity in a first sample from the human individual;
(b) Obtaining sequence data about at least one polymorphic marker in the first sample or in a second sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA levels in humans;
(c) Determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker;
(d) Comparing the corrected PSA quantity determined in (c) with a reference range of normal PSA quantity in humans;
wherein determination of a corrected PSA quantity that is greater than the reference range is indicative of suspected prostate cancer in the individual.
In another aspect, the invention provides a method of diagnosis of prostate cancer in humans, the method comprising : (a) Obtaining an uncorrected PSA quantity in a first biological sample from the human individual;
(b) Obtaining sequence data about at least one polymorphic marker in the first biological sample or in a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans;
(c) Determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker;
(d) Determining whether the corrected PSA quantity is greater than normal PSA quantity in humans;
(e) Performing a further diagnostic evaluation procedure selected from the group consisting of rectal ultrasound imaging and prostate biopsy on the individual if the corrected PSA quantity is determined to be greater than the reference range;
wherein determination of a positive outcome of the ultrasound imaging or prostate biopsy is indicative of prostate cancer in the individual.
In certain embodiments, the obtaining of uncorrected PSA quantity comprises detecting the PSA quantity in a first biological sample from the individual.
A further aspect provides a method of diagnosis of prostate cancer, the method comprising :
Analyzing corrected PSA quantity of a human individual, wherein if the corrected PSA levels of the human individual are determined to be greater than normal PSA quantity in humans, a further diagnostic evaluation selected from the group consisting of rectal ultrasound imaging and prostate biopsy is performed; and
wherein determination of a positive outcome of the further diagnostic evaluation is indicative of prostate cancer in the individual. Preferably, the corrected PSA quantity is determined using any one of the methods of determining corrected PSA quantity described herein .
A further diagnostic application relates to selection processes for individuals who are undergoing evaluation for prostate cancer. For example, an individual who is a candidate for further diagnostic evaluation for prostate cancer can be selected by (a) obtaining data representing uncorrected values of PSA quantity in the individual; (b) determining, in the genome of the human individual, the allelic identity of at least one allele of at least one polymorphic marker, wherein different alleles of the at least one marker are associated with different levels of PSA quantity in humans, and wherein the at least one marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsll067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith; (c) determining a corrected PSA quantity in the individual based on the allelic identity of the at least one polymorphic marker; and (d) identifying the subject as a subject who is a candidate for further diagnostic evaluation for prostate cancer if said corrected PSA quantity is greater than values of normal PSA quantity in humans. The invention further provides methods of treatment of prostate cancer diagnosed by the diagnostic methods described herein . Thus, methods of diagnosing prostate cancer as described herein may in certain embodiment comprise an additional step of treatment of prostate cancer, wherein the treatment is selected from the group consisting of surgery, radiation therapy, proton therapy, hormonal therapy and chemotherapy.
A further aspect of the invention relates to a method of treatment of prostate cancer, the method comprising (i) determining a corrected PSA quantity in the individual, wherein the corrected PSA quantity is determined based on the allelic identity of at least one allele of at least one polymorphic marker, wherein different alleles of the at least one marker are associated with different levels of PSA quantity in humans, and wherein the at least one marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith; and (ii) performing a prostate biopsy if the corrected PSA quantity is greater than values of normal PSA quantity in humans; wherein if the individual is determined to have prostate cancer based on the prostate biopsy, the individual is selected for at least one treatment module selected from the group consisting of surgery, radiation therapy, proton therapy, hormonal therapy and chemotherapy.
The range of normal PSA quantity in humans may in certain embodiments by less than 50ng/mL, less than 40ng/mL, less than 30ng/mL, less than 20ng/mL, less than lOng/mL, less than 9ng/mL, less than 8ng/mL, less than 7ng/mL, less than 6ng/mL, less than 5ng/mL, less than 4ng/mL, less than 3.5ng/mL, less than 3.0ng/mL, less than 2.5ng/mL, less than 2.0ng/mL, less than 1.5ng/mL, less than l .Ong/mL or less than 0.5ng/mL In one preferred embodiment, normal PSA quantity in humans is less than 4.0ng/mL In another preferred embodiment, normal PSA quantity in humans is less than 3.5ng/mL In another preferred embodiment, normal PSA quantity is less than 3.0ng/mL In another preferred embodiment, normal PSA quantity is less than 2.5ng/mL Other appropriate cutoff values bridging any of the above numbers may also be suitably be selected as appropriate values for normal PSA levels in humans.
In certain cases, the human individual is in a particular age group. For example, the individual may be less than age 40, the individual may be age 40 - 49, age 50 - 59, age 60 - 69, age 70 - 79, age 70 or higher. In certain such embodiments, the normal PSA quantity is determined in the same age group as the individual . For example, if the individual is in the age 40 - 49, the reference value of normal PSA quantity in humans is suitably determined in individuals age 40 - 49. The invention is applicable to any particular age range, and all age ranges are contemplated and within scope of the invention. In preferred embodiments, normal PSA values are determined in the same age range as the individual who is undergoing diagnostic evaluation.
In preferred embodiments, PSA is determined in human blood samples, in particular in human serum. However, the present invention is applicable for correcting PSA levels determined in any human tissue Methods of determining a susceptibility to Prostate Cancer The present invention also provides methods of determining a susceptibility to prostate cancer. It has been discovered that allele T of the marker rsl7632542 is indicative of increased susceptibility of prostate cancer in humans (OR = 1.39; P-value 1.8xl0"10) . This marker, and other markers in linkage disequilibrium therewith, is therefore useful for determining a susceptibility to prostate cancer.
As a consequence, in one aspect the invention provides a method of determining a susceptibility to prostate cancer, the method comprising analyzing nucleic acid sequence data from a human individual for at least one polymorphic marker selected from the group consisting of rsl7632542, and markers in linkage disequilibrium therewith, wherein different alleles of the at least one polymorphic marker are associated with different susceptibilities to prostate cancer in humans, and determining a susceptibility to prostate cancer from the nucleic acid sequence data.
In certain embodiments, markers in linkage disequilibrium with rsl7632542 are in linkage disequilibrium as characterized by values of r2 with rsl7632542 of 0.2 or greater. In certain embodiments, markers in linkage disequilibrium with rsl7632542 are selected from the group consisting of s.55554247, s.55566277, s.55582344, rs2546552, s.55596785, s.55597645, s.55598078 s.55600121 s.55605246, s.55606024, s.55607242, s.55624341, s.55630396, s.55630578 s.55630679 s.55630791, s.55631170, s.55632347, s.55632363, s.55636052, s.55637350 s.55640040 s.55646568, s.55649132, s.55650629, s.55650844, s.55652397, s.55653401 s.55653991 s.55654907, s.55657973, s.55659043, s.55660011, s.55660013, s.55660139 s.55660143 s.55661660, s.55661718, rs6509476, s.55664020, s.55664897, s.55665723 s.55665726 s.55672641, s.55673254, s.55674252, s.55674254, s.55674727, s.55676073 s.55683393 s.55687122, s.55695317, s.55697027, s.55701748, rs7257447, s.55702308 s.55703568 s.55706751, s.55708051, s.55709067, s.55709498, s.55709766, s.55710030 s.55710848 s.55710851, s.55711749, s.55712802, s.55713451, s.55713453, s.55713458 s.55713862 s.55716007, s.55718272, s.55723496, s.55724346, s.55726794, s.55729556 s.55729562 s.55729563, s.55731588, s.55733658, s.55741403, s.55743524, s.55745833 s.55746123 s.55747079, s.55748269, s.55748274, s.55748844, s.55749193, s.55752178 s.55752271 s.55770158, rs7247686, s.55771401, s.55772266, s.55775314, s.55778756 s.55788661 s.55790622, s.55791942, rsl0413426, s.55798366, s.55818900, s.55822129 s.55825528 s.55825624, s.55833489, s.55833938, s.55848124, s.55848125, s.55849044 s.55857289 s.55857585, s.55861107, s.55861111, s.55861196, s.55862851, s.55865439 s.55867208 s.55867650, s.55868902, s.55870429, rs73598616, s.55874339, s.55875249 s.55875725 s.55881262, s.55882788, s.55883542, s.55886467, s.55887498, s.55889175 s.55892113 s.55892618, s.55892866, s.55893305, s.55896443, s.55896826, s.55898241 s.55898245 s.55899120, s.55900597, s.55900764, s.55912567, s.55914840, s.55915776 s.55936192 s.55940336, s.55946316, s.55949971, s.55955333, s.55962188, s.55963864 s.55969754 s.55979135, rs67367861, s.55989580, s.56004001, s.56006528, s.56012046 s.56013739 rs2411330, rs3212825, s.56018053, s.56019106, rs7246740, s.56025860 s.56026713 rs55786312, s.56026881, s.56026882, s.56027319, s.56029265, s.56029362, s.56032778, s.56032963, s.56032964, s.56033138, s.56033138, s.56033664, s.56033664, s.56036363, s.56037076, s.56037076, s.56038334, s.56038334, s.56039736, s.56042100, s.56042603, s.56042603, rs2659124, rs2659124, s.56046798, rs266878, rs266878, rsl74776, rsl74776, s.56052630, s.56052630, s.56052652, s.56052652, s.56053983, s.56054527, s.56054527, rsl058205, rsl058205, rs2569735, rs2569735, rs2735839, rs62113216, rs62113216, s.56058308, s.56058606, s.56058688, s.56058866, s.56060000, s.56061277, s.56062250, s.56066550, s.56066560, s.56066619, s.56067024, s.56067024, rs73592873, s.56076121, s.56076122, s.56078845, s.56085550, s.56093594, s.56472259, and rs273622.
In certain embodiments, determination of the presence of the T allele of rsl7632542 is indicative of increased susceptibility to prostate cancer in the individual. Other marker alleles indicative of increased susceptibility to prostate cancer may also be suitably selected using the information provided in Table 1. In certain embodiments, marker alleles indicative of increased susceptibility in humans are selected from the group consisting of s.55554247 allele A, s.55566277 allele T, s.55582344 allele C, rs2546552 allele G, s.55596785 allele T, s.55597645 allele A, s.55598078 allele A, s.55600121 allele A, s.55605246 allele G, s.55606024 allele A, s.55607242 allele G, s.55624341 allele C, s.55630396 allele T, s.55630578 allele T, s.55630679 allele T, s.55630791 allele T, s.55631170 allele C, s.55632347 allele A, s.55632363 allele A, s.55636052 allele T, s.55637350 allele C, s.55640040 allele T, s.55646568 allele A, s.55649132 allele T, s.55650629 allele A, s.55650844 allele G, s.55652397 allele G, s.55653401 allele T, s.55653991 allele A, s.55654907 allele A, s.55657973 allele G, s.55659043 allele A, s.55660011 allele G, s.55660013 allele T, s.55660139 allele T, s.55660143 allele T, s.55661660 allele C, s.55661718 allele T, rs6509476 allele A, s.55664020 allele G, s.55664897 allele T, s.55665723 allele G, s.55665726 allele G, s.55672641 allele C, s.55673254 allele G, s.55674252 allele G, s.55674254 allele A, s.55674727 allele T, s.55676073 allele A, s.55683393 allele G, s.55687122 allele A, s.55695317 allele A, s.55697027 allele C, s.55701748 allele C, rs7257447 allele T, s.55702308 allele A, s.55703568 allele T, s.55706751 allele T, s.55708051 allele T, s.55709067 allele A, s.55709498 allele T, s.55709766 allele T, s.55710030 allele C, s.55710848 allele T, s.55710851 allele A, s.55711749 allele A, s.55712802 allele G, s.55713451 allele T, s.55713453 allele G, s.55713458 allele C, s.55713862 allele T, s.55716007 allele G, s.55718272 allele A, s.55723496 allele C, s.55724346 allele T, s.55726794 allele G, s.55729556 allele A, s.55729562 allele G, s.55729563 allele A, s.55731588 allele G, s.55733658 allele G, s.55741403 allele C, s.55743524 allele T, s.55745833 allele A, s.55746123 allele T, s.55747079 allele T, s.55748269 allele T, s.55748274 allele T, s.55748844 allele T, s.55749193 allele G, s.55752178 allele T, s.55752271 allele A, s.55770158 allele A, rs7247686 allele T, s.55771401 allele T, s.55772266 allele C, s.55775314 allele C, s.55778756 allele G, s.55788661 allele G, s.55790622 allele T, s.55791942 allele A, rsl0413426 allele G, s.55798366 allele G, s.55818900 allele G, s.55822129 allele C, s.55825528 allele G, s.55825624 allele T, s.55833489 allele T, s.55833938 allele G, s.55848124 allele G, s.55848125 allele G, s.55849044 allele A, s.55857289 allele T, s.55857585 allele A, s.55861107 allele G, s.55861111 allele A, s.55861196 allele T, s.55862851 allele T, s.55865439 allele T, s.55867208 allele A, s.55867650 allele G, s.55868902 allele G, s.55870429 allele C, rs73598616 allele G, s.55874339 allele T, s.55875249 allele C, s.55875725 allele C, s.55881262 allele A, s.55882788 allele T, s.55883542 allele C, s.55886467 allele T, s.55887498 allele T, s.55889175 allele G, s.55892113 allele A, s.55892618 allele T, s.55892866 allele T, s.55893305 allele G, s.55896443 allele G, s.55896826 allele A, s.55898241 allele T, s.55898245 allele A, s.55899120 allele T, s.55900597 allele G, s.55900764 allele A, s.55912567 allele T, s.55914840 allele A, s.55915776 allele G, s.55936192 allele T, s.55940336 allele C, s.55946316 allele G, s.55949971 allele C, s.55955333 allele G, s.55962188 allele T, s.55963864 allele G, s.55969754 allele T, s.55979135 allele T, rs67367861 allele C, s.55989580 allele A, s.56004001 allele A, s.56006528 allele G, s.56012046 allele G, s.56013739 allele G, rs2411330 allele G, rs3212825 allele G, s.56018053 allele G, s.56019106 allele C, rs7246740 allele A, s.56025860 allele G, s.56026713 allele T, rs55786312 allele T, s.56026881 allele A, s.56026882 allele A, s.56027319 allele A, s.56029265 allele C, s.56029362 allele G, s.56032778 allele G, s.56032963 allele T, s.56032964 allele G, s.56033138 allele G, s.56033138 allele G, s.56033664 allele T, s.56033664 allele T, s.56036363 allele G, s.56037076 allele T, s.56037076 allele T, s.56038334 allele A, s.56038334 allele A, s.56039736 allele C, s.56042100 allele C, s.56042603 allele A, s.56042603 allele A, rs2659124 allele T, rs2659124 allele T, s.56046798 allele C, rs266878 allele C, rs266878 allele C, rsl74776 allele C, rsl74776 allele C, s.56052630 allele T, s.56052630 allele T, s.56052652 allele C, s.56052652 allele C, s.56053983 allele C, s.56054527 allele T, s.56054527 allele T, rsl058205 allele T, rsl058205 allele T, rs2569735 allele G, rs2569735 allele G, rs2735839 allele G, rs62113216 allele T, rs62113216 allele T, s.56058308 allele G, s.56058606 allele A, s.56058688 allele T, s.56058866 allele T, s.56060000 allele A, s.56061277 allele G, s.56062250 allele C, s.56066550 allele T, s.56066560 allele C, s.56066619 allele G, s.56067024 allele C, s.56067024 allele C, rs73592873 allele G, s.56076121 allele G, s.56076122 allele G, s.56078845 allele G, s.56085550 allele G, s.56093594 allele G, s.56472259 allele C, and rs273622 allele A. Determination of the absence of at least one of the at-risk alleles recited above is indicative of a decreased risk of prostate cancer for the human individual. As a consequence, in certain embodiments, the analyzing comprises determining the presence or absence of at least one at- risk allele of the polymorphic marker. Individuals who are homozygous for at-risk alleles are at particularly high risk. Thus, in certain embodiments determination of the presence of two alleles of one or more of the above-recited risk alleles is indicative of particularly high risk
(susceptibility) of prostate cancer.
Alternatively, the allele that is detected can be the allele of the complementary strand of DNA. This means that that the nucleic acid sequence data may include the identification of at least one allele which is complementary to any of the alleles of the polymorphic markers referenced above. In certain embodiments, the nucleic acid sequence data is obtained from a biological sample containing nucleic acid from the human individual . The nucleic acids sequence may suitably be obtained using a method that comprises at least one procedure selected from (i) amplification of nucleic acid from the biological sample; (ii) hybridization assay using a nucleic acid probe and nucleic acid from the biological sample; and (iii) hybridization assay using a nucleic acid probe and nucleic acid obtained by amplification of the biological sample. The nucleic acid sequence data may also be obtained from a preexisting record. For example, the preexisting record may comprise a genotype dataset for at least one polymorphic marker. In certain embodiments, the determining comprises comparing the sequence data to a database containing correlation data between the at least one polymorphic marker and susceptibility to the condition.
It is contemplated that in certain embodiments of the invention, it may be convenient to prepare a report of results of risk assessment. Thus, certain embodiments of the methods of the invention comprise a further step of preparing a report containing results from the
determination, wherein said report is written in a computer readable medium, printed on paper, or displayed on a visual display. In certain embodiments, it may be convenient to report results of susceptibility to at least one entity selected from the group consisting of the individual, a guardian of the individual, a genetic service provider, a physician, a medical organization, and a medical insurer.
In certain embodiments, determination of the presence of at least one copy of the T allele of rsl7632542 in the genome of an individual is indicative of increased risk of prostate cancer with an early age of onset. In other embodiments, determination of the presence of at least one copy of a marker allele in linkage disequilibrium with the T allele of rsl7632542 is indicative of increased risk of prostate cancer with an early age of onset. Individuals who are homozygous for such risk alleles are at particularly increased risk of prostate cancer with an early onset. In certain embodiments, the age of onset of prostate cancer is below 50 years. In certain embodiments, the age of onset of prostate cancer is below 45 years. In certain embodiments, the age of onset of prostate cancer is below 40 years.
An individual who is at an increased susceptibility (i.e., increased risk) for prostate cancer is an individual in whom at least one specific allele at one or more polymorphic marker, or haplotype, conferring increased susceptibility (increased risk) for the disease is identified (i.e., at-risk marker alleles or haplotypes) . The at-risk marker or haplotype is one that confers an increased risk (increased susceptibility) of the disease. In one embodiment, significance associated with a marker or - is measured by a relative risk (RR) . In another embodiment, significance associated with a marker or haplotype is measured by an odds ratio (OR) . In a further embodiment, the significance is measured by a percentage. In one embodiment, a significant increased risk is measured as a risk (relative risk and/or odds ratio) of at least 1.1, including but not limited to: at least 1.15, at least 1.20, at least 1.25, at least 1.30, at least 1.35, at least 1.40, at least 1.45, at least 1.5, at least 1.6, at least 1.7, at least 1.8, at least 1.9, and at least 2.0. In a particular embodiment, a risk (relative risk and/or odds ratio) of at least 1.2 is significant. In another particular embodiment, a risk of at least 1.30 is significant. In yet another embodiment, a risk of at least 1.35 is significant. In a further embodiment, a relative risk of at least 1.5 is significant. However, other cutoffs are also contemplated, e.g., at least 1.15, 1.25, 1.35, and so on, and such cutoffs are also within scope of the present invention. In other embodiments, a significant increase in risk is at least about 20%, including but not limited to about 20%, 25%, 30%, 35%, 40%, 45%, 50%, 60%, 70%, 80%, 90%, and 100%. In certain embodiments, a significant increase in risk is characterized by a p-value, such as a p-value of less than 0.05, less than 0.01, less than 0.001, less than 0.0001, less than 0.00001, less than 0.000001, less than 0.0000001, less than 0.00000001, or less than 0.000000001.
An at-risk polymorphic marker as described herein is one where at least one allele of at least one marker or haplotype is more frequently present in an individual at risk for prostate cancer
(affected), or diagnosed with prostate cancer, compared to the frequency of its presence in a comparison group (control), such that the presence of the at least one allele of the at least one marker or haplotype is indicative of susceptibility to prostate cancer. The control group may in one embodiment be a population sample, i.e. a random sample from the general population . In another embodiment, the control group is represented by a group of individuals who are disease- free, i.e. not diagnosed with prostate cancer.
The person skilled in the art will appreciate that for markers with two alleles present in the population being studied (such as SNPs), and wherein one allele is found in increased frequency in a group of individuals with a trait or disease in the population, compared with controls, the other allele of the marker will be found in decreased frequency in the group of individuals with the trait or disease, compared with controls. In such a case, one allele of the marker (the one found in increased frequency in individuals with the trait or disease) will be the at-risk allele, while the other allele will be a protective allele.
Thus, in other embodiments of the invention, an individual who is at a decreased susceptibility (i.e., at a decreased risk) for prostate cancer is an individual in whom at least one specific allele at one or more polymorphic marker or haplotype conferring decreased susceptibility for prostate cancer is identified. The marker alleles conferring decreased risk are also said to be protective. In one aspect, the protective marker or haplotype is one that confers a significant decreased risk (or susceptibility) of prostate cancer. In one embodiment, significant decreased risk is measured as a relative risk (or odds ratio) of less than 0.9, including but not limited to less than 0.8, less than 0.7, less than 0.6, and less than 0.5. In one particular embodiment, significant decreased risk is less than 0.80. In another embodiment, significant decreased risk is less than 0.75. In yet another embodiment, significant decreased risk is less than 0.70. In another embodiment, the decrease in risk (or susceptibility) is at least 20%, including but not limited to at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, and at least 50%. Other cutoffs or ranges as deemed suitable by the person skilled in the art to characterize the invention are however also
contemplated, and those are also within scope of the present invention.
For both single-marker and haplotype analyses, relative risk (RR) and the population attributable risk (PAR) can be calculated assuming a multiplicative model (haplotype relative risk model) (Terwilliger, J.D. & Ott, J ., Hum. Hered. 42: 337-46 (1992) and Falk, C.T. & Rubinstein, P, Ann. Hum. Genet. 51 (Pt 3) : 227-33 (1987)), i .e., that the risks of the two alleles/haplotypes a person carries multiply. For example, if RR is the risk of A relative to a, then the risk of a person homozygote AA will be RR times that of a heterozygote Aa and RR2 times that of a homozygote aa. The multiplicative model has a nice property that simplifies analysis and computations— haplotypes are independent, i.e., in Hardy-Weinberg equilibrium, within the affected population as well as within the control population . As a consequence, haplotype counts of the affected and controls each have multinomial distributions, but with different haplotype frequencies under the alternative hypothesis. Specifically, for two haplotypes, Λ, and h risk(ft)/risk(ftj) = (fi/Pi)/(fj/Pj), where fand p denote, respectively, frequencies in the affected population and in the control population. While there is some power loss if the true model is not multiplicative, the loss tends to be mild except for extreme cases. Most importantly, p-values are always valid since they are computed with respect to null hypothesis.
Number of Polymorphic Markers/Genes Analyzed
With regard to the methods described herein, the methods can comprise obtaining sequence data about any number of polymorphic markers and/or about any number of genes. For example, the method can comprise obtaining sequence data for about at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 100, 500, 1000, 10,000 or more polymorphic markers. The markers can be independent and/or the markers may be in linkage disequilibrium. The markers may also form a haplotype. The polymorphic markers can be the ones of the group specified herein or they can be different polymorphic markers that are not listed herein, including, for example, polymorphic markers in linkage disequilibrium with the markers described herein. In a specific embodiment, the method comprises obtaining sequence data about at least two polymorphic markers. In certain embodiments, each of the markers may be associated with a different gene. For example, in some instances, if the method comprises obtaining nucleic acid data about a human individual identifying at least one allele of a polymorphic marker, then the method comprises identifying at least one allele of at least one polymorphic marker. Also, for example, the method can comprise obtaining sequence data about a human individual identifying alleles of multiple, independent markers or haplotypes, which are not in linkage disequilibrium. In another specific embodiment of the invention, the method comprises obtaining nucleic acid sequence data about at least one polymorphic marker from associated with at least one gene selected from the group consisting of the KLK3 gene, the HNF1B gene, the FGFR2 gene, the TBX3 gene, the MSMB gene and the TERT gene.
Obtaining nucleic acid sequence data
Sequence data can be nucleic acid sequence data, which may be obtained by means known in the art. For example, nucleic acid sequence data may be obtained through direct analysis of the sequence of the polymorphic position (allele) of a polymorphic marker. Suitable methods, some of which are described herein, include, for instance, whole genome analysis using a whole genome SNP chip (e.g., Infinium HD BeadChip), cloning for polymorphisms, non-radioactive PCR-single strand conformation polymorphism analysis, denaturing high pressure liquid chromatography (DHPLC), DNA hybridization, computational analysis, single-stranded conformational polymorphism (SSCP), restriction fragment length polymorphism (RFLP), automated fluorescent sequencing; clamped denaturing gel electrophoresis (CDGE); denaturing gradient gel electrophoresis (DGGE), mobility shift analysis, restriction enzyme analysis;
heteroduplex analysis, chemical mismatch cleavage (CMC), RNase protection assays, use of polypeptides that recognize nucleotide mismatches, such as E. coli mutS protein, allele-specific PCR, and direct manual and automated sequencing. These and other methods are described in the art (see, for instance, Li et al., Nucleic Acids Research, 28(2) : el (i-v) (2000); Liu et al., Biochem Cell Bio 80 : 17-22 (2000); and Burczak et al., Polymorphism Detection and Analysis, Eaton Publishing, 2000; Sheffield et al., Proc. Natl. Acad. Sci. USA, 86: 232-236 (1989); Orita et al., Proc. Natl. Acad. Sci. USA, 86: 2766-2770 (1989); Flavell et al., Cell, 15 : 25-41 (1978); Geever et al., Proc. Natl. Acad. Sci. USA, 78: 5081-5085 (1981); Cotton et al., Proc. Natl. Acad. Sci. USA, 85 :4397-4401 (1985); Myers et al., Science 230 : 1242-1246 (1985); Church and Gilbert, Proc. Natl. Acad. Sci. USA, 81 : 1991-1995 (1988); Sanger et al., Proc. Natl. Acad. Sci. USA, 74: 5463-5467 (1977); and Beavis et al., U .S. Patent No. 5,288,644).
In a general sense, sequence data establishes the identity of particular nucleotide along a nucleic acid molecule. For polymorphic sites, sequence data established the identity of particular alleles at the polymorphic site. In certain embodiments, sequence data establishes whether particular alleles are present or absent at a polymorphic site.
The sequence data may be obtained from a first sample that is also used to determine PSA values. Alternatively, the sequence data is obtained from a second sample. Nucleic acid sequence data is preferably obtained from a sample that contains nucleic acid, preferably genomic nucleic acid.
Recent technological advances have resulted in technologies that allow massive parallel sequencing, also called high-throughput sequencing, to be performed in relatively condensed format. These technologies share sequencing-by-synthesis principle for generating sequence information, with different technological solutions implemented for extending, tagging and detecting sequences. Exemplary high-throughput sequencing technologies include 454 pyrosequencing technology (Nyren, P. et al. Anal Biochem 208: 171-75 (1993);
http://www.454.com), Illumina Solexa sequencing technology (Bentley, D.R. Curr Opin Genet Dev 16: 545-52 (2006); http://www.illumina .com), and the SOLiD technology developed by Applied Biosystems (ABI) (http ://www.appliedbiosystems.com; see also Strausberg, R.L., et al. Drug Disc Today 13 : 569-77 (2008)) . Other sequencing technologies include those developed by Pacific Biosciences (http://www.pacificbiosciences.com), Complete Genomics
(http://www.completegenomics.com), Intelligen Bio-Systems
(http://www.intelligentbiosystems.com), Oxford Nanopore Technologies
(http://www. nanoporetech .com/), Genome Corp (http://www.genomecorp.com), ION Torrent Systems (http://www.iontorrent.com) and Helicos Biosciences (http://www. helicosbio.som) . It is contemplated that sequence data useful for performing the present invention may be obtained by any such sequencing method, or other sequencing methods that are developed or made available. Thus, any sequence method that provides the allelic identity at particular polymorphic sites {e.g., the absence or presence of particular alleles at particular polymorphic sites) is useful in the methods described and claimed herein .
Alternatively, determination of the presence or absence of particular alleles can be accomplished using a hybridization method (see Current Protocols in Molecular Biology , Ausubel et al., eds., John Wiley & Sons, including all supplements) . A biological sample of genomic DNA, RNA, or cDNA (a "test sample") is obtained from a test subject or individual suspected of having, being susceptible to, experiencing symptoms associated with, or predisposed for eosinophilia, asthma, and/or myocardial infarction (the "test subject") . The subject can be an adult, child, or fetus. A test sample of DNA from fetal cells or tissue can be obtained by appropriate methods, such as by amniocentesis or chorionic villus sampling. The DNA, RNA, or cDNA sample is then examined. The presence of a specific marker allele can be indicated by sequence-specific hybridization of a nucleic acid probe specific for the particular allele. The presence of more than one specific marker allele or a specific haplotype can be indicated by using several sequence-specific nucleic acid probes, each being specific for a particular allele. In one embodiment, a haplotype can be indicated by a single nucleic acid probe that is specific for the specific haplotype (i.e., hybridizes specifically to a DNA strand comprising the specific marker alleles characteristic of the haplotype) . A sequence-specific probe can be directed to hybridize to genomic DNA, RNA, or cDNA. A "nucleic acid probe", as used herein, can be a DNA probe or an RNA probe that hybridizes to a complementary sequence. One of skill in the art would know how to design such a probe so that sequence specific hybridization will occur only if a particular allele is present in a genomic sequence from a test sample.
To determine whether particular alleles are present at a polymorphic site, a hybridization sample can be formed by contacting the test sample, such as a genomic DNA sample, with at least one nucleic acid probe. A non-limiting example of a probe for detecting mRNA or genomic DNA is a labeled nucleic acid probe that is capable of hybridizing to mRNA or genomic DNA sequences described herein. The nucleic acid probe can be, for example, a full-length nucleic acid molecule, or a portion thereof, such as an oligonucleotide of at least 10, 15, 30, 50, 100, 250 or 500 nucleotides in length that is sufficient to specifically hybridize under stringent conditions to appropriate mRNA or genomic DNA. In certain embodiments, the nucleic acid probe is capable of hybridizing specifically under stringent conditions to a nucleic acid molecule with sequence as set forth in any one of SEQ ID NO: 1-728, or a nucleic acid molecule with the complementary sequence of any one of SEQ ID NO: 1-728. Other suitable probes for use in the diagnostic assays of the invention are described herein . Hybridization can be performed by methods well known to the person skilled in the art (see, e.g., Current Protocols in Molecular Biology, Ausubel et al., eds., John Wiley & Sons, including all supplements) . In one embodiment, hybridization refers to specific hybridization, i.e., hybridization with no mismatches (exact hybridization) . In one embodiment, the hybridization conditions for specific hybridization are high stringency.
Specific hybridization, if present, is detected using standard methods. If specific hybridization occurs between the nucleic acid probe and the nucleic acid in the test sample, then the sample contains the allele that is complementary to the nucleotide that is present in the nucleic acid probe. The process can be repeated for any markers of the invention, or markers that make up a haplotype of the invention, or multiple probes can be used concurrently to detect more than one marker alleles at a time.
In certain embodiments, nucleic acid sequence data is obtained by a method that comprises at least one procedure selected from the group consisting of amplification of nucleic acid from a first or second biological sample, hybridization assay using a nucleic acid probe and nucleic acid from the first or second biological sample, and hybridization assay using a nucleic acid probe and nucleic acid obtained by amplification of nucleic acid from the first or second biological sample.
Allele-specific oligonucleotides can also be used to detect the presence of a particular allele in a nucleic acid. An "allele-specific oligonucleotide" (also referred to herein as an "allele-specific oligonucleotide probe") is an oligonucleotide of approximately 10-50 base pairs or approximately 15-30 base pairs, that specifically hybridizes to a nucleic acid which contains a specific allele at a polymorphic site (e.g., a polymorphicmarker as described herein) . An allele-specific
oligonucleotide probe that is specific for one or more particular alleles at polymorphic markers can be prepared using standard methods (see, e.g., Current Protocols in Molecular Biology, supra) . PCR can be used to amplify the desired region . Specific hybridization of an allele- specific oligonucleotide probe to DNA from the subject is indicative of a specific allele at a polymorphic site (see, e.g., Gibbs et al., Nucleic Acids Res. 17 : 2437-2448 (1989) and WO 93/22456) .
In another embodiment, arrays of oligonucleotide probes that are complementary to target nucleic acid sequence segments from a subject, can be used to identify polymorphisms in a nucleic acid The polymorphism may for example be any one or a combination of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith). For example, an oligonucleotide array can be used. Oligonucleotide arrays typically comprise a plurality of different oligonucleotide probes that are coupled to a surface of a substrate in different known locations. These arrays can generally be produced using mechanical synthesis methods or light directed synthesis methods that incorporate a combination of photolithographic methods and solid phase oligonucleotide synthesis methods, or by other methods known to the person skilled in the art (see, e.g ., Bier et al., Adv Biochem Eng Biotechnol 109 :433-53 (2008); Hoheisel, Nat Rev Genet 7: 200-10 (2006); Fan et al., Methods Enzymol 410 : 57-73 (2006); Raqoussis & Elvidge, Expert Rev Mol Diagn 6: 145-52 (2006); Mockler et al., Genomics 85 : 1-15 (2005), and references cited therein, the entire teachings of each of which are incorporated by reference herein) . Many additional descriptions of the preparation and use of oligonucleotide arrays for detection of polymorphisms can be found, for example, in US 6,858,394, US 6,429,027, US 5,445,934, US 5,700,637, US 5,744,305, US 5,945,334, US 6,054,270, US 6,300,063, US 6,733,977, US 7,364,858, EP 619 321, and EP 373 203, the entire teachings of which are incorporated by reference herein.
Also, standard techniques for genotyping can be used, such as fluorescence-based techniques (e.g. , Chen et al., Genome Res. 9(5) : 492-98 (1999); Kutyavin et al., Nucleic Acid Res. 34: el28 (2006)), utilizing PCR, LCR, Nested PCR and other techniques for nucleic acid amplification . Specific commercial methodologies available for SNP genotyping include, but are not limited to, TaqMan genotyping assays and SNPlex platforms (Applied Biosystems), gel electrophoresis (Applied Biosystems), mass spectrometry (e.g., MassARRAY system from Sequenom), minisequencing methods, real-time PCR, Bio-Plex system (BioRad), CEQ and SNPstream systems (Beckman), array hybridization technology(e.g., Affymetrix GeneChip; Perlegen ), BeadArray Technologies (e.g ., Illumina GoldenGate and Infinium assays), array tag technology (e.g., Parallele), and endonuclease-based fluorescence hybridization technology (Invader; Third Wave) . Some of the available array platforms, including Affymetrix SNP Array 6.0 and Illumina CNV370- Duo and 1M BeadChips, include SNPs that tag certain copy number variations (CNVs) . This allows detection of CNVs via surrogate SNPs included in these platforms. Thus, by use of these or other methods available to the person skilled in the art, one or more alleles at polymorphic markers, including microsatellites, SNPs or other types of polymorphic markers, can be identified.
The direct sequence analysis can be of the nucleic acid of a biological sample obtained from the human individual for which a susceptibility is being determined . The biological sample can be any sample containing nucleic acid (e.g., genomic DNA) obtained from the human individual . For example, the biological sample can be a blood sample, a serum sample, a leukapheresis sample, an amniotic fluid sample, a cerebrospinal fluid sample, a hair sample, a tissue sample from skin, muscle, buccal, or conjuctival mucosa, placenta, gastrointestinal tract, or other organs, a semen sample, a urine sample, a saliva sample, a nail sample, a tooth sample, and the like.
In a specific aspect of the invention, obtaining nucleic acid sequence data comprises obtaining nucleic acid sequence information from a preexisting record, e.g., a preexisting medical record comprising genotype information of the human individual . For example, direct sequence analysis of the allele of the polymorphic marker can be accomplished by mining a pre-existing genotype dataset for the sequence of the allele of the polymorphic marker.
Indirect analysis
Alternatively, the nucleic acid sequence data may be obtained through indirect analysis of the nucleic acid sequence of the allele of the polymorphic marker. For example, the allele could be one which leads to the expression of a variant protein comprising an altered amino acid sequence, as compared to the non-variant (e.g., wild-type) protein, due to one or more amino acid substitutions, deletions, or insertions, or truncation (due to, e.g., splice variation) . For example, the allele could be the T allele of rsl7632542, which leads to a substitution of
Isoleucine to Threonine at position 179 of GenBank Accession No. NP_001639. In this instance, nucleic acid sequence data about the allele of the polymorphic marker (e.g., rsl7632542) can be obtained through detection of the amino acid substitution of the variant protein . Methods of detecting variant proteins are known in the art. For example, direct amino acid sequencing of the variant protein followed by comparison to a reference amino acid sequence can be used. Also, Immunoassays, e.g., immunofluorescent immunoassays, immunoprecipitations, radioimmunoasays, ELISA, and Western blotting, in which an antibody specific for an epitope comprising the variant sequence among the variant protein and non-variant or wild-type protein can be used.
It is also possible, for example, for the variant protein to demonstrate altered (e.g., upregulated or downregulated) biological activity, in comparison to the non-variant or wild-type protein. The biological activity can be, for example, a binding activity or enzymatic activity. In this instance, nucleic acid sequence data about the allele of the polymorphic marker can be obtained through detection of the altered biological activity. Methods of detecting binding activity and enzymatic activity are known in the art and include, for instance, ELISA, competitive binding assays, quantitative binding assays using instruments such as, for example, a Biacore® 3000 instrument, chromatographic assays, e.g ., HPLC and TLC.
Alternatively or additionally, the polymorphic variant (the allele of the polymorphic marker) could lead to an altered expression level, e.g., an increased expression level of an mRNA or protein, a decreased expression level of an mRNA or protein. Nucleic acid sequence data about the allele of the polymorphic marker can, in these instances, be obtained through detection of the altered expression level. Methods of detecting expression levels are known in the art. For example, ELISA, radioimmunoassays, immunofluorescence, and Western blotting can be used to compare the expression of protein levels. Alternatively, Northern blotting can be used to compare the levels of mRNA. These processes are described in Sambrook et al., Molecular Cloning: A Laboratory Manual, 3rd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York (2001) .
The indirect sequence analysis can be of a nucleic acid (e.g., DNA, mRNA) or protein of a biological sample obtained from the human individual for which a susceptibility is being determined. The biological sample can be any nucleic acid or protein containing sample obtained from the human individual. For example, the biological sample can be any of the biological samples described herein.
In view of the foregoing, analyzing the sequence of at least one polymorphic marker can comprise determining the presence or absence of at least one allele of the marker. Alternatively, the analyzing can comprise analyzing the sequence of the polymorphic marker in a particular sample. Further, analyzing the sequence of the at least one polymorphic marker can comprise determining the presence or absence of an amino acid substitution in the amino acid sequence encoded by the polymorphic marker, or it can comprise obtaining a biological sample from the human individual and analyzing the amino acid sequence encoded by at least one gene of the group. In certain embodiments, analyzing sequence comprises determining the identity of both alleles of the at least one polymorphic marker. Such sequence analysis thus corresponds to establishing the genotype of a particular marker for an individual.
Linkage Disequilibrium
The nucleic acid sequence data may be obtained through other means of indirect analysis of the nucleic acid sequence of the allele of the polymorphic marker. For example, obtaining nucleic acid data can comprise identifying at least one allele of a marker in linkage disequilibrium with at least one polymorphic marker associated with PSA levels.
Linkage Disequilibrium (LD) refers to a non-random assortment of two genetic elements. For example, if a particular genetic element (e.g. , an allele of a polymorphic marker, or a haplotype) occurs in a population at a frequency of 0.50 (50%) and another element occurs at a frequency of 0.50 (50%), then the predicted occurrance of a person's having both elements is 0.25 (25%), assuming a random distribution of the elements. However, if it is discovered that the two elements occur together at a frequency higher than 0.25, then the elements are said to be in linkage disequilibrium, since they tend to be inherited together at a higher rate than what their independent frequencies of occurrence (e.g., allele or haplotype frequencies) would predict. Roughly speaking, LD is generally correlated with the frequency of recombination events between the two elements. Allele or haplotype frequencies can be determined in a population by genotyping individuals in a population and determining the frequency of the occurence of each allele or haplotype in the population . For populations of diploids, e.g. , human populations, individuals will typically have two alleles for each genetic element (e.g. , a marker, haplotype or gene) .
Many different measures have been proposed for assessing the strength of linkage disequilibrium (LD; reviewed in Devlin, B. & Risch, N ., Genomics 29 : 311-22 (1995)) . Most capture the strength of association between pairs of biallelic sites. Two important pairwise measures of LD are r2 (sometimes denoted Δ2) and | D'| (Lewontin, R., Genetics 49:49-67 (1964); Hill, W.G . &
Robertson, A. Theor. Appl. Genet. 22: 226-231 (1968)) . Both measures range from 0 (no disequilibrium) to 1 ('complete' disequilibrium), but their interpretation is slightly different. | D'| is defined in such a way that it is equal to 1 if just two or three of the possible haplotypes are present, and it is < 1 if all four possible haplotypes are present. Therefore, a value of | D'| that is < 1 indicates that historical recombination may have occurred between two sites (recurrent mutation can also cause | D'| to be < 1, but for single nucleotide polymorphisms (SNPs) this is usually regarded as being less likely than recombination). The measure r2 represents the statistical correlation between two sites, and takes the value of 1 if only two haplotypes are present.
The r2 measure is arguably the most relevant measure for association mapping, because there is a simple inverse relationship between r2 and the sample size required to detect association between susceptibility loci and SNPs. These measures are defined for pairs of sites, but for some applications a determination of how strong LD is across an entire region that contains many polymorphic sites might be desirable (e.g., testing whether the strength of LD differs significantly among loci or across populations, or whether there is more or less LD in a region than predicted under a particular model) . Measuring LD across a region is not straightforward, but one approach is to use the measure r, which was developed in population genetics. Roughly speaking, r measures how much recombination would be required under a particular population model to generate the LD that is seen in the data. This type of method can potentially also provide a statistically rigorous approach to the problem of determining whether LD data provide evidence for the presence of recombination hotspots.
For the methods described herein, a significant r2 value between markers can be at least 0.1 such as at least 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99 or 1.0. In one specific embodiment of invention, the significant r2 value can be at least 0.2. This means that markers are considered to be in LD if the correlation coefficient r2 between the markers has a value of least 0.2. Alternatively, linkage disequilibrium as described herein, refers to linkage
disequilibrium characterized by values of | D'| of at least 0.2, such as 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.96, 0.97, 0.98, 0.99. Thus, linkage disequilibrium represents a correlation between alleles of distinct markers. It is measured by correlation coefficient or | D'| (r2 up to 1.0 and | D'| up to 1.0) . Linkage disequilibrium can be determined in a single human population, as defined herein, or it can be determined in a collection of samples comprising individuals from more than one human population. In one embodiment of the invention, LD is determined in a sample from one or more of the HapMap populations. These include samples from the Yoruba people of Ibadan, Nigeria (YRI), samples from individuals from the Tokyo area in Japan (JPT), samples from individuals Beijing, China (CHB), and samples from U.S. residents with northern and western European ancestry (CEU), as described (The International HapMap Consortium, Nature 426: 789-796 (2003)) . In one such embodiment, LD is determined in the Caucasian CEU population of the HapMap samples. In yet another embodiment, LD is determined in samples from the Icelandic population. In another embodiment, LD is determined in samples from the UK population.
If all polymorphisms in the genome were independent at the population level {i.e., no LD between polymorphisms), then every single one of them would need to be investigated in association studies, to assess all different polymorphic states. However, due to linkage disequilibrium between polymorphisms, tightly linked polymorphisms are strongly correlated, which reduces the number of polymorphisms that need to be investigated in an association study to observe a significant association . Another consequence of LD is that many polymorphisms may give an association signal due to the fact that these polymorphisms are strongly correlated.
Genomic LD maps have been generated across the genome, and such LD maps have been proposed to serve as framework for mapping disease-genes (Risch, N . & Merkiangas, K, Science 273 : 1516-1517 (1996); Maniatis, N ., et ai., Proc Natl Acad Sci USA 99 : 2228-2233 (2002); Reich, DE et ai, Nature 411 : 199-204 (2001)) .
It is now established that many portions of the human genome can be broken into series of discrete haplotype blocks containing a few common haplotypes; for these blocks, linkage disequilibrium data provides little evidence indicating recombination (see, e.g., Wall., J.D. and Pritchard, J.K., Nature Reviews Genetics 4: 587-597 (2003); Daly, M . et ai., Nature Genet.
29: 229-232 (2001); Gabriel, S.B. et ai., Science 296: 2225-2229 (2002); Patil, N . et ai., Science 294: 1719-1723 (2001); Dawson, E. et ai., Nature 4.28: 544-548 (2002); Phillips, M.S. et ai., Nature Genet. 33: 382-387 (2003)) .
There are two main methods for defining these haplotype blocks: blocks can be defined as regions of DNA that have limited haplotype diversity (see, e.g., Daly, M . et al., Nature Genet. 29: 229-232 (2001); Patil, N . et ai., Science 294: 1719-1723 (2001); Dawson, E. et ai., Nature 4.28: 544-548 (2002); Zhang, K. et ai., Proc. Natl. Acad. Sci. USA 99: 7335-7339 (2002)), or as regions between transition zones having extensive historical recombination, identified using linkage disequilibrium (see, e.g., Gabriel, S.B. et ai., Science 296: 2225-2229 (2002); Phillips, M .S. et ai., Nature Genet. 33: 382-387 (2003); Wang, N . et ai., Am. J. Hum. Genet. 71 : 1221- 1234 (2002); Stumpf, M .P., and Goldstein, D.B., Curr. Biol. 13: 1-8 (2003)) . More recently, a fine-scale map of recombination rates and corresponding hotspots across the human genome has been generated (Myers, S., et al., Science 310: 321-32324 (2005); Myers, S. et al., Blochem Soc Trans 34: 526530 (2006)). The map reveals the enormous variation in recombination across the genome, with recombination rates as high as 10-60 cM/Mb in hotspots, while closer to 0 in intervening regions, which thus represent regions of limited haplotype diversity and high LD. The map can therefore be used to define haplotype blocks/LD blocks as regions flanked by recombination hotspots. As used herein, the terms "haplotype block" or "LD block" includes blocks defined by any of the above described characteristics, or other alternative methods used by the person skilled in the art to define such regions.
Haplotype blocks (LD blocks) can be used to map associations between phenotype and haplotype status, using single markers or haplotypes comprising a plurality of markers. The main haplotypes can be identified in each haplotype block, and then a set of "tagging" SNPs or markers (the smallest set of SNPs or markers needed to distinguish among the haplotypes) can then be identified . These tagging SNPs or markers can then be used in assessment of samples from groups of individuals, in order to identify association between phenotype and haplotype. If desired, neighboring haplotype blocks can be assessed concurrently, as there may also exist linkage disequilibrium among the haplotype blocks.
It has thus become apparent that for any given observed association to a polymorphic marker in the genome, it is likely that additional markers in the genome also show association . This is a natural consequence of the uneven distribution of LD across the genome, as observed by the large variation in recombination rates. The markers used to detect association thus in a sense represent "tags" for a genomic region (i.e., a haplotype block or LD block) that is associating with a given disease or trait, and as such are useful for use in the methods and kits of the invention . One or more causative (functional) variants or mutations may reside within the region found to be associating to the disease or trait. The functional variant may be another SNP, a tandem repeat polymorphism (such as a minisatellite or a microsatellite), a transposable element, or a copy number variation, such as an inversion, deletion or insertion. Such variants in LD with other variants used to detect an association to a disease or trait (e.g., the variants described herein to be associated with risk of eosinophilia, asthma, myocardial infarction, and/or hypertension) may confer a higher relative risk (RR) or odds ratio (OR) than observed for the tagging markers used to detect the association . The invention thus refers to the markers used for detecting association to the disease, as described herein, as well as markers in linkage disequilibrium with the markers. Thus, in certain embodiments of the invention, markers that are in LD with the markers and/or haplotypes of the invention, as described herein, may be used as surrogate markers. The surrogate markers have in one embodiment relative risk (RR) and/or odds ratio (OR) values smaller than for the markers or haplotypes initially found to be associating with the disease, as described herein . In other embodiments, the surrogate markers have RR or OR values greater than those initially determined for the markers initially found to be associating with the disease, as described herein . An example of such an embodiment would be a rare, or relatively rare (< 10% allelic population frequency) variant in LD with a more common variant (> 10% population frequency) initially found to be associating with the disease, such as the variants described herein . Identifying and using such markers for detecting the association discovered by the inventors as described herein can be performed by routine methods well known to the person skilled in the a rt, and are therefore within the scope of the invention .
In view of the foregoing, the marker in li nkage disequilibrium with a polymorphic marker associated with PSA levels may be one of the surrogate markers listed i n Ta ble 1. The markers were selected using data for Caucasia n CEU samples from the 1000 Genomes Project
(http ://1000genomes.org) a nd the HapMa p dataset (http ://www. hapma p.org) .
Table 1. Su rrogate ma rkers for the ma rkers shown herei n to be associated with PSA levels. Shown are (1) anchor marker name and the allele correlating with increased PSA levels; (2) the surrogate ma rker; (3) chromosome a nd position of the surrogate ma rker in NCBI Build 36; (4) identity of the su rrogate a llele predicted to correlate with reduced PSA levels; (5) identity of the surrogate a llele predicted to correlate with elevated PSA levels; (6) D' values for the correlation between the a nchor and the su rrogate; and (7) r2 va lues for the correlation between the a nchor a nd the surrogate.
Seq ID
Decrease Increase
Anchor SNP Surrogate Position D' r2 NO of
Allele Allele
surrogate rsl0788160_l s.122837469 10-122837469 C A 1 0.21 305 rsl0788160_l rs2130779 10-122869722 G T 0.73 0.21 130 rsl0788160_l s.122876448 10-122876448 G A 0.78 0.29 306 rsl0788160_l s.122901140 10-122901140 C T 1 0.28 307 rsl0788160_l s.122901142 10-122901142 A c 1 0.28 308 rsl0788160_l s.122905335 10-122905335 G A 0.71 0.29 309 rsl0788160_l rsl0788149 10-122957160 A G 0.59 0.24 24 rsl0788160_l rsl0749408 10-122957516 T C 0.79 0.37 15 rsl0788160_l rs2172071 10-122958020 T C 0.65 0.28 131 rsl0788160_l rsll592107 10-122958954 G A 0.59 0.24 89 rsl0788160_l rsl907218 10-122960206 C T 0.65 0.28 122 rsl0788160_l rsl907220 10-122960913 G A 0.65 0.28 123 rsl0788160_l rsl994655 10-122961236 G T 0.65 0.28 127 rsl0788160_l rsl907221 10-122962417 T c 0.59 0.24 124 rsl0788160_l rsl907225 10-122965623 T c 0.65 0.28 125 rsl0788160_l rsl907226 10-122965736 A G 0.65 0.28 126 rsl0788160_l rsl0749409 10-122966556 G C 0.65 0.28 16 rsl0788160_l rslll99835 10-122967147 A G 0.65 0.28 66 rsl0788160_l s.122991926 10-122991926 T C 0.74 0.25 310 rsl0788160_l rs729014 10-122992796 c T 0.88 0.34 274 rsl0788160_l s.122993518 10-122993518 A G 0.83 0.66 311 rsl0788160_l s.122994309 10-122994309 G A 0.83 0.66 312 rsl0788160_l s.122994946 10-122994946 T G 1 0.25 313 rsl0788160_l rsl873450 10-122996264 T G 0.84 0.7 116 rsl0788160_l rs2901290 10-122997016 G A 0.8 0.42 167 rsl0788160_l s.122998594 10-122998594 G A 0.8 0.42 314 rsl0788160_l s.122998678 10-122998678 G T 1 0.21 315 rsl0788160_l s.122998978 10-122998978 A T 0.75 0.27 316 rsl0788160_l rs2201026 10-122998993 T G 0.86 0.47 132 rsl0788160_l rs4237529 10-122999123 A G 0.8 0.42 200 rsl0788160_l s.122999386 10-122999386 A G 0.84 0.7 317 rsl0788160_l rsl873451 10-123000467 T C 0.8 0.42 117 rsl0788160_l rsl873452 10-123000564 T C 0.8 0.42 118 rsl0788160_l rs4752520 10-123001514 c T 0.8 0.42 230 rsl0788160_l rsl0886880 10-123003911 T c 0.84 0.7 37 rsl0788160_l rsl0749412 10-123007551 A T 0.8 0.42 17 rsl0788160_l s.123008216 10-123008216 G A 0.8 0.42 318 rsl0788160_l rs3925042 10-123009010 C T 0.8 0.42 191 Seq ID
Decrease Increase
Anchor SNP Surrogate Position D' r2 NO of
Allele Allele
surrogate rsl0788160_l rsll25527 10-123009606 G A 0.8 0.42 85 rsl0788160_l rsll25528 10-123009942 T A 0.84 0.7 86 rsl0788160_l rs4319451 10-123010241 A G 1 0.21 205 rsl0788160_l rsl0788154 10-123011231 A C 0.8 0.42 25 rsl0788160_l rs7081844 10-123011258 C T 0.8 0.42 265 rsl0788160_l rs7076500 10-123011721 G A 0.8 0.44 262 rsl0788160_l s.123011774 10-123011774 C T 0.8 0.42 319 rsl0788160_l s.123011879 10-123011879 C T 0.8 0.42 320 rsl0788160_l rslll99862 10-123012946 G A 0.84 0.7 67 rsl0788160_l s.123014171 10-123014171 T C 0.77 0.41 321 rsl0788160_l rsl2146156 10-123014406 T C 0.94 0.84 99 rsl0788160_l s.123014499 10-123014499 A G 0.94 0.84 322 rsl0788160_l s.123014519 10-123014519 G A 0.89 0.38 323 rsl0788160_l rsl2146366 10-123014670 C T 0.94 0.84 100 rsl0788160_l s.123014684 10-123014684 C A 0.87 0.52 324 rsl0788160_l rs7091083 10-123014747 G A 0.87 0.52 269 rsl0788160_l rs7074985 10-123014878 T A 0.87 0.52 259 rsl0788160_l rs7915008 10-123015215 G A 0.94 0.79 285 rsl0788160_l s.123015342 10-123015342 C A 1 0.3 325 rsl0788160_l s.123015365 10-123015365 G A 0.87 0.52 326 rsl0788160_l rsl0749413 10-123015655 A T 0.87 0.52 18 rsl0788160_l rslll99866 10-123015727 G A 0.87 0.52 68 rsl0788160_l s.123016003 10-123016003 G A 0.94 0.84 327 rsl0788160_l rs7923130 10-123016492 G A 0.87 0.52 288 rsl0788160_l rs7922901 10-123016509 C G 0.87 0.52 287 rsl0788160_l rsl0886882 10-123017023 C T 0.87 0.52 38 rsl0788160_l rsl0886883 10-123017171 C G 0.87 0.52 39 rsl0788160_l rslll99867 10-123017394 G T 0.87 0.52 69 rsl0788160_l s.123017698 10-123017698 C T 1 0.44 328 rsl0788160_l s.123018111 10-123018111 G c 0.87 0.52 329 rsl0788160_l rs4393247 10-123018166 G A 0.94 0.84 206 rsl0788160_l s.123018188 10-123018188 C T 0.87 0.52 330 rsl0788160_l rs4489674 10-123018240 A G 0.87 0.52 210 rsl0788160_l rslll99868 10-123018329 T A 0.94 0.84 70 rsl0788160_l s.123018670 10-123018670 G T 0.94 0.84 331 rsl0788160_l s.123019408 10-123019408 T G 0.87 0.49 332 rsl0788160_l s.123019759 10-123019759 c G 0.87 0.52 333 rsl0788160_l rslll99869 10-123020055 A G 0.94 0.84 71 rsl0788160_l s.123020245 10-123020245 G T 1 0.44 334 rsl0788160_l s.123020365 10-123020365 A T 0.87 0.52 335 rsl0788160_l rsl0886885 10-123020471 G T 0.94 0.84 40 rsl0788160_l rsl0788159 10-123020775 A G 0.94 0.84 26 rsl0788160_l rsl0886886 10-123020859 T G 0.94 0.79 41 rsl0788160_l rslll99871 10-123020940 c A 0.94 0.74 72 rsl0788160_l rslll99872 10-123021180 G A 0.94 0.84 73 rsl0788160_l rsl2761612 10-123021400 G A 0.94 0.84 106 rsl0788160_l rs4575197 10-123022158 A G 1 0.3 220 rsl0788160_l rslll99874 10-123022509 G A 1 0.95 74 rsl0788160_l rsl0886887 10-123023168 C T 1 1 42 rsl0788160_l s.123023625 10-123023625 G T 1 0.95 336 rsl0788160_l s.123023836 10-123023836 T c 1 0.95 337 rsl0788160_l rs4465316 10-123024171 c A 1 0.95 207 rsl0788160_l rs4468286 10-123024381 c A 1 0.95 208 rsl0788160_l rsl0886890 10-123027193 A G 1 0.95 43 rsl0788160_l rsl0788162 10-123027299 A G 1 0.6 27 rsl0788160_l s.123028135 10-123028135 C A 1 1 338 rsl0788160_l rsl2413648 10-123028887 G A 1 1 103 rsl0788160_l s.123029102 10-123029102 T C 1 1 339 rsl0788160_l rsl0788163 10-123029792 T G 1 1 28 Seq ID
Decrease Increase
Anchor SNP Surrogate Position D' r2 NO of
Allele Allele
surrogate rsl0788160_l s.123031617 10-123031617 G T 1 1 340 rsl0788160_l s.123031811 10-123031811 A T 1 1 341 rsl0788160_l rsl0788164 10-123032835 C T 1 0.63 29 rsl0788160_l rsll598592 10-123033379 G A 1 0.47 91 rsl0788160_l rsl0788165 10-123034204 T G 1 0.63 30 rsl0788160_l rs9630106 10-123034373 A G 1 0.47 292 rsl0788160_l rsl0886893 10-123034442 T C 1 0.95 44 rsl0788160_l s.123034821 10-123034821 T C 0.95 0.9 342 rsl0788160_l rslll99879 10-123035202 T C 0.95 0.9 75 rsl0788160_l rslll99881 10-123035860 T C 1 0.95 76 rsl0788160_l rsl2415826 10-123036368 T C 1 0.95 104 rsl0788160_l rsl0788166 10-123036532 A G 1 0.95 31 rsl0788160_l rsl0886894 10-123036863 T C 1 0.95 45 rsl0788160_l rsl0886895 10-123037303 c A 1 0.95 46 rsl0788160_l rsl0886896 10-123037386 c A 1 0.95 47 rsl0788160_l rsl0886897 10-123037630 T C 1 0.95 48 rsl0788160_l rsl0886898 10-123037681 T G 1 0.95 49 rsl0788160_l rsl0886899 10-123037711 G T 1 0.95 50 rsl0788160_l rsl0886900 10-123037998 A G 1 0.95 51 rsl0788160_l rsl0886901 10-123038120 T C 1 0.95 52 rsl0788160_l rsl0886902 10-123039254 T C 1 0.95 53 rsl0788160_l rsl0886903 10-123039425 c G 1 0.95 54 rsl0788160_l rsl2413088 10-123042718 c T 1 0.95 102 rsl0788160_l rsl0788167 10-123044008 T A 1 0.95 32 rsl0788160_l s.123047182 10-123047182 c T 1 0.28 343 rsl0788160_l rs7085073 10-123047258 c T 1 0.28 266 rsl0788160_l rs7071101 10-123047771 G A 1 0.28 257 rsl0788160_l rsl2570783 10-123049889 G A 1 0.28 105 rsl0788160_l rslll99884 10-123053164 G A 0.75 0.37 77 rsl0788160_l rs7085506 10-123054129 C G 1 0.28 267 rsl0788160_l rsl0886905 10-123057992 T C 0.82 0.41 55 rsl0788160_l rsl0736302 10-123059707 T C 0.75 0.37 14 rsl0788160_l s.123061811 10-123061811 c T 1 0.28 344 rsl0788160_l s.123062031 10-123062031 G c 1 0.28 345 rsl0788160_l rslll99886 10-123062077 G T 0.75 0.37 78 rsl0788160_l s.123063327 10-123063327 A T 1 0.28 346 rsl0788160_l s.123063715 10-123063715 G A 0.75 0.37 347 rsl0788160_l rsl0886907 10-123063722 G C 0.75 0.37 56 rsl0788160_l s.123064252 10-123064252 C T 0.81 0.37 348 rsl0788160_l s.123064345 10-123064345 G T 0.75 0.37 349 rsl0788160_l s.123064780 10-123064780 C T 0.82 0.41 350 rsl0788160_l s.123064783 10-123064783 T c 0.75 0.37 351 rsl0788160_l s.123066424 10-123066424 T c 0.75 0.37 352 rsl0788160_l s.123066700 10-123066700 T c 0.75 0.37 353 rsl0788160_l rs3981043 10-123066817 A T 1 0.26 192 rsl0788160_l rslll99896 10-123067415 C T 0.81 0.37 79 rsl0788160_l rslll99897 10-123067723 G A 0.75 0.37 80 rsl0788160_l rslll99898 10-123067775 T C 0.82 0.41 81 rsl0788160_l s.123067963 10-123067963 T A 0.75 0.37 354 rsl0788160_l rslll99900 10-123067986 A T 0.75 0.37 82 rsl0788160_l rslll99901 10-123068059 C T 0.75 0.37 83 rsl0788160_l s.123068178 10-123068178 G T 0.73 0.33 355 rsl0788160_l s.123068222 10-123068222 G A 0.75 0.37 356 rsl0788160_l s.123068236 10-123068236 C T 0.9 0.42 357 rsl0788160_l s.123068424 10-123068424 A G 0.73 0.33 358 rsl0788160_l s.123068619 10-123068619 C T 0.82 0.41 359 rsl0788160_l s.123068743 10-123068743 A G 0.9 0.42 360 rsl0788160_l s.123068926 10-123068926 A T 1 0.44 361 rsl0788160_l s.123068997 10-123068997 G A 0.73 0.33 362 Seq ID
Decrease Increase
Anchor SNP Surrogate Position D' r2 NO of
Allele Allele
surrogate rsl0788160_l s.123069012 10-123069012 C T 1 0.27 363 rsl0788160_l s.123069326 10-123069326 G T 0.88 0.34 364 rsl0788160_l s.123069570 10-123069570 C T 0.81 0.37 365 rsl0788160_l s.123069989 10-123069989 T c 0.75 0.37 366 rsl0788160_l s.123070105 10-123070105 c T 0.73 0.33 367 rsl0788160_l s.123071090 10-123071090 G A 0.75 0.37 368 rsl0788160_l s.123071347 10-123071347 G C 1 0.26 369 rsl0788160_l rs4254007 10-123071380 T A 1 0.27 202 rsl0788160_l s.123071495 10-123071495 G A 1 0.27 370 rsl0788160_l s.123071914 10-123071914 G T 1 0.36 371 rsl0788160_l s.123072804 10-123072804 G A 1 0.48 372 rsl0788160_l rs7900630 10-123073094 C T 1 0.27 283 rsl0788160_l s.123074016 10-123074016 T c 0.57 0.26 373 rsl0788160_l rsl896416 10-123074480 G A 0.57 0.26 119 rsl0788160_l s.123074531 10-123074531 C T 0.88 0.34 374 rsl0788160_l s.123074928 10-123074928 C T 0.75 0.37 375 rsl0788160_l s.123076274 10-123076274 T c 1 0.65 376 rsl0788160_l s.123076472 10-123076472 c G 1 0.27 377 rsl0788160_l rs2420925 10-123077176 T C 1 0.27 135 rsl0788160_l s.123077398 10-123077398 A G 1 0.27 378 rsl0788160_l s.123077455 10-123077455 G C 1 0.27 379 rsl0788160_l rsl2779205 10-123077742 A T 1 0.65 108 rsl0788160_l rslll99912 10-123078010 G T 1 0.27 84 rsl0788160_l rs4752534 10-123078189 T c 1 0.24 231 rsl0788160_l s.123078389 10-123078389 A T 1 0.28 380 rsl0788160_l rsl896420 10-123078843 C T 1 0.28 121 rsl0788160_l rsl896419 10-123079069 A c 1 0.23 120 rsl0788160_l s.123079199 10-123079199 G A 1 0.28 381 rsl0788160_l s.123081990 10-123081990 T A 1 0.21 382 rsl0788160_l s.123081993 10-123081993 T A 1 0.25 383 rsl0788160_l s.123081998 10-123081998 A G 1 0.32 384 rsl0788160_l s.123201870 10-123201870 T C 1 0.21 385 rsl0993994_4 s.51157005 10-51157005 A G 0.8 0.48 459 rsl0993994_4 s.51159221 10-51159221 T C 0.8 0.48 460 rsl0993994_4 rs35716372 10-51159230 G A 0.65 0.27 177 rsl0993994_4 s.51159373 10-51159373 T C 0.8 0.48 461 rsl0993994_4 s.51159376 10-51159376 G C 0.8 0.48 462 rsl0993994_4 s.51159399 10-51159399 G T 0.8 0.48 463 rsl0993994_4 s.51159786 10-51159786 G c 0.8 0.48 464 rsl0993994_4 rs4935090 10-51161131 A T 0.8 0.48 232 rsl0993994_4 rsl2781411 10-51161595 C T 0.8 0.48 109 rsl0993994_4 s.51162137 10-51162137 A G 0.8 0.48 465 rsl0993994_4 s.51162792 10-51162792 C A 0.8 0.48 466 rsl0993994_4 s.51162795 10-51162795 C A 0.8 0.48 467 rsl0993994_4 rsll004246 10-51165355 T C 0.8 0.48 58 rsl0993994_4 s.51165690 10-51165690 A C 0.79 0.44 468 rsl0993994_4 rsll004324 10-51166629 T G 0.8 0.48 59 rsl0993994_4 rs2843562 10-51166802 T C 0.8 0.51 165 rsl0993994_4 rsll004409 10-51168025 G C 0.95 0.61 60 rsl0993994_4 rsll004415 10-51168187 G A 1 0.61 61 rsl0993994_4 rsll004422 10-51168342 A G 0.65 0.35 62 rsl0993994_4 s.51168415 10-51168415 C T 0.63 0.28 469 rsl0993994_4 rsll004435 10-51168499 C A 0.65 0.35 63 rsl0993994_4 rsll599333 10-51169661 A C 1 0.61 92 rsl0993994_4 s.51170094 10-51170094 T G 1 0.61 470 rsl0993994_4 s.51170307 10-51170307 G A 1 0.61 471 rsl0993994_4 rsl2763717 10-51170880 C G 1 0.61 107 rsl0993994_4 rs67289834 10-51171310 C T 1 0.65 251 rsl0993994_4 s.51172442 10-51172442 T A 1 0.61 472 Seq ID
Decrease Increase
Anchor SNP Surrogate Position D' r2 NO of
Allele Allele
surrogate rsl0993994_4 s.51172558 10-51172558 T G 1 0.61 473 rsl0993994_4 rs57858801 10-51172580 A T 1 0.61 244 rsl0993994_4 s.51172618 10-51172618 C A 1 0.61 474 rsl0993994_4 s.51172808 10-51172808 C G 1 0.61 475 rsl0993994_4 s.51173184 10-51173184 A G 1 0.61 476 rsl0993994_4 rs7071471 10-51173341 C T 1 0.61 258 rsl0993994_4 rs7090326 10-51173381 A T 1 0.61 268 rsl0993994_4 s.51173565 10-51173565 C G 1 0.61 477 rsl0993994_4 s.51173983 10-51173983 T C 1 0.61 478 rsl0993994_4 s.51174391 10-51174391 A G 1 0.61 479 rsl0993994_4 s.51174499 10-51174499 A C 0.86 0.63 480 rsl0993994_4 s.51174610 10-51174610 C T 0.86 0.63 481 rsl0993994_4 s.51174944 10-51174944 G A 1 0.61 482 rsl0993994_4 s.51175013 10-51175013 G A 0.73 0.34 483 rsl0993994_4 s.51175409 10-51175409 A G 1 0.61 484 rsl0993994_4 s.51176290 10-51176290 C T 1 0.61 485 rsl0993994_4 s.51176963 10-51176963 T c 1 0.61 486 rsl0993994_4 s.51180209 10-51180209 G A 1 0.7 487 rsl0993994_4 rsl0825652 10-51180767 G A 1 0.7 33 rsl0993994_4 s.51180819 10-51180819 C A 1 0.7 488 rsl0993994_4 rs2843560 10-51182135 C G 1 0.61 164 rsl0993994_4 rs2125770 10-51184830 C T 1 0.61 129 rsl0993994_4 rs2611513 10-51185463 T c 1 0.7 144 rsl0993994_4 rs2611512 10-51185540 G A 1 0.61 143 rsl0993994_4 rs2611509 10-51186258 A G 1 0.7 142 rsl0993994_4 s.51186305 10-51186305 T G 1 0.7 489 rsl0993994_4 rs2926494 10-51187362 c T 1 0.7 168 rsl0993994_4 rs2611508 10-51188053 A T 1 0.7 141 rsl0993994_4 rs2611507 10-51188679 C T 0.95 0.69 140 rsl0993994_4 s.51188694 10-51188694 C A 1 0.7 490 rsl0993994_4 rs2611506 10-51188793 T C 1 0.7 139 rsl0993994_4 rs57263518 10-51189160 G A 1 0.7 243 rsl0993994_4 s.51189522 10-51189522 A G 0.95 0.69 491 rsl0993994_4 rs3101227 10-51190209 A C 1 0.7 170 rsl0993994_4 rs2843549 10-51191253 A C 1 0.7 160 rsl0993994_4 rs2843550 10-51191458 T C 1 0.7 161 rsl0993994_4 rs2249986 10-51191690 G T 1 0.7 133 rsl0993994_4 rs2843551 10-51191951 A c 1 0.7 162 rsl0993994_4 s.51192126 10-51192126 T c 0.95 0.69 492 rsl0993994_4 rs7077830 10-51192282 c G 0.95 0.69 263 rsl0993994_4 s.51193219 10-51193219 T A 1 0.73 493 rsl0993994_4 rs2843554 10-51193867 T G 1 0.73 163 rsl0993994_4 s.51194280 10-51194280 T C 1 0.31 494 rsl0993994_4 rs2611489 10-51194895 A G 1 0.73 138 rsl0993994_4 rs3123078 10-51194977 T C 1 0.73 171 rsl0993994_4 rs4935162 10-51195705 c G 1 0.73 233 rsl0993994_4 rs7081532 10-51196099 G A 1 0.7 264 rsl0993994_4 rsl0826075 10-51197376 C G 0.74 0.54 34 rsl0993994_4 rs7896156 10-51199385 G A 1 0.7 282 rsl0993994_4 s.51199599 10-51199599 C A 1 0.7 495 rsl0993994_4 rs6481329 10-51199752 A G 1 0.7 248 rsl0993994_4 rs7910704 10-51199811 T C 1 0.28 284 rsl0993994_4 rs4554834 10-51200152 c A 1 0.7 217 rsl0993994_4 rsl0826125 10-51200511 A G 1 0.7 35 rsl0993994_4 rsl0826127 10-51200763 A G 1 0.73 36 rsl0993994_4 rs4486572 10-51201811 G A 1 0.7 209 rsl0993994_4 rs4581397 10-51202373 G A 0.95 0.69 221 rsl0993994_4 rs4630240 10-51202534 A G 1 0.32 223 rsl0993994_4 rs7920517 10-51202627 A G 1 0.7 286 Seq ID
Decrease Increase
Anchor SNP Surrogate Position D' r2 NO of
Allele Allele
surrogate rsl0993994_4 rs4630241 10-51202757 A G 1 0.7 224 rsl0993994_4 rs9787697 10-51203382 T C 1 0.7 293 rsl0993994_4 rsl0763534 10-51204926 T C 1 0.7 19 rsl0993994_4 rsl0763536 10-51205807 A G 1 0.7 20 rsl0993994_4 s.51205998 10-51205998 T C 1 0.7 496 rsl0993994_4 rsl0763546 10-51206405 G C 1 0.68 21 rsl0993994_4 s.51206890 10-51206890 A C 0.74 0.54 497 rsl0993994_4 rs4131357 10-51207298 A C 1 0.7 196 rsl0993994_4 s.51207437 10-51207437 T C 1 0.7 498 rsl0993994_4 s.51207481 10-51207481 A G 1 0.7 499 rsl0993994_4 s.51208175 10-51208175 C A 0.85 0.58 500 rsl0993994_4 rsll006207 10-51208182 C T 1 0.7 64 rsl0993994_4 rsl0763576 10-51208819 T A 1 0.7 22 rsl0993994_4 s.51208921 10-51208921 T G 1 0.68 501 rsl0993994_4 rsll593361 10-51209162 G A 1 0.68 90 rsl0993994_4 rsl0763588 10-51209768 T G 1 0.7 23 rsl0993994_4 rsll006274 10-51210297 c T 1 0.7 65 rsl0993994_4 s.51210619 10-51210619 c A 0.74 0.54 502 rsl0993994_4 s.51210866 10-51210866 A G 1 0.7 503 rsl0993994_4 rs4630243 10-51210873 C T 1 0.7 225 rsl0993994_4 rs4512771 10-51210912 A c 1 0.7 211 rsl0993994_4 rs4306255 10-51212450 G A 1 0.7 204 rsl0993994_4 s.51213076 10-51213076 G T 1 0.68 504 rsl0993994_4 rs4631830 10-51213350 T c 0.95 0.69 226 rsl0993994_4 rs7075009 10-51214149 G T 1 0.7 260 rsl0993994_4 rs7098889 10-51214481 T c 1 0.7 270 rsl0993994_4 rs4304716 10-51214593 G A 0.85 0.58 203 rsl0993994_4 s.51214689 10-51214689 G A 1 0.29 505 rsl0993994_4 s.51214690 10-51214690 C T 1 0.68 506 rsl0993994_4 rs7477953 10-51214698 A G 1 0.7 279 rsl0993994_4 s.51215034 10-51215034 A G 0.95 0.66 507 rsl0993994_4 s.51216121 10-51216121 G A 0.86 0.21 508 rsl0993994_4 s.51216342 10-51216342 G A 1 0.81 509 rsl0993994_4 rs7075697 10-51217377 G C 0.95 0.66 261 rsl0993994_4 s.51219226 10-51219226 G C 0.9 0.65 510 rsl0993994_4 s.51219227 10-51219227 G T 1 0.63 511 rsl0993994_4 s.51219230 10-51219230 G c 1 0.37 512 rsl0993994_4 s.51219320 10-51219320 C T 1 0.63 513 rsl0993994_4 s.51221179 10-51221179 T c 1 0.42 514 rsll067228_l s.113576401 12-113576401 T A 1 0.41 296 rsll067228_l s.113582477 12-113582477 A G 1 1 297 rsll067228_l s.113584188 12-113584188 A G 1 0.84 298 rsll067228_l s.113584539 12-113584539 A G 1 0.3 299 rsll067228_l s.113585097 12-113585097 C T 1 0.81 300 rsll067228_l rsl2819162 12-113586774 G A 0.82 0.23 110 rsll067228_l rsll609105 12-113586865 C A 0.91 0.32 93 rsll067228_l rs514849 12-113588873 A G 0.89 0.24 237 rsll067228_l rs513061 12-113589060 C T 0.89 0.24 236 rsll067228_l s.113590733 12-113590733 C A 0.96 0.74 301 rsll067228_l rsl061657 12-113592519 C T 0.91 0.32 13 rsll067228_l rs8853 12-113593290 T c 0.96 0.72 290 rsll067228_l rs3741698 12-113593606 G c 0.91 0.32 186 rsll067228_l s.113594635 12-113594635 T G 0.92 0.68 302 rsll067228_l rs567223 12-113594954 G T 0.89 0.76 242 rsll067228_l rs551510 12-113598419 C T 0.84 0.61 240 rsll067228_l rs59336 12-113600735 T A 0.8 0.58 245 rsll067228_l s.113601412 12-113601412 T G 0.83 0.27 303 rsll067228_l rs515746 12-113603380 G A 0.8 0.58 238 rsll067228_l rs545076 12-113604286 G A 0.8 0.58 239 Seq ID
Decrease Increase
Anchor SNP Surrogate Position D' r2 NO of
Allele Allele
surrogate rsll067228_l s.113614584 12-113614584 G C 0.62 0.22 304 rs4430796_l rs3744763 17-33164998 G A 0.67 0.37 187 rs4430796_l rs7405776 17-33167135 A G 1 0.78 278 rs4430796_l rs2005705 17-33170413 A G 1 1 128 rs4430796_l s.33170591 17-33170591 C T 1 0.63 454 rs4430796_l rsll263761 17-33171888 G A 1 0.44 87 rs4430796_l rs4239217 17-33173100 G A 1 0.67 201 rs4430796_l rsll651755 17-33173953 C T 1 1 95 rs4430796_l rsl0908278 17-33174065 T A 1 1 57 rs4430796_l s.33174083 17-33174083 c T 1 0.44 455 rs4430796_l rsll657964 17-33174880 A G 1 0.78 96 rs4430796_l rs7501939 17-33175269 T C 1 0.75 280 rs4430796_l rs8064454 17-33175699 A C 1 1 289 rs4430796_l s.33175746 17-33175746 G T 1 0.75 456 rs4430796_l s.33176039 17-33176039 G A 1 0.75 457 rs4430796_l rs7405696 17-33176148 G C 1 0.63 277 rs4430796_l rsll651052 17-33176494 A G 1 1 94 rs4430796_l rsll263763 17-33177678 G A 1 0.97 88 rs4430796_l rsll658063 17-33177985 C G 1 0.78 97 rs4430796_l rs9913260 17-33180010 A G 1 0.48 294 rs4430796_l rs3760511 17-33180426 T G 1 0.33 188 rs4430796_l s.33182344 17-33182344 T C 1 0.33 458 rsl7632542_4 s.55554247 19-55554247 G A 1 0.24 515 rsl7632542_4 s.55566277 19-55566277 C T 1 0.24 516 rsl7632542_4 s.55582344 19-55582344 G c 1 0.24 517 rsl7632542_4 rs2546552 19-55588229 T G 1 0.24 136 rsl7632542_4 s.55596785 19-55596785 G T 1 0.24 518 rsl7632542_4 s.55597645 19-55597645 T A 1 0.24 519 rsl7632542_4 s.55598078 19-55598078 c A 1 0.24 520 rsl7632542_4 s.55600121 19-55600121 T A 1 0.24 521 rsl7632542_4 s.55605246 19-55605246 T G 1 0.24 522 rsl7632542_4 s.55606024 19-55606024 c A 1 0.24 523 rsl7632542_4 s.55607242 19-55607242 A G 1 0.24 524 rsl7632542_4 s.55624341 19-55624341 A C 1 0.24 525 rsl7632542_4 s.55630396 19-55630396 C T 1 0.24 526 rsl7632542_4 s.55630578 19-55630578 C T 0.72 0.25 527 rsl7632542_4 s.55630679 19-55630679 C T 0.72 0.25 528 rsl7632542_4 s.55630791 19-55630791 C T 0.72 0.25 529 rsl7632542_4 s.55631170 19-55631170 A c 1 0.24 530 rsl7632542_4 s.55632347 19-55632347 T A 1 0.24 531 rsl7632542_4 s.55632363 19-55632363 T A 1 0.24 532 rsl7632542_4 s.55636052 19-55636052 c T 1 0.24 533 rsl7632542_4 s.55637350 19-55637350 A c 1 0.24 534 rsl7632542_4 s.55640040 19-55640040 C T 1 0.24 535 rsl7632542_4 s.55646568 19-55646568 G A 1 0.24 536 rsl7632542_4 s.55649132 19-55649132 C T 1 0.24 537 rsl7632542_4 s.55650629 19-55650629 C A 1 0.24 538 rsl7632542_4 s.55650844 19-55650844 C G 1 0.24 539 rsl7632542_4 s.55652397 19-55652397 A G 1 0.24 540 rsl7632542_4 s.55653401 19-55653401 C T 1 0.24 541 rsl7632542_4 s.55653991 19-55653991 T A 1 0.24 542 rsl7632542_4 s.55654907 19-55654907 c A 1 0.24 543 rsl7632542_4 s.55657973 19-55657973 A G 1 0.24 544 rsl7632542_4 s.55659043 19-55659043 G A 1 0.24 545 rsl7632542_4 s.55660011 19-55660011 A G 1 0.24 546 rsl7632542_4 s.55660013 19-55660013 C T 1 0.24 547 rsl7632542_4 s.55660139 19-55660139 A T 1 0.24 548 rsl7632542_4 s.55660143 19-55660143 A T 1 0.24 549 rsl7632542_4 s.55661660 19-55661660 T c 1 0.24 550 Seq ID
Decrease Increase
Anchor SNP Surrogate Position D' r2 NO of
Allele Allele
surrogate rsl7632542_4 s.55661718 19-55661718 A T 1 0.24 551 rsl7632542_4 rs6509476 19-55661773 C A 1 0.24 249 rsl7632542_4 s.55664020 19-55664020 C G 1 0.24 552 rsl7632542_4 s.55664897 19-55664897 A T 1 0.24 553 rsl7632542_4 s.55665723 19-55665723 C G 0.72 0.25 554 rsl7632542_4 s.55665726 19-55665726 C G 1 0.24 555 rsl7632542_4 s.55672641 19-55672641 T C 1 0.24 556 rsl7632542_4 s.55673254 19-55673254 A G 0.72 0.25 557 rsl7632542_4 s.55674252 19-55674252 C G 1 0.24 558 rsl7632542_4 s.55674254 19-55674254 T A 1 0.24 559 rsl7632542_4 s.55674727 19-55674727 A T 1 0.24 560 rsl7632542_4 s.55676073 19-55676073 T A 1 0.24 561 rsl7632542_4 s.55683393 19-55683393 A G 1 0.24 562 rsl7632542_4 s.55687122 19-55687122 T A 1 0.24 563 rsl7632542_4 s.55695317 19-55695317 T A 1 0.24 564 rsl7632542_4 s.55697027 19-55697027 A C 1 0.24 565 rsl7632542_4 s.55701748 19-55701748 A C 0.72 0.25 566 rsl7632542_4 rs7257447 19-55702303 A T 1 0.24 273 rsl7632542_4 s.55702308 19-55702308 T A 1 0.24 567 rsl7632542_4 s.55703568 19-55703568 A T 1 0.24 568 rsl7632542_4 s.55706751 19-55706751 A T 1 0.24 569 rsl7632542_4 s.55708051 19-55708051 A T 1 0.24 570 rsl7632542_4 s.55709067 19-55709067 T A 1 0.24 571 rsl7632542_4 s.55709498 19-55709498 G T 1 0.24 572 rsl7632542_4 s.55709766 19-55709766 A T 1 0.24 573 rsl7632542_4 s.55710030 19-55710030 G c 1 0.24 574 rsl7632542_4 s.55710848 19-55710848 A T 1 0.24 575 rsl7632542_4 s.55710851 19-55710851 T A 1 0.24 576 rsl7632542_4 s.55711749 19-55711749 G A 0.72 0.25 577 rsl7632542_4 s.55712802 19-55712802 C G 1 0.24 578 rsl7632542_4 s.55713451 19-55713451 G T 1 0.24 579 rsl7632542_4 s.55713453 19-55713453 T G 1 0.24 580 rsl7632542_4 s.55713458 19-55713458 A C 1 0.24 581 rsl7632542_4 s.55713862 19-55713862 A T 1 0.24 582 rsl7632542_4 s.55716007 19-55716007 T G 1 0.24 583 rsl7632542_4 s.55718272 19-55718272 T A 1 0.24 584 rsl7632542_4 s.55723496 19-55723496 T C 0.72 0.25 585 rsl7632542_4 s.55724346 19-55724346 c T 1 0.24 586 rsl7632542_4 s.55726794 19-55726794 T G 1 0.24 587 rsl7632542_4 s.55729556 19-55729556 c A 1 0.24 588 rsl7632542_4 s.55729562 19-55729562 T G 1 0.24 589 rsl7632542_4 s.55729563 19-55729563 c A 1 0.24 590 rsl7632542_4 s.55731588 19-55731588 A G 0.72 0.25 591 rsl7632542_4 s.55733658 19-55733658 T G 1 0.24 592 rsl7632542_4 s.55741403 19-55741403 G C 1 0.24 593 rsl7632542_4 s.55743524 19-55743524 G T 1 0.24 594 rsl7632542_4 s.55745833 19-55745833 T A 1 0.24 595 rsl7632542_4 s.55746123 19-55746123 c T 1 0.24 596 rsl7632542_4 s.55747079 19-55747079 G T 1 0.24 597 rsl7632542_4 s.55748269 19-55748269 A T 1 0.24 598 rsl7632542_4 s.55748274 19-55748274 C T 1 0.24 599 rsl7632542_4 s.55748844 19-55748844 G T 1 0.24 600 rsl7632542_4 s.55749193 19-55749193 A G 1 0.24 601 rsl7632542_4 s.55752178 19-55752178 C T 1 0.24 602 rsl7632542_4 s.55752271 19-55752271 T A 1 0.24 603 rsl7632542_4 s.55770158 19-55770158 G A 1 0.24 604 rsl7632542_4 rs7247686 19-55770361 C T 1 0.24 272 rsl7632542_4 s.55771401 19-55771401 C T 1 0.24 605 rsl7632542_4 s.55772266 19-55772266 G c 1 0.24 606 Seq ID
Decrease Increase
Anchor SNP Surrogate Position D' r2 NO of
Allele Allele
surrogate rsl7632542_4 s.55775314 19-55775314 A C 1 0.24 607 rsl7632542_4 s.55778756 19-55778756 C G 1 0.24 608 rsl7632542_4 s.55788661 19-55788661 A G 1 0.24 609 rsl7632542_4 s.55790622 19-55790622 C T 1 0.24 610 rsl7632542_4 s.55791942 19-55791942 G A 1 0.24 611 rsl7632542_4 rsl0413426 19-55797671 A G 1 0.24 11 rsl7632542_4 s.55798366 19-55798366 T G 1 0.24 612 rsl7632542_4 s.55818900 19-55818900 c G 1 0.24 613 rsl7632542_4 s.55822129 19-55822129 T C 1 0.24 614 rsl7632542_4 s.55825528 19-55825528 A G 1 0.24 615 rsl7632542_4 s.55825624 19-55825624 G T 1 0.24 616 rsl7632542_4 s.55833489 19-55833489 C T 1 0.24 617 rsl7632542_4 s.55833938 19-55833938 A G 1 0.24 618 rsl7632542_4 s.55848124 19-55848124 C G 1 0.24 619 rsl7632542_4 s.55848125 19-55848125 C G 1 0.24 620 rsl7632542_4 s.55849044 19-55849044 G A 1 0.24 621 rsl7632542_4 s.55857289 19-55857289 G T 1 0.24 622 rsl7632542_4 s.55857585 19-55857585 T A 1 0.24 623 rsl7632542_4 s.55861107 19-55861107 T G 1 0.24 624 rsl7632542_4 s.55861111 19-55861111 c A 1 0.24 625 rsl7632542_4 s.55861196 19-55861196 c T 1 0.24 626 rsl7632542_4 s.55862851 19-55862851 c T 1 0.24 627 rsl7632542_4 s.55865439 19-55865439 c T 1 0.24 628 rsl7632542_4 s.55867208 19-55867208 T A 1 0.24 629 rsl7632542_4 s.55867650 19-55867650 T G 1 0.24 630 rsl7632542_4 s.55868902 19-55868902 A G 1 0.24 631 rsl7632542_4 s.55870429 19-55870429 G C 1 0.24 632 rsl7632542_4 rs73598616 19-55873660 T G 1 0.24 276 rsl7632542_4 s.55874339 19-55874339 A T 1 0.24 633 rsl7632542_4 s.55875249 19-55875249 G c 1 0.24 634 rsl7632542_4 s.55875725 19-55875725 A c 1 0.24 635 rsl7632542_4 s.55881262 19-55881262 T A 1 0.24 636 rsl7632542_4 s.55882788 19-55882788 G T 1 0.24 637 rsl7632542_4 s.55883542 19-55883542 T c 1 0.24 638 rsl7632542_4 s.55886467 19-55886467 G T 1 0.24 639 rsl7632542_4 s.55887498 19-55887498 A T 1 0.24 640 rsl7632542_4 s.55889175 19-55889175 A G 1 0.24 641 rsl7632542_4 s.55892113 19-55892113 G A 1 0.24 642 rsl7632542_4 s.55892618 19-55892618 A T 1 0.24 643 rsl7632542_4 s.55892866 19-55892866 A T 1 0.24 644 rsl7632542_4 s.55893305 19-55893305 C G 1 0.24 645 rsl7632542_4 s.55896443 19-55896443 A G 1 0.24 646 rsl7632542_4 s.55896826 19-55896826 T A 1 0.24 647 rsl7632542_4 s.55898241 19-55898241 G T 1 0.24 648 rsl7632542_4 s.55898245 19-55898245 T A 1 0.24 649 rsl7632542_4 s.55899120 19-55899120 c T 1 0.24 650 rsl7632542_4 s.55900597 19-55900597 A G 1 0.24 651 rsl7632542_4 s.55900764 19-55900764 C A 1 0.24 652 rsl7632542_4 s.55912567 19-55912567 C T 1 0.24 653 rsl7632542_4 s.55914840 19-55914840 G A 1 0.24 654 rsl7632542_4 s.55915776 19-55915776 T G 1 0.24 655 rsl7632542_4 s.55936192 19-55936192 G T 1 0.24 656 rsl7632542_4 s.55940336 19-55940336 T c 1 0.24 657 rsl7632542_4 s.55946316 19-55946316 A G 1 0.24 658 rsl7632542_4 s.55949971 19-55949971 G C 1 0.24 659 rsl7632542_4 s.55955333 19-55955333 A G 1 0.24 660 rsl7632542_4 s.55962188 19-55962188 A T 1 0.24 661 rsl7632542_4 s.55963864 19-55963864 A G 1 0.24 662 rsl7632542_4 s.55969754 19-55969754 A T 1 0.24 663 Seq ID
Decrease Increase
Anchor SNP Surrogate Position D' r2 NO of
Allele Allele
surrogate rsl7632542_4 s.55979135 19-55979135 A T 1 0.24 664 rsl7632542_4 rs67367861 19-55987833 T c 1 0.24 252 rsl7632542_4 s.55989580 19-55989580 T A 1 0.24 665 rsl7632542_4 s.56004001 19-56004001 G A 1 0.24 666 rsl7632542_4 s.56006528 19-56006528 C G 1 0.24 667 rsl7632542_4 s.56012046 19-56012046 T G 1 0.24 668 rsl7632542_4 s.56013739 19-56013739 A G 1 0.24 669 rsl7632542_4 rs2411330 19-56015173 C G 1 0.24 134 rsl7632542_4 rs3212825 19-56017315 C G 1 0.24 176 rsl7632542_4 s.56018053 19-56018053 T G 1 0.24 670 rsl7632542_4 s.56019106 19-56019106 A C 1 0.24 671 rsl7632542_4 rs7246740 19-56025486 T A 1 0.24 271 rsl7632542_4 s.56025860 19-56025860 A G 1 0.24 672 rsl7632542_4 s.56026713 19-56026713 C T 1 0.24 673 rsl7632542_4 rs55786312 19-56026861 A T 1 0.21 241 rsl7632542_4 s.56026881 19-56026881 G A 1 0.24 674 rsl7632542_4 s.56026882 19-56026882 G A 1 0.24 675 rsl7632542_4 s.56027319 19-56027319 G A 1 0.24 676 rsl7632542_4 s.56029265 19-56029265 A C 1 0.24 677 rsl7632542_4 s.56029362 19-56029362 T G 1 0.24 678 rsl7632542_4 s.56032778 19-56032778 c G 1 0.24 679 rsl7632542_4 s.56032963 19-56032963 G T 1 0.24 680 rsl7632542_4 s.56032964 19-56032964 T G 1 0.24 681 rsl7632542_4 s.56033138 19-56033138 A G 0.82 0.49 682 rsl7632542_4 s.56033138 19-56033138 A G 1 0.43 682 rsl7632542_4 s.56033664 19-56033664 A T 1 0.21 683 rsl7632542_4 s.56033664 19-56033664 A T 1 0.36 683 rsl7632542_4 s.56036363 19-56036363 T G 1 0.24 684 rsl7632542_4 s.56037076 19-56037076 c T 1 0.36 685 rsl7632542_4 s.56037076 19-56037076 c T 1 0.61 685 rs2735839_3 rs2659051 19-56037380 c G 0.61 0.27 145 rsl7632542_4 s.56038334 19-56038334 G A 1 0.28 686 rsl7632542_4 s.56038334 19-56038334 G A 1 0.48 686 rsl7632542_4 s.56039736 19-56039736 G C 1 0.24 687 rs2735839_3 rs266849 19-56040902 G A 0.71 0.34 148 rsl7632542_4 s.56042100 19-56042100 G C 1 0.24 688 rsl7632542_4 s.56042603 19-56042603 G A 1 0.43 689 rsl7632542_4 s.56042603 19-56042603 G A 1 0.74 689 rsl7632542_4 rs2659124 19-56046409 A T 0.71 0.32 147 rsl7632542_4 rs2659124 19-56046409 A T 0.81 0.6 147 rsl7632542_4 s.56046798 19-56046798 T c 1 0.24 690 rsl7632542_4 rs266878 19-56050926 G c 0.7 0.26 149 rsl7632542_4 rs266878 19-56050926 G c 0.73 0.49 149 rsl7632542_4 rsl74776 19-56051664 T c 0.7 0.26 113 rsl7632542_4 rsl74776 19-56051664 T c 0.73 0.49 113 rsl7632542_4 s.56052630 19-56052630 c T 0.67 0.24 691 rsl7632542_4 s.56052630 19-56052630 c T 1 0.32 691 rsl7632542_4 s.56052652 19-56052652 T c 1 0.59 692 rsl7632542_4 s.56052652 19-56052652 T c 1 1 692 rs2735839_3 rsl7632542 19-56053569 c T 1 0.59 114 rsl7632542_4 s.56053983 19-56053983 G c 1 0.24 693 rsl7632542_4 s.56054527 19-56054527 G T 1 0.67 694 rsl7632542_4 s.56054527 19-56054527 G T 1 0.88 694 rs2735839_3 rs2659122 19-56054838 C T 1 0.33 146 rsl7632542_4 rsl058205 19-56055210 C T 1 0.43 12 rsl7632542_4 rsl058205 19-56055210 C T 1 0.73 12 rsl7632542_4 rs2569735 19-56056081 A G 1 0.54 137 rsl7632542_4 rs2569735 19-56056081 A G 1 0.92 137 rsl7632542_4 rs2735839 19-56056435 A G 1 0.59 7 Seq ID
Decrease Increase
Anchor SNP Surrogate Position D' r2 NO of
Allele Allele
surrogate rsl7632542_4 rs62113216 19-56056615 A T 1 0.43 247 rsl7632542_4 rs62113216 19-56056615 A T 1 0.74 247 rsl7632542_4 s.56058308 19-56058308 A G 1 0.24 695 rsl7632542_4 s.56058606 19-56058606 T A 1 0.24 696 rsl7632542_4 s.56058688 19-56058688 A T 1 0.24 697 rsl7632542_4 s.56058866 19-56058866 C T 1 0.24 698 rsl7632542_4 s.56060000 19-56060000 C A 1 0.24 699 rsl7632542_4 s.56061277 19-56061277 C G 1 0.24 700 rsl7632542_4 s.56062250 19-56062250 A C 0.52 0.23 701 rsl7632542_4 s.56066550 19-56066550 A T 1 0.24 702 rsl7632542_4 s.56066560 19-56066560 G c 1 0.24 703 rsl7632542_4 s.56066619 19-56066619 T G 1 0.24 704 rsl7632542_4 s.56067024 19-56067024 T C 0.53 0.21 705 rsl7632542_4 s.56067024 19-56067024 T C 0.72 0.4 705 rsl7632542_4 rs73592873 19-56074766 A G 1 0.24 275 rsl7632542_4 s.56076121 19-56076121 C G 1 0.24 706 rsl7632542_4 s.56076122 19-56076122 C G 1 0.24 707 rsl7632542_4 s.56078845 19-56078845 C G 1 0.24 708 rsl7632542_4 s.56085550 19-56085550 C G 1 0.24 709 rsl7632542_4 s.56093594 19-56093594 T G 0.78 0.37 710 rsl7632542_4 s.56472259 19-56472259 A C 1 0.24 711 rs2736098_4 s.1030492 5-1030492 A G 1 0.5 295 rs2736098_4 s.1233724 5-1233724 G C 0.49 0.24 386 rs2736098_4 s.1251946 5-1251946 G C 0.49 0.24 387 rs2736098_4 s.1257345 5-1257345 G A 1 0.5 388 rs2736098_4 s.1258032 5-1258032 A G 0.49 0.24 389 rs401681_2 rs9418 5-1278121 C T 0.52 0.21 291 rs401681_2 s.1282167 5-1282167 C T 0.68 0.22 390 rs401681_2 s.1285240 5-1285240 C T 0.51 0.24 391 rs401681_2 s.1285775 5-1285775 T A 0.53 0.23 392 rs401681_2 s.1287049 5-1287049 G A 0.68 0.22 393 rs2736098_4 s.1292191 5-1292191 T C 1 0.5 394 rs2736098_4 s.1334730 5-1334730 c A 1 0.27 395 rs401681_2 s.1349759 5-1349759 c T 0.63 0.22 396 rs401681_2 s.1350079 5-1350079 c A 1 0.22 397 rs401681_2 rs2736108 5-1350488 c T 0.63 0.22 158 rs401681_2 s.1350854 5-1350854 c T 0.63 0.22 398 rs401681_2 rs2735948 5-1352213 A G 0.78 0.51 156 rs401681_2 rs2735846 5-1352379 C G 0.64 0.24 153 rs401681_2 s.1352392 5-1352392 A G 1 0.28 399 rs401681_2 s.1353401 5-1353401 T C 0.59 0.34 400 rs401681_2 rs2735946 5-1353429 T G 0.94 0.51 155 rs401681_2 rs2736102 5-1355144 T C 0.94 0.51 157 rs401681_2 rs2853666 5-1355914 G A 0.95 0.68 166 rs401681_2 rs2735945 5-1356901 T C 0.94 0.51 154 rs401681_2 s.1359165 5-1359165 T C 0.96 0.71 401 rs401681_2 rs4530805 5-1359331 T C 0.96 0.71 215 rs401681_2 s.1359765 5-1359765 c G 0.96 0.8 402 rs401681_2 rs61574973 5-1362168 T C 0.96 0.71 246 rs401681_2 s.1362904 5-1362904 G A 0.96 0.9 403 rs401681_2 s.1363152 5-1363152 G A 0.96 0.77 404 rs401681_2 rsl2332579 5-1364198 C T 0.89 0.23 101 rs401681_2 rs6866783 5-1365020 T c 0.96 0.71 253 rs401681_2 s.1365329 5-1365329 T c 1 0.24 405 rs401681_2 rsl3356727 5-1365457 G A 0.96 0.77 112 rs401681_2 rsl3355267 5-1365935 T C 0.96 0.77 111 rs401681_2 s.1366701 5-1366701 A G 0.96 0.74 406 rs401681_2 rsl0078017 5-1367009 C T 0.96 0.77 10 rs401681_2 rs4975615 5-1368343 G A 0.96 0.71 234 Seq ID
Decrease Increase
Anchor SNP Surrogate Position D' r2 NO of
Allele Allele
surrogate rs401681_2 rs4975616 5-1368660 G A 0.96 0.8 235 rs401681_2 rs6554759 5-1370102 G A 1 0.29 250 rs401681_2 rs3816659 5-1370820 A G 1 0.93 190 rs401681_2 rsl801075 5-1370949 C T 1 0.31 115 rs401681_2 rs451360 5-1372680 A c 1 0.28 212 rs401681_2 rs421629 5-1373136 A G 1 1 199 rs401681_2 rs380286 5-1373247 A G 1 1 189 rs401681_2 rs402710 5-1373722 T C 1 0.29 195 rs401681_2 rsl0073340 5-1374873 T C 1 0.29 9 rs401681_2 rs414965 5-1377121 A G 1 0.93 197 rs401681_2 rs421284 5-1378590 C T 1 0.93 198 rs401681_2 rs466502 5-1378767 G A 1 0.97 228 rs401681_2 rs465498 5-1378803 G A 1 0.97 227 rs401681_2 rs452932 5-1383253 C T 1 1 214 rs401681_2 rs452384 5-1383840 C T 1 1 213 rs401681_2 rs370348 5-1384219 G A 1 1 185 rs401681_2 s.1386077 5-1386077 G A 1 0.93 407 rs401681_2 s.1386169 5-1386169 A G 1 0.65 408 rs401681_2 s.1386204 5-1386204 A G 1 0.51 409 rs401681_2 s.1386674 5-1386674 C G 1 0.35 410 rs401681_2 rs457130 5-1389178 T A 1 0.87 219 rs401681_2 rs467095 5-1389221 c T 1 0.9 229 rs401681_2 s.1389243 5-1389243 G A 1 0.97 411 rs401681_2 rs462608 5-1389626 A T 1 0.93 222 rs401681_2 rs456366 5-1390070 C T 1 0.65 218 rs401681_2 s.1390106 5-1390106 A T 1 0.97 412 rs401681_2 s.1390174 5-1390174 C T 1 0.35 413 rs401681_2 rs31487 5-1394101 C G 1 1 172 rs401681_2 s.1395154 5-1395154 C T 1 0.47 414 rs401681_2 rs31489 5-1395714 A c 1 0.93 173 rs401681_2 rs31490 5-1397458 A G 1 1 174 rs401681_2 rs27996 5-1398474 G A 1 0.93 159 rs401681_2 rs27071 5-1399081 C T 1 0.47 152 rs401681_2 rs27070 5-1399303 C G 1 0.9 151 rs401681_2 rs27068 5-1400239 T C 0.93 0.43 150 rs401681_2 s.1401106 5-1401106 c T 0.86 0.56 415 rs401681_2 rs37011 5-1401798 T A 0.92 0.8 184 rs401681_2 s.1402130 5-1402130 c G 1 0.45 416 rs401681_2 s.1402535 5-1402535 G A 0.87 0.64 417 rs401681_2 rs37009 5-1403339 T C 0.93 0.83 183 rs401681_2 rs40182 5-1403397 A G 0.93 0.83 194 rs401681_2 rs37008 5-1404538 A G 0.96 0.9 182 rs401681_2 rs37007 5-1405372 C G 0.93 0.83 181 rs401681_2 s.1407027 5-1407027 G A 1 0.32 418 rs401681_2 rs40181 5-1407462 T G 0.92 0.8 193 rs2736098_4 s.1407682 5-1407682 T A 1 0.5 419 rs401681_2 rs37006 5-1408058 T C 0.93 0.83 180 rs401681_2 s.1408859 5-1408859 T C 1 0.24 420 rs401681_2 rs37005 5-1409450 T C 0.96 0.9 179 rs401681_2 s.1409771 5-1409771 c A 0.93 0.83 421 rs401681_2 rs37002 5-1409944 T C 0.93 0.83 178 rs401681_2 s.1411822 5-1411822 T C 1 0.22 422 rs401681_2 s.1411901 5-1411901 c T 0.83 0.27 423 rs401681_2 s.1412098 5-1412098 T c 1 0.28 424 rs401681_2 rs31494 5-1414669 T G 1 0.55 175 rs401681_2 s.1418662 5-1418662 c T 1 0.28 425 rs401681_2 s.1419748 5-1419748 A G 1 0.28 426 rs2736098_4 s.1426206 5-1426206 A T 1 0.39 427 rs2736098_4 s.1426336 5-1426336 C T 1 0.5 428 Seq ID
Decrease Increase
Anchor SNP Surrogate Position D' r2 NO of
Allele Allele
surrogate rs2736098_4 s.1428371 5-1428371 C A 1 0.39 429 rs2736098_4 s.1428373 5-1428373 C A 1 0.66 430 rs2736098_4 s.1472454 5-1472454 c T 1 0.5 431 rs2736098_4 s.1518154 5-1518154 A c 1 0.21 432 rs2736098_4 s.1557827 5-1557827 C A 0.49 0.24 433 rs2736098_4 rsll743119 5-1583020 G C 1 0.21 98 rs2736098_4 s.1583465 5-1583465 T A 1 0.5 434 rs2736098_4 rs4551123 5-1589257 A G 1 0.21 216 rs2736098_4 s.1589581 5-1589581 C G 1 0.21 435 rs2736098_4 s.1591616 5-1591616 G C 1 0.24 436 rs2736098_4 s.1607388 5-1607388 C T 1 0.32 437 rs2736098_4 rs6893515 5-1615555 C T 0.49 0.24 255 rs2736098_4 s.1618305 5-1618305 G c 1 0.5 438 rs2736098_4 s.1621550 5-1621550 T c 0.49 0.24 439 rs2736098_4 s.1621551 5-1621551 G A 0.49 0.24 440 rs2736098_4 rs6892057 5-1630411 C G 1 0.5 254 rs2736098_4 s.1638061 5-1638061 T C 1 0.5 441 rs2736098_4 rs6898387 5-1638354 T C 1 0.5 256 rs2736098_4 rs7724451 5-1649038 A G 1 0.5 281 rs2736098_4 rs2937006 5-1662778 G A 1 0.5 169 rs2736098_4 s.1663985 5-1663985 G T 1 0.5 442 rs2736098_4 s.1667254 5-1667254 G A 1 0.5 443 rs2736098_4 s.1668831 5-1668831 C T 1 0.5 444 rs2736098_4 s.1673499 5-1673499 G A 1 0.5 445 rs2736098_4 s.1737379 5-1737379 A G 0.49 0.24 446 rs2736098_4 s.1756873 5-1756873 C A 0.49 0.24 447 rs2736098_4 s.1782909 5-1782909 A G 1 0.5 448 rs2736098_4 s.1788485 5-1788485 G C 1 0.5 449 rs2736098_4 s.1799150 5-1799150 G A 1 0.5 450 rs2736098_4 s.1800043 5-1800043 G T 1 0.5 451 rs2736098_4 s.1804565 5-1804565 G A 1 0.5 452 rs2736098_4 s.1812409 5-1812409 A G 1 0.5 453 rs2736098_4 s.886453 5-886453 A G 1 0.5 712 rs2736098_4 s.887600 5-887600 T C 1 0.5 713 rsl0993994_4 rs2012677 10-51174803 T A 1 0.65 714 rs4430796_l rs757210 17-33170628 A G 0.96 0.61 715 rs4430796_l rs7213769 17-33189279 C G 0.73 0.27 716 rsl0788160_l rslll99892 10-123066171 C T 0.77 0.29 717 rsl0788160_l rsll593067 10-122962348 C T 0.76 0.20 718 rsll067228_l rsl2820376 12-113587344 G A 0.91 0.24 719 rsl7632542_4 rs273622 19-56486259 G A 1 0.27 720 rs401681_2 rs2736098 5-1347086 G A 0.94 0.39 721 rs2736098_l rs2735845 5-1353584 G C 0.71 0.26 722 rs4430796_l rsl016990 17-33163028 G C 0.56 0.21 723 rs2736098_l rs31484 5-1390906 T A 0.94 0.39 724 rs401681_2 rs31484 5-1390906 T A 1 1.00 724
Suita ble markers in li nkage disequili brium with any one of rs401681, rs2736098, rsl0788160, rsl0993994, rsl l067228, rs4430796, rs2735839 and rsl7632542 may for example be selected using the data provided in Table 1.
In one em bodiment, suitable ma rkers in lin kage disequilibriu m with rs401681 are selected from the group consisting of rs2736098, rs31484, rs4635969, rs9418, s.1282167, s.1285240, s.1285775, s.1287049, s.1349759, s.1350079, rs2736108, s.1350854, rs2735948, rs2735846, s.1352392, s.1353401, rs2735946, rs2736102, rs2853666, rs2735945, s.1359165, rs4530805, s.1359765, rs61574973, s.1362904, s.1363152, rsl2332579, rs6866783, s.1365329, rsl3356727, rsl3355267, s.1366701, rsl0078017, rs4975615, rs4975616, rs6554759, rs3816659, rsl801075, rs451360, rs421629, rs380286, rs402710, rsl0073340, rs414965, rs421284, rs466502, rs465498, rs452932, rs452384, rs370348, s.1386077, s.1386169, s.1386204, s.1386674, rs457130, rs467095, s.1389243, rs462608, rs456366, s.1390106, s.1390174, rs31487, s.1395154, rs31489, rs31490, rs27996, rs27071, rs27070, rs27068, s.1401106, rs37011, s.1402130, s.1402535, rs37009, rs40182, rs37008, rs37007, s.1407027, rs40181, rs37006, s.1408859, rs37005, s.1409771, rs37002, s.1411822, s.1411901, s.1412098, rs31494, s.1418662, and s.1419748.
In one embodiment, suitable markers in linkage disequilibrium with rs2736098 are selected from the group consisting of rs2735845, rs31484, rs401681, s.1030492, s.1233724, s.1251946, s.1257345, s.1258032, s.1292191, s.1334730, s.1407682, s.1426206, s.1426336, s.1428371, s.1428373, s.1472454, s.1518154, s.1557827, rsll743119, s.1583465, rs4551123, s.1589581, s.1591616, s.1607388, rs6893515, s.1618305, s.1621550, s.1621551, rs6892057, s.1638061, rs6898387, rs7724451, rs2937006, s.1663985, s.1667254, s.1668831, s.1673499, s.1737379, s.1756873, s.1782909, s.1788485, s.1799150, s.1800043, s.1804565, s.1812409, s.886453, and s.887600.
In one embodiment, suitable markers in linkage disequilibrium with rsl0788160 are selected from the group consisting of rslll99892, rsll593067, s.122837469, rs2130779, s.122876448, s.122901140, s.122901142, s.122905335, rsl0788149, rsl0749408, rs2172071, rsll592107, rsl907218, rsl907220, rsl994655, rsl907221, rsl907225, rsl907226, rsl0749409, rslll99835, s.122991926, rs729014, s.122993518, s.122994309, s.122994946, rsl873450, rs2901290, s.122998594, s.122998678, s.122998978, rs2201026, rs4237529, s.122999386, rsl873451, rsl873452, rs4752520, rsl0886880, rsl0749412, s.123008216, rs3925042, rsll25527, rsll25528, rs4319451, rsl0788154, rs7081844, rs7076500, s.123011774, s.123011879, rslll99862, s.123014171, rsl2146156, s.123014499, s.123014519, rsl2146366, s.123014684, rs7091083, rs7074985, rs7915008, s.123015342, s.123015365, rsl0749413, rslll99866, s.123016003, rs7923130, rs7922901, rsl0886882, rsl0886883, rslll99867, s.123017698, s.123018111, rs4393247, s.123018188, rs4489674, rslll99868, s.123018670, s.123019408, s.123019759, rslll99869, s.123020245, s.123020365, rsl0886885, rsl0788159, rsl0886886, rslll99871, rslll99872, rsl2761612, rs4575197, rslll99874, rsl0886887, s.123023625, s.123023836, rs4465316, rs4468286, rsl0886890, rsl0788162, s.123028135, rsl2413648, s.123029102, rsl0788163, s.123031617, s.123031811, rsl0788164, rsll598592, rsl0788165, rs9630106, rsl0886893, s.123034821, rslll99879, rslll99881, rsl2415826, rsl0788166, rsl0886894, rsl0886895, rsl0886896, rsl0886897, rsl0886898, rsl0886899, rsl0886900, rsl0886901, rsl0886902, rsl0886903, rsl2413088, rsl0788167, s.123047182, rs7085073, rs7071101, rsl2570783, rslll99884, rs7085506, rsl0886905, rsl0736302, s.123061811, s.123062031, rslll99886, s.123063327, s.123063715, rsl0886907,
s.123064252, s.123064345, s.123064780, s.123064783, s.123066424, s.123066700, rs3981043, rsll l99896, rsl ll99897, rsl l l99898, s.123067963, rsl l l99900, rsll l99901, s.123068178, s.123068222, s.123068236, s.123068424, s.123068619, s.123068743, s.123068926, s.123068997, s.123069012, s.123069326, s.123069570, s.123069989, s.123070105, s.123071090, s.123071347, rs4254007, s.123071495, s.123071914,
s.123072804, rs7900630, s.123074016, rsl896416, s.123074531, s.123074928, s.123076274, s.123076472, rs2420925, s.123077398, s.123077455, rsl2779205, rsll l99912, rs4752534, s.123078389, rsl896420, rsl896419, s.123079199, s.123081990, s.123081993, s.123081998, and s.123201870.
In one embodiment, suitable markers in linkage disequilibrium with rsl0993994 are selected from the group consisting of s.51157005, s.51159221, rs35716372, s.51159373, s.51159376, s.51159399, s.51159786, rs4935090, rsl2781411, s.51162137, s.51162792, s.51162795, rsll004246, s.51165690, rsl l004324, rs2843562, rsl l004409, rsll004415, rsl l004422, s.51168415, rsl l004435, rsl l599333, s.51170094, s.51170307, rsl2763717, rs67289834, s.51172442, s.51172558, rs57858801, s.51172618, s.51172808, s.51173184, rs7071471, rs7090326, s.51173565, s.51173983, s.51174391, s.51174499, s.51174610, s.51174944, s.51175013, s.51175409, s.51176290, s.51176963, s.51180209, rsl0825652, s.51180819, rs2843560, rs2125770, rs2611513, rs2611512, rs2611509, s.51186305, rs2926494, rs2611508, rs2611507, s.51188694, rs2611506, rs57263518, s.51189522, rs3101227, rs2843549, rs2843550, rs2249986, rs2843551, s.51192126, rs7077830, s.51193219, rs2843554, s.51194280, rs2611489, rs3123078, rs4935162, rs7081532, rsl0826075, rs7896156, s.51199599, rs6481329, rs7910704, rs4554834, rsl0826125, rsl0826127, rs4486572, rs4581397, rs4630240, rs7920517, rs4630241, rs9787697, rsl0763534, rsl0763536, s.51205998, rsl0763546, s.51206890, rs4131357, s.51207437, s.51207481, s.51208175, rsll006207, rsl0763576, s.51208921, rsll593361, rsl0763588, rsl l006274, s.51210619, s.51210866, rs4630243, rs4512771, rs4306255, s.51213076, rs4631830, rs7075009, rs7098889, rs4304716, s.51214689, s.51214690, rs7477953, s.51215034, s.51216121, s.51216342, rs7075697, s.51219226, s.51219227, s.51219230, s.51219320, s.51221179, and rs2012677.
In one embodiment, suitable markers in linkage disequilibrium with rsll067228 are selected from the group consisting of rsl2820376, s.113576401, s.113582477, s.113584188, s.113584539, s.113585097, rsl2819162, rsll609105, rs514849, rs513061, s.113590733, rsl061657, rs8853, rs3741698, s.113594635, rs567223, rs551510, rs59336, s.113601412, rs515746, rs545076, and s.113614584.
In one embodiment, suitable markers in linkage disequilibrium with rs4430796 are selected from the group consisting of rs757210, rs7213769, rsl016990, rsl7626423, rs3744763, rs7405776, rs2005705, s.33170591, rsl l263761, rs4239217, rsll651755, rsl0908278, s.33174083, rsll657964, rs7501939, rs8064454, s.33175746, s.33176039, rs7405696, rsl l651052, rsll263763, rsl l658063, rs9913260, rs3760511, and s.33182344.
In one embodiment, suitable markers in linkage disequilibrium with rs2735839 are selected from the group consisting of rs2659051, rs266849, rsl7632542, and rs2659122. In one embodiment, suitable markers in linkage disequilibrium with rsl7632542 are selected from the group consisting of rs273622, s.55554247, s.55566277, s.55582344, rs2546552, s.55596785, s.55597645, s.55598078, s.55600121, s.55605246, s.55606024, s.55607242, s.55624341, s.55630396, s.55630578, s.55630679, s.55630791, s.55631170, s.55632347, s.55632363, s.55636052, s.55637350, s.55640040, s.55646568, s.55649132, s.55650629, s.55650844, s.55652397, s.55653401, s.55653991, s.55654907, s.55657973, s.55659043, s.55660011, s.55660013, s.55660139, s.55660143, s.55661660, s.55661718, rs6509476, s.55664020, s.55664897, s.55665723, s.55665726, s.55672641, s.55673254, s.55674252, s.55674254, s.55674727, s.55676073, s.55683393, s.55687122, s.55695317, s.55697027, s.55701748, rs7257447, s.55702308, s.55703568, s.55706751, s.55708051, s.55709067, s.55709498, s.55709766, s.55710030, s.55710848, s.55710851, s.55711749, s.55712802, s.55713451, s.55713453, s.55713458, s.55713862, s.55716007, s.55718272, s.55723496, s.55724346, s.55726794, s.55729556, s.55729562, s.55729563, s.55731588, s.55733658, s.55741403, s.55743524, s.55745833, s.55746123, s.55747079, s.55748269, s.55748274, s.55748844, s.55749193, s.55752178, s.55752271, s.55770158, rs7247686, s.55771401, s.55772266, s.55775314, s.55778756, s.55788661, s.55790622, s.55791942, rsl0413426, s.55798366, s.55818900, s.55822129, s.55825528, s.55825624, s.55833489, s.55833938, s.55848124, s.55848125, s.55849044, s.55857289, s.55857585, s.55861107, s.55861111, s.55861196, s.55862851, s.55865439, s.55867208, s.55867650, s.55868902, s.55870429, rs73598616, s.55874339, s.55875249, s.55875725, s.55881262, s.55882788, s.55883542, s.55886467, s.55887498, s.55889175, s.55892113, s.55892618, s.55892866, s.55893305, s.55896443, s.55896826, s.55898241, s.55898245, s.55899120, s.55900597, s.55900764, s.55912567, s.55914840, s.55915776, s.55936192, s.55940336, s.55946316, s.55949971, s.55955333, s.55962188, s.55963864, s.55969754, s.55979135, rs67367861, s.55989580, s.56004001, s.56006528, s.56012046, s.56013739, rs2411330, rs3212825, s.56018053, s.56019106, rs7246740, s.56025860, s.56026713, rs55786312, s.56026881, s.56026882, s.56027319, s.56029265, s.56029362, s.56032778, s.56032963, s.56032964, s.56033138, s.56033138, s.56033664, s.56033664, s.56036363, s.56037076, s.56037076, s.56038334, s.56038334, s.56039736, s.56042100, s.56042603, s.56042603, rs2659124, rs2659124, s.56046798, rs266878, rs266878, rsl74776, rsl74776, s.56052630, s.56052630, s.56052652, s.56052652, s.56053983, s.56054527, s.56054527, rsl058205, rsl058205, rs2569735, rs2569735, rs2735839, rs62113216, rs62113216, s.56058308, s.56058606, s.56058688, s.56058866, s.56060000, s.56061277, s.56062250, s.56066550, s.56066560, s.56066619, s.56067024, s.56067024, rs73592873, s.56076121, s.56076122, s.56078845, s.56085550, s.56093594, and s.56472259.
The skilled person will appreciate that using the LD data provided in Table 1, suitable surrogate markers may be selected based on suitable cutoff values for the LD measures r2 and D'.
Detecting polymorphic markers
Alleles for SNP markers as referred to herein refer to the bases A, C, G or T as they occur at the polymorphic site. The allele codes for SNPs used herein are as follows: 1= A, 2=C, 3=G, 4=T. Since human DNA is double-stranded, the person skilled in the art will realise that by assaying or reading the opposite DNA strand, the complementary allele can in each case be measured. Thus, for a polymorphic site (polymorphic marker) characterized by an A/G polymorphism, the methodology employed to detect the marker may be designed to specifically detect the presence of one or both of the two bases possible, i.e. A and G. Alternatively, by designing an assay that is designed to detect the complimentary strand on the DNA template, the presence of the complementary bases T and C can be measured . Quantitatively (for example, in terms of risk estimates), identical results would be obtained from measurement of either DNA strand (+ strand or - strand) .
A haplotype refers to a single-stranded segment of DNA that is characterized by a specific combination of alleles arranged along the segment. For diploid organisms such as humans, a haplotype comprises one member of the pair of alleles for each polymorphic marker or locus. In a certain embodiment, the haplotype can comprise two or more alleles, three or more alleles, four or more alleles, or five or more alleles, each allele corresponding to a specific polymorphic marker along the segment. Haplotypes can comprise a combination of various polymorphic markers, e.g. , SNPs and microsatellites, having particular alleles at the polymorphic sites. The haplotypes thus comprise a combination of alleles at various genetic markers.
It is possible to impute or predict genotypes for un-genotyped relatives of genotyped individuals. For every un-genotyped case, it is possible to calculate the probability of the genotypes of its relatives given its four possible phased genotypes. In practice it may be preferable to include only the genotypes of the case's parents, children, siblings, half-siblings (and the half-sibling's parents), grand-parents, grand-children (and the grand-children's parents) and spouses. It will be assumed that the individuals in the small sub-pedigrees created around each case are not related through any path not included in the pedigree. It is also assumed that alleles that are not transmitted to the case have the same frequency - the population allele frequency. Let us consider a SNP marker with the alleles A and G. The probability of the genotypes of the case's relatives can then be computed by:
Pr(genotypes of relatives; #) = ^ Pr(/z; #)Pr(genotypes of relatives | h) , where Θ denotes the A allele's frequency in the cases. Assuming the genotypes of each set of relatives are independent, this allows us to write down a likelihood function for Θ:
L(0) = ]^[Pr(genotypesof relatives of case i; Θ) . (*)
This assumption of independence is usually not correct. Accounting for the dependence between individuals is a difficult and potentially prohibitively expensive computational task. The likelihood function in (*) may be thought of as a pseudolikelihood approximation of the full likelihood function for Θ which properly accounts for all dependencies. In general, the genotyped cases and controls in a case-control association study are not independent and applying the case-control method to related cases and controls is an analogous approximation . The method of genomic control (Devlin, B. et al ., Nat Genet 36, 1129-30; author reply 1131 (2004)) has proven to be successful at adjusting case-control test statistics for relatedness. We therefore apply the method of genomic control to account for the dependence between the terms in our
pseudolikelihood and produce a valid test statistic.
Fisher's information can be used to estimate the effective sample size of the part of the pseudolikelihood due to un-genotyped cases. Breaking the total Fisher information, J, into the part due to genotyped cases, Ig, and the part due to ungenotyped cases, Iu, I = Ig + Iu, and denoting the number of genotyped cases with N, the effective sample size due to the ungenotyped cases is estimated by—N . It is also possible to impute genotypes for markers with no genotype data . For example, using the IMPUTE (Marchini, J. et al. Nat Genet 39: 906-13 (2007)) software and the HapMap (NCBI Build 36 (dbl26b)) CEU data as reference (Frazer, K.A., et al. Nature 449 : 851-61 (2007)) it is possible to impute ungenotyped markers. This can be useful for extending genotype coverage, if the CEU dataset has been genotyped.
Analyzing multiple markers
A genetic variant associated with a disease or a trait such as PSA quantity can be used alone to predict the risk of the disease for a given genotype. For a bia I le lie marker, such as a SNP, there are 3 possible genotypes: homozygote for the at risk variant, heterozygote, and non carrier of the at risk variant. Risk associated with variants at multiple loci can be used to estimate overall risk. For multiple SNP variants, there are k possible genotypes k = 3" x 2P; where n is the number autosomal loci and p the number of gonosomal (sex chromosomal) loci. Overall risk assessment calculations for a plurality of risk variants usually assume that the relative risks of different genetic variants multiply, i.e. the overall risk (e.g. , RR or OR) associated with a particular genotype combination is the product of the risk values for the genotype at each locus. If the risk presented is the relative risk for a person, or a specific genotype for a person, compared to a reference population with matched gender and ethnicity, then the combined risk is the product of the locus specific risk values and also corresponds to an overall risk estimate compared with the population. If the risk for a person is based on a comparison to non-carriers of the at risk allele, then the combined risk corresponds to an estimate that compares the person with a given combination of genotypes at all loci to a group of individuals who do not carry risk variants at any of those loci. The group of non-carriers of any at risk variant has the lowest estimated risk and has a combined risk compared with itself {i.e., non-carriers) of 1.0, but has an overall risk, compare with the population, of less than 1.0. It should be noted that the group of non-carriers can potentially be very small, especially for large number of loci, and in that case, its relevance is correspondingly small.
The multiplicative model is a parsimonious model that usually fits the data of complex traits reasonably well. Deviations from multiplicity have been rarely described in the context of common variants for common diseases, and if reported are usually only suggestive since very large sample sizes are usually required to be able to demonstrate statistical interactions between loci.
By way of an example, let us consider a case of eight variants that have been associated with risk prostate cancer (Gudmundsson, J., et al., Nat Genet 39:631-7 (2007), Gudmundsson, J., et al., Nat Genet 39:977-83 (2007); Yeager, M ., et al, Nat Genet 39:645-49 (2007), Amundadottir, L, el al., Nat Genet 38:652-8 (2006); Haiman, C.A., et al., Nat Genet 39:638-44 (2007)) . Seven of these loci are on autosomes, and the remaining locus is on chromosome X. The total number of theoretical genotypic combinations is then 37 x 21 = 4374. Some of those genotypic classes are very rare, but are still possible, and should be considered for overall risk assessment. It is likely that the multiplicative model applied in the case of multiple genetic variants will also be valid in conjugation with non-genetic risk variants assuming that the genetic variant does not clearly correlate with the "environmental" factor. In other words, genetic and non-genetic at- risk variants can be assessed under the multiplicative model to estimate combined risk, assuming that the non-genetic and genetic risk factors do not interact.
Using the same quantitative approach, the combined or overall effect of any plurality of variants associated with PSA quantity and prostate cancer risk, as described herein, may be assessed .
Risk assessment and Diagnostics
Within any given population, there is an absolute risk of developing a disease or trait, defined as the chance of a person developing the specific disease or trait over a specified time-period . For example, a woman's lifetime absolute risk of breast cancer is one in nine. That is to say, one woman in every nine will develop breast cancer at some point in their lives. Risk is typically measured by looking at very large numbers of people, rather than at a particular individual. Risk is often presented in terms of Absolute Risk (AR) and Relative Risk (RR) . Relative Risk is used to compare risks associating with two variants or the risks of two different groups of people. For example, it can be used to compare a group of people with a certain genotype with another group having a different genotype. For a disease, a relative risk of 2 means that one group has twice the chance of developing a disease as the other group. The risk presented is usually the relative risk for a person, or a specific genotype of a person, compared to the population with matched gender and ethnicity. Risks of two individuals of the same gender and ethnicity could be compared in a simple manner. For example, if, compared to the population, the first individual has relative risk 1.5 and the second has relative risk 0.5, then the risk of the first individual compared to the second individual is 1.5/0.5 = 3.
Risk Calculations
The creation of a model to calculate the overall genetic risk involves two steps: i) conversion of odds-ratios for a single genetic variant into relative risk and ii) combination of risk from multiple variants in different genetic loci into a single relative risk value. Deriving risk from odds-ratios
Most gene discovery studies for complex diseases that have been published to date in authoritative journals have employed a case-control design because of their retrospective setup. These studies sample and genotype a selected set of cases (people who have the specified disease condition) and control individuals. The interest is in genetic variants (alleles) which frequency in cases and controls differ significantly.
The results are typically reported in odds ratios, that is the ratio between the fraction
(probability) with the risk variant (carriers) versus the non-risk variant (non-carriers) in the groups of affected versus the controls, i.e. expressed in terms of probabilities conditional on the affection status:
OR = (Pr(c|A)/Pr(nc|A)) / (Pr(c| C)/Pr(nc| C))
Sometimes it is however the absolute risk for the disease that we are interested in, i.e. the fraction of those individuals carrying the risk variant who get the disease or in other words the probability of getting the disease. This number cannot be directly measured in case-control studies, in part, because the ratio of cases versus controls is typically not the same as that in the general population. However, under certain assumption, we can estimate the risk from the odds ratio.
It is well known that under the rare disease assumption, the relative risk of a disease can be approximated by the odds ratio. This assumption may however not hold for many common diseases. Still, it turns out that the risk of one genotype variant relative to another can be estimated from the odds ratio expressed above. The calculation is particularly simple under the assumption of random population controls where the controls are random samples from the same population as the cases, including affected people rather than being strictly unaffected individuals. To increase sample size and power, many of the large genome-wide association and replication studies use controls that were neither age-matched with the cases, nor were they carefully scrutinized to ensure that they did not have the disease at the time of the study.
Hence, while not exactly, they often approximate a random sample from the general population . It is noted that this assumption is rarely expected to be satisfied exactly, but the risk estimates are usually robust to moderate deviations from this assumption.
Calculations show that for the dominant and the recessive models, where we have a risk variant carrier, "c", and a non-carrier, "nc", the odds ratio of individuals is the same as the risk ratio between these variants:
OR = Pr(A| c)/Pr(A| nc) = r
And likewise for the multiplicative model, where the risk is the product of the risk associated with the two allele copies, the allelic odds ratio equals the risk factor:
OR = Pr(A| aa)/Pr(A| ab) = Pr(A| ab)/Pr(A| bb) = r
Here "a" denotes the risk allele and "b" the non-risk allele. The factor "r" is therefore the relative risk between the allele types. For many of the studies published in the last few years, reporting common variants associated with complex diseases, the multiplicative model has been found to summarize the effect adequately and most often provide a fit to the data superior to alternative models such as the dominant and recessive models.
The risk relative to the average population risk
It is most convenient to represent the risk of a genetic variant relative to the average population since it makes it easier to communicate the lifetime risk for developing the disease compared with the baseline population risk. For example, in the multiplicative model we can calculate the relative population risk for variant "aa" as:
RR(aa) = Pr(A| aa)/Pr(A) = (Pr(A| aa)/Pr(A| bb))/(Pr(A)/Pr(A| bb)) = r2/(Pr(aa) r2 + Pr(ab) r + Pr(bb)) = r2/(p2 r2 + 2pq r + q2) = r2/R
Here "p" and "q" are the allele frequencies of "a" and "b" respectively. Likewise, we get that RR(ab) = r/R and RR(bb) = 1/R. The allele frequency estimates may be obtained from the publications that report the odds-ratios and from the HapMap database. Note that in the case where we do not know the genotypes of an individual, the relative genetic risk for that test or marker is simply equal to one.
Combining the risk from multiple markers
When genotypes of many SNP variants are used to estimate the risk for an individual a multiplicative model for risk can generally be assumed. This means that the combined genetic risk relative to the population is calculated as the product of the corresponding estimates for individual markers, e.g. for two markers gl and g2:
RR(gl,g2) = RR(g l)RR(g2)
The underlying assumption is that the risk factors occur and behave independently, i .e. that the joint conditional probabilities can be represented as products:
Pr(A| gl,g2) = Pr(A| gl)Pr(A| g2)/Pr(A) and Pr(gl,g2) = Pr(gl)Pr(g2)
Obvious violations to this assumption are markers that are closely spaced on the genome, i .e. in linkage disequilibrium, such that the concurrence of two or more risk alleles is correlated. In such cases, we can use so called haplotype modeling where the odds-ratios are defined for all allele combinations of the correlated SNPs.
As is in most situations where a statistical model is utilized, the model applied is not expected to be exactly true since it is not based on an underlying bio-physical model. However, the multiplicative model has so far been found to fit the data adequately, i.e. no significant deviations are detected for many common diseases for which many risk variants have been discovered.
As an example, an individual who has the following genotypes at 4 hypothetical markers associated with a particular disease along with the risk relative to the population at each marker: Marker Genotype Calculated risk
M l CC 1.03
M2 GG 1.30
M3 AG 0.88
M4 TT 1.54
Combined, the overall risk relative to the population for this individual is: 1.03 x 1.30 x0.88x 1.54 = 1.81.
Risk assessment of prostate cancer
As described herein, certain polymorphic markers and haplotypes comprising such markers are found to be useful for risk assessment of prostate cancer. Certain markers have also been found to be useful for correcting PSA quantity to establish a corrected PSA quantity based on the genotype of individuals at particular polymorphic markers. Markers in linkage disequilibrium with any such marker are, by necessity, also useful in such applications. This fact is obvious to the skilled person, who thus knows that surrogate markers may be suitably selected to detect the effect of any particular anchor marker. The stronger the linkage disequilibrium to the anchor marker, the better the surrogate, and thus the more similar the results obtained by detecting the surrogate will be to that of the anchor marker. Markers with values of r2 equal to 1 are perfect surrogates anchor marker, i .e. genotypes for the surrogate marker perfectly predicts genotypes for the anchor marker. Markers with smaller values of r2 than 1 can also be useful surrogates, although they are expected to give rise to observed effects that are smaller than for the anchor marker. Alternatively, such surrogate markers may represent variants with effects (e.g., OR, RR for prostate cancer, or effect on PSA levels) as high as or possibly even higher than that of the anchor marker. In this scenario, the anchor variant identified may not be the functional variant itself, but is in this instance in linkage disequilibrium with the true functional variant. The functional variant may be a SNP, but may also for example be a tandem repeat, such as a minisatellite or a microsatellite, a transposable element (e.g., an Alu element), or a structural alteration, such as a deletion, insertion or inversion (sometimes also called copy number variations, or CNVs) . The present invention encompasses the assessment of such surrogate markers for the markers as disclosed herein. Such markers are annotated, mapped and listed in public databases, as well known to the skilled person, or can alternatively be readily identified by sequencing a genomic region or a part of the region identified by the markers of the present invention in a group of individuals, and identify polymorphisms in the resulting group of sequences. As a consequence, the person skilled in the art can readily and without undue experimentation identify and genotype surrogate markers in linkage disequilibrium with the markers described herein.
Detection of nucleic acid sequence as described herein can in certain embodiments be practiced by assessing a sample comprising genomic DNA from an individual for the presence of certain variants described herein to be associated with PSA levels and risk of prostate cancer. Such assessment typically includes steps that detect the presence or absence of at least one allele of at least one polymorphic marker, using methods well known to the skilled person and further described herein, and based on the outcome of such assessment, determine whether the individual from whom the sample is derived is at increased or decreased risk (i.e., increased or decreased susceptibility) of prostate, or determine a corrected PSA value based on the outcome. Obtaining nucleic acid sequence data can comprise nucleic acid sequence at a single nucleotide position, which is sufficient to identify alleles at SNPs. The nucleic acid sequence data can also comprise sequence at any other number of nucleotide positions, in particular for genetic markers that comprise multiple nucleotide positions, and can be anywhere from two to hundreds of thousands, possibly even millions, of nucleotides (in particular, in the case of copy number variations (CNVs)) .
In certain embodiments, the invention can be practiced utilizing a dataset comprising information about the genotype status of at least one polymorphic marker. In other words, a dataset containing information about particular polymorphic markers, for example in the form of genotype counts at a certain polymorphic marker, or a plurality of markers (e.g., an indication of the presence or absence of certain at-risk alleles, or the presence or absence of certain alleles predictive of increased or decreased PSA quantity), or actual genotypes for one or more markers, can be queried for the presence or absence of certain alleles.
It should be apparent to the skilled person that the methods described herein for determining corrected PSA quantity and methods of assessing prostate cancer susceptibility may be performed using multiple markers. Thus, any one, or a combination of the markers described herein may be used. In certain embodiments, the use of additional polymorphic markers useful in the method is contemplated. Methods known in the art and described herein may be used to determine the overall effect of such multiple markers.
Study population
The Icelandic population is a Caucasian population of Northern European ancestry. A large number of studies reporting results of genetic linkage and association in the Icelandic population have been published in the last few years. Many of those studies show replication of variants, originally identified in the Icelandic population as being associating with a particular disease, in other populations (Sulem, P., et al. Nat Genet May 17 2009 (Epub ahead of print); Rafnar, T., et al. Nat Genet 41 : 221-7 (2009); Greta rsdottir, S., et al. Ann Neurol 64:402-9 (2008); Stacey, S.N ., et al. Nat Genet 40 : 1313-18 (2008); Gudbjartsson, D.F., et al. Nat Genet 40: 886-91 (2008); Sty rka rsdottir, U ., et al. N Engl J Med 358: 2355-65 (2008); Thorgeirsson, T., et al. Nature 452: 638-42 (2008); Gudmundsson, J., et al. Nat Genet. 40 : 281-3 (2008); Stacey, S.N ., et al., Nat Genet. 39 : 865-69 (2007); Helgadottir, A., et al., Science 316: 1491-93 (2007);
Steinthorsdottir, V., et al., Nat Genet. 39 : 770-75 (2007); Gudmundsson, J., et al., Nat Genet. 39 : 631-37 (2007); Frayling, TM, Nature Reviews Genet 8:657-662 (2007); Amundadottir, L.T., et al., Nat Genet. 38: 652-58 (2006); Grant, S.F., et al., Nat Genet. 38: 320-23 (2006)) . Thus, genetic findings in the Icelandic population have in general been replicated in other populations, including populations from Africa and Asia .
By way of example, prostate cancer risk variants on Chromosome 8q24 (rsl447295 and rsl6901979), Chromosome 17q l2 (rs4430796), Chromosome 17q24.3 (rsl859962),
Chromosome 2pl5 (rs2710646), Chromosome l lq l3 (rsl0896450) and Chromosome Xpl l.22 (rs5945572), all of which had originally been identified in samples from the Icelandic population have been confirmed as risk variants of prostate cancer in many other populations.
It is thus believed that the markers described herein to be associated with PSA quantity and prostate cancer risk will show similar association in other human populations. Particular embodiments comprising individual human populations are therefore also contemplated and within the scope of the invention . Such embodiments relate to human individuals that are from one or more human population including, but not limited to, Caucasian populations, European populations, American populations, Eurasian populations, Asian populations, Central/South Asian populations, East Asian populations, Middle Eastern populations, African populations, Hispanic populations, and Oceanian populations.
In certain embodiments, the invention relates to markers and/or haplotypes identified in specific populations, as described in the above. The person skilled in the art will appreciate that linkage disequilibrium (LD) may vary across human populations. This is due to different population history of different human populations as well as differential selective pressures that may have led to differences in LD in specific genomic regions. It is also well known to the person skilled in the art that certain markers, e.g. SNP markers, have different population frequency in different populations, or are polymorphic in one population but not in another. The person skilled in the art will however apply available methods and methods described herein to practice the present invention in any given human population . For example, selecting markers in LD with an anchor marker may in certain embodiments be done using Caucasian samples. In general, however, markers in LD with an anchor markers may be suitably selected using LD determined in a particular population that is intended for study. For example, for applying the present invention in the Chinese population, it may be suitable to select markers in LD with a particular anchor marker (e.g., any of the markers shown herein to be predictive of PSA quantity in humans) based on LD measures determined in samples from the Chinese population. Such selection of markers is well known to the skilled person, and can be done using data from the public domain, for example data from the HapMap project (http://www. hapmap.org), utilizing methods known in the art.
As a consequence, certain embodiments of the invention pertain to markers that are in linkage disequilibrium with a marker selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, wherein linkage disequilibrium is determined in samples from the same human population as the individual being studied. In certain embodiments, the individual is Caucasian and the population is a Caucasian population. The population may also suitably be a European population, for example in cases where the individual is European or of European origin . Certain other embodiments relate to populations with a European origin.
Nucleic acids and polypeptides
The nucleic acids and polypeptides described herein can be used in methods and kits of the present invention. An "isolated" nucleic acid molecule, as used herein, is one that is separated from nucleic acids that normally flank the gene or nucleotide sequence (as in genomic sequences) and/or has been completely or partially purified from other transcribed sequences (e.g. , as in an RNA library) . For example, an isolated nucleic acid of the invention can be substantially isolated with respect to the complex cellular milieu in which it naturally occurs, or culture medium when produced by recombinant techniques, or chemical precursors or other chemicals when chemically synthesized. In some instances, the isolated material will form part of a composition (for example, a crude extract containing other substances), buffer system or reagent mix. In other circumstances, the material can be purified to essential homogeneity, for example as determined by polyacrylamide gel electrophoresis (PAGE) or column chromatography (e.g. , HPLC) . An isolated nucleic acid molecule of the invention can comprise at least about 50%, at least about 80% or at least about 90% (on a molar basis) of all macromolecular species present. With regard to genomic DNA, the term "isolated" also can refer to nucleic acid molecules that are separated from the chromosome with which the genomic DNA is naturally associated . For example, the isolated nucleic acid molecule can contain less than about 250 kb, 200 kb, 150 kb, 100 kb, 75 kb, 50 kb, 25 kb, 10 kb, 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of the nucleotides that flank the nucleic acid molecule in the genomic DNA of the cell from which the nucleic acid molecule is derived .
The nucleic acid molecule can be fused to other coding or regulatory sequences and still be considered isolated. Thus, recombinant DNA contained in a vector is included in the definition of "isolated" as used herein. Also, isolated nucleic acid molecules include recombinant DNA molecules in heterologous host cells or heterologous organisms, as well as partially or substantially purified DNA molecules in solution . "Isolated" nucleic acid molecules also encompass in vivo and in vitro RNA transcripts of the DNA molecules of the present invention . An isolated nucleic acid molecule or nucleotide sequence can include a nucleic acid molecule or nucleotide sequence that is synthesized chemically or by recombinant means. Such isolated nucleotide sequences are useful, for example, in the manufacture of the encoded polypeptide, as probes for isolating homologous sequences (e.g. , from other mammalian species), for gene mapping (e.g. , by in situ hybridization with chromosomes), or for detecting expression of the gene in tissue (e.g. , human tissue), such as by Northern blot analysis or other hybridization techniques.
The invention also pertains to nucleic acid molecules that hybridize under high stringency hybridization conditions, such as for selective hybridization, to a nucleotide sequence described herein (e.g. , nucleic acid molecules that specifically hybridize to a nucleotide sequence containing a polymorphic site associated with a marker or haplotype described herein) . Such nucleic acid molecules can be detected and/or isolated by allele- or sequence-specific hybridization (e.g. , under high stringency conditions) . Stringency conditions and methods for nucleic acid hybridizations are well known to the skilled person (see, e.g. , Current Protocols in Molecular Biology, Ausubel, F. et al, John Wiley & Sons, (1998), and Kraus, M. and Aaronson, S., Methods Enzymol. , 200 : 546-556 (1991), the entire teachings of which are incorporated by reference herein . The percent identity of two nucleotide or amino acid sequences can be determined by aligning the sequences for optimal comparison purposes (e.g. , gaps can be introduced in the sequence of a first sequence) . The nucleotides or amino acids at corresponding positions are then compared, and the percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e., % identity = # of identical positions/total # of positions x 100) . In certain embodiments, the length of a sequence aligned for comparison purposes is at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95%, of the length of the reference sequence. The actual comparison of the two sequences can be accomplished by well-known methods, for example, using a mathematical algorithm. A non-limiting example of such a mathematical algorithm is described in Karlin, S. and Altschul, S., Proc. Natl. Acad. Sci. USA, 90: 5873-5877 (1993) . Such an algorithm is incorporated into the NBLAST and XBLAST programs (version 2.0), as described in Altschul, S. et al., Nucleic Acids Res., 25: 3389-3402 (1997) . When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (e.g., NBLAST) can be used. See the website on the World Wide Web at ncbi. nlm .nih.gov. In one embodiment, parameters for sequence comparison can be set at score= 100, wordlength = 12, or can be varied (e.g. , W=5 or W=20) . Another example of an algorithm is BLAT (Kent, W.J. Genome Res. 12: 656-64 (2002)) . Other examples include the algorithm of Myers and Miller, CABIOS (1989), ADVANCE and ADAM as described in Torellis, A. and Robotti, C, Comput. Appl. Biosci. 10: 3-5 (1994); and FASTA described in Pearson, W. and Lipman, D., Proc. Natl. Acad. Sci. USA, 85: 2444-48 (1988) . In another embodiment, the percent identity between two amino acid sequences can be accomplished using the GAP program in the GCG software package (Accelrys, Cambridge, UK) .
The present invention also provides isolated nucleic acid molecules that contain a fragment or portion that hybridizes under highly stringent conditions to a nucleic acid that comprises, or consists of, the nucleotide sequence of any one of the KLK3 gene, the HNF1B gene, the FGFR2 gene, the TBX3 gene, the MSMB gene and the TERT gene, or a nucleotide sequence comprising, or consisting of, the complement of the nucleotide sequence of any one of the KLK3 gene, the HNF1B gene, the FGFR2 gene, the TBX3 gene, the MSMB gene and the TERT gene. In certain embodiments, the nucleotide sequence comprises at least one polymorphic allele contained in the markers described herein . The nucleic acid fragments of the invention are at least about 15, at least about 18, 20, 23 or 25 nucleotides, and can be 30, 40, 50, 100, 200, 500, 1000, 10,000 or more nucleotides in length . In a specific embodiment, the nucleic acid fragments are 15-500 nucleotides in length .
The nucleic acid fragments of the invention are used as probes or primers in assays such as those described herein . "Probes" or "primers" are oligonucleotides that hybridize in a base- specific manner to a complementary strand of a nucleic acid molecule. In addition to DNA and RNA, such probes and primers include polypeptide nucleic acids (PNA), as described in Nielsen, P. et al. , Science 254: 1497-1500 (1991) . A probe or primer comprises a region of nucleotide sequence that hybridizes to at least about 15, typically about 20-25, and in certain embodiments about 40, 50 or 75, consecutive nucleotides of a nucleic acid molecule. In one embodiment, the probe or primer comprises at least one allele of at least one polymorphic marker or at least one haplotype described herein, or the complement thereof. In particular embodiments, a probe or primer can comprise 100 or fewer nucleotides; for example, in certain embodiments from 6 to 50 nucleotides, or, for example, from 12 to 30 nucleotides. In other embodiments, the probe or primer is at least 70% identical, at least 80% identical, at least 85% identical, at least 90% identical, or at least 95% identical, to the contiguous nucleotide sequence or to the complement of the contiguous nucleotide sequence. In another embodiment, the probe or primer is capable of selectively hybridizing to the contiguous nucleotide sequence or to the complement of the contiguous nucleotide sequence. Often, the probe or primer further comprises a label, e.g., a radioisotope, a fluorescent label, an enzyme label, an enzyme co-factor label, a magnetic label, a spin label, an epitope label .
The nucleic acid molecules of the invention, such as those described above, can be identified and isolated using standard molecular biology techniques well known to the skilled person. The amplified DNA can be labeled (e.g. , radiolabeled, fluorescently labeled) and used as a probe for screening a cDNA library derived from human cells. The cDNA can be derived from mRNA and contained in a suitable vector. Corresponding clones can be isolated, DNA obtained following in vivo excision, and the cloned insert can be sequenced in either or both orientations by art- recognized methods to identify the correct reading frame encoding a polypeptide of the appropriate molecular weight. Using these or similar methods, the polypeptide and the DNA encoding the polypeptide can be isolated, sequenced and further characterized .
Kits
Kits useful in the methods of the invention comprise components useful in any of the methods described herein, including for example, primers for nucleic acid amplification, hybridization probes, restriction enzymes (e.g. , for RFLP analysis), allele-specific oligonucleotides, antibodies useful for detecting PSA, e.g. antibodies that bind to PSA epitopes, antibodies that bind to an altered PSA polypeptide (e.g. , antibodies that bind to PSA epitopes that comprise a I179T variation) or to a non-altered (native) polypeptide encoded, means for analyzing the nucleic acid sequence of a nucleic acid, , etc. The kits can for include necessary buffers, nucleic acid primers for amplifying nucleic acids of the invention, and reagents for allele-specific detection of the fragments amplified using such primers and necessary enzymes (e.g. , DNA polymerase) .
Additionally, kits can provide reagents for assays to be used in combination with the methods of the present invention, e.g. , reagents for use with other diagnostic assays. For example, in certain embodiments, kits provide reagents for performing a PSA assay.
In one embodiment, the invention pertains to a kit for assaying a sample from a subject to detect a the presence or absence of certain alleles at certain polymorphic markers in a subject, wherein the kit comprises reagents necessary for selectively detecting at least one allele of at least one polymorphism as described herein in the genome of the individual. In a particular embodiment, the reagents comprise at least one contiguous oligonucleotide that hybridizes to a fragment of the genome of the individual comprising at least one polymorphism of the present invention . In another embodiment, the reagents comprise at least one pair of oligonucleotides that hybridize to opposite strands of a genomic segment obtained from a subject, wherein each oligonucleotide primer pair is designed to selectively amplify a fragment of the genome of the individual that includes at least one polymorphism that is useful in the methods described herein . For example, in certain embodiments, the polymorphism is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsll067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith. In one embodiment the fragment is at least 20 base pairs in size. Such oligonucleotides or nucleic acids (e.g. , oligonucleotide primers) can be designed using portions of the nucleic acid sequence flanking polymorphisms (e.g. , SNPs or microsatellites) that are associated with PSA levels, as described herein . In another embodiment, the kit comprises one or more labeled nucleic acids capable of allele- specific detection of one or more specific polymorphic markers, and reagents for detection of the label. Suitable labels include, e.g., a radioisotope, a fluorescent label, an enzyme label, an enzyme co-factor label, a magnetic label, a spin label, an epitope label.
In particular embodiments, the polymorphic marker or haplotype to be detected by the reagents of the kit comprises one or more markers, two or more markers, three or more markers, four or more markers, five or more markers, six or more markers, seven or more markers, eight or more markers, nine or more markers, or ten or more markers.
In a further aspect of the present invention, a pack (kit) is provided, the pack comprising (i) reagents for determining PSA levels in humans, and (ii) reagents for determining sequence information about at least one polymorphic marker, wherein the at least one polymorphic marker is correlated with PSA quantity in humans. In certain embodiments, the reagents for determining sequence information comprise reagents for determining the presence or absence of at least one allele of at least one polymorphic marker.
In certain embodiments, the kit further comprises a set of instructions for using the reagents comprising the kit. In certain embodiments, the kit further comprises instructions for interpreting results obtained by using reagents in the kit. For example, the instructions in one embodiment comprise instructions for determining corrected PSA levels based on (a) uncorrected PSA levels obtained using reagents provided in the kit and (b) sequence information obtained using reagents provided in the kit. In another embodiment, the kit contains a data sheet providing information on corrected PSA values based on results on uncorrected PSA values and sequence information about at least one polymorphic marker obtained using the reagents provided in the kit.
Antibodies
The invention also provides antibodies which bind to an epitope comprising either a variant amino acid sequence (e.g., comprising an amino acid substitution) encoded by a variant allele or the reference amino acid sequence encoded by the corresponding non-variant or wild-type allele. The term "antibody" as used herein refers to immunoglobulin molecules and immunologically active portions of immunoglobulin molecules, i.e. , molecules that contain antigen-binding sites that specifically bind an antigen. A molecule that specifically binds to a polypeptide of the invention is a molecule that binds to that polypeptide or a fragment thereof, but does not substantially bind other molecules in a sample, e.g. , a biological sample, which naturally contains the polypeptide. Examples of immunologically active portions of immunoglobulin molecules include F(ab) and F(ab fragments which can be generated by treating the antibody with an enzyme such as pepsin. The invention provides polyclonal and monoclonal antibodies that bind to a polypeptide of the invention. The term "monoclonal antibody" or "monoclonal antibody composition", as used herein, refers to a population of antibody molecules that contain only one species of an antigen binding site capable of immunoreacting with a particular epitope of a polypeptide of the invention. A monoclonal antibody composition thus typically displays a single binding affinity for a particular polypeptide of the invention with which it immunoreacts.
Polyclonal antibodies can be prepared as described above by immunizing a suitable subject with a desired immunogen, e.g. , polypeptide of the invention or a fragment thereof. The antibody titer in the immunized subject can be monitored over time by standard techniques, such as with an enzyme linked immunosorbent assay (ELISA) using immobilized polypeptide. If desired, the antibody molecules directed against the polypeptide can be isolated from the mammal (e.g., from the blood) and further purified by well-known techniques, such as protein A
chromatography to obtain the IgG fraction. At an appropriate time after immunization, e.g. , when the antibody titers are highest, antibody-producing cells can be obtained from the subject and used to prepare monoclonal antibodies by standard techniques, such as the hybridoma technique originally described by Kohler and Milstein, Nature 256:495-497 (1975), the human B cell hybridoma technique (Kozbor et al., Immunol. Today 4: 72 (1983)), the EBV-hybridoma technique (Cole et al., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, 1985, Inc., pp. 77-96) or trioma techniques. The technology for producing hybridomas is well known (see generally Current Protocols in Immunology (1994) Coligan et al., (eds.) John Wiley & Sons, Inc., New York, NY) . Briefly, an immortal cell line (typically a myeloma) is fused to lymphocytes (typically splenocytes) from a mammal immunized with an immunogen as described above, and the culture supernatants of the resulting hybridoma cells are screened to identify a hybridoma producing a monoclonal antibody that binds a polypeptide of the invention .
Any of the many well known protocols used for fusing lymphocytes and immortalized cell lines can be applied for the purpose of generating a monoclonal antibody to a polypeptide of the invention (see, e.g. , Current Protocols in Immunology, supra; Galfre et al. , Nature 266: 55052 (1977); R.H. Kenneth, in Monoclonal Antibodies: A New Dimension In Biological Analyses, Plenum Publishing Corp., New York, New York (1980); and Lerner, Yale J. Biol. Med. 54: 387-402 (1981)) . Moreover, the ordinarily skilled worker will appreciate that there are many variations of such methods that also would be useful.
Alternative to preparing monoclonal antibody-secreting hybridomas, a monoclonal antibody to a polypeptide of the invention can be identified and isolated by screening a recombinant combinatorial immunoglobulin library (e.g., an antibody phage display library) with the polypeptide to thereby isolate immunoglobulin library members that bind the polypeptide. Kits for generating and screening phage display libraries are commercially available (e.g., the Pharmacia Recombinant Phage Antibody System, Catalog No. 27-9400-01; and the Stratagene SurfZkP™ Phage Display Kit, Catalog No. 240612). Additionally, examples of methods and reagents particularly amenable for use in generating and screening antibody display library can be found in, for example, U.S. Patent No. 5,223,409; PCT Publication No. WO 92/18619; PCT Publication No. WO 91/17271; PCT Publication No. WO 92/20791; PCT Publication No. WO 92/15679; PCT Publication No. WO 93/01288; PCT Publication No. WO 92/01047; PCT
Publication No. WO 92/09690; PCT Publication No. WO 90/02809; Fuchs et al., Bio/Technology 9 : 1370-1372 (1991); Hay et al. , Hum. Antibod. Hybridomas 3 : 81-85 (1992); Huse et al. , Science 246: 1275-1281 (1989); and Griffiths et al., EMBO J. 12 : 725-734 (1993).
Additionally, recombinant antibodies, such as chimeric and humanized monoclonal antibodies, comprising both human and non-human portions, which can be made using standard recombinant DNA techniques, are within the scope of the invention . Such chimeric and humanized monoclonal antibodies can be produced by recombinant DNA techniques known in the art.
In general, antibodies of the invention (e.g. , a monoclonal antibody) can be used to isolate a polypeptide of the invention by standard techniques, such as affinity chromatography or immunoprecipitation . A polypeptide-specific antibody can facilitate the purification of natural polypeptide from cells and of recombinantly produced polypeptide expressed in host cells.
Moreover, an antibody specific for a polypeptide of the invention can be used to detect the polypeptide (e.g. , in a cellular lysate, cell supernatant, or tissue sample) in order to evaluate the abundance and pattern of expression of the polypeptide. Antibodies can be used diagnostically to monitor protein levels in tissue as part of a clinical testing procedure, e.g. , to, for example, determine the efficacy of a given treatment regimen. The antibody can be coupled to a detectable substance to facilitate its detection . Examples of detectable substances include various enzymes, prosthetic groups, fluorescent materials, luminescent materials,
bioluminescent materials, and radioactive materials. Examples of suitable enzymes include horseradish peroxidase, alkaline phosphatase, beta-galactosidase, or acetylcholinesterase; examples of suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin; examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a luminescent material includes luminol; examples of bioluminescent materials include luciferase, luciferin, and aequorin, and examples of suitable radioactive material include 125I, 131I, 35S or 3H.
Antibodies may also be useful in pharmacogenomic analysis. In such embodiments, antibodies against variant proteins encoded by nucleic acids according to the invention, such as variant proteins that are encoded by nucleic acids that contain at least one polymorphic marker of the invention, can be used to identify individuals that require modified treatment modalities.
Antibodies can furthermore be useful for assessing expression of variant proteins in disease states, such as in active stages of a disease, or in an individual with a predisposition to a disease related to the function of the protein, in particular prostate cancer. In certain embodiments, antibodies are useful for assessing PSA quantity in humans. Antibodies specific for a variant protein of the present invention can be used to screen for the presence of the variant protein, for example to screen for a predisposition to prostate cancer as indicated by the presence of the variant protein . In one embodiment, the variant protein is a I179T variant of the KLK3 protein .
Antibodies can be used in other methods. Thus, antibodies are useful as diagnostic tools for evaluating proteins, such as variant proteins of the invention, in conjunction with analysis by electrophoretic mobility, isoelectric point, tryptic or other protease digest, or for use in other physical assays known to those skilled in the art. Antibodies may also be used in tissue typing . In one such embodiment, a specific variant protein has been correlated with expression in a specific tissue type, and antibodies specific for the variant protein can then be used to identify the specific tissue type.
Subcellular localization of proteins, including variant proteins, can also be determined using antibodies, and can be applied to assess aberrant subcellular localization of the protein in cells in various tissues. Such use can be applied in genetic testing, but also in monitoring a particular treatment modality. In the case where treatment is aimed at correcting the expression level or presence of the variant protein or aberrant tissue distribution or developmental expression of the variant protein, antibodies specific for the variant protein or fragments thereof can be used to monitor therapeutic efficacy.
Antibodies are further useful for inhibiting variant protein function, for example by blocking the binding of a variant protein to a binding molecule or partner. Such uses can also be applied in a therapeutic context in which treatment involves inhibiting a variant protein's function . An antibody can be for example be used to block or competitively inhibit binding, thereby modulating (i.e., agonizing or antagonizing) the activity of the protein . Antibodies can be prepared against specific protein fragments containing sites required for specific function or against an intact protein that is associated with a cell or cell membrane. For administration in vivo, an antibody may be linked with an additional therapeutic payload, such as radionuclide, an enzyme, an immunogenic epitope, or a cytotoxic agent, including bacterial toxins (diphtheria or plant toxins, such as ricin) . The in vivo half-life of an antibody or a fragment thereof may be increased by pegylation through conjugation to polyethylene glycol.
The present invention further relates to kits for using antibodies in the methods described herein . This includes, but is not limited to, kits for detecting the quantity of protein in a sample, and kits for detecting the presence of a variant protein in a sample. One preferred embodiment comprises antibodies such as a labelled or labelable antibody and a compound or agent for detecting PSA in a biological sample and/or means for determining the quantity of PSA protein in the sample, as well as instructions for use of the kit.
Antisense
The nucleic acids and/or variants described herein, or nucleic acids comprising their
complementary sequence, may be used as antisense constructs to control gene expression in cells, tissues or organs. The methodology associated with antisense techniques is well known to the skilled artisan, and is for example described and reviewed in AntisenseDrug Technology: Principles, Strategies, and Applications, Crooke, ed ., Marcel Dekker Inc., New York (2001) . In general, antisense agents (antisense oligonucleotides) are comprised of single stranded oligonucleotides (RNA or DNA) that are capable of binding to a complimentary nucleotide segment. By binding the appropriate target sequence, an RNA-RNA, DNA-DNA or RNA-DNA duplex is formed. The antisense oligonucleotides are complementary to the sense or coding strand of a gene. It is also possible to form a triple helix, where the antisense oligonucleotide binds to duplex DNA.
Several classes of antisense oligonucleotide are known to those skilled in the art, including cleavers and blockers. The former bind to target RNA sites, activate intracellular nucleases (e.g., RnaseH or Rnase L), that cleave the target RNA. Blockers bind to target RNA, inhibit protein translation by steric hindrance of the ribosomes. Examples of blockers include nucleic acids, morpholino compounds, locked nucleic acids and methylphosphonates (Thompson, Drug
Discovery Today, 7 : 912-917 (2002)) . Antisense oligonucleotides are useful directly as therapeutic agents, and are also useful for determining and validating gene function, for example by gene knock-out or gene knock-down experiments. Antisense technology is further described in Lavery et al. , Curr. Opin. Drug Discov. Devel. 6: 561-569 (2003), Stephens et al., Curr. Opin. Mol. Ther. 5 : 118-122 (2003), Kurreck, Eur. J. Biochem. 270: 1628-44 (2003), Dias et al., Mol. Cancer Ter. 1 : 347-55 (2002), Chen, Methods Mol. Med. 75: 621-636 (2003), Wang et al., Curr. Cancer Drug Targets 1 : 177-96 (2001), and Bennett, Antisense Nucleic Acid Drug Dev. 12 : 215- 24 (2002) .
In certain embodiments, the antisense agent is an oligonucleotide that is capable of binding to a particular nucleotide segment. In certain embodiments, the nucleotide segment comprises a fragment of a gene selected from the group consisting of the KLK3 gene, the HNF1B gene, the FGFR2 gene, the TBX3 gene, the MSMB gene and the TERT gene. In certain other embodiments, the antisense nucleotide is capable of binding to a nucleotide segment of as set forth in SEQ ID NO: 1-728. Antisense nucleotides can be from 5-500 nucleotides in length, including 5-200 nucleotides, 5-100 nucleotides, 10-50 nucleotides, and 10-30 nucleotides. In certain preferred embodiments, the antisense nucleotides are from 14-50 nucleotides in length, including 14-40 nucleotides and 14-30 nucleotides.
The variants described herein can also be used for the selection and design of antisense reagents that are specific for particular variants. Using information about the variants described herein, antisense oligonucleotides or other antisense molecules that specifically target mRNA molecules that contain one or more variants of the invention can be designed. In this manner, expression of mRNA molecules that contain one or more variant of the present invention (i.e. certain marker alleles and/or haplotypes) can be inhibited or blocked. In one embodiment, the antisense molecules are designed to specifically bind a particular allelic form (i.e., one or several variants (alleles and/or haplotypes)) of the target nucleic acid, thereby inhibiting translation of a product originating from this specific allele or haplotype, but which do not bind other or alternate variants at the specific polymorphic sites of the target nucleic acid molecule. As antisense molecules can be used to inactivate mRNA so as to inhibit gene expression, and thus protein expression, the molecules can be used for disease treatment. The methodology can involve cleavage by means of ribozymes containing nucleotide sequences complementary to one or more regions in the mRNA that attenuate the ability of the mRNA to be translated . Such mRNA regions include, for example, protein-coding regions, in particular protein-coding regions corresponding to catalytic activity, substrate and/or ligand binding sites, or other functional domains of a protein .
The phenomenon of RNA interference (RNAi) has been actively studied for the last decade, since its original discovery in C. elegans (Fire et al., Nature 391 : 806-11 (1998)), and in recent years its potential use in treatment of human disease has been actively pursued (reviewed in Kim & Rossi, Nature Rev. Genet. 8: 173-204 (2007)). RNA interference (RNAi), also called gene silencing, is based on using double-stranded RNA molecules (dsRNA) to turn off specific genes. In the cell, cytoplasmic double-stranded RNA molecules (dsRNA) are processed by cellular complexes into small interfering RNA (siRNA) . The siRNA guide the targeting of a protein-RNA complex to specific sites on a target mRNA, leading to cleavage of the mRNA (Thompson, Drug Discovery Today, 7 : 912-917 (2002)) . The siRNA molecules are typically about 20, 21, 22 or 23 nucleotides in length . Thus, one aspect of the invention relates to isolated nucleic acid molecules, and the use of those molecules for RNA interference, i.e. as small interfering RNA molecules (siRNA) . In one embodiment, the isolated nucleic acid molecules are 18-26 nucleotides in length, preferably 19-25 nucleotides in length, more preferably 20-24 nucleotides in length, and more preferably 21, 22 or 23 nucleotides in length .
Another pathway for RNAi-mediated gene silencing originates in endogenously encoded primary microRNA (pri-miRNA) transcripts, which are processed in the cell to generate precursor miRNA (pre-miRNA) . These miRNA molecules are exported from the nucleus to the cytoplasm, where they undergo processing to generate mature miRNA molecules (miRNA), which direct translational inhibition by recognizing target sites in the 3' untranslated regions of mRNAs, and subsequent mRNA degradation by processing P-bodies (reviewed in Kim & Rossi, Nature Rev. Genet. 8: 173-204 (2007)) .
Clinical applications of RNAi include the incorporation of synthetic siRNA duplexes, which preferably are approximately 20-23 nucleotides in size, and preferably have 3' overlaps of 2 nucleotides. Knockdown of gene expression is established by sequence-specific design for the target mRNA. Several commercial sites for optimal design and synthesis of such molecules are known to those skilled in the art.
Other applications provide longer siRNA molecules (typically 25-30 nucleotides in length, preferably about 27 nucleotides), as well as small hairpin RNAs (shRNAs; typically about 29 nucleotides in length) . The latter are naturally expressed, as described in Amarzguioui et al.
{FEBS Lett. 579 : 5974-81 (2005)) . Chemically synthetic siRNAs and shRNAs are substrates for In vivo processing, and in some cases provide more potent gene-silencing than shorter designs (Kim et al., Nature Biotechnol. 23: 222-226 (2005); Siolas et al., Nature Biotechnol. 23: 227-231 (2005)) . In general siRNAs provide for transient silencing of gene expression, because their intracellular concentration is diluted by subsequent cell divisions. By contrast, expressed shRNAs mediate long-term, stable knockdown of target transcripts, for as long as transcription of the shRNA takes place (Marques et ai., Nature Biotechnol. 23 : 559-565 (2006); Brummelkamp et al., Science 296: 550-553 (2002)) .
Since RNAi molecules, including siRNA, miRNA and shRNA, act in a sequence-dependent manner, the variants presented herein can be used to design RNAi reagents that recognize specific nucleic acid molecules comprising specific alleles and/or haplotypes (e.g., the alleles and/or haplotypes of the present invention), while not recognizing nucleic acid molecules comprising other alleles or haplotypes. These RNAi reagents can thus recognize and destroy the target nucleic acid molecules. As with antisense reagents, RNAi reagents can be useful as therapeutic agents (i.e., for turning off disease-associated genes or disease-associated gene variants), but may also be useful for characterizing and validating gene function (e.g., by gene knock-out or gene knockdown experiments) .
Delivery of RNAi may be performed by a range of methodologies known to those skilled in the art. Methods utilizing non-viral delivery include cholesterol, stable nucleic acid-lipid particle (SNALP), heavy-chain antibody fragment (Fab), aptamers and nanoparticles. Viral delivery methods include use of lentivirus, adenovirus and adeno-associated virus. The siRNA molecules are in some embodiments chemically modified to increase their stability. This can include modifications at the 2' position of the ribose, including 2'-0-methylpurines and 2'- fluoropyrimidines, which provide resistance to Rnase activity. Other chemical modifications are possible and known to those skilled in the art.
Prognostic methods
In addition to the utilities described above, the polymorphic markers of the invention are useful in determining prognosis of human individuals. Accurate pretreatment staging is important for prostate cancer treatment. Serum PSA levels correlate with aggressiveness of disease. Thus, individuals with serum PSA levels less than lOng/mL are most likely to respond to local therapy. Further, the PSA velocity (change in levels per year) is an independent predictor of mortality following treatment.
Given the important contribution of genetic factors to PSA levels, it would be valuable to use corrected values of PSA quantity to assess prognosis. The invention therefore provides a method for determining the prognosis of an individual diagnosed with prostate cancer, the method comprising (i) detecting an uncorrected PSA quantity in a first biological sample from the human individual; (ii) obtaining sequence data about at least one polymorphic marker in the first biological sample or in a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans; and (iii) determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker; wherein the corrected PSA quantity is indicative of the prognosis of the individual. In one embodiment, a corrected PSA quantity of lOng/mL or greater is indicative of a worse prognosis. In one embodiment, the method further comprises determining corrected PSA velocity by repeating steps (i) - (iii) using a first sample and/or a second sample taken at a different time than the first set of first and/or second sample, and calculating a corrected PSA velocity based on the corrected PSA quantity determined for samples obtained at different times.
In preferred embodiments, the at least one polymorphic marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsll067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith .
Methods of assessing recurrence risk
PSA quantity is a useful tool for assessing recurrence risk in individuals who have undergone treatment for prostate cancer. Following treatment, PSA levels should decrease and remain at a low and steady level over time. A detection of an increased PSA levels in individuals who have undergone treatment is thus an indication of disease recurrence.
Applying a correction of uncorrected PSA quantity, as described herein, is useful for this purpose. This is particularly important if a particular PSA threshold is used as a guidance that an individual is experiencing, or is at risk for, disease recurrence.
Therefore, the invention in a further aspect provides a method of assessing recurrence risk of prostate cancer in a human individual who has undergone treatment for prostate cancer, the method comprising (i) detecting an uncorrected PSA quantity in a first biological sample from the human individual; (ii) obtaining sequence data about at least one polymorphic marker in the first biological sample or in a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans; and (iii) determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker; wherein the corrected PSA quantity is indicative of recurrence risk of the individual. In certain embodiments, a corrected PSA quantity above a certain threshold is indicative of recurrence in the individual . In certain embodiments, a corrected PSA quantity of 0.5 or greater is indicative of recurrence in the individual. In one embodiment, a corrected PSA quantity of 1.0 or greater is indicative of recurrence in the individual. In another embodiment, a corrected PSA quantity of 2.0 or greater is indicative of recurrence in the individual. In another embodiment, a corrected PSA quantity of 3.0 or greater is indicative of recurrence in the individual. In another embodiment, a corrected PSA quantity of 4.0 or greater is indicative of recurrence in the individual.
In certain embodiments, the method further comprises determining corrected PSA velocity by repeating steps (i) - (iii) using a first sample and/or a second sample taken at a different time than the first set of first and/or second sample, and calculating a corrected PSA velocity based on the corrected PSA quantity determined for samples obtained at said different times.
The at least one polymorphic marker is suitably selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith . Computer-Implemented aspects
As understood by those of ordinary skill in the art, the methods and information described herein may be implemented, in all or in part, as computer executable instructions on known computer readable media. For example, the methods described herein may be implemented in hardware. Alternatively, the method may be implemented in software stored in, for example, one or more memories or other computer readable medium and implemented on one or more processors. As is known, the processors may be associated with one or more controllers, calculation units and/or other units of a computer system, or implanted in firmware as desired. If implemented in software, the routines may be stored in any computer readable memory such as in RAM, ROM, flash memory, a magnetic disk, a laser disk, or other storage medium, as is also known .
Likewise, this software may be delivered to a computing device via any known delivery method including, for example, over a communication channel such as a telephone line, the Internet, a wireless connection, etc., or via a transportable medium, such as a computer readable disk, flash drive, etc.
More generally, and as understood by those of ordinary skill in the art, the various steps described above may be implemented as various blocks, operations, tools, modules and techniques which, in turn, may be implemented in hardware, firmware, software, or any combination of hardware, firmware, and/or software. When implemented in hardware, some or all of the blocks, operations, techniques, etc. may be implemented in, for example, a custom integrated circuit (IC), an application specific integrated circuit (ASIC), a field programmable logic array (FPGA), a programmable logic array (PLA), etc.
When implemented in software, the software may be stored in any known computer readable medium such as on a magnetic disk, an optical disk, or other storage medium, in a RAM or ROM or flash memory of a computer, processor, hard disk drive, optical disk drive, tape drive, etc. Likewise, the software may be delivered to a user or a computing system via any known delivery method including, for example, on a computer readable disk or other transportable computer storage mechanism.
Fig. 1 illustrates an example of a suitable computing system environment 100 on which a system for the steps of the claimed method and apparatus may be implemented. The computing system environment 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the method or apparatus of the claims. Neither should the computing environment 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 100.
The steps of the claimed method and system are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the methods or system of the claims include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The steps of the claimed method and system may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The methods and apparatus may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In both integrated and distributed computing environments, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to Fig . 1, an exemplary system for implementing the steps of the claimed method and system includes a general purpose computing device in the form of a computer 110.
Components of computer 110 may include, but are not limited to, a processing unit 120, a system memory 130, and a system bus 121 that couples various system components including the system memory to the processing unit 120. The system bus 121 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
Computer 110 typically includes a variety of computer readable media . Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media . By way of example, and not limitation, computer readable media may comprise computer storage media and
communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer 110. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media . The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
Combinations of the any of the above should also be included within the scope of computer readable media. The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation, Fig. 1 illustrates operating system 134, application programs 135, other program modules 136, and program data 137.
The computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media . By way of example only, Fig. 1 illustrates a hard disk drive 140 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolatile magnetic disk 152, and an optical disk drive 155 that reads from or writes to a removable, nonvolatile optical disk 156 such as a CD ROM or other optical media . Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 141 is typically connected to the system bus 121 through a non-removable memory interface such as interface 140, and magnetic disk drive 151 and optical disk drive 155 are typically connected to the system bus 121 by a removable memory interface, such as interface 150.
The drives and their associated computer storage media discussed above and illustrated in Fig . 1, provide storage of computer readable instructions, data structures, program modules and other data for the computer 110. In Fig. 1, for example, hard disk drive 141 is illustrated as storing operating system 144, application programs 145, other program modules 146, and program data 147. Note that these components can either be the same as or different from operating system 134, application programs 135, other program modules 136, and program data 137. Operating system 144, application programs 145, other program modules 146, and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies. A user may enter commands and information into the computer 20 through input devices such as a keyboard 162 and pointing device 161, commonly referred to as a mouse, trackball or touch pad . Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB) . A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190. In addition to the monitor, computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 190.
The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in Fig . 1. The logical connections depicted in Fig . 1 include a local area network (LAN) 171 and a wide area network (WAN) 173, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, Fig. 1 illustrates remote application programs 185 as residing on memory device 181. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
Although the forgoing text sets forth a detailed description of numerous different embodiments of the invention, it should be understood that the scope of the invention is defined by the words of the claims set forth at the end of this patent. The detailed description is to be construed as exemplary only and does not describe every possibly embodiment of the invention because describing every possible embodiment would be impractical, if not impossible. Numerous alternative embodiments could be implemented, using either current technology or technology developed after the filing date of this patent, which would still fall within the scope of the claims defining the invention .
While the risk evaluation system and method, and other elements, have been described as preferably being implemented in software, they may be implemented in hardware, firmware, etc., and may be implemented by any other processor. Thus, the elements described herein may be implemented in a standard multi-purpose CPU or on specifically designed hardware or firmware such as an application-specific integrated circuit (ASIC) or other hard-wired device as desired, including, but not limited to, the computer 110 of Fig . 1. When implemented in software, the software routine may be stored in any computer readable memory such as on a magnetic disk, a laser disk, or other storage medium, in a RAM or ROM of a computer or processor, in any database, etc. Likewise, this software may be delivered to a user or a diagnostic system via any known or desired delivery method including, for example, on a computer readable disk or other transportable computer storage mechanism or over a communication channel such as a telephone line, the internet, wireless communication, etc. (which are viewed as being the same as or interchangeable with providing such software via a transportable storage medium) .
Thus, many modifications and variations may be made in the techniques and structures described and illustrated herein without departing from the spirit and scope of the present invention . Thus, it should be understood that the methods and apparatus described herein are illustrative only and are not limiting upon the scope of the invention.
In one embodiment, the invention provides an apparatus for determining corrected PSA quantity in a human individual, comprising (a) a processor; and (b) a computer readable memory having computer executable instructions adapted to be executed on the processor, wherein said instructions comprise steps of (i) obtaining data representing uncorrected PSA quantity in a biological sample from the human individual; (ii) obtaining sequence data about at least one polymorphic marker in the genome of the human individual, wherein different alleles of the at least one polymorphic marker are predictive of different PSA quantity in humans; (iii) determining a corrected PSA quantity based on the sequence data about the at least one polymorphic marker. In one embodiment, the at least one allele of the at least one marker is predictive of an increased quantity of PSA in humans, and wherein at least one other allele of the at least one marker is predictive of a decreased quantity of PSA in humans.
Also provided is a computer-readable medium having computer executable instructions for determining corrected values of PSA quantity, the computer readable medium comprising (i) data indicative uncorrected values of PSA quantity for at least one human individual; (ii) data comprising sequence data about at least one polymorphic marker in the genome of the at least one human individual, wherein said at least polymorphic marker is predictive of PSA quantity in humans; and (iii) a routine stored on the computer readable medium and adapted to be executed by a processor to determine corrected PSA values for the at least one human individual.
Another aspect of the invention is a system that is capable of carrying out a part or all of a method of the invention, or carrying out a variation of a method of the invention as described in herein in greater detail. Exemplary systems include, as one or more components, computing systems, environments, and/or configurations that may be suitable for use with the methods and include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. In some variations, a system of the invention includes one or more machines used for analysis of biological material (e.g ., genetic material), as described herein . In some variations, this analysis of the biological material involves a chemical analysis and/or a nucleic acid amplification.
With reference to Fig . 4, an exemplary system of the invention, which may be used to implement one or more steps of methods of the invention, includes a computing device in the form of a computer 110. Components shown in dashed outline are not technically part of the computer 110, but are used to illustrate the exemplary embodiment of Fig. 4. Components of computer 110 may include, but are not limited to, a processor 120, a system memory 130, a
memory/graphics interface 121, also known as a Northbridge chip, and an I/O interface 122, also known as a Southbridge chip. The system memory 130 and a graphics processor 190 may be coupled to the memory/graphics interface 121. A monitor 191 or other graphic output device may be coupled to the graphics processor 190.
A series of system busses may couple various system components including a high speed system bus 123 between the processor 120, the memory/graphics interface 121 and the I/O interface 122, a front-side bus 124 between the memory/graphics interface 121 and the system memory 130, and an advanced graphics processing (AGP) bus 125 between the memory/graphics interface 121 and the graphics processor 190. The system bus 123 may be any of several types of bus structures including, by way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus and Enhanced ISA (EISA) bus. As system architectures evolve, other bus architectures and chip sets may be used but often generally follow this pattern. For example, companies such as Intel and AMD support the Intel Hub Architecture (IHA) and the Hypertransport™ architecture, respectively.
The computer 110 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media . By way of example, and not limitation, computer readable media may comprise computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other physical medium which can be used to store the desired information and which can accessed by computer 110.
The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. The system ROM 131 may contain permanent system data 143, such as identifying and manufacturing information . In some embodiments, a basic input/output system (BIOS) may also be stored in system ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processor 120. By way of example, and not limitation, Fig . 4 illustrates operating system 134, application programs 135, other program modules 136, and program data 137.
The I/O interface 122 may couple the system bus 123 with a number of other busses 126, 127 and 128 that couple a variety of internal and external devices to the computer 110. A serial peripheral interface (SPI) bus 126 may connect to a basic input/output system (BIOS) memory 133 containing the basic routines that help to transfer information between elements within computer 110, such as during start-up.
A super input/output chip 160 may be used to connect to a number of 'legacy' peripherals, such as floppy disk 152, keyboard/mouse 162, and printer 196, as examples. The super I/O chip 160 may be connected to the I/O interface 122 with a bus 127, such as a low pin count (LPC) bus, in some embodiments. Various embodiments of the super I/O chip 160 are widely available in the commercial marketplace.
In one embodiment, bus 128 may be a Peripheral Component Interconnect (PCI) bus, or a variation thereof, may be used to connect higher speed peripherals to the I/O interface 122. A PCI bus may also be known as a Mezzanine bus. Variations of the PCI bus include the Peripheral Component Interconnect-Express (PCI-E) and the Peripheral Component Interconnect - Extended (PCI-X) busses, the former having a serial interface and the latter being a backward compatible parallel interface. In other embodiments, bus 128 may be an advanced technology attachment (ATA) bus, in the form of a serial ATA bus (SATA) or parallel ATA (PATA) .
The computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media . By way of example only, Fig. 4 illustrates a hard disk drive 140 that reads from or writes to non-removable, nonvolatile magnetic media. The hard disk drive 140 may be a conventional hard disk drive.
Removable media, such as a universal serial bus (USB) memory 153, firewire (IEEE 1394), or CD/DVD drive 156 may be connected to the PCI bus 128 directly or through an interface 150. A storage media 154 may coupled through interface 150. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like.
The drives and their associated computer storage media discussed above and illustrated in Fig . 4, provide storage of computer readable instructions, data structures, program modules and other data for the computer 110. In Fig. 4, for example, hard disk drive 140 is illustrated as storing operating system 144, application programs 145, other program modules 146, and program data 147. Note that these components can either be the same as or different from operating system 134, application programs 135, other program modules 136, and program data 137. Operating system 144, application programs 145, other program modules 146, and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies. A user may enter commands and information into the computer 20 through input devices such as a mouse/keyboard 162 or other input device combination . Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processor 120 through one of the I/O interface busses, such as the SPI 126, the LPC 127, or the PCI 128, but other busses may be used. In some embodiments, other devices may be coupled to parallel ports, infrared interfaces, game ports, and the like (not depicted), via the super I/O chip 160.
The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180 via a network interface controller (NIC) 170. The remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110. The logical connection between the NIC 170 and the remote computer 180 depicted in Fig . 4 may include a local area network (LAN), a wide area network (WAN), or both, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets, and the Internet. The remote computer 180 may also represent a web server supporting interactive sessions with the computer 110, or in the specific case of location-based applications may be a location server or an application server.
In some embodiments, the network interface may use a modem (not depicted) when a broadband connection is not available or is not used. It will be appreciated that the network connection shown is exemplary and other means of establishing a communications link between the computers may be used.
In some variations, the invention is a system for determining corrected PSA levels in a human subject. For example, in one variation, the system includes tools for performing at least one step, preferably two or more steps, and in some aspects all steps of a method of the invention, where the tools are operably linked to each other. Operable linkage describes a linkage through which components can function with each other to perform their purpose.
In some variations, a system of the invention is a system for determining corrected PSA levels in a human subject, and comprises:
(a) at least one processor;
(b) at least one computer-readable medium;
(c) a susceptibility database operatively coupled to a computer-readable medium of the system and containing population information correlating the presence or absence of one or more alleles of at least one polymorphic marker with PSA levels in a population of humans;
(d) a measurement tool that receives an input about the human subject and generates information from the input about (i) uncorrected PSA levels in the human subject and (ii) the presence or absence of at least allele of at least one polymorphic marker in the human subject that is correlated with PSA levels in humans; and
(e) an analysis tool or routine that:
(i) is operatively coupled to the susceptibility database and the information generated by the measurement tool,
(ii) is stored on a computer-readable medium of the system,
(iii) is adapted to be executed on a processor of the system, to compare the information about the human subject with the population information in the susceptibility database and generate a conclusion with respect to corrected PSA levels for the human subject.
In certain embodiments, the at least one polymorphic marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsll067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith . Exemplary processors (processing units) include all variety of microprocessors and other processing units used in computing devices. Exemplary computer-readable media are described above. When two or more components of the system involve a processor or a computer- readable medium, the system generally can be created where a single processor and/or computer readable medium is dedicated to a single component of the system; or where two or more functions share a single processor and/or share a single computer readable medium, such that the system contains as few as one processor and/or one computer readable medium. In some variations, it is advantageous to use multiple processors or media, for example, where it is convenient to have components of the system at different locations. For instance, some components of a system may be located at a testing laboratory dedicated to laboratory or data analysis, whereas other components, including components (optional) for supplying input information or obtaining an output communication, may be located at a medical treatment or counseling facility (e.g., doctor's office, health clinic, HMO, pharmacist, geneticist, hospital) and/or at the home or business of the human subject (patient) for whom the testing service is performed.
Referring to Figure 5, an exemplary system includes a susceptibility database 208 that is operatively coupled to a computer-readable medium of the system and that contains population information correlating the presence or absence of one or more alleles associated with PSA levels in a population of humans, for example allels of the polymorphic markers rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542.
In a simple variation, the susceptibility database contains 208 data relating to the correlation between a particular marker allele and PSA levels in humans. The correlation may suitably be contained in a form of percentage or fractional increase for a particular marker allele. For SNPs, the alternate allele, by necessity, will then be correlated with decreased PSA levels by the same percentage or fraction . Such data provides an indication as to the genetic contribution of observed PSA levels for the subject having the allele in question . In another variation, the susceptibility database includes similar data with respect to two or more polymorphic markers, thus providing information about the contribution of two or more markers to PSA levels. In still another variation, the susceptibility database includes additional quantitative personal, medical, or genetic information about the individuals in the database diagnosed with prostate cancer or those who are free of prostate cancer. Such information includes, but is not limited to, information about parameters such as age, sex, ethnicity, race, medical history, weight, diabetes status, blood pressure, family history of prostate cancer, smoking history, and alcohol use in humans and impact of the at least one parameter on susceptibility to prostate cancer and/or PSA levels. The information also can include information about other genetic risk factors for prostate cancer. These more robust susceptibility databases can be used by an analysis routine 210 to calculate combined corrected PSA levels and or risk of prostate cancer, utilizing information about polymorphic markers as described herein and information about other genetic risk factors.
In addition to the susceptibility database 208, the system further includes a measurement tool 206 programmed to receive an input 204 from or about the human subject and generate an output that contains information about the presence or absence of the at least one allele of at least one polymorphic marker. (The input 204 is not part of the system per se but is illustrated in the schematic Figure 5.) Thus, the input 204 will contain a specimen or contain data from which the presence or absence of the at least one allele can be directly read, or analytically determined. In a simple variation, the input contains annotated information about genotypes or allele counts for at least one polymorphic marker in the genome of the human subject, in which case no further processing by the measurement tool 206 is required, except possibly
transformation of the relevant information about the presence/absence of the allele into a format compatible for use by the analysis routine 210 of the system .
In another variation, the input 204 from the human subject contains data that is unannotated or insufficiently annotated with respect to particular polymorphic markers, requiring analysis by the measurement tool 206. For example, the input can be genetic sequence of a chromosomal region or chromosome on which the particular polymorphic markers of interest reside, or whole genome sequence information, or unannotated information from a gene chip analysis of a variable loci in the human subject's genome. In such variations of the invention, the
measurement tool 206 comprises a tool, preferably stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to receive a data input about a subject and determine information about the presence or absence of the at least one allele of at least one polymorphic marker in a human subject from the data. For example, the measurement tool 206 contains instructions, preferably executable on a processor of the system, for analyzing the unannotated input data and determining the presence or absence of at least one allele of interest in the human subject. Where the input data is genomic sequence information, and the measurement tool optionally comprises a sequence analysis tool stored on a computer readable medium of the system and executable by a processor of the system with instructions for determining the presence or absence of the at least one allele from the genomic sequence information.
In yet another variation, the input 204 from the human subject comprises a biological sample, such as a fluid (e.g., blood) or tissue sample, that contains genetic material that can be analyzed to determine the presence or absence of the allele of interest. In this variation, an exemplary measurement tool 206 includes laboratory equipment for processing and analyzing the sample to determine the presence or absence (or identity) of the allele(s) in the human subject. For instance, in one variation, the measurement tool includes: an oligonucleotide microarray (e.g., "gene chip") containing a plurality of oligonucleotide probes attached to a solid support; a detector for measuring interaction between nucleic acid obtained from or amplified from the biological sample and one or more oligonucleotides on the oligonucleotide microarray to generate detection data; and an analysis tool stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to determine the presence or absence of the at least one allele of interest based on the detection data.
In another variation, the input 204_from the human subject comprises a biological sample that is suitable for determining PSA levels, such as a fluid (e.g. blood) or tissue sample that can be analyzed to determine uncorrected PSA levels. In this variation the exemplary measurement tool 206 includes laboratory equipment and reagents for processing and analyzing the sample to determine uncorrrected PSA levels in the human subject. For example, the reagents may comprise an antibody assay for determining PSA levels.
To provide another example, in some variations the measurement tool 206 includes: a nucleotide sequencer (e.g., an automated DNA sequencer) that is capable of determining nucleotide sequence information from nucleic acid obtained from or amplified from the biological sample; and an analysis tool stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to determine the presence or absence of the at least one allele associated with PSA levels, based on the nucleotide sequence information.
In some variations, the measurement tool 206 further includes additional equipment and/or chemical reagents for processing the biological sample to purify and/or amplify nucleic acid of the human subject for further analysis using a sequencer, gene chip, or other analytical equipment. In further variations, he measurement tool 206 further includes additional equipment and/or chemical reagents for processing the biological sample to purify protein of the human subject for determining PSA levels using appropriate analytical equipment.
The exemplary system further includes an analysis tool or routine 210 that: is operatively coupled to the susceptibility database 208 and operatively coupled to the measurement tool 206, is stored on a computer-readable medium of the system, is adapted to be executed on a processor of the system to compare the information about the human subject with the population information in the susceptibility database 208 and generate a conclusion with respect to corrected PSA levels for the human subject. In simple terms, the analysis tool 210 looks at the alleles identified by the measurement tool 206 for the human subject, and compares this information to the susceptibility database 208, to determine corrected PSA levels for the subject. The susceptibility can be based on the single parameter (the identity of one or more marker alleles), or can involve a calculation based on multiple genetic markers and/or other genetic and non-genetic data, as described above, that is collected and included as part of the input 204 from the human subject, and that also is stored in the susceptibility database 208 with respect to a population of other humans. Generally speaking, each parameter of interest is weighted to provide a conclusion with respect to susceptibility to PSA levels.
In some variations of the invention, the system as just described further includes a
communication tool 212. For example, the communication tool is operatively connected to the analysis routine 210 and comprises a routine stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to: generate a communication containing the conclusion; and to transmit the communication to the human subject 200 or the medical practitioner 202, and/or enable the subject or medical practitioner to access the communication . (The subject and medical practitioner are depicted in the schematic Fig. 2, but are not part of the system per se, though they may be considered users of the system. The communication tool 212 provides an interface for communicating to the subject, or to a medical practitioner for the subject (e.g., doctor, nurse, genetic counselor), the conclusion generated by the analysis tool 210 with respect to corrected PSA levels for the subject. Usually, if the communication is obtained by or delivered to the medical practitioner 202, the medical practitioner will share the communication with the human subject 200 and/or counsel the human subject about the medical significance of the communication. In some variations, the communication is provided in a tangible form, such as a printed report or report stored on a computer readable medium such as a flash drive or optical disk. In some variations, the communication is provided electronically with an output that is visible on a video display or audio output (e.g., speaker) . In some variations, the communication is transmitted to the subject or the medical practitioner, e.g., electronically or through the mail. In some variations, the system is designed to permit the subject or medical practitioner to access the communication, e.g ., by telephone or computer. For instance, the system may include software residing on a memory and executed by a processor of a computer used by the human subject or the medical practitioner, with which the subject or practitioner can access the communication, preferably securely, over the internet or other network connection . In some variations of the system, this computer will be located remotely from other components of the system, e.g., at a location of the human subject's or medical practitioner's choosing.
In some variations of the invention, the system as described (including embodiments with or without the communication tool) further includes components that add a treatment or prophylaxis utility to the system. For instance, value is added to a determination of corrected PSA levels and/or susceptibility to prostate cancer when a medical practitioner can prescribe or administer a standard of care that can reduce susceptibility to the cancer; and/or delay onset of the cancer; and/or increase the likelihood of detecting the cancer at an early stage, to facilitate early treatment when the cancer has not spread and is most curable. Exemplary lifestyle change protocols include loss of weight, increase in exercise, cessation of unhealthy behaviors such as smoking, and change of diet. Exemplary medicinal and surgical intervention protocols include administration of pharmaceutical agents for prophylaxis; and surgery, including in extreme cases surgery to remove a tissue or organ before it has become cancerous. Exemplary diagnostic protocols include non-invasive and invasive imaging; monitoring metabolic biomarkers; and biopsy screening .
For example, in some variations, the system further includes a medical protocol database 214 operatively connected to a computer-readable medium of the system and containing information correlating the presence or absence of the at least one marker allele of interest and medical protocols for human subjects at risk for prostate cancer. Such medical protocols include any variety of medicines, lifestyle changes, diagnostic tests, increased frequencies of diagnostic tests, and the like that are designed to achieve one of the aforementioned goals. The information correlating marker alleles with protocols could include, for example, information about PSA levels and the success with which the cancer is avoided or delayed, or success with which the cancer is detected early and treated, if a subject has particular corrected PSA levels and follows a protocol. The system of this embodiment further includes a medical protocol tool or routine 216, operatively connected to the medical protocol database 214 and to the analysis tool or routine 210. The medical protocol tool or routine 216 preferably is stored on a computer-readable medium of the system, and adapted to be executed on a processor of the system, to: (i) compare (or correlate) the conclusion that is obtained from the analysis routine 210 (with respect to corrected PSA levels for the subject) and the medical protocol database 214, and (ii) generate a protocol report with respect to the probability that one or more medical protocols in the medical protocol database will achieve one or more of the goals of reducing susceptibility to prostate cancer; delaying onset of prostate cancer; and increasing the likelihood of detecting the cancer at an early stage to facilitate early treatment. The probability can be based on empirical evidence collected from a population of humans and expressed either in absolute terms (e.g ., compared to making no intervention), or expressed in relative terms, to highlight the comparative or additive benefits of two or more protocols.
Some variations of the system just described include the communication tool 212. In some examples, the communication tool generates a communication that includes the protocol report in addition to, or instead of, the conclusion with respect to susceptibility.
Information about marker allele status not only can provide useful information about identifying or quantifying PSA levels and/or determine susceptibility to prostate cancer; it can also provide useful information about possible causative factors for a human subject identified with a cancer, and useful information about therapies for the cancer patient. In some variations, systems of the invention are useful for these purposes.
For instance, in some variations the invention is a system for assessing or selecting a treatment protocol for a subject diagnosed with a cancer. An exemplary system, schematically depicted in Figure 6, comprises:
(a) at least one processor;
(b) at least one computer-readable medium;
(c) a medical treatment database 308 operatively connected to a computer-readable medium of the system and containing information correlating values of corrected PSA levels and efficacy of treatment regimens for prostate cancer;
(d) a measurement tool 306 to receive an input (304, depicted in Fig . 3 but not part of the system per se) about a human subject and generate information from the input 304 about genetically corrected PSA levels in humans; and
(e) a medical protocol routine or tool 310 operatively coupled to the medical treatment database 308 and the measurement tool 306, stored on a computer-readable medium of the system, and adapted to be executed on a processor of the system, to compare the information with respect to corrected PSA levels for the human subject, and generate a conclusion with respect to at least one of: (i) the probability that one or more medical treatments will be efficacious for treatment of the prostate cancer for the patient; and
(ii) which of two or more medical treatments for the cancer will be more efficacious for the patient.
Preferably, such a system further includes a communication tool 312 operatively connected to the medical protocol tool or routine 310 for communicating the conclusion to the subject 300, or to a medical practitioner for the subject 302 (both depicted in the schematic of Fig . 3, but not part of the system per se) . An exemplary communication tool comprises a routine stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to generate a communication containing the conclusion; and transmit the
communication to the subject or the medical practitioner, or enable the subject or medical practitioner to access the communication.
Preferably, the markers useful in the computer-implemented functions described herein are selected from the group consisting of rs7193343, rs7618072, rsl0077199, rsl0490066, rsl0516002, rsl0519674, rsl394796, rs2935888, rs4560443, rs6010770 and rs7733337, and markers in linkage disequilibrium therewith .
The present invention will now be exemplified by the following non-limiting examples.
EXAMPLE 1
A genome-wide association study (GWAS) to search for sequence variants affecting population variation in PSA levels was performed, and the effects of PSA variants on subsequent prostate cancer diagnoses was investigated.
RESULTS
Sequence variants associated with PSA levels
We performed a GWAS on PSA levels, adjusted for age and laboratory center, in Icelandic men not diagnosed with prostate cancer according to data from the nation-wide Icelandic Cancer Registry (ICR) until end of 2008. These men had also not undergone transurethral resection of the prostate (TURP), based on records from the Landspitali-National Hospital where 90% of all TURP procedures in the country are performed. In total, we had access to PSA measurements from 4,620 individuals genotyped on Illumina chips, containing either the 317K or the 370K HumanHap SNP panel. The analysis was augmented with data from 9,218 Icelanders with PSA measurements whose genetic information could be partially inferred from genotyped relatives (in-silico genotyping), using a previously described method (21-23). With respect to statistical power, this augmentation is equivalent to an additional 2,918 individuals on average (for details about the populations see Table 2) . After quality control, 304,070 SNPs were available for the GWAS. Since the mean of the χ2 values was below 1 (χ2 = 0.91) we did not apply any genomic control correction .
We selected all association signals with P < lxlO"5 for further analysis. This represented 12 SNPs at 6 different loci, of which four loci reached genome-wide significance after accounting for the number of tests performed(P < 1.64xl0"7 = 0.05/304,070) (Table 3a) . The genome-wide significant association signals were in or near genes at the following loci : KLK3 on 19ql3.33; HNF1B on 17ql2; FGFR2 on 10q26.12; and TBX3 on 12q24.21. The two suggestive association signals were at 10ql l .23 near the MSMB gene and at 5pl5.33 near the TERT gene (Table 3a) .
To further investigate each of the six loci, we imputed genotypes based on data for 2.5M SNPs from the HapMap CEU individuals for all SNPs present within a window of 500Kb centered on the most significant SNP. Based on this analysis, we identified three additional SNPs; rs2736098-A at 5pl5.33, rs4430796-A at 17ql2 and rsl7632542-T at 19ql3.33, that had stronger association effect on PSA levels than any SNP present on the 317K chip (Table 3b) .
In an attempt to follow-up the observed associations with PSA levels in the Icelandic discovery group, we genotyped the most significant SNP at each of the six loci in an additional 1,919
Icelandic men with PSA level measurements and not diagnosed with prostate cancer, and in 454 men from the UK with PSA levels below 3 ng/ml and not diagnosed with prostate cancer. All UK participants in the present study came from the ProtecT trial(24) . After combining significance levels from Iceland and the UK, at least one SNP at each locus reached genome-wide significance (Table 4) .
For the strongest variant at each locus, the allele frequency was comparable in the Icelandic and UK populations with frequencies ranging from 24% to 93% (Table 4) and their observed effect on the PSA level ranges from 7% to 39% per allele in the Icelandic samples and from 5% to 102% per allele in the UK samples (see Table 4 and Table 5 for genotype effect of the variants.) . The strongest overall association effect observed in the present study is for two SNPs, rs2735839 and rsl7632542, located near or in the PSA coding gene KLK3 (Table 4), of which rs2735839-G (and highly correlated markers) has previously been reported to associate with PSA levels (18- 20, 26) . The two SNPs are moderately correlated with each other (D'= l and r2=0.48 in UK; r2=0.56 in Iceland; r2=0.56 in HapMap CEU phase 3) . When we adjusted the results for each SNP, using the other SNP as a covariate and only including individuals genotyped for both markers, results for rsl7632542 remain significant after adjusting for rs2735839 (P
= 5.51xl0"8) whereas rs2735839 was marginally significant after adjusting for rsl7632542
=0.043) . This suggests that the signal from rs2735839 is subsumed by rsl7632542. The SNP rsl7632542 is a missense mutation (an amino acid change denoted as I179T) in KLK3. This amino acid alteration is defined as either neutral or deleterious by different online protein structure algorithms (see Table 6) . A deleterious mutation could conceivably destabilize the protein, affecting circulating PSA levels. Alternatively, the mutation might affect the antigenicity of the protein and thereby influence its detectability in PSA tests. For the lOq ll (MSMB) and 17ql2 (HNF1B) PSA loci, the alleles identified here i .e. rsl0993994-T and rs4430796-A are the same as those previously reported to associate with PSA levels (25) as well as with prostate cancer risk (25, 27) .
At the novel PSA locus on 10q26, two variants, rsl0788160-A and rsl2413088-T, were genome- wide significant and had similar effects on PSA levels. The two variants are located within an LD- region not known to contain any genes, 324 and 305 Kb centromeric to the start of the FGFR2 gene, respectively. The two variants are highly correlated (r2 = 0.85 in Iceland and r2 = 0.83 in the UK) and neither remains significant after adjusting for the other. Since the effects of the two variants cannot be distinguished from each other, we elected to focus on rsl0788160-A in subsequent investigations. Sequence variants at the FGFR2 locus (rsl219648 and its surrogates) have been reported to predispose to breast cancer (28-30). The PSA variant, rsl0788160, is in very low linkage disequilibrium with the variant conferring risk of breast cancer (D'=0.15, r2=0.01 between rsl219648 and rsl0788160 in Iceland) . No association was detected between rsl0788160 and breast cancer in a case control study in Iceland (OR=0.97, P=0.36), or between rsl219648 and PSA levels in the GWAS of PSA (P=0.46) . Hence, the variants at the FGFR2 locus conferring risk of breast cancer and variation in PSA levels seem to be distinct.
The most significant variant on 12q24, the second novel PSA locus, is rsl l067228-A. This SNP is located in an LD-block that contains the gene TBX3 in which mutations have been found to cause the ulnar-mammary syndrome (OMIM #181450) but not previously shown to affect PSA levels.
At the third novel PSA locus, 5pl5 near the TERT gene, two sequence variants, rs401681-C and rs2736098-A, were demonstrated to have a comparable effect on PSA levels. They are moderately correlated (D'=0.93 and r2 =0.39 between rs401681 and rs2736098 according to HapMap CEU Phase 2), and because the effects of the variants cannot be distinguished from each other, we elected to focus on rs2736098-A in subsequent analyses.
We estimated the fraction of the total variance in the level of PSA explained by combining the effect from the best marker at each of the six loci (rs2736098, rsl0993994, rsl0788160, rsll067228, rs4430796 and rsl7632542) . The fraction accounted for is estimated to be 4.2% in Iceland and 11.8% in the UK. In both populations, the missense mutation in the KLK3 gene, rsl7632542, accounts for half of the fraction of variance explained. The PSA variants and predisposition to prostate cancer
Variants at four of the six loci discussed above (KLK3, TERT, MSMB and HNF1B) have previously been reported to associate with risk of prostate cancer, although at different degrees of significance (18, 22, 25-27, 31) and some even with conflicting evidence (19) . Due to the potential confounding effects of PSA levels and prostate cancer, we examined if the PSA SNPs identified in this study also associate with prostate cancer. Based on a combined analysis of over 5,325 prostate cancer cases and 41,417 controls from Iceland, the Netherlands, Spain, Romania and the US, we replicated the four loci previously reported to predispose to prostate cancer, each with a similar effect as described before (ORs ranging from 1.10 to 1.21; see Table 7) .
Interestingly, in our data the missense variant in KLK3, rsl7632542, shows a stronger association with prostate cancer than the strongest previously reported variant at this locus, rs2735839 (OR= 1.39 and 1.19 for rsl7632542-T and rs2735839-G, respectively; see Table 7) . In contrast, we found that neither of the variants at two of the three new PSA loci (FGFR2 and TBX3) associate significantly with prostate cancer (P = 0.27 and 0.54; OR = 0.97 and 1.01, for rsl0788160-A and rsl l067228, respectively) .
We next examined if any of the six loci associated with PSA levels have an effect on age at diagnosis or aggressiveness of prostate cancer among patients in the 6 study groups, coming from Iceland, the Netherlands, Spain, Romania, the US and the UK. Only the missense mutation in KLK3, rsl7632542, is significantly associated with age at diagnosis; for each allele of rsl7632542-T, which associates with higher PSA levels, the age at diagnosis was estimated to decrease by ~9 months (0.71 year decrease, P = 0.016; see Table 8) . When performing a case- only analysis, we observe that for the missense mutation in KLK3, rsl7632542-T, the allele conferring risk of prostate cancer is significantly less frequent (OR=0.78, P = 0.0099) among cases with more aggressive prostate cancer (Gleason score > 6, and/or T3 or higher, and/ or node positive, and/or with metastatic disease) compared to cases with less aggressive prostate cancer (Gleason score < 7, and T2 or lower) . This is in agreement with findings previously reported for the correlated variant at this locus, rs2735839(32, 33) . For none of the five variants was a significant effect on the aggressiveness of the disease detected .
As discussed above, there has been some controversy in the literature about whether the predisposition to prostate cancer observed for the previously reported KLK3 variant (rs2735839) is mainly due to its strong effect on PSA levels and therefore, driven by the increasing frequency of PSA testing in the last decades (19, 20) . In order to test for this, we stratified our Icelandic study group into cases diagnosed before 1992, a time when the majority of patients were diagnosed without undergoing PSA testing, and cases diagnosed from 1992 to 2008, a period in which PSA testing has become increasingly more frequent. We use in- silico genotyping based on familial imputation to augment the effective sample size of the group of cases, while we used 34,124 Icelanders not known to have prostate cancer as controls. Our results for rs2735839-G show that the association effect observed for the total case study group (OR = 1.15 (95% CI 1.04-1.27), P = 0.007) is confined to the group of cases diagnosed 1992 or later (OR = 1.17 (95% CI. 1.06-1.29), P = 0.002) whereas cases diagnosed before 1992 have no increased risk (OR = 0.97 (95% CI. 0.83-1.13), P = 0.7) . These results support the notion that the prostate cancer risk reported for the KLK3 locus is driven by the increasing frequency of PSA testing and subsequent biopsies over the last few decades. In contrast, the results for the other three PSA loci that associate with increased risk of prostate cancer (TERT, HNF1B and MSNB) are not substantially different for the two case subgroups, diagnosed before or after 1992. As expected no effect on prostate cancer risk was observed in either group of cases for the FGFR2 and TBX3 SNPs.
Effect of prostate cancer risk variants on PSA levels
Due to the effect of prostate cancer on the level of PSA and the increased probability of being diagnosed with prostate cancer, given an increase in PSA levels, we assessed the effect on PSA levels of the 47 sequence variants conferring risk of prostate cancer reported to date (see Table 9) (selected SNPs based on the NIH Catalog of Published Genome-Wide Association Studies; http://www.genome.gOv/26525384# l) . Some loci have more than one reported SNP. According to our results, there is a clear tendency for the allele associated with prostate cancer risk also to be associated with high levels of PSA (see Table 9) . This is comparable to results previously reported by Wiklund et al.(20). For the vast majority of the loci (N=41), their effect on PSA level is weak (well below 0.1 standard unit) and likely reflects undiagnosed prostate cancer cases in the PSA study group (also suggested by Wiklund et al 2008(20)) . Exceptions are the variants at the KLK3 (rs2735839 and rsl7632542), HNFIB (rs4430769), MSMB (rsl0993994) and the TERT loci (rs2736098), the loci of genome-wide significance in our PSA GWA study. Variants at two other loci l lq l3 (rsl l228565) and 8q24 (rsl6901979) also have greater effects on PSA levels but the effects did not reach genome-wide significance levels. These six loci can roughly be divided into two groups: those with a moderate effect on the PSA levels compared to their effect on prostate cancer risk (8q24, l lql3, lOql l and 17ql2) and those comprised of variants that have a relatively strong PSA effect compared to their effect on prostate cancer risk (i.e. variants at: KLK3 on 19q l3.33, and TERT on 5pl5) .
Sequence variants and benign prostatic hyperplasia
Benign prostatic hyperplasia (BPH) can affect PSA levels. In order to determine if any of the PSA variants discussed above are associated with BPH, we used a set of 33,779 Icelandic controls and 2,312 Icelandic men with BPH; defined as individuals either diagnosed after undergoing TURP or men over the age of 50 repeatedly using drugs in the G04C group of the ATC classification (e.g. Tamsulosin, Finasteride and Dutasteride) between the years 2003 and 2009 (see Methods) . Except for rs2736098-T on 5pl5 that showed a nominally significant association (P = 0.048, OR= 1.08), no association was observed between BPH and any of the remaining five PSA variants, given the number of tests performed . Hence, BPH is unlikely to account for a significant fraction of the observed association with PSA levels for the variants discussed here.
PSA sequence variants and prostate biopsies
When screening for prostate cancer, a PSA level above a certain cutoff value is considered an indication for performing a needle biopsy. We wanted to assess if the variants that associate with increased PSA levels also make men more prone to undergo a biopsy of the prostate. In our study group of 2,300 Icelandic men who underwent a prostate biopsy between 1998 and 2008, we observed a higher frequency of the allele increasing PSA-levels in those undergoing biopsies than in population controls for all six variants (1.04≤OR< 1.46; all SNPs have P < 0.05 except rsll067228 on 12q24 which has P = 0.25, see Table 10) . Among the 2,300 individuals who had undergone a biopsy, cancer had been diagnosed in close to 50% (a positive biopsy) . When restricting the analysis to individuals with biopsy but no detectable prostate cancer (negative biopsy) and comparing them to population controls, similar or even stronger results were observed (1.03 < OR < 1.82; all SNPs have P < 0.05 except rsl0993994 near MSMB which has P = 0.48, see Table 11) . From the UK study group, we had access to a group of approximately 1,400 men who had undergone a biopsy. Of those, about one third was diagnosed with prostate cancer. Using the Icelandic and the UK study groups of men who had been biopsied, we compared the frequency of the PSA variants in positive and negative biopsies. Of the six loci we found that for the three PSA variants not primarily associated with prostate cancer risk (KLK3, FGFR2 and TBX3), the PSA increasing allele was significantly less frequent among men with a positive biopsy than in men with a negative biopsy (rsl0788160-A near FGFR2 has OR
0.79 and P = 5.4xl0~6, rsll067228-A near TBX3has OR = 0.87 and P = 0.0034, rsl7632542-T in KLK3 has OR = 0.77 and P = 0.013; see Table 12) . The results for these three variants demonstrate that the alleles associated with increased PSA level increase the probability that a normal prostate is biopsied .
DISCUSSION
In this study, we identified 6 loci that associate with PSA levels with genome-wide significance. Variants at three of these loci had previously been shown to associate with PSA levels whereas three of the loci, at 10q26, 5pl5 and 12q24, are novel. Unlike the variants previously reported to associate with PSA levels, two of the novel loci, i .e. 12q24 and 10q26, do not associate with prostate cancer risk and the third locus, at 5pl5, has only a moderate effect on prostate cancer. Furthermore, we have shown that two of these variants (rsl0788160-A on 10q26 and rsll067228-A on 12q24), together with the KLK3 variant, are associated with a greater probability of having a normal prostate biopsied. Hence, these new markers primarily predict the outcome of the PSA-based prostate cancer screening process, i.e. the decision of performing a biopsy or not, and the outcome of the biopsy, rather than predisposition to prostate cancer. In our study we showed that a missense mutation, rsl7632542-T, in the KLK3 gene on 19q33.33 is associated with higher PSA levels. This variant has a stronger effect on PSA than the variant rs2735839, previously reported at this locus. The KLK3 variant was also found to predispose to prostate cancer but the association effect was confined to the group of cases primarily diagnosed after the introduction of the PSA test. Furthermore, the association with prostate cancer at the KLK3 locus was shown to be predominantly with the less aggressive form of the disease. We have also shown that, given biopsy, the variant rsl7632542-T is associated with greater probability of not being diagnosed with cancer. Together, these results suggest that the reported association with prostate cancer at the KLK3 locus is mainly driven by its effect on PSA levels and the increasing frequency of PSA testing in men .
REFERENCES
1. Jemal, A., et al. M. J. Cancer statistics, 2009. CA Cancer J Clin, 59: 225-49, 2009.
2. Barry, M. J. Screening for prostate cancer--the controversy that refuses to die. N Engl J Med, 360: 1351-4, 2009.
3. Nam, R. K., et al. Utility of incorporating genetic variants for the early detection of prostate cancer. Clin Cancer Res, 15: 1787-93, 2009. 4. Thompson, I. M. , et al . Assessing prostate cancer risk: results from the Prostate Cancer Prevention Trial . J Natl Cancer Inst, 98: 529-34, 2006.
5. Bradford, T. J. , et al. Molecular markers of prostate cancer. Urol Oncol, 24: 538-51, 2006.
6. Vickers, A. J. , et al . Prostate-Specific Antigen Velocity for Early Detection of Prostate Cancer:
Result from a Large, Representative, Population-based Cohort. Eur Urol, 2009.
7. Schroder, F. H . , et al. Screening and prostate-cancer mortality in a randomized European study. N Engl J Med, 360: 1320-8, 2009.
8. Andriole, G. L. , et al . Mortality results from a randomized prostate-cancer screening trial . N Engl J Med, 360: 1310-9, 2009.
9. van Leeuwen, P. J. , et al. Prostate cancer mortality in screen and clinically detected prostate cancer: estimating the screening benefit. Eur J Cancer, 46: 377-83.
10. Hugosson, J. , et al . Mortality results from the Goteborg randomised population-based
prostate-cancer screening trial. Lancet Oncol .
11. Neal, D. E. PSA testing for prostate cancer improves survival-but can we do better? Lancet Oncol, 2010.
12. Thompson, I. M. , et al . Operating characteristics of prostate-specific antigen in men with an initial PSA level of 3.0 ng/ml or lower. Jama, 294: 66-70, 2005.
13. Oesterling, J. E. , et al. Serum prostate-specific antigen in a community-based population of healthy men . Establishment of age-specific reference ranges. Jama, 270: 860-4, 1993.
14. DeAntoni, E. P. , et al . Age- and race-specific reference ranges for prostate-specific antigen from a large community-based study. Urology, 48: 234-9, 1996.
15. Emilsson, V. , et al . Genetics of gene expression and its effect on disease. Nature, 452: 423-8, 2008.
16. Bansal, A., et al . Heritability of prostate-specific antigen and relationship with zonal prostate volumes in aging twins. J Clin Endocrinol Metab, 85: 1272-6, 2000.
17. Pilia, G. , et al . Heritability of cardiovascular and personality traits in 6, 148 Sardinians. PLoS Genet, 2: el32, 2006.
18. Eeles, R. A. , et al. Multiple newly identified loci associated with prostate cancer susceptibility.
Nat Genet, 40: 316-21, 2008.
19. Ahn, J. , et al . Variation in KLK genes, prostate-specific antigen and risk of prostate cancer. Nat Genet, 40: 1032-4; author reply 1035-6, 2008.
20. Wiklund, F., et al . Association of reported prostate cancer risk alleles with PSA levels among men without a diagnosis of prostate cancer. Prostate, 69: 419-27, 2009.
21. Gudbjartsson, D. F., et al . Many sequence variants affecting diversity of adult human height.
Nat Genet, 40: 609- 15, 2008.
22. Rafnar, T. , et al. Sequence variants at the TERT-CLPTM 1 L locus associate with many cancer types. Nat Genet, 41 : 221-7, 2009.
23. Gudmundsson, J. , et al . Common variants on 9q22.33 and 14q l3.3 predispose to thyroid cancer in European populations. Nat Genet, 41 : 460-4, 2009.
24. Moore, A. L. , et al . Population-based prostate-specific antigen testing in the UK leads to a stage migration of prostate cancer. BJU Int, 104: 1592-8, 2009.
25. Thomas, G. , et al . Multiple loci identified in a genome-wide association study of prostate
cancer. Nat Genet, 40: 310-5, 2008.
26. Pal, P. , et al. Tagging SNPs in the kallikrein genes 3 and 2 on 19q l3 and their associations with prostate cancer in men of European origin . Hum Genet, 122: 251-9, 2007.
27. Gudmundsson, J. , et al . Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nat Genet, 39: 977-83, 2007.
28. Hunter, D. J. , et al . A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet, 39: 870-4, 2007.
29. Easton, D. F. , et al. Genome-wide association study identifies novel breast cancer
susceptibility loci. Nature, 447: 1087-93, 2007.
30. Stacey, S. N . , et al. Common variants on chromosome 5pl2 confer susceptibility to estrogen receptor-positive breast cancer. Nat Genet, 40: 703-6, 2008.
31. Kote-Jarai, Z., et al . Multiple novel prostate cancer predisposition loci confirmed by an
international study: the PRACTICAL Consortium . Cancer Epidemiol Biomarkers Prev, 17: 2052-
61, 2008.
32. Xu, J. , et al. Association of prostate cancer risk variants with clinicopathologic characteristics of the disease. Clin Cancer Res, 14: 5819-24, 2008.
33. Kader, A. K. , et al . Individual and cumulative effect of prostate cancer risk-associated variants on clinicopathologic variables in 5,895 prostate cancer patients. Prostate, 69: 1195-205, 2009.
34. Gulcher, J. R. , et al . Protection of privacy by third-party encryption in genetic research in Iceland. Eur J Hum Genet, 8: 739-42, 2000.
35. Gretarsdottir, S. , et al . The gene encoding phosphodiesterase 4D confers risk of ischemic stroke. Nat Genet, 35: 131-8, 2003. Table 2. Characteristics of men with PSA measurements in Iceland and UK used in the analysis
Mean age Mean number
Study Individuals Median PSA-value (ng/ml)
Sub-classification (years) at of PSA- Recruitment period group (n) (1st_quartile,3rd_quartile)
PSA (s.d.) measurements
Iceland Chip-genotyped individuals 4,620 66 (12) 2.8 1.69 (0.87, 3.6) 1994-2009
Used for in-silico genotyping 9,218 60 (13) 2.1 1.50 (0.80, 3.2) 1994-2009
Single track assay genotyping 1 ,919 63 (12) 2.8 2.90 (0.73, 6.3) 1994-2009
Total 15,757
All with single track assay
UK
genotyping:
PSA below 3 ng/ml 454 63 (5) 1 1.50 (0.70, 2.20) 1999-2007
PSA from 3-10 ng/ml and
biopsy negative 960 62 (5) 1 4.10 (3.50, 5.07) 1999-2007
PSA >3 ng/ml and biopsy
positive 523 63 (5) 1 6.00 (3.90, 14.0) 1999-2007
Total 1,937
Shown are the relevant characteristics for the Icelandic and United Kingdom (UK) study groups; number (n) of individuals in each sup-group, the mean age (years) at the first PSA level measurement and the standard deviation (s.d.), the mean number of PSA measurements for each sub-study group, the median PSA value (ng/ml) and the recruitment period.
Table 3. Association results from the GWAS on PSA levels in Iceland
a. Results for SNPs present on the lllumina 317K SNP chip
Closest Position Individuals Allele Association
SNP Allele Locus gene (bp) (n) Frequency effect (%) P-value rs401681 C 5p15.33 TERT 1 ,375,087 7,508 0.55 6.9 5.7E-06 rs 10993994 T 10q11.23 MSMB 51 ,219,502 7,507 0.39 7.2 5.8E-06 rs 10788160 A 10q26.12 FGFR2 123,023,539 7,322 0.31 9.2 1.1 E-07 rs 12413088 T 10q26.12 FGFR2 123,042,718 7,656 0.28 8.0 3.0E-06 rs11067228 A 12q24.21 TBX3 1 13,578,643 7,564 0.56 8.3 1.5E-07 rs3744763 C 17q12 HNF1 B 33,164,998 7,392 0.60 8.4 6.5E-08 rs 7501939 C 17q12 HNF1 B 33,175,269 7,432 0.58 7.9 5.3E-07 rs266849 A 19q13.33 KLK3 56,040,902 7,643 0.83 16.1 1.2E-13 rs266870 T 19q13.33 KLK3 56,043,746 7,583 0.51 9.7 1.3E-09 rs 1058205 T 19q13.33 KLK3 56,055,210 7,575 0.82 19.4 5.4E-20 rs2735839 G 19q13.33 KLK3 56,056,435 7,533 0.87 22.5 1.8E-21 rs 1506684 T 19q13.33 KLK3 56,063,231 7,487 0.58 9.3 1.9E-09 b. Imputed results for SNPs not present on the lllumina 317K SNP chip
Closest Position Individuals Allele Association
SNP Allele Locus gene (bp) (n) Frequency effect (%) P-value rs2736098 A 5p15.33 TERT 1 ,347,086 4,506 0.33 1 1.5 8.8E-07 rs4430796 A 17q12 HNF1 B 33,172,153 4,506 0.52 1 1.3 3.8E-09 rs 17632542 T 19q13.33 KLK3 56,053,569 4,506 0.91 35.7 1.6E-18
Part a) of the table: shown are genome-wide association results for SNPs with P < 1 E-05, the number of individuals (n) with PSA measurement and either genotyped using the lllumina 317K chip (on average 4,599 men) or by the in-silico genotyping method (on average 2,918 men), the allele associated with increased PSA levels, the association effect per allele and the two-sided P- value.
Part b) of the table: shown are association results for the three SNPs that showed a stronger effect than the chip-genotyped SNPs. The imputation analysis was based on 2.5M HapMap SNPs, testing all SNPs within a window of 500 Kb for all six loci shown in section a) of this table.
Table 4. Association results for SNPs and PSA levels, based on samples from Iceland and UK.
Iceland UK Combined
Increase Increase
SNP (SEQ ID Total Total
Allele Chr Position (bp) P-value Freq. per allele P-value Freq. per allele P-value NO) (n) (n)
(%) (%)
rs401681 (1 ) C 5 1 ,375,087 1.88E-09 0.55 9,049 7.0 0.002 0.53 451 19.0 1.20E-10 rs2736098* (2) A 5 1 ,347,086 5.10E-10 0.33 6,347 10.5 0.021 0.27 450 14.8 2.84E-10 rs10788160 (3) A 10 123,023,539 8.88E-14 0.31 8,686 10.2 0.0012 0.24 453 22.9 4.50E-15 rs 10993994 (4) T 10 51 ,219,502 9.25E-14 0.39 8,870 9.2 0.46 0.38 453 5.4 6.66E-13 rs11067228 (5) A 12 113,578,643 1.09E-1 1 0.56 8,882 8.3 0.074 0.56 441 9.2 1.93E-1 1 rs4430796* (6) A 17 33,172,153 1.40E-1 1 0.52 6,222 9.4 0.21 0.50 449 6.3 5.60E-1 1 rs2735839 (7) G 19 56,056,435 4.84E-43 0.87 8,869 25.4 1.18E-06 0.86 445 49.7 6.26E^7 rs 17632542* (8) T 19 56,053,569 9.00E-40 0.91 6,078 39.1 2.66E-09 0.93 435 102.2 3.05E^6
Shown are results for alleles that associate with increased (%) levels of PSA. Results for SNPs present on the lllumina chips are based on genotypes from chip (-50%), in- silico genotyping using family imputation (-30%), and single track assay genotyping (-20%)
* These SNPs (rs273098, rs4430796, and rs17632542) are not on the lllumina chips used in the present study and results are based on genotypes from HapMap SNP imputation (-70%) and single track assay (-30%) genotyping.
Table 5. Estimates from Iceland and UK on the relative genotype effect for SNPs associated with PSA levels
Results for the Icelandic study group
Allelic Relative XX XX relative OX OX relative 00 relative
SNP Allele Chr Position (bp) Frequency Allelic effect Frequency gt-effect Frequency gt-effect Frequency gt-effect rs2736098 A 5 1 ,347,086 0.33 1.1 1 0.11 1.14 0.44 1.03 0.45 0.93 rs401681 C 5 1 ,375,087 0.55 1.07 0.30 1.06 0.50 0.99 0.20 0.93 rs 10993994 T 10 51 ,219,502 0.39 1.09 0.15 1.11 0.47 1.02 0.38 0.93 rs 10788160 A 10 123,023,539 0.31 1.10 0.10 1.14 0.43 1.04 0.48 0.94 rs11067228 A 12 1 13,578,643 0.56 1.08 0.31 1.07 0.49 0.99 0.20 0.91 rs4430796 A 17 33,172,153 0.52 1.09 0.27 1.09 0.50 0.99 0.23 0.91 rs 17632542 T 19 56,053,569 0.91 1.39 0.82 1.05 0.17 0.76 0.01 0.54 rs2735839 G 19 56,056,435 0.87 1.25 0.75 1.06 0.23 0.84 0.02 0.67 b. Results for the UK study group
on
Allelic Relative XX XX relative OX OX relative 00 relative
SNP Allele Chr Position (bp) Frequency Allelic effect Frequency gt-effect Frequency gt-effect Frequency gt-effect rs2736098 A 5 1 ,347,086 0.27 1.15 0.07 1.22 0.39 1.06 0.53 0.92 rs401681 C 5 1 ,375,087 0.53 1.19 0.29 1.17 0.50 0.98 0.22 0.82 rs 10993994 T 10 51 ,219,502 0.38 1.05 0.14 1.07 0.47 1.01 0.39 0.96 rs 10788160 A 10 123,023,539 0.24 1.23 0.06 1.36 0.37 1.10 0.57 0.90 rs11067228 A 12 1 13,578,643 0.56 1.09 0.31 1.08 0.49 0.99 0.20 0.90 rs4430796 A 17 33,172,153 0.50 1.06 0.25 1.06 0.50 1.00 0.25 0.94 rs 17632542 T 19 56,053,569 0.93 2.02 0.86 1.08 0.14 0.53 0.01 0.26 rs2735839 G 19 56,056,435 0.86 1.50 0.73 1.10 0.25 0.74 0.02 0.49
Shown are the SNPs and their alleles associated with increasing PSA levels and the genotype (gt) frequency and the relative genotype (gt) effect on PSA levels, compared to the average of the population under study: for homozygous (XX), heterozygous (OX), and non-carriers (OO) of the allele associated with elevated PSA levels.
Table 6: Bioinformatic analysis of the KLK3 missense variant rsl7632542 (I179T)
Nonsynonymous (I179T); change
Amino acid variation from medium size and hydrophobic (1) to medium size and polar (T)
Prediction Tool Analysis Type Prediction Results
PhastCons_44waya Conservation not conserved
F-Scoreb Structure / Conservation 0.75
Panther subPSECc Structure / Conservation -6.28
Probability of being deleterious = Panther Pdeleteriousc Structure / Conservation 97%
PolyPhend Structure / Conservation benign
LS-SNPe Structure / Conservation deleterious
SN Perfect* Structure / Conservation deleterious
SNPs3Dg Structure / Conservation deleterious
ESEfinderh Exonic splicing enhancer changed
ESRSearch1 Exonic splicing enhancer changed
PESXj Exonic splicing enhancer changed
RESCUE ESEk Exonic splicing enhancer not changed
Carries out multiple alignments of 44 vertebrate species and returns measures of evolutionary conservation using a phylogenetic hidden Markov model (phylo-HMM). Siepel A, et al., Genome Res 15:1034-1050, 2005. bUses the F-SNP database
(http://compbio.cs.queensu.ca/F-SNP/) to provide integrated information about the functional effects of SNPs obtained from 16 different bioinformatic tools and databases. Functional effects are predicted and indicated at the splicing, transcriptional, translational and post-translational levels. °Panther estimates the likelihood of a particular nsSNP to cause a functional impact on the protein. It calculates subPSEC (substitution position-specific evolutionary conservation) score based on an alignment of evolutionarily related proteins. It then calculates Pdeleterious, the probability that a given variant will have a deleterious effect on protein function, such that a subPSEC score of -3 corresponds to a Pdeleterious of 0.5. Brunham LR, et al. PLoS Genet 1(6) 2005: e83. doi:10.1371/journal.pgen.0010083. dPolyPhen predicts the possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. Ramensky, V, et al. Nucleic Acids Res 30(17): 3894-900, 2002. eDisease-associated nsSNPs are predicted by a support vector machine (SVM) trained on OMIM amino-acid variants and putatively neutral nsSNPs from dbSNP. Karchin R, et al. Bioinformatics 21(12):2814- 20, 2005. 'The SNPeffect database uses sequence- and structure-based bioinformatics tools to predict the effect of non- synonymous SNPs on the molecular phenotype of proteins. Reumers J, et al., Bioinformatics 22:2183-2185, 2006. 9SNPs3D assigns molecular functional effects of non-synonymous SNPs based on structure and sequence analysis. Peng Y and John M, J Mol Biol. 356(5) :1263-74, 2006. hESEfinder uses position weighted matrices to predict putative human exonic splicing enhancers (ESEs). Cartegni L, et al., Nucleic Acids Res 31 (13): 3568-3571 , 2003. 'ESRSearch uses the evolutionary conservation of wobble positions between human and mouse orthologous exons and the analysis of the overabundance of sequence motifs, compared with their random expectation, given by their codon relative frequency, to predict ESEs. Goren A, et al., Mol Cell. 22(6):769-81 , 2006. 'PESX compares the frequency of all 65536 8-mers in internal non-coding exons against their adjacent pseudo exons and in internal non-coding exons against 5'UTR of intronless genes to predict ESEs. Zhang XH and Chasin LA, Genes Dev 18(11 ):1241-1250, 2004 kSpecific hexanucleotide sequences were identified as candidate ESEs on the basis that they have both significantly higher frequency of occurrence in exons than in introns and also significantly higher frequency in exons with weak (non-consensus) splice sites than in exons with strong (consensus) splice sites. Fairbrother WG, et al., Science 297(5583): 1007- 13, 2002.
Table 7. Association of the six PSA SNPs with prostate cancer in Iceland, The Netherlands, Spain, Romania, and the
US
a. Combined association results from a case-control association analysis in five study populations
Cases Controls Frequency
SNP Allele Chr Position (bp) (n) (n) Cases Controls OR P-value Phel rs2736098 A 5 1 ,347,086 5,009 41 ,334 0.30 0.29 1.1 1 3.5E-04 0.28 rs 10993994 T 10 51 ,219,502 5,077 41 ,168 0.45 0.40 1.21 7.7E-15 0.0066 rs 10788160 A 10 123,023,539 5,317 41 ,417 0.25 0.25 0.97 2.7E-01 0.65 rs11067228 A 12 1 13,578,643 5,325 41 ,383 0.55 0.54 1.01 5.4E-01 0.16 rs4430796 A 17 33,172,153 5,162 41 ,320 0.55 0.51 1.20 3.2E-13 0.29 rs 17632542 T 19 56,053,569 5,284 40,522 0.95 0.93 1.39 1.8E-10 0.052 rs2735839 G 19 56,056,435 5,080 41 ,120 0.88 0.86 1.19 1.1 E-06 0.89 b. Odds ratio and P-value for each study population from an case-control association analysis of prostate cancer
SNP OR ICE P ICE OR NL P NL OR US P US OR ROM P ROM OR SPA P SPA
rs2736098 1.08 7.5E-02 1.17 1.2E-02 1.13 3.8E-02 0.83 2.0E-01 1.15 1.2E-01 rs 10993994 1.1 1 2.1 E-03 1.20 1.2E-03 1.40 2.4E-10 1.17 2.8E-01 1.32 2.6E-04 rs 10788160 0.96 3.1 E-01 0.98 7.5E-01 1.04 5.1 E-01 0.92 6.3E-01 0.90 1.7E-01 rs11067228 0.96 2.4E-01 1.01 8.5E-01 1.09 1.1 E-01 0.98 9.5E-01 1.12 8.4E-02 rs4430796 1.17 3.2E-05 1.26 5.0E-05 1.26 9.0E-06 1.30 5.9E-02 1.07 3.2E-01 rs 17632542 1.23 3.0E-03 1.61 1.8E-04 1.52 5.1 E-04 1.16 6.1 E-01 2.01 1.2E-04 rs2735839 1.15 6.6E-03 1.25 4.0E-03 1.22 1.1 E-02 1.09 6.9E-01 1.23 1.0E-01
Shown are: the allele associated with increased PSA levels, the number of cases and controls (n), the allele frequency in cases and controls, the odds ratio (OR) and the two-sided P-value. For the combined study populations the OR and P-values were estimated using the Mantel- Haenszel model. Abbreviations for study populations are: Iceland (ICE), the Netherlands (NL), Chicago USA (US), Romania (ROM), and Spain (SPA).
Table 8. Effect of the al lele conferring elevated PSA levels on age at diagnosis among 6,406 patients from six Europea n a ncestry study populations
„.._. Allele increasing „. Age effect 95% CI „ , „ T
SNP PSA-levels Chromosome H (yegr) (year) P_value Phet I2 rs2736098 A 5 -0.23 (-0.51, 0.06) 0.13 0.0037 71.4 rs 10993994 T 10 0.19 (-0.08, 0.45) 0.17 0.76 0 rs 10788160 A 10 0.01 (-0.10, 0.11) 0.96 0.6 0 rs11067228 A 12 -0.10 (-0.36, 0.17) 0.48 0.86 0 rs4430796 A 17 -0.15 (-0.41, 0.11) 0.27 0.51 0 rs 17632542 T 19 -0.71 (-1.29, -0.13) 0.016 0.2 31.3
Of the six PSA-associated SNPs, only the missense mutation in KLK3, rs17632542-T, is significantly associated with age at prostate cancer diagnosis. The T allele of rs17632542, which associates with a higher PSA levels, is associated with a decrease in age at diagnosis of 9 months for each allele carried (-0.71 years).
Study populations:
Chicago, the US: 1578 patients
The Netherlands: 1088 patients
Iceland: 2258 patients
Romania: 309 patients
Spain: 656 patients
United Kingdom: 517 patients
Table 9. Association of the 47 previously reported prostate cancer risk SNPs with PSA levels and prostate cancer in Iceland.
PSA Prostate cancer
SNP Allele Chromosome Position (bp) P-value Effect s.u. n Freq. P-value OR Cases (n) Controls (n) rs 1465618 C 2 43 07 53 4.50E-01 -0.01794 4,470 0.807 1.42E-01 0.94 1 ,757 36,145 rs 1465618 T 2 43 07 53 4.50E-01 0.017935 4,470 0.193 1.42E-01 1.06 1 ,757 36,145 rs 721048 A 2 62,985,235 5.58E-01 -0.0137 4,506 0.201 5.16E-04 1.16 1 ,763 36,400 rs 721048 G 2 62,985,235 5.58E-01 0.013701 4,506 0.799 5.16E-04 0.87 1 ,763 36,400 rs2710646 A 2 62,988,383 6.23E-01 -0.0116 4,461 0.196 3.13E-04 1.16 1 ,745 36,061 rs2710646 C 2 62,988,383 6.23E-01 0.01 1599 4,461 0.804 3.13E-04 0.86 1 ,745 36,061 rs12621278 A 2 173,019,799 1.08E-01 0.065471 4,506 0.942 1.08E-02 1.22 1 ,763 36,400 rs12621278 G 2 173,019,799 1.08E-01 -0.06547 4,506 0.058 1.08E-02 0.82 1 ,763 36,400 rs2660753 C 3 87,193,364 8.78E-01 -0.0049 4,503 0.903 4.23E-02 0.89 1 ,761 36,349 rs2660753 T 3 87,193,364 8.78E-01 0.004899 4,503 0.097 4.23E-02 1.12 1 ,761 36,349 rs 10934853 A 3 129,521 ,063 1.70E-02 0.050924 4,481 0.269 3.53E-03 1.12 1 ,754 36,151 rs 10934853 C 3 129,521 ,063 1.70E-02 -0.05092 4,481 0.731 3.53E-03 0.89 1 ,754 36,151 rs 12500426 A 4 95,733,632 3.60E-01 -0.01745 4,502 0.402 1.59E-01 1.05 1 ,762 36,356 rs 12500426 C 4 95,733,632 3.60E-01 0.017452 4,502 0.598 1.59E-01 0.95 1 ,762 36,356 rs17021918 C 4 95,781 ,900 9.50E-01 0.001227 4,506 0.639 7.05E-01 1.01 1 ,763 36,400 rs17021918 T 4 95,781 ,900 9.50E-01 -0.00123 4,506 0.361 7.05E-01 0.99 1 ,763 36,400 rs7679673 A 4 106,280,983 5.18E-01 0.012612 4,506 0.363 7.92E-03 0.91 1 ,763 36,400 rs7679673 C 4 106,280,983 5.18E-01 -0.01261 4,506 0.637 7.92E-03 1.10 1 ,763 36,400 rs2736098 C 5 1 ,347,086 8.80E-07 -0.12272 4,506 0.657 7.51 E-02 0.92 1 ,763 36,400 rs2736098 T 5 1 ,347,086 8.80E-07 0.122718 4,506 0.343 7.51 E-02 1.08 1 ,763 36,400 rs401681 c 5 1 ,375,087 746E-04 0.063589 4,502 0.545 5.33E-02 1.07 1 ,762 36,375 rs401681 T 5 1 ,375,087 746E-04 -0.06359 4,502 0.455 5.33E-02 0.94 1 ,762 36,375 rs9364554 c 6 160,753,654 2.67E-01 -0.02253 4,504 0.694 8.84E-02 0.94 1 ,761 36,376 rs9364554 T 6 160,753,654 2.67E-01 0.022532 4,504 0.306 8.84E-02 1.07 1 ,761 36,376 rs12155172 A 7 20,961 ,016 4.86E-02 0.042607 4,501 0.255 5.89E-01 1.02 1 ,762 36,360 rs12155172 G 7 20,961 ,016 4.86E-02 -0.04261 4,501 0.745 5.89E-01 0.98 1 ,762 36,360 rs 10486567 A 7 27,943,088 1.81 E-01 -0.02948 4,505 0.235 4.88E-03 0.89 1 ,762 36,379 rs 10486567 G 7 27,943,088 1.81 E-01 0.029482 4,505 0.765 4.88E-03 1.12 1 ,762 36,379
Table continued on next page
Table 9 continued. PSA Prostate cancer
SNP Allele Chromosome Position (bp) P-value Effect s.u. n Freq. P-value OR Cases (n) Controls (n) rs6465657 C 7 97,654,263 6.91 E-01 -0.00752 4,503 0.423 2.40E-01 1.04 1 ,762 36,319 rs6465657 T 7 97,654,263 6.91 E-01 0.007524 4,503 0.577 2.40E-01 0.96 1 ,762 36,319 rs2928679 A 8 23,494,920 2.04E-01 0.023671 4,503 0.464 6.81 E-02 1.06 1 ,761 36,364 rs2928679 G 8 23,494,920 2.04E-01 -0.02367 4,503 0.536 6.81 E-02 0.94 1 ,761 36,364 rs1512268 C 8 23,582,408 1.02E-05 -0.08698 4,506 0.660 1.99E-03 0.90 1 ,763 36,400 rs1512268 T 8 23,582,408 1.02E-05 0.08698 4,506 0.340 1.99E-03 1.12 1 ,763 36,400 rs 12543663 A 8 127,993,841 5.50E-01 0.012596 4,506 0.696 8.19E-04 0.88 1 ,763 36,400 rs 12543663 C 8 127,993,841 5.50E-01 -0.0126 4,506 0.304 8.19E-04 1.14 1 ,763 36,400 rs 13252298 A 8 128,164,338 3.50E-01 0.019375 4,506 0.704 5.32E-05 1.17 1 ,763 36,400 rs 13252298 G 8 128,164,338 3.50E-01 -0.01938 4,506 0.296 5.32E-05 0.85 1 ,763 36,400 rs16901979 A 8 128,194,098 8.1 1 E-04 0.18569 4,506 0.032 3.54E-17 1.92 1 ,763 36,400 rs16901979 C 8 128,194,098 8.1 1 E-04 -0.18569 4,506 0.968 3.54E-17 0.52 1 ,763 36,400 rs445114 C 8 128,392,363 1.27E-02 -0.04946 4,503 0.327 2.08E-06 0.84 1 ,761 36,366 rs445114 T 8 128,392,363 1.27E-02 0.049464 4,503 0.673 2.08E-06 1.20 1 ,761 36,366 rs6983267 G 8 128,482,487 8.32E-02 0.032849 4,492 0.542 9.40E-04 1.12 1 ,759 36,219 rs6983267 T 8 128,482,487 8.32E-02 -0.03285 4,492 0.458 9.40E-04 0.89 1 ,759 36,219 rs 1447295 A 8 128,554,220 9.74E-03 0.078536 4,504 0.105 1.33E-20 1.57 1 ,762 36,389 rs 1447295 C 8 128,554,220 9.74E-03 -0.07854 4,504 0.895 1.33E-20 0.64 1 ,762 36,389 rs1571801 G 9 123,467,194 4.72E-02 -0.04147 4,489 0.724 7.26E-02 1.07 1 ,758 36,234 rs1571801 T 9 123,467,194 4.72E-02 0.041468 4,489 0.276 7.26E-02 0.93 1 ,758 36,234 rs7920517 A 10 51 ,202,627 3.21 E-04 -0.06796 4,506 0.575 1.16E-03 0.89 1 ,763 36,400 rs7920517 G 10 51 ,202,627 3.21 E-04 0.067959 4,506 0.425 1.16E-03 1.12 1 ,763 36,400 rs 10993994 C 10 51 ,219,502 8.66E-06 -0.0854 4,505 0.617 2.07E-03 0.90 1 ,763 36,384 rs 10993994 T 10 51 ,219,502 8.66E-06 0.085404 4,505 0.383 2.07E-03 1.11 1 ,763 36,384 rs4962416 c 10 126,686,862 5.99E-01 0.01 1722 4,506 0.227 8.97E-01 1.01 1 ,763 36,400 rs4962416 T 10 126,686,862 5.99E-01 -0.01 172 4,506 0.773 8.97E-01 0.99 1 ,763 36,400 rs7127900 A 11 2,190,150 2.76E-01 0.027159 4,506 0.175 2.22E-03 1.15 1 ,763 36,400 rs7127900 G 11 2,190,150 2.76E-01 -0.02716 4,506 0.825 2.22E-03 0.87 1 ,763 36,400
Table continued on next page
Table 9 continued. PSA Prostate cancer
SNP Allele Chromosome Position (bp) P-value Effect s.u n Freq. P-value OR Cases (n) Controls (n) rs12418451 A 11 68,691 ,995 1.64E-01 0.029052 4,506 0.289 6.68E-05 1.16 1 ,763 36,400 rs12418451 G 11 68,691 ,995 1.64E-01 -0.02905 4,506 0.711 6.68E-05 0.86 1 ,763 36,400 rs 11228565 A 11 68,735,156 1.01 E-02 0.081594 4,506 0.130 4.38E-05 1.25 1 ,763 36,400 rs 11228565 G 11 68,735,156 1.01 E-02 -0.08159 4,506 0.870 4.38E-05 0.80 1 ,763 36,400 rs 10896449 A 11 68,751 ,243 5.51 E-01 -0.01 151 4,506 0.543 1.92E-04 0.88 1 ,763 36,400 rs 10896449 G 11 68,751 ,243 5.51 E-01 0.01 1507 4,506 0.457 1.92E-04 1.14 1 ,763 36,400 rs 10896450 A 11 68,764,690 5.30E-01 -0.01 188 4,505 0.536 2.55E-04 0.88 1 ,762 36,381 rs 10896450 G 11 68,764,690 5.30E-01 0.01 1884 4,505 0.464 2.55E-04 1.13 1 ,762 36,381 rs902774 A 12 51 ,560,171 2.20E-01 0.029519 4,506 0.193 3.95E-01 1.04 1 ,763 36,386 rs902774 G 12 51 ,560,171 2.20E-01 -0.02952 4,506 0.807 3.95E-01 0.96 1 ,763 36,386 rs 10778826 A 12 80,626,985 1.23E-01 0.029397 4,500 0.427 6.78E-02 0.94 1 ,762 36,363 rs 10778826 G 12 80,626,985 1.23E-01 -0.0294 4,500 0.573 6.78E-02 1.07 1 ,762 36,363 rs11861609 C 16 81 ,942,167 4.40E-01 -0.01551 4,506 0.625 1.58E-01 0.95 1 ,763 36,400 rs11861609 G 16 81 ,942,167 4.40E-01 0.015513 4,506 0.375 1.58E-01 1.05 1 ,763 36,400 rs4782780 C 16 81 ,960,548 2.82E-01 0.021353 4,506 0.383 1.53E-01 1.05 1 ,763 36,400 rs4782780 T 16 81 ,960,548 2.82E-01 -0.02135 4,506 0.617 1.53E-01 0.95 1 ,763 36,400 rs4054823 c 17 13,565,749 4.60E-01 -0.01574 4,506 0.448 3.18E-02 0.92 1 ,763 36,400 rs4054823 T 17 13,565,749 4.60E-01 0.015739 4,506 0.552 3.18E-02 1.09 1 ,763 36,400 rs 11649743 A 17 33,149,092 7.95E-01 -0.00682 4,506 0.220 5.20E-02 0.91 1 ,763 36,400 rs 11649743 G 17 33,149,092 7.95E-01 0.006823 4,506 0.780 5.20E-02 1.10 1 ,763 36,400 rs4430796 A 17 33,172,153 3.85E-09 0.116905 4,506 0.525 3.17E-05 1.17 1 ,763 36,400 rs4430796 G 17 33,172,153 3.85E-09 -0.1 1691 4,506 0.475 3.17E-05 0.86 1 ,763 36,400 rs 1859962 G 17 66,620,348 6.81 E-01 0.007882 4,506 0.451 2.01 E-04 1.14 1 ,763 36,400 rs 1859962 T 17 66,620,348 6.81 E-01 -0.00788 4,506 0.549 2.01 E-04 0.88 1 ,763 36,400 rs8102476 c 19 43,427,453 5.27E-02 0.03643 4,495 0.488 8.72E-04 1.12 1 ,754 36,238 rs8102476 T 19 43,427,453 5.27E-02 -0.03643 4,495 0.512 8.72E-04 0.89 1 ,754 36,238 rs887391 c 19 46,677,464 3.77E-01 -0.02005 4,504 0.219 8.30E-01 0.99 1 ,762 36,320 rs887391 T 19 46,677,464 3.77E-01 0.020054 4,504 0.781 8.30E-01 1.01 1 ,762 36,320
Table continued on next page
Table 9 continued. PSA Prostate cancer
SNP Allele Chromosome Position (bp) P-value Effect s.u. n Freq. P-value OR Cases (n) Controls (n) rs2659056 C 19 56,027,755 6.98E-04 0.085854 4,506 0.344 2.16E-01 1.06 1 ,763 36,400 rs2659056 T 19 56,027,755 6.98E-04 -0.08585 4,506 0.656 2.16E-01 0.94 1 ,763 36,400 rs266849 A 19 56,040,902 6.32E-10 0.155396 4,496 0.834 3.66E-02 1.10 1 ,761 36,282 rs266849 G 19 56,040,902 6.32E-10 -0.1554 4,496 0.166 3.66E-02 0.91 1 ,761 36,282 rs2735839 A 19 56,056,435 5.39E-17 -0.22886 4,504 0.136 6.60E-03 0.87 1 ,763 36,364 rs2735839 G 19 56,056,435 5.39E-17 0.22886 4,504 0.864 6.60E-03 1.15 1 ,763 36,364 rs96231 17 C 22 38,782,065 5.24E-01 0.014766 4,502 0.204 9.46E-01 1.00 1 ,762 36,381 rs96231 17 T 22 38,782,065 5.24E-01 -0.01477 4,502 0.796 9.46E-01 1.00 1 ,762 36,381 rs5759167 G 22 41 ,830,156 2.57E-01 -0.02523 4,506 0.514 1.96E-02 1.10 1 ,763 36,400 rs5759167 T 22 41 ,830,156 2.57E-01 0.02523 4,506 0.486 1.96E-02 0.91 1 ,763 36,400
Shown are association results for 47 SNPs reported to be associated with prostate cancer by various GWAS. Our selection of SNPs is based on the NIH Catalog of Published Genome-Wide Association Studies; http://www.genome.gOv/26525384#1. Shown are association results for PSA levels; two-sided P-values, the association effect in standardized units (s.u.) (see Methods), number (n) of individuals with PSA level measurements, and the allele frequency (freq.). Shown are association results for prostate cancer in Iceland, the two-sided P-value, the odds ratio (OR) and the number (n) of patients with prostate cancer (cases) and controls
Table 10. Association of the PSA variants with having undergone a biopsy of the prostate among Icelandic men
Individuals Individuals not Individuals with Individuals not with
SNP Allele Chr Position (bp) P-value OR with biopsy (n) with biopsy (n) biopsy, allele freq. biopsy, allele freq. Comment rs2736098 A 5 1,347,086 8.5E-03 1.11 2,216 41,323 0.35 0.34 S rs401681 C 5 1,375,087 2.4E-03 1.09 2,513 41,509 0.57 0.55 # rsl0993994 T 10 51,219,502 4.5E-02 1.06 2,342 39,737 0.40 0.39 # rsl0788160 A 10 123,023,539 2.5E-02 1.08 2,302 37,835 0.33 0.31 # rs 11067228 A 12 113,578,643 2.5E-01 1.04 2,347 39,340 0.57 0.56 # rs4430796 A 17 33,172,153 1.2E-04 1.13 2,338 39,621 0.55 0.53 s rsl7632542 T 19 56,053,569 4.2E-09 1.46 2,325 38,265 0.94 0.91 s rs2735839 G 19 56,056,435 3.5E-05 1.21 2,368 39,551 0.89 0.86 #
Shown are: the allele associated with increased PSA levels, the number of individuals (n) that have undergone a biopsy of the prostate, the number of individuals (controls) not known to have undergone a biopsy of the prostate, the allele frequency (freq.) in each group of individuals, the odds ratio (OR), and the two-sided P-value.
# For those SNPs, the average number of persons with in-silico derived genotypes is 332, the remaining individuals were directly genotyped using the lllumina chip or single track SNP assays.
$ For those SNPs, 1 ,484 persons with biopsy and 36,369 persons not known to have a biopsy had their genotypes imputed based on the 2.5 million HapMap SNP data set or were genotyped using a single track SNP assays. The analysis are done separately for the different genotyping methods and the results combined using the Mantel-Haenszel model
Table 11. Association of the PSA variants with having a negative prostate biopsy outcome among Icelandic men
a. Results for SNPs and individuals genotyped with Illumine SNP chip
Frequency
Men with Men with
Controls
SNP Allele Chr Position (bp) P-value OR negative negative Contrc
biopsy (n) (n) biopsy
rs 10788160 A 10 123,023,539 4.2E-04 1.17 1 ,133 37,835 0.34 0.31
rs 10993994 T 10 51 ,219,502 0.48 1.03 1 ,143 39,737 0.39 0.39
rs11067228 A 12 113,578,643 5.8E-03 1.12 1 ,151 39,340 0.59 0.56
rs2735839 G 19 56,056,435 6.7E-06 1.35 1 ,137 39,551 0.90 0.86
rs401681 C 5 1 ,375,087 0.037 1.09 1 ,169 41,509 0.57 0.55
b. Results for SNPs and individuals either imputed or genotyped using a Centaurus single track assay
Imputed genotypes Single track assay genotyp es
Frequency Men Frequency
Men with with Men
negative Controls Men with Controls
negative with
SNP Allele Chr Position (bp) P-value OR biopsy (n) (n) negative Controls biopsy (n) Controls negative biopsy
(n) biopsy
rs2736098 A 5 1 ,347,086 0.025 1.13 488 36,369 0.36 0.35 492 4,954 0.32 0.28 rs4430796 A 17 33,172,153 9.0E-03 1.14 488 36,369 0.56 0.53 491 3,252 0.54 0.51 rs 17632542 T 19 56,053,569 6.1 E-09 1.82 488 36,369 0.94 0.91 480 1,896 0.96 0.91
Association results in Iceland for PSA SNPs in men that have had a prostate biopsy but have not been diagnosed with prostate cancer (a negative biopsy) compared with Icelandic controls that have not undergone a biopsy and are not known to have prostate cancer. Shown are: the allele associated with increased PSA levels, the number (n) of individuals that have undergone a biopsy of the prostate but were not diagnosed with prostate cancer (a negative biopsy), the number (n) of controls not known to have undergone a biopsy of the prostate and not known to have been diagnosed with prostate cancer, the allele frequency in each of groups, the odds ratio (OR), and the two-sided P-value. In the upper part of the table are results for individuals that were genotyped using the lllumina genotyping SNP chip. In the lower part of the table are the combined results for individuals either genotyped using Centaurus single track SNP assay or individuals that had their genotypes imputed based on the 2.5 million HapMap SNP data set.
Table 12. Association results for PSA SNPs and outcome from a biopsy of the prostate, combined results for Iceland and UK
Allele Persons with Persons with Persons with Persons with
OR
SNP increasing Position (bp) positive biopsy positive biopsy, negative negative P-value
95% CI
PSA-levels (n) freq. biopsy (n) biopsy, freq.
1.04
rs2736098 A 5 1 ,347,086 1 ,718 0.34 1 ,907 0.32 (0.94,1.16) 0.47 0.082
1.05
rs 10993994 T 10 51 ,219,502 1 ,696 0.41 2,082 0.40 (0.96,1.15) 0.31 0.82
0.79
rs 10788160 A 10 123,023,539 1 ,679 0.28 2,084 0.32 (0.71 ,0.87) 5.40E-06 0.092
0.87
rs11067228 A 12 113,578,643 1 ,706 0.55 2,106 0.59 (0.79,0.95) 0.0034 0.51
1.03
rs4430796 A 17 33,172,153 1 ,858 0.55 1 ,919 0.53 (0.97,1.10) 0.37 0.067
0.77
rs 17632542 T 19 56,053,569 1 ,873 0.93 1 ,924 0.95 (0.63,0.95) 0.013 0.56
0.85
rs2735839 G 19 56,056,435 1 ,743 0.88 2,091 0.89 (0.74,0.98) 0.026 0.44
Shown are the results from a combined analysis of the Icelandic and UK study groups, the number of individuals (n) that have undergone a biopsy of the prostate and have been diagnosed with cancer of the prostate (positive biopsy; maximum number of individuals with genotypes used in the analysis is 1 ,870, of those 1 ,354 are from Iceland and 516 from the UK), the number of individuals (n) that have undergone a biopsy of the prostate and have not been diagnosed with cancer of the prostate (negative biopsy; maximum number of individuals with genotypes used in the analysis is 2,124, of those 1 ,169 are from Iceland and 955 from the UK), the allele associated with increased PSA levels and the allelic frequency (freq.), the odds ratio (OR), and the two-sided P-value. The OR and P-values were estimated using the Mantel-Haenszel model.
EXAMPLE 2
In order to summarize the overall effect on PSA levels, we combined the effect of the PSA variants, assuming a multiplicative model, independently for the Icelandic and UK study populations. We chose to include in the analysis only the four sequence variants, located near TERT, FGFR2 TBX3 and KLK3 (rs2736098, rsl0788160, rsl l067228, and rsl7632542, respectively) that are primarily associated with PSA levels. The variants at the MSMB and HNF1B loci were not included, since we consider them to be associated primarily with prostate cancer. Based on results from Iceland for the top 5% of the genetic PSA level distribution, the measured PSA levels are estimated to be increased by 23% to 47% compared to the population average. Similarly, for the bottom 5% of genetic PSA level distribution, the measured PSA levels is estimated to be decreased by 30% to 56% compared to the population average. In the UK study population the estimated relative effect on PSA levels are even greater; the range of increase is 40% to 92% for the top 5% of the distribution with the greatest genotypic effect compared to the population average, whereas for the bottom 5% of the distribution, the range of decrease is 53% to 80% compared to the population average.
To apply the above to demonstrate how the genetic effect of the four PSA sequence variants influences individual PSA levels, we calculated a personalized PSA cutoff value corresponding to the commonly used cutoff of 4 ng/ml. This was done by multiplying the value of 4 ng/ml with the estimated relative genetic effect for the PSA SNPs. For individuals with the highest (top 5% of the distribution) genotypic effect, the personalized PSA cutoff value increased from 4 ng/ml to cutoff values between 4.9 and 5.9 ng/ml based on the estimates from Iceland, and to cutoff values between 5.6 and 7.7 ng/ml based on the UK estimates. For the bottom 5% of the genetic relative effect distribution, the personalized PSA cutoff values move from 4 ng/ml to cutoff values between 1.7 and 2.8 ng/ml according to the Icelandic estimates, and to cutoff values between 0.8 and 1.9 ng/ml according to the UK estimates (see Fig. 2) . These data demonstrate that for a substantial fraction of men undergoing PSA-based prostate cancer screening, the personalized PSA cutoff value is shifted following correction for the effect of the PSA sequence variants. If applied clinically, men would be reclassified with respect to whether or not they should undergo a biopsy.
Our results from estimating the combined relative effect of the 4 variants primarily associated with PSA levels demonstrate a considerable variation in PSA levels between individuals based on their genotypes of these 4 variants. By applying the combined genetic effect on commonly used PSA cutoff values, a personalized PSA cutoff value can be obtained . Thus our data indicate that for a substantial fraction of men undergoing PSA-based prostate cancer screening, the personalized PSA cutoff value (for the decision of doing a biopsy or not) is shifted and hence men would be reclassified with respect to whether or not they should undergo a biopsy. This reclassification is likely to affect both the sensitivity and the specificity of the PSA test, and thereby, also the long term outcome of the patients since early diagnosis is the most powerful way to improve the patient's prognosis. For a screening test as important and widely used as the PSA test, having a better way to interpret the measured PSA level is likely to improve substantially the clinical performance of the test. EXAMPLE 3
MATERIALS AND METHODS
Study subjects
Icelandic study population. Results from PSA testing were collected from the three clinical laboratories performing the great majority of all PSA measurements in Iceland . The series of data spanned a period of 15 years (from 1994 to 2009) . In total we had information about PSA values from 15,757 individuals. The men have not been diagnosed with prostate cancer according to the nation-wide Icelandic Cancer Registry (ICR), and had not undergone TURP between 1983 and 2008, based on a list from the Landspitali-University Hospital where 90% of all TURP procedures in the country are performed.
Icelandic men diagnosed with prostate cancer were identified based on a nationwide list from the ICR that contained all 4,732 Icelandic prostate cancer patients diagnosed from January 1, 1955, to December 31, 2008. The Icelandic prostate cancer sample collection included 2,289 patients (diagnosed from December 1974 to December 2008) who were recruited from November 2000 until June 2009. A total of 2,249 patients were included in the study which all had genotypes from a genome wide SNP genotyping effort, using the Infinium II assay method and the Sentrix HumanHap300 BeadChip (Illumina, San Diego, CA, USA) or a Centaurus single SNP genotyping assay (see Supplementary Materials) . The mean age at diagnosis for the consenting patients is 70.7 years (ranging from 40 to 96 years), while the mean age at diagnosis is 73 years for all prostate cancer patients in the ICR. The median time from diagnosis to blood sampling is 2 years (range 0 to 26 years) . In the present study, for all populations, aggressive prostate cancer is defined as: Gleason >7 and/or T3 or higher and/or node positive and/or metastatic disease, while the less aggressive disease is defined as Gleason <7 and T2 or lower. The Icelandic men diagnosed with benign hyperplasia of the prostate (BPH) were identified based on a list of men undergoing TURP between 1983 and 2008 at the Landspitali-National Hospital in Iceland.
The 35,470 controls (15,359 men (43.3%) and 20,111 femen (56.7%)) used in this study consisted of individuals recruited through different genetic research projects at deCODE. The individuals have been diagnosed with common diseases of the ca rdio-vascu la r system (e.g . stroke or myocardial infraction), psychiatric and neurological diseases (e.g. schizophrenia, bipolar disorder), endocrine and autoimmune system (e.g. type 2 diabetes, asthma), malignant diseases other than prostate cancer as well as individuals randomly selected from the Icelandic genealogical database. No single disease project represented more than 6% of the total number of controls. The controls had a mean age of 84 years and the range was from 8 to 105 years. The controls were absent from the nation-wide list of prostate cancer patients according to the ICR. The DNA for both the Icelandic cases and controls was isolated from whole blood using standard methods.
The study was approved by the Data Protection Commission of Iceland and the National Bioethics Committee of Iceland. Written informed consent was obtained from all patients and controls. Personal identifiers associated with medical information and blood samples were encrypted with a third-party encryption system as previously described (Gulcher, J.R., et al. Eur J. Hum Genet 8: 739-42 (2000)) .
UK study population. In the 'Prostate Testing for Cancer and Treatment' trial (ProtecT), men aged 50-69 years were contacted and provided with information about the uncertainty surrounding PSA testing, detection and radical treatment of early prostate cancer, and offered an appointment for counseling and PSA testing. Recruitment took place at nine sites in the UK; 94,427 men agreed to be tested (50% of men contacted) and 8,807 (~9%) had a raised PSA level. Of those with raised PSA levels, 2,022 (23%) were diagnosed with prostate cancer; 229 men (~ 12%) had locally advanced (T3 or T4) or metastatic cancers, the rest having clinically localized (Tic or T2) disease. Men with a PSA level of > 20 ng/ ml_ were excluded from the trial. Those with locally confined cancers (mostly Tic, but some T2a and T2b) and with PSA levels of < 20 ng/mL were offered randomization into a three-arm trial of treatment (random assignment between active monitoring, radical prostatectomy or radical radiotherapy) . Participants will be followed up for > 10 years. Study participants found to have locally advanced (> T3) or distantly advanced disease were not eligible for the ProtecT treatment trial, and were referred for routine UK National Health Service care. Ethical approval for the ProtecT study was obtained from Trent Multi-Centre Research Ethics Committee.
From the ProtecT trial study group, the following number of samples were selected for the present study: 524 men with PSA values >3 ng/ml and diagnosed with prostate cancer after undergoing a needle biopsy (average age at diagnosis is 63.0 years), 960 men with PSA values between 3 ng/ml and 10 ng/ml but not diagnosed with prostate cancer after undergoing a needle biopsy (average age at PSA measurement is 62.4 years), and 454 men with PSA values < 3 ng/ml (average age at PSA measurement is 62.7 years) .
Dutch study population. The total number of Dutch prostate cancer cases used in this study was 1, 100. The Dutch study population consisted of two recruitment-sets of prostate cancer cases; Group-A was comprised of 360 hospital-based cases recruited from January 1999 to June 2006 at the Urology Outpatient Clinic of the Radboud University Nijmegen Medical Centre (RUNMC); Group-B consisted of 707 cases recruited from June 2006 to December 2006 through a population-based cancer registry held by the Comprehensive Cancer Centre IKO. Both groups were of self-reported European descent. The average age at diagnosis for patients in Group-A was 63 years (median 63 years; range 43 to 83 years) . The average age at diagnosis for patients in Group-B was 65 years (median 66 years; range 43 to 75 years) . The 2,021 control individuals (1,004 men and 1,017 femen) were cancer free and were matched for age with the cases. They were recruited within a project entitled "The Nijmegen Biomedical Study", in the Netherlands. This is a population-based survey conducted by the Department of Epidemiology and Biostatistics and the Department of Clinical Chemistry of RUNMC, in which 9,371 individuals participated from a total of 22,500 age and sex stratified, randomly selected inhabitants of Nijmegen . Control individuals from the Nijmegen Biomedical Study were invited to participate in a study on gene-environment interactions in multifactorial diseases, such as cancer. All the 2,021 participants in the present study are of self-reported European descent and were fully informed about the goals and the procedures of the study. The study protocol was approved by the Institutional Review Board of Radboud University and all study subjects gave written informed consent.
Spanish study population. The Spanish study population used in this study consisted of 618 prostate cancer cases. The cases were recruited from the Oncology Department of Zaragoza Hospital in Zaragoza, Spain, from June 2005 to September 2007. All patients were of self- reported European descent. Clinical information including age at onset, grade and stage was obtained from medical records. The average age at diagnosis for the patients was 69 years (median 70 years) and the range was from 44 to 83 years. The 1,605 Spanish control individuals (737 men and 868 femen) were approached at the University Hospital in Zaragoza, and the men were prostate cancer free at the time of recruitment. Study protocols were approved by the Institutional Review Board of Zaragoza University Hospital . All subjects gave written informed consent.
Chicago study population. The Chicago study population used consisted of 1,560 prostate cancer cases. The cases were recruited from the Pathology Core of Northwestern University's Prostate Cancer Specialized Program of Research Excellence (SPORE) from May 2002 to May 2009. The average age at diagnosis for the patients was 60 years (median 59 years) and the range was from 39 to 87 years. The 1,172 European American controls (781 men and 391 femen) were recruited as healthy control subjects for genetic studies at the University of Chicago and
Northwestern University Medical School, Chicago, US. All individuals from Chicago included in this report were of self-reported European descent. Study protocols were approved by the Institutional Review Boards of Northwestern University and the University of Chicago. All subjects gave written informed consent.
Romanian study population. The Romanian study population used in this study consisted of 362 prostate cancer cases. The cases were recruited from the Urology Clinic "Theodor Burghele" of The University of Medicine and Pharmacy "Carol Davila" Bucharest, Romania, from May 2008 to November 2009. All patients were of self- re ported European descent. Clinical information including age at onset, grade and stage were obtained from medical records at the hospital. The average age at diagnosis for the cases was 70 years (median 71 years) and the range was from 46 to 89 years. The 182 Romanian controls were recruited at the General Surgery Clinic "St.
Mary" and at the Urology Clinic "Theodor Burghele" of The University of Medicine and Pharmacy "Carol Davila" Bucharest, Romania. The average age for controls was 60 years (median 62 years) with a range from 19 to 87 years. The controls were cancer free at the time of recruitment. PSA values were tested for men . Study protocols were approved by the National Ethical Board of the Romanian Medical Doctors Association in Romania . All subjects gave written informed consent.
Genotyping
As a part of ongoing research projects at deCODE, 38,541 Icelandic individuals have been successfully genotyped with either the Infinium HumanHap300 or the 370K SNP chip (Illumina, San Diego, CA, USA), containing haplotype tagging SNPs derived from phase I of the International HapMap project. After quality control, 304,070 SNPs were available for the GWAS of PSA levels. Any samples with a call rate below 98% were excluded from the analysis. Single SNP genotyping of the PSA follow-up samples from Iceland and the UK and the prostate cancer case-control groups from The Netherlands, Spain, Romania, and Chicago was carried out by deCODE Genetics in Reykjavik, Iceland, applying the Centaurus (Nanogen) platform. The quality of each Centaurus SNP assay was evaluated by genotyping each assay in the CEU and/or YRI HapMap samples and comparing the results with the HapMap publicly released data. Assays with > 1.5% mismatch rate were not used and a linkage disequilibrium (LD) test was used for markers known to be in LD.
Association testing of quantitative traits
PSA level
Two populations were used to study PSA levels; Iceland and UK. To study PSA levels among unaffected men in Iceland, we excluded subjects who had been diagnosed with prostate cancer as recorded by the ICR (between 1955 and 2008) or were known to have undergone TURP between 1983 and 2008. PSA levels were corrected for age at measurement for each center separately, using a generalized additive model with a smooth component on the age. Also, the PSA levels were standardized so that they had a normal distribution, using a quantile
standardization . Most subjects had more than two PSA measurements. Hence, we used the mean of the adjusted and standardized PSA values for each individual.
For each SNP a classical linear regression using the genotype as an additive covariate and PSA as a response, was fitted to test for association. In addition to testing the standardized value, we also performed an analysis using log-transformed values which we then back-transformed to report the effect under a multiplicative model. We report significance levels based on the standardized values and the association effect based on both the standardized value and under the multiplicative model.
PSA measurements exist for many more Icelandic individuals than those who have been genotyped using an Illumina SNP chip. We used the available genotype information on the relatives of individuals who had not been genotyped in order to extract more information on association from our data (in-silico genotyping) . In total we had access to PSA levels of 4,620 individuals genotyped on Illumina chips, all containing the 317K HumanHap SNP panel. The analysis was augmented with data from 9,218 Icelanders with PSA measurements whose genetic information could be partially inferred from genotyped relatives that belong to the set of the 38,541 chip typed Icelanders. This augmentation is equivalent to an additional 2,918 individuals. We have previously applied this method to the analysis of height and details can be found in a recent publication (Gudbjartsson, D.F. et al. Nat Genet. 40 : 609-15 (2008)) . After the initial scan, we followed-up the top markers, using 1,919 men genotyped with Centaurus single track assay. Our final analysis eventually included all genotype data, derived from : chip -, single-track-, and in-silico genotyping. To study PSA levels in the UK samples, we used 454 men with a single PSA measurement with a value between 0 and 3 ng/ml from the ProtecT trial and directly genotyped with Centaurus single track assay. Measurements were standardized and adjusted for age at measurement and center.
To calculate a combined significance for Iceland and the UK, we performed a two degree of freedom test on the sum of the individual χ2 values. To model the genotypic effect of SNPs on PSA level in each population, we use the estimated allelic effect based on the multiplicative model within each locus (see above) and assume Hardy-Weinberg equilibrium. When combining the effect of multiple SNPs, we assume linkage equilibrium between loci and use a multiplicative model. When performing a case only analysis among prostate cancer patients of the six populations to study the association between SNPs and age at diagnosis, we use a linear regression with age at diagnosis as response and the allele count as an additive covariate.
Association testing of binary traits
For case control association analysis, for example when comparing prostate cancer cases, benign prostatic hyperplasia cases or biopsied individuals to population controls and within group comparisons (aggressive vs. non-aggressive, biopsy pos. vs. biopsy neg.), we used a standard likelihood ratio statistic, implemented in the NEMO software to calculate two-sided P values for each individual allele, assuming a multiplicative model for risk (Greta rsdottir, S. et a/. Nat Genet 35 : 131-8 (2003)) . Combined significance levels were calculated using a Mantel-Haenszel model. Heterogeneity was examined using a likelihood ratio test by comparing the null hypothesis of the effect being the same in all populations to the alternative hypothesis of each population having a different effect.
Finemapping of the six PSA associated loci
To investigate further the top six loci from the GWAS, we analyzed the association of imputed genotypes based on HapMap CEU for a window of 500Kb centered on the most significant SNP at each loci. For the individuals directly genotyped on chip, SNP imputation was based on the Phase II CEU HapMap samples and was done using IMPUTE. Association testing was performed using a logistic regression with the allele count as a covariate. For a given locus, we performed multivariate analysis using genotypes from different SNPs as covariates and standardized and corrected PSA value as the response to adjust the association of one SNP for the other SNP.
EXAMPLE 4
We investigated the observed correlation of surrogate markers with PSA levels. For this purpose, genotypes for surrogates of the markers rs401681, rs2736098, rsl0788160, rsll067228, rsl0993994, rs4430796, rs2735839 and rsl7632542 were imputed based on the 1000 genomes data set (http://www.1000genomes.org) . All the surrogates were selected using a cutoff of r2 > 0.2 (see Table 1) . Results are shown in Table 13. As can be seen, all the surrogate markers are significantly associated with PSA levels, showing that these markers can all be useful for assessing the effect of genetic variants on PSA levels.
Table 13. Association of surrogate markers with PSA levels. Genotypes were imputed in the Icelandic sample set using data from the 1000 Genomes project. Shown are marker identity, chromosome, position of marker in NCBI Build 36, alleles, minor allele frequency in controls, number of imputed cases, predicted effect (in fraction of standard deviation of the distribution), P-value of the association, information content, identities of alleles predicted to be associated with decreased and increased PSA levels, respectively, and the SEQ ID NO for the marker.
Figure imgf000118_0001
Figure imgf000119_0001
Figure imgf000120_0001
Figure imgf000121_0001
Figure imgf000122_0001
EXAMPLE 5
We assessed what fraction of 12,779 PSA measurements from 4,569 Icelandic men would be reclassified, with respect to certain PSA cut-off value, after correcting them for four PSA sequence variants, located near TERT, FGFR2 TBX3 and KLK3 (rs2736098, rsl0788160, rsll067228, and rsl7632542, respectively) . For a PSA cut-off value of 4ng/ml, 6.0% of the men had at least one PSA measurement reclassified; 3.0% moved from below to above the cutoff value and 3.0% moved in the opposite direction . The results for a cut-off value of 3ng/ml were similar, 6.9% of the men had at least one PSA measurement reclassified; 3.1% moved from below to above the cut-off value and 3.8% moved in the opposite direction (Table 14) . If applied clinically, these men would be reclassified with respect to whether or not they should undergo a biopsy.
Table 14. Reclassification after enetic correction of PSA levels
Figure imgf000123_0001
Figure imgf000123_0002
Shown are the number of measurements (n = 12,779) from 4,569 Icelandic men
before and after genetic correction, using combined estimates for the four PSA
variants (rs2736098, rsl0788160, rsll067228, and rsl7632542), discussed in
the main text, a) number of measurements that are reclassified with respect to a
PSA cut-off value of 3 ng/ml; 143 unique persons (3.1% of the 4,569) have at
least one measurement that is below 3 before correction and above 3 after
correction and 172 unique persons (3.8% of the 4,569) have at least one
measurement that is above 3 before correction and below 3 after correction, b)
number of measurements that are reclassified with respect to a PSA cut-off value
of 4 ng/ml; 135 unique persons (3.0% of the 4,569) have at least one
measurement that is below 4 before correction and above 4ng/ml after correction
and 138 unique persons (3.0% of the 4,569) have at least one measurement that
is above 4 ng/ml before correction and below 4ng/ml after correction. EXAMPLE 6
Discriminatory power of biopsy outcome models
We calculated the area under the receiver-operating-characteristic curve (AUC) to assess the discriminatory power of four models on the outcome of performing a biopsy of the prostate. The four models included the following data : model-1) PSA levels, model-2) the combined prostate cancer risk estimates of 23 established sequence variants, model-3) genetic correction of PSA values based on the sequence variants at the four PSA loci (5pl5, 10q26, 12q24 and 19q33.3) discussed above, model-4) the PSA levels corrected for sequence variants and the combined risk estimates of the 23 prostate cancer risk variants. In the analyses of the models, we used 415 Icelandic and 1,291 British men with information on biopsy outcome (i .e. biopsy positive or biopsy negative) and PSA levels, as well as genotypes for 23 established prostate cancer variants and the PSA variants reported above. Biopsy outcome risk models
Iceland
To assess biopsy outcome risk models we selected Icelandic men with a biopsy report and chip genotyped. In addition we required that the individual have an available PSA measurement in the six months preceding the biopsy and furthermore the individual should not have undergone TURP prior to the biopsy. For individuals with multiple biopsies with only negative outcomes (i.e., no cancer detected) we use the first available event. For individuals with multiple biopsies including one with a positive outcome (ie. cancer detected) we use that event. In total 415 individuals fulfills these criteria, 194 of which had a negative biopsy and 221 had a positive biopsy. The median of the PSA level among the 194 biopsy negative men was 8.85 (1st quartile=6.28, 3rd quartile= 13.35) . The median of the PSA level among the 221 biopsy positive men was 14.00 (1st quartile=8.90, 3rd quartile=25.20) .
UK
To assess biopsy outcome risk models we selected men from the ProtecT trial in the UK with a biopsy report and genotyped using a Centaurus single track assay. We selected men with a PSA between 3 and 10. In total 1291 individuals fulfills these criteria, 948 of which had a negative biopsy and 343 had a positive biopsy. The median of the PSA level among the 948 biopsy negative men was 4.10 (1st quartile=3.50, 3rd quartile=5.10) . The median of the PSA level among the 343 biopsy positive men was 4.50 (1st quartile=3.60, 3rd quartile=6.23) . Variables in the models The variables included in the models are (1) PSA value, (2) prostate cancer multi-marker genetic risk prediction and (3) PSA with genetic correction . To calculate the prostate cancer multi-marker genetic risk prediction for each individual we use published estimates of the allelic frequencies and effects of 23 markers associated with prostate cancer (list of SNPs: rsl0086908, rsl0486567, rsl0896450, rsl0934853, rsl0993994, rsl2621278, rsl447295, rsl512268, rsl6901979, rsl6902104, rsl859962, rs2660753, rs2710646, rs4430796, rs445114, rs5759167, rs5945572, rs6465657, rs6983267, rs7127900, rs7679673, rs8102476, rs9364554) . We then calculate the corresponding relative risk for each genotype under the assumption of a multiplicative model at each locus and combine the relative risks for each individual assuming a multiplicative model between loci.
To assess a PSA level after genetic correction we divide the measured PSA level with the predicted combined genetic relative effect. In Iceland and UK separately we calculated the combined genetic effect using the genotypic effects for each SNP as estimated in each population (see Table S3) and combined them assuming a multiplicative model. We selected four markers that predominantly affect PSA excluding the MSMB and HNF1B loci for which we suspect that the association is primarily to prostate cancer (rsl0788160, rsl l067228, rsl7632542, and rs2736098) .
We fit four logistic regression models, one for each of the three variables described above (PSA value, prostate cancer genetic risk prediction and PSA value with genetic correction) and one combing the prostate cancer genetic risk prediction and PSA with genetic correction .
We use ROC curves and calculate the area under the curve (AUC) to assess the discriminative ability of each model. Each point in the ROC curve shows the effect of a rule for turning a risk estimate into a prediction of the biopsy outcome.
Results
The model with genetic correction of PSA levels (model-3) has an AUC of 70.9% and 58.5% in Iceland and UK, respectively (Fig. 3) . When compared to model-1, which has an AUC of 70.4% and 57.1% in Iceland and UK, respectively, the inclusion of PSA levels corrected for sequence variants (model-3) increases the discriminatory power by 0.5 and 1.4 percentage points in Iceland and UK, respectively. However, of the four models assessed, model-4 has the greatest discriminatory power; with an AUC of 73.2% and 63.6% in Iceland and UK, respectively.
Compared to model-1 the increased AUC of model-4 is 2.8 and 6.5 percentage points in Iceland and UK, respectively. Hence, the most gain in discriminatory power is achieved by including both the 23 prostate cancer risk variants and the genetic correction of PSA levels. However, in order to better assess the effect of the PSA and prostate cancer risk variants on PSA-based biopsies this type of modeling would have to be done in a population where biopsies are done systematically, irrespective of individual PSA levels, similar to what was done in the PCPT study(3) . Nevertheless, the results indicate that genetic correction of PSA levels lead to improved specificity of the models.

Claims

1. A method of determining corrected PSA quantity in a human individual, the method comprising :
(a) Obtaining data identifying an uncorrected PSA quantity in a first biological sample from the human individual;
(b) Analyzing sequence data about at least one polymorphic marker from the first biological sample or a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans; and
(c) Determining a corrected PSA quantity in the human individual based on the
sequence data about the at least one polymorphic marker.
2. The method of claim 1, wherein analyzing sequence data comprises determining the presence or absence of at least one allele of the at least one polymorphic marker.
3. The method of claim 1 or claim 2, wherein analyzing sequencing data comprises
determining the identity of both alleles of the at least one polymorphic marker in the genome of the individual .
4. The method of any one of the preceding claims, wherein the sequence data is nucleic acid sequence data obtained from the first biological sample or a second biological sample containing nucleic acid from the human individual .
5. The method of claim 4, wherein the nucleic acid sequence data is obtained using a
method that comprises at least one procedure selected from :
(i) amplification of nucleic acid from the first or second biological sample;
(ii) hybridization assay using a nucleic acid probe and nucleic acid from the first or second biological sample;
(iii) hybridization assay using a nucleic acid probe and nucleic acid obtained by amplification of nucleic acid from the first or second biological sample; and
(iv) high-throughput sequencing.
6. The method of any one of the preceding claims, wherein the sequence data is obtained from a preexisting record.
7. The method of any one of the preceding claims, wherein the data identifying an uncorrected PSA quantity is determined in a blood sample from the individual.
8. The method of claim 7, wherein the determination is performed using an antibody test for PSA.
8. The method of any one of the preceding claims, wherein at least one allele of the at least one marker is predictive of an increased quantity of PSA in humans.
9. The method of claim 8, wherein the determining of corrected PSA quantity comprises adjusting uncorrected PSA quantity based on the predicted effect of the at least one allele on PSA quantity in humans.
10. The method of any one of the preceding claims, wherein the at least one polymorphic marker is a biallelic marker.
11. The method of any one of the preceding claims, wherein the at least one polymorphic marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith.
12. The method of any one of the preceding claims, wherein determination of the presence of an allele selected from the group consisting of the C allele of rs401681, the A allele of rs2736098, the A allele of rsl0788160, the T allele of rsl0993994, the A allele of rsl l067228, the A allele of rs4430796, the G allele of rs2735839 and the T allele of rsl7632542 is indicative of elevated PSA quantity in the individual.
13. The method of any one of the preceding claims, wherein determination of the presence of an allele selected from the group consisting of the T allele of rs401681, the G allele of rs2736098, the G allele of rsl0788160, the C allele of rsl0993994, the G allele of rsl l067228, the G allele of rs4430796, the A allele of rs2735839 and the C allele of rsl7632542 is indicative of reduced PSA quantity in the individual.
14. The method of claim 11, wherein markers in linkage disequilibrium with rs2736098 are selected from the group consisting of rs2735845, rs31484, rs401681, s.1030492, s.1233724, s.1251946, s.1257345, s.1258032, s.1292191, s.1334730, s.1407682, s.1426206, s.1426336, s.1428371, s.1428373, s.1472454, s.1518154, s.1557827, rsl l743119, s.1583465, rs4551123, s.1589581, s.1591616, s.1607388, rs6893515, s.1618305, s.1621550, s.1621551, rs6892057, s.1638061, rs6898387, rs7724451, rs2937006, s.1663985, s.1667254, s.1668831, s.1673499, s.1737379, s.1756873, s.1782909, s.1788485, s.1799150, s.1800043, s.1804565, s.1812409, s.886453 and s.887600.
15. The method of claim 11, wherein markers in linkage disequilibrium with rsl0788160 are selected from the group consisting of rslll99892, rsll593067, s.122837469, rs2130779, s.122876448, s.122901140, s.122901142, s.122905335, rsl0788149, rsl0749408, rs2172071, rsll592107, rsl907218, rsl907220, rsl994655, rsl907221, rsl907225, rsl907226, rsl0749409, rslll99835, s.122991926, rs729014,
s.122993518, s.122994309, s.122994946, rsl873450, rs2901290, s.122998594, s.122998678, s.122998978, rs2201026, rs4237529, s.122999386, rsl873451, rsl873452, rs4752520, rsl0886880, rsl0749412, s.123008216, rs3925042, rsll25527, rsll25528, rs4319451, rsl0788154, rs7081844, rs7076500, s.123011774,
s.123011879, rslll99862, s.123014171, rsl2146156, s.123014499, s.123014519, rsl2146366, s.123014684, rs7091083, rs7074985, rs7915008, s.123015342, s.123015365, rsl0749413, rslll99866, s.123016003, rs7923130, rs7922901, rsl0886882, rsl0886883, rslll99867, s.123017698, s.123018111, rs4393247, s.123018188, rs4489674, rslll99868, s.123018670, s.123019408, s.123019759, rslll99869, s.123020245, s.123020365, rsl0886885, rsl0788159, rsl0886886, rslll99871, rslll99872, rsl2761612, rs4575197, rslll99874, rsl0886887, s.123023625, s.123023836, rs4465316, rs4468286, rsl0886890, rsl0788162, s.123028135, rsl2413648, s.123029102, rsl0788163, s.123031617, s.123031811, rsl0788164, rsll598592, rsl0788165, rs9630106, rsl0886893, s.123034821, rslll99879, rslll99881, rsl2415826, rsl0788166, rsl0886894, rsl0886895, rsl0886896, rsl0886897, rsl0886898, rsl0886899, rsl0886900, rsl0886901, rsl0886902, rsl0886903, rsl2413088, rsl0788167, s.123047182, rs7085073, rs7071101, rsl2570783, rslll99884, rs7085506, rsl0886905, rsl0736302, s.123061811, s.123062031, rslll99886, s.123063327, s.123063715, rsl0886907, s.123064252, s.123064345, s.123064780, s.123064783, s.123066424, s.123066700, rs3981043, rslll99896, rslll99897, rslll99898, s.123067963, rslll99900, rslll99901, s.123068178, s.123068222, s.123068236, s.123068424, s.123068619, s.123068743, s.123068926, s.123068997, s.123069012, s.123069326, s.123069570, s.123069989, s.123070105, s.123071090, s.123071347, rs4254007, s.123071495, s.123071914, s.123072804, rs7900630, s.123074016, rsl896416, s.123074531, s.123074928, s.123076274, s.123076472, rs2420925, s.123077398, s.123077455, rsl2779205, rslll99912, rs4752534, s.123078389, rsl896420, rsl896419, s.123079199, s.123081990, s.123081993, s.123081998 and s.123201870.
16. The method according to claim 11, wherein markers in linkage disequilibrium with
rsll067228 are selected from the group consisting of rsl2820376, s.113576401, s.113582477, s.113584188, s.113584539, s.113585097, rsl2819162, rsll609105, rs514849, rs513061, s.113590733, rsl061657, rs8853, rs3741698, s.113594635, rs567223, rs551510, rs59336, s.113601412, rs515746, rs545076 and s.113614584.
17. The method according to claim 11, wherein markers in linkage disequilibrium with rsl0993994 are selected from the group consisting of s.51157005, s.51159221, rs35716372, s.51159373, s.51159376, s.51159399, s.51159786, rs4935090, rsl2781411, s.51162137, s.51162792, s.51162795, rsll004246, s.51165690, rsl l004324, rs2843562, rsl l004409, rsl l004415, rsl l004422, s.51168415, rsl l004435, rsl l599333, s.51170094, s.51170307, rsl2763717, rs67289834, s.51172442, s.51172558, rs57858801, s.51172618, s.51172808, s.51173184, rs7071471, rs7090326, s.51173565, s.51173983, s.51174391, s.51174499, s.51174610, s.51174944, s.51175013, s.51175409, s.51176290, s.51176963, s.51180209, rsl0825652, s.51180819, rs2843560, rs2125770, rs2611513, rs2611512, rs2611509, s.51186305, rs2926494, rs2611508, rs2611507, s.51188694, rs2611506, rs57263518, s.51189522, rs3101227, rs2843549, rs2843550, rs2249986, rs2843551, s.51192126, rs7077830, s.51193219, rs2843554, s.51194280, rs2611489, rs3123078, rs4935162, rs7081532, rsl0826075, rs7896156, s.51199599, rs6481329, rs7910704, rs4554834, rsl0826125, rsl0826127, rs4486572, rs4581397, rs4630240, rs7920517, rs4630241, rs9787697, rsl0763534, rsl0763536, s.51205998, rsl0763546, s.51206890, rs4131357, s.51207437, s.51207481, s.51208175, rsl l006207, rsl0763576, s.51208921, rsl l593361, rsl0763588, rsl l006274, s.51210619, s.51210866, rs4630243, rs4512771, rs4306255, s.51213076, rs4631830, rs7075009, rs7098889, rs4304716, s.51214689, s.51214690, rs7477953, s.51215034, s.51216121, s.51216342, rs7075697, s.51219226, s.51219227, s.51219230, s.51219320, s.51221179 and rs2012677.
18. The method according to claim 11, wherein markers in linkage disequilibrium with
rs4430796 are selected from the group consisting of rs757210, rs7213769, rsl016990, rsl7626423, rs3744763, rs7405776, rs2005705, s.33170591, rsl l263761, rs4239217, rsl l651755, rsl0908278, s.33174083, rsl l657964, rs7501939, rs8064454,
s.33175746, s.33176039, rs7405696, rsl l651052, rsl l263763, rsl l658063, rs9913260, rs3760511 and s.33182344.
19. The method according to claim 11, wherein markers in linkage disequilibrium with
rsl7632542 are selected from the group consisting of rs273622, s.55554247, s.55566277, s.55582344, rs2546552, s.55596785, s.55597645, s.55598078, s.55600121, s.55605246, s.55606024, s.55607242, s.55624341, s.55630396, s.55630578, s.55630679, s.55630791, s.55631170, s.55632347, s.55632363, s.55636052, s.55637350, s.55640040, s.55646568, s.55649132, s.55650629, s.55650844, s.55652397, s.55653401, s.55653991, s.55654907, s.55657973, s.55659043, s.55660011, s.55660013, s.55660139, s.55660143, s.55661660, s.55661718, rs6509476, s.55664020, s.55664897, s.55665723, s.55665726, s.55672641, s.55673254, s.55674252, s.55674254, s.55674727, s.55676073, s.55683393, s.55687122, s.55695317, s.55697027, s.55701748, rs7257447, s.55702308 , s.55703568, s.55706751, s.55708051, s.55709067, s.55709498, s.55709766 , s.55710030, s.55710848, s.55710851, s.55711749, s.55712802, s.55713451 , s.55713453, s.55713458, s.55713862, s.55716007, s.55718272, s.55723496 , s.55724346, s.55726794, s.55729556, s.55729562, s.55729563, s.55731588 , s.55733658, s.55741403, s.55743524, s.55745833, s.55746123, s.55747079 , s.55748269, s.55748274, s.55748844, s.55749193, s.55752178, s.55752271 , s.55770158, rs7247686, s.55771401, s.55772266, s.55775314, s.55778756 , s.55788661, s.55790622, s.55791942, rsl0413426, s.55798366, s.55818900 , s.55822129, s.55825528, s.55825624, s.55833489, s.55833938, s.55848124 , s.55848125, s.55849044, s.55857289, s.55857585, s.55861107, s.55861111 , s.55861196, s.55862851, s.55865439, s.55867208, s.55867650, s.55868902 , s.55870429, rs73598616, s.55874339, s.55875249, s.55875725, s.55881262 , s.55882788, s.55883542, s.55886467, s.55887498, s.55889175, s.55892113 , s.55892618, s.55892866, s.55893305, s.55896443, s.55896826, s.55898241 , s.55898245, s.55899120, s.55900597, s.55900764, s.55912567, s.55914840 , s.55915776, s.55936192, s.55940336, s.55946316, s.55949971, s.55955333 , s.55962188, s.55963864, s.55969754, s.55979135, rs67367861, s.55989580 , s.56004001, s.56006528, s.56012046, s.56013739, rs2411330, rs3212825, s.56018053 , s.56019106, rs7246740, s.56025860, s.56026713, rs55786312, s.56026881 , s.56026882, s.56027319, s.56029265, s.56029362, s.56032778, s.56032963 , s.56032964, s.56033138, s.56033138, s.56033664, s.56033664, s.56036363 , s.56037076, s.56037076, s.56038334, s.56038334, s.56039736, s.56042100 , s.56042603, s.56042603, rs2659124, rs2659124, s.56046798, rs266878, rs266878, rs 174776, rsl74776, s.56052630, s.56052630, s.56052652, s.56052652, s.56053983 , s.56054527, s.56054527, rsl058205, rsl058205, rs2569735, rs2569735, rs2735839, rs62113216, rs62113216, s.56058308, s.56058606, s.56058688, s.56058866 , s.56060000, s.56061277, s.56062250, s.56066550, s.56066560, s.56066619 , s.56067024, s.56067024, rs73592873, s.56076121, s.56076122, s.56078845 , s.56085550, s.56093594 and s.56472259.
20. The method of any one of the preceding claims, wherein the uncorrected PSA quantity is determined using an assay that comprises at least one antibody selective for PSA.
21. The method of any one of the preceding claims, further comprising a step of preparing a report containing results from the determination of corrected PSA quantity, wherein said report is written in a computer readable medium, printed on paper, or displayed on a visual display.
22. A method of diagnosis of prostate cancer in a human individual, the method comprising :
(a) Detecting an uncorrected PSA quantity in a first biological sample from the human
individual; (b) Obtaining sequence data about at least one polymorphic marker in the first biological sample or in a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans;
(c) Determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker;
(d) Determining whether the corrected PSA quantity is greater than normal PSA quantity in humans;
(e) Performing a further diagnostic evaluation procedure selected from the group consisting of rectal ultrasound imaging and prostate biopsy on the individual if the corrected PSA quantity is determined to be greater than normal PSA quantity in humans; wherein determination of a positive outcome of the ultrasound imaging or prostate biopsy is indicative of prostate cancer in the individual.
23. The method of claim 22, wherein the obtaining sequence data comprises determining the presence or absence of at least one allele of the at least one polymorphic marker.
24. The method of claim 22 or claim 23, wherein the obtaining sequencing data comprises determining the identity of both alleles of the at least one polymorphic marker in the genome of the individual .
25. The method of any one of the claims 22 to 24, wherein the sequence data is nucleic acid sequence data obtained from a first biological sample containing nucleic acid from the human individual .
26. The method of claim 22, wherein the nucleic acid sequence data is obtained using a method that comprises at least one procedure selected from :
(i) amplification of nucleic acid from the biological sample;
(ii) hybridization assay using a nucleic acid probe and nucleic acid from the biological sample;
(iii) hybridization assay using a nucleic acid probe and nucleic acid obtained by amplification of nucleic acid from the biological sample; and
(iv) high-throughput sequencing.
27. The method of any one of the claims 22 to 26, wherein the sequence data is obtained from a preexisting record.
28. The method of any one of the claims 22 to 26, wherein PSA quantity is determined in a blood sample from the individual.
29. The method of any one of the claims 22 to 28, wherein normal PSA quantity in humans is less than l .Ong/mL, less than 1.5ng/mL, less than 2.0ng/mL, less than 2.5ng/mL, less than 3.0ng/mL, less than 3.5ng/mL, less than 4.0ng/mL, less than 5.0ng/mL or less than lOng/mL of serum .
30. The method of any one of the claims 22 to 29, wherein at least one allele of the at least one marker is predictive of an increased quantity of PSA in humans, and wherein at least one other allele of the at least one marker is predictive of a decreased quantity of PSA in humans.
31. The method of claim 30, wherein the determining of corrected PSA quantity comprises adjusting uncorrected PSA quantity based on the predicted effect of the at least one allele on PSA quantity in humans.
32. The method of any one of the claims 22 to 31, wherein the at least one polymorphic marker is a biallelic marker, and wherein one allele of the at least one marker correlates with elevated PSA levels in humans and the other allele correlates with reduced PSA levels in humans.
33. The method of any one of the claims 22 to 32, wherein the at least one polymorphic marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith.
34. The method of any one of the claims 22 to 33, wherein determination of the presence of an allele selected from the group consisting of the C allele of rs401681, the A allele of rs2736098, the A allele of rsl0788160, the T allele of rsl0993994, the A allele of rsl l067228, the A allele of rs4430796, the G allele of rs2735839 and the T allele of rsl7632542 is indicative of elevated PSA quantity in the individual.
35. The method of any one of the claims 22 to 34, wherein determination of the presence of an allele selected from the group consisting of the T allele of rs401681, the G allele of rs2736098, the G allele of rsl0788160, the C allele of rsl0993994, the G allele of rsl l067228, the G allele of rs4430796, the A allele of rs2735839 and the C allele of rsl7632542 is indicative of reduced PSA quantity in the individual.
36. The method of claim 33, wherein markers in linkage disequilibrium with rs2736098 are selected from the group consisting of rs2735845, rs31484, rs401681, s.1030492, s.1233724, s.1251946, s.1257345, s.1258032, s.1292191, s.1334730, s.1407682, s.1426206, s.1426336, s.1428371, s.1428373, s.1472454, s.1518154, s.1557827, rsll743119, s.1583465, rs4551123, s.1589581, s.1591616, s.1607388, rs6893515, s.1618305, s.1621550, s.1621551, rs6892057, s.1638061, rs6898387, rs7724451, rs2937006, s.1663985, s.1667254, s.1668831, s.1673499, s.1737379, s.1756873, s.1782909, s.1788485, s.1799150, s.1800043, s.1804565, s.1812409, s.886453 and s.887600.
The method of claim 33, wherein markers in linkage disequilibrium with rsl0788160 are selected from the group consisting of rslll99892, rsll593067, s.122837469, rs2130779, s.122876448, s.122901140, s.122901142, s.122905335, rsl0788149, rsl0749408, rs2172071, rsll592107, rsl907218, rsl907220, rsl994655, rsl907221, rsl907225, rsl907226, rsl0749409, rslll99835, s.122991926, rs729014,
s.122993518, s.122994309, s.122994946, rsl873450, rs2901290, s.122998594, s.122998678, s.122998978, rs2201026, rs4237529, s.122999386, rsl873451, rsl873452, rs4752520, rsl0886880, rsl0749412, s.123008216, rs3925042, rsll25527, rsll25528, rs4319451, rsl0788154, rs7081844, rs7076500, s.123011774,
s.123011879, rslll99862, s.123014171, rsl2146156, s.123014499, s.123014519, rsl2146366, s.123014684, rs7091083, rs7074985, rs7915008, s.123015342, s.123015365, rsl0749413, rslll99866, s.123016003, rs7923130, rs7922901, rsl0886882, rsl0886883, rslll99867, s.123017698, s.123018111, rs4393247, s.123018188, rs4489674, rslll99868, s.123018670, s.123019408, s.123019759, rslll99869, s.123020245, s.123020365, rsl0886885, rsl0788159, rsl0886886, rslll99871, rslll99872, rsl2761612, rs4575197, rslll99874, rsl0886887, s.123023625, s.123023836, rs4465316, rs4468286, rsl0886890, rsl0788162, s.123028135, rsl2413648, s.123029102, rsl0788163, s.123031617, s.123031811, rsl0788164, rsll598592, rsl0788165, rs9630106, rsl0886893, s.123034821, rslll99879, rslll99881, rsl2415826, rsl0788166, rsl0886894, rsl0886895, rsl0886896, rsl0886897, rsl0886898, rsl0886899, rsl0886900, rsl0886901, rsl0886902, rsl0886903, rsl2413088, rsl0788167, s.123047182, rs7085073, rs7071101, rsl2570783, rslll99884, rs7085506, rsl0886905, rsl0736302, s.123061811, s.123062031, rslll99886, s.123063327, s.123063715, rsl0886907, s.123064252, s.123064345, s.123064780, s.123064783, s.123066424, s.123066700, rs3981043, rslll99896, rslll99897, rslll99898, s.123067963, rslll99900, rslll99901, s.123068178, s.123068222, s.123068236, s.123068424, s.123068619, s.123068743, s.123068926, s.123068997, s.123069012, s.123069326, s.123069570, s.123069989, s.123070105, s.123071090, s.123071347, rs4254007, s.123071495, s.123071914, s.123072804, rs7900630, s.123074016, rsl896416, s.123074531, s.123074928, s.123076274, s.123076472, rs2420925, s.123077398, s.123077455, rsl2779205, rslll99912, rs4752534, s.123078389, rsl896420, rsl896419, s.123079199, s.123081990, s.123081993, s.123081998 and s.123201870.
38. The method according to claim 33, wherein markers in linkage disequilibrium with rsl l067228 are selected from the group consisting of rsl2820376, s.113576401, s.113582477, s.113584188, s.113584539, s.113585097, rsl2819162, rsl l609105, rs514849, rs513061, s.113590733, rsl061657, rs8853, rs3741698, s.113594635, rs567223, rs551510, rs59336, s.113601412, rs515746, rs545076 and s.113614584.
39. The method according to claim 33, wherein markers in linkage disequilibrium with
rsl0993994 are selected from the group consisting of s.51157005, s.51159221, rs35716372, s.51159373, s.51159376, s.51159399, s.51159786, rs4935090, rsl2781411, s.51162137, s.51162792, s.51162795, rsll004246, s.51165690, rsl l004324, rs2843562, rsl l004409, rsl l004415, rsl l004422, s.51168415, rsl l004435, rsl l599333, s.51170094, s.51170307, rsl2763717, rs67289834, s.51172442, s.51172558, rs57858801, s.51172618, s.51172808, s.51173184, rs7071471, rs7090326, s.51173565, s.51173983, s.51174391, s.51174499, s.51174610, s.51174944, s.51175013, s.51175409, s.51176290, s.51176963, s.51180209, rsl0825652, s.51180819, rs2843560, rs2125770, rs2611513, rs2611512, rs2611509, s.51186305, rs2926494, rs2611508, rs2611507, s.51188694, rs2611506, rs57263518, s.51189522, rs3101227, rs2843549, rs2843550, rs2249986, rs2843551, s.51192126, rs7077830, s.51193219, rs2843554, s.51194280, rs2611489, rs3123078, rs4935162, rs7081532, rsl0826075, rs7896156, s.51199599, rs6481329, rs7910704, rs4554834, rsl0826125, rsl0826127, rs4486572, rs4581397, rs4630240, rs7920517, rs4630241, rs9787697, rsl0763534, rsl0763536, s.51205998, rsl0763546, s.51206890, rs4131357, s.51207437, s.51207481, s.51208175, rsl l006207, rsl0763576, s.51208921, rsl l593361, rsl0763588, rsl l006274, s.51210619, s.51210866, rs4630243, rs4512771, rs4306255, s.51213076, rs4631830, rs7075009, rs7098889, rs4304716, s.51214689, s.51214690, rs7477953, s.51215034, s.51216121, s.51216342, rs7075697, s.51219226, s.51219227, s.51219230, s.51219320, s.51221179 and rs2012677.
40. The method according to claim 33, wherein markers in linkage disequilibrium with
rs4430796 are selected from the group consisting of rs757210, rs7213769, rsl016990, rsl7626423, rs3744763, rs7405776, rs2005705, s.33170591, rsl l263761, rs4239217, rsl l651755, rsl0908278, s.33174083, rsl l657964, rs7501939, rs8064454,
s.33175746, s.33176039, rs7405696, rsl l651052, rsl l263763, rsl l658063, rs9913260, rs3760511 and s.33182344.
41. The method according to claim 33, wherein markers in linkage disequilibrium with
rsl7632542 are selected from the group consisting of rs273622, s.55554247, s.55566277, s.55582344, rs2546552, s.55596785, s.55597645, s.55598078, s.55600121, s.55605246, s.55606024, s.55607242, s.55624341, s.55630396, s.55630578, s.55630679, s.55630791, s.55631170, s.55632347, s.55632363, s.55636052 s.55637350 s.55640040, s.55646568 s.55649132 s.55650629, s.55650844 s.55652397 s.55653401, s.55653991 s.55654907 s.55657973, s.55659043 s.55660011 s.55660013, s.55660139 s.55660143 s.55661660, s.55661718 rs6509476, s.55664020, s .55664897, s.55665723, s.55665726, s.55672641 s.55673254 s.55674252, s.55674254 s.55674727 s.55676073, s.55683393 s.55687122 s.55695317, s.55697027 s.55701748 rs7257447, s.55702308 s.55703568 s.55706751, s.55708051 s.55709067 s.55709498, s.55709766 s.55710030 s.55710848, s.55710851 s.55711749 s.55712802, s.55713451 s.55713453 s.55713458, s.55713862 s.55716007 s.55718272, s.55723496 s.55724346 s.55726794, s.55729556 s.55729562 s.55729563, s.55731588 s.55733658 s.55741403, s.55743524 s.55745833 s.55746123, s.55747079 s.55748269 s.55748274, s.55748844 s.55749193 s.55752178, s.55752271 s.55770158 rs7247686, s .55771401, s.55772266, s.55775314, s.55778756 s.55788661 s.55790622, s.55791942 rsl0413426 s.55798366, s.55818900 s.55822129 s.55825528, s.55825624 s.55833489 s.55833938, s.55848124 s.55848125 s.55849044, s.55857289 s.55857585 s.55861107, s.55861111 s.55861196 s.55862851, s.55865439 s.55867208 s.55867650, s.55868902 s.55870429 rs73598616, s.55874339 s.55875249 s.55875725, s.55881262 s.55882788 s.55883542, s.55886467 s.55887498 s.55889175, s.55892113 s.55892618 s.55892866, s.55893305 s.55896443 s.55896826, s.55898241 s.55898245 s.55899120, s.55900597 s.55900764 s.55912567, s.55914840 s.55915776 s.55936192, s.55940336 s.55946316 s.55949971, s.55955333 s.55962188 s.55963864, s.55969754 s.55979135 rs67367861, s.55989580 s.56004001 s.56006528, s.56012046 s.56013739 rs2411330, rs3212825, s.56018053 s.56019106 rs7246740, s .56025860, s.56026713, rs55786312, s.56026881 s.56026882 s.56027319, s.56029265 s.56029362 s.56032778, s.56032963 s.56032964 s.56033138, s.56033138 s.56033664 s.56033664, s.56036363 s.56037076 s.56037076, s.56038334 s.56038334 s.56039736, s.56042100 s.56042603 s.56042603, rs2659124, rs2659124, s.56046798, rs266878, rs266878, rs 174776, rs 174776, s.56052630, s.56052630, s.56052652, s.56052652, s.56053983 s.56054527 s.56054527, rsl058205, rsl058205, rs2569735, rs2569735, rs2735839, rs62113216, rs62113216, s.56058308, s.56058606, s.56058688, s.56058866 , s.56060000 s.56061277, s.56062250, s.56066550, s.56066560, s.56066619 s.56067024 s.56067024, rs73592873, s.56076121, s.56076122, s.56078845 s.56085550 s.56093594 and s.56472259.
42. The method of any one of the claims 22 to 41, further comprising a step of preparing a report containing results from the determination of corrected PSA quantity, wherein said report is written in a computer readable medium, printed on paper, or displayed on a visual display.
43. A method of diagnosis of prostate cancer, the method comprising :
Analyzing corrected PSA quantity of a human individual, wherein said corrected PSA quantity is obtained by a method as set forth in any one of the claims 1 to 20; wherein if the corrected PSA quantity of the human individual is determined to be greater than normal PSA quantity in humans, a further diagnostic evaluation selected from the group consisting of rectal ultrasound imaging and prostate biopsy is performed; and wherein determination of a positive outcome of the further diagnostic evaluation is indicative of prostate cancer in the individual .
44. The method according to claim 43, wherein normal PSA quantity in humans is determined in individuals not diagnosed with prostate cancer.
45. The method according to claim 43 or claim 44, wherein normal PSA quantity in humans is less than l .Ong/mL, less than 1.5ng/mL, less than 2.0ng/mL, less than 2.5ng/mL, less than 3.0ng/mL, less than 3.5ng/mL, less than 4.0ng/mL, less than 5.0ng/mL or less than lOng/mL serum.
46. A method of determining a susceptibility to prostate cancer, the method comprising : analyzing nucleic acid sequence data from a human individual for at least one polymorphic marker selected from the group consisting of rsl7632542, and markers in linkage disequilibrium therewith, wherein different alleles of the at least one polymorphic marker are associated with different susceptibilities to prostate cancer in humans, and determining a susceptibility to prostate cancer from the nucleic acid sequence data.
47. The method of claim 46, wherein the nucleic acid sequence data is obtained from a
biological sample containing nucleic acid from the human individual .
48. The method of claim 47, wherein the nucleic acid sequence data is obtained using a method that comprises at least one procedure selected from :
(i) amplification of nucleic acid from the biological sample;
(ii) hybridization assay using a nucleic acid probe and nucleic acid from the biological sample;
(iii) hybridization assay using a nucleic acid probe and nucleic acid obtained by amplification of the biological sample; and (iv) high-throughput sequencing.
49. The method of claim 46, wherein the nucleic acid sequence data is obtained from a preexisting record .
50. The method of claim 49, wherein the preexisting record comprises a genotype dataset.
51. The method of any one of the claims 46 to 50, further comprising a step of preparing a report containing results from the determination, wherein said report is written in a computer readable medium, printed on paper, or displayed on a visual display.
52. The method of any one of the claims 46 to 51, wherein the analyzing comprises
determining the presence or absence of at least one at-risk allele of the polymorphic marker for prostate cancer.
53. The method of any one of the claims 46 to 52, wherein the determining comprises comparing the sequence data to a database containing correlation data between the at least one polymorphic marker and susceptibility to the condition.
54. The method of any one of the claims 46 to 53, wherein determination of the presence of the T allele of rsl7632542 is indicative of an increased susceptibility of prostate cancer for the human individual.
55. The method according to claim 46, wherein markers in linkage disequilibrium with
rsl7632542 are selected from the group consisting of rs273622, s.55554247, s.55566277, s.55582344, rs2546552, s.55596785, s.55597645, s.55598078, s.55600121, s.55605246, s.55606024, s.55607242, s.55624341, s.55630396, s.55630578, s.55630679, s.55630791, s.55631170, s.55632347, s.55632363, s.55636052, s.55637350, s.55640040, s.55646568, s.55649132, s.55650629, s.55650844, s.55652397, s.55653401, s.55653991, s.55654907, s.55657973, s.55659043, s.55660011, s.55660013, s.55660139, s.55660143, s.55661660, s.55661718, rs6509476, s.55664020, s.55664897, s.55665723, s.55665726, s.55672641, s.55673254, s.55674252, s.55674254, s.55674727, s.55676073, s.55683393, s.55687122, s.55695317, s.55697027, s.55701748, rs7257447, s.55702308, s.55703568, s.55706751, s.55708051, s.55709067, s.55709498, s.55709766, s.55710030, s.55710848, s.55710851, s.55711749, s.55712802, s.55713451, s.55713453, s.55713458, s.55713862, s.55716007, s.55718272, s.55723496, s.55724346, s.55726794, s.55729556, s.55729562, s.55729563, s.55731588, s.55733658, s.55741403, s.55743524, s.55745833, s.55746123, s.55747079, s.55748269, s.55748274, s.55748844, s.55749193, s.55752178, s.55752271, s.55770158, rs7247686, s.55771401, s.55772266, s.55775314, s.55778756, s.55788661, s.55790622, s.55791942, rsl0413426, s.55798366, s.55818900, s.55822129, s.55825528, s.55825624, s.55833489, s.55833938, s.55848124, s.55848125, s.55849044, s.55857289, s.55857585, s.55861107, s.55861111, s.55861196, s.55862851, s.55865439, s.55867208, s.55867650, s.55868902, s.55870429, rs73598616, s.55874339, s.55875249, s.55875725, s.55881262, s.55882788, s.55883542, s.55886467, s.55887498, s.55889175, s.55892113, s.55892618, s.55892866, s.55893305, s.55896443, s.55896826, s.55898241, s.55898245, s.55899120, s.55900597, s.55900764, s.55912567, s.55914840, s.55915776, s.55936192, s.55940336, s.55946316, s.55949971, s.55955333, s.55962188, s.55963864, s.55969754, s.55979135, rs67367861, s.55989580, s.56004001, s.56006528, s.56012046, s.56013739, rs2411330, rs3212825, s.56018053, s.56019106, rs7246740, s.56025860, s.56026713, rs55786312, s.56026881, s.56026882, s.56027319, s.56029265, s.56029362, s.56032778, s.56032963, s.56032964, s.56033138, s.56033138, s.56033664, s.56033664, s.56036363, s.56037076, s.56037076, s.56038334, s.56038334, s.56039736, s.56042100, s.56042603, s.56042603, rs2659124, rs2659124, s.56046798, rs266878, rs266878, rsl74776, rsl74776, s.56052630, s.56052630, s.56052652, s.56052652, s.56053983, s.56054527, s.56054527, rsl058205, rsl058205, rs2569735, rs2569735, rs2735839, rs62113216, rs62113216, s.56058308, s.56058606, s.56058688, s.56058866, s.56060000, s.56061277, s.56062250, s.56066550, s.56066560, s.56066619, s.56067024, s.56067024, rs73592873, s.56076121, s.56076122, s.56078845, s.56085550, s.56093594 and s.56472259.
56. The method of any one of the claims 46 to 55, further comprising reporting the
susceptibility to at least one entity selected from the group consisting of the individual, a guardian of the individual, a genetic service provider, a physician, a medical organization, and a medical insurer.
57. A method for identifying a human individual who is a candidate for further diagnostic evaluation for prostate cancer, the method comprising the steps of: a) obtaining data representing uncorrected values of PSA quantity in the individual; b) determining, in the genome of the human individual, the allelic identity of at least one allele of at least one polymorphic marker, wherein different alleles of the at least one marker are associated with different levels of PSA quantity in humans, and wherein the at least one marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith; c) determining a corrected PSA quantity in the individual based on the allelic identity of the at least one polymorphic marker; and d) identifying the subject as a subject who is a candidate for further diagnostic evaluation for prostate cancer if said corrected PSA quantity is greater than values of normal PSA quantity in humans.
58. The method of claim 57, wherein the further diagnostic evaluation is selected from the group consisting of rectal ultrasound imaging and prostate biopsy.
59. The method of claim 57 or claim 58, wherein said uncorrected PSA quantity is determined in a blood sample from the individual.
60. The method of any one of the claims 57 to 59, wherein values of normal PSA quantity in human serum are less than l .Ong/mL, less than 1.5ng/mL, less than 2.0ng/mL, less than 2.5ng/mL, less than 3.0ng/mL, less than 3.5ng/mL, less than 4.0ng/mL, less than 5.0ng/mL or less than lOng/mL
61. A method of treatment of prostate cancer, the method comprising :
(i) Obtaining data identifying a PSA quantity from a human individual that has been corrected for genetic variability;
(ii) determining whether said corrected PSA quantity exceeds values of normal PSA quantity in humans;
(iii) performing a prostate biopsy of the individual if the corrected PSA quantity in the individual exceeds values of normal PSA quantity in humans; wherein if the individual is determined to have prostate cancer based on outcome of the prostate biopsy, at least one treatment module selected from the group consisting of surgery, radiation therapy, proton therapy, hormonal therapy and chemotherapy is administered to the individual .
62. The method of claim 61, wherein said corrected PSA quantity is determined by
(a) Obtaining data identifying an uncorrected PSA quantity in a first biological sample from the human individual;
(b) Analyzing sequence data about at least one polymorphic marker from the first biological sample or a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans; and
(c) Determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker.
The method of claim 62, wherein different alleles of the at least one polymorphic marker are associated with different levels of PSA quantity in humans, and wherein the at least one polymorphic marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith .
64. An apparatus for determining corrected PSA quantity in a human individual, comprising : a processor; a computer readable memory having computer executable instructions adapted to be executed on the processor, wherein said instructions comprise steps of:
(i) obtaining data representing uncorrected PSA quantity in a biological sample from the human individual;
(ii) obtaining sequence data about at least one polymorphic marker in the genome of the human individual, wherein different alleles of the at least one polymorphic marker are predictive of different PSA quantity in humans;
(iii) determining a corrected PSA quantity based on the sequence data about the at least one polymorphic marker.
65. The apparatus of claim 64, wherein at least one allele of the at least one marker is predictive of an increased quantity of PSA in humans, and wherein at least one other allele of the at least one marker is predictive of a decreased quantity of PSA in humans.
66. The apparatus of claim 64 or claim 65, wherein the at least one polymorphic marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith .
67. The apparatus of any one of the claims 64 to 66, wherein determination of the presence of an allele selected from the group consisting of the C allele of rs401681, the A allele of rs2736098, the A allele of rsl0788160, the T allele of rsl0993994, the A allele of rsl l067228, the A allele of rs4430796, the G allele of rs2735839 and the T allele of rsl7632542 is indicative of elevated PSA quantity in the individual.
68. The apparatus of any one of the claims 64 to 66, wherein determination of the presence of an allele selected from the group consisting of the T allele of rs401681, the G allele of rs2736098, the G allele of rsl0788160, the C allele of rsl0993994, the G allele of rsl l067228, the G allele of rs4430796, the A allele of rs2735839 and the C allele of rsl7632542 is indicative of reduced PSA quantity in the individual.
69. A computer-readable medium having computer executable instructions for determining corrected values of PSA quantity, the computer readable medium comprising : data indicative uncorrected values of PSA quantity for at least one human individual; data comprising sequence data about at least one polymorphic marker in the genome of the at least one human individual, wherein said at least polymorphic marker is predictive of PSA quantity in humans; and a routine stored on the computer readable medium and adapted to be executed by a processor to determine corrected PSA values for the at least one human individual;
70. The computer-readable medium according to claim 69, wherein the at least one
polymorphic marker is selected from the group consisting of rs7193343, rs7618072, rsl0077199, rsl0490066, rsl0516002, rsl0519674, rsl394796, rs2935888, rs4560443, rs6010770 and rs7733337, and markers in linkage disequilibrium therewith.
71. A method of assessing recurrence risk of prostate cancer in a human individual who has undergone treatment for prostate cancer, the method comprising (i) detecting an uncorrected PSA quantity in a first biological sample from the human individual; (ii) obtaining sequence data about at least one polymorphic marker in the first biological sample or in a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans; and (iii) determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker; wherein the corrected PSA quantity is indicative of recurrence risk of the individual .
72. A method for determining the prognosis of an individual diagnosed with prostate cancer, the method comprising
(i) detecting an uncorrected PSA quantity in a first biological sample from the human individual;
(ii) obtaining sequence data about at least one polymorphic marker in the first biological sample or in a second biological sample from the human individual, wherein the at least one polymorphic marker is correlated with PSA quantity in humans; and (iii) determining a corrected PSA quantity in the human individual based on the sequence data about the at least one polymorphic marker; wherein the corrected PSA quantity is indicative of the prognosis of the individual.
73. The method of claim 72, wherein the method further comprises determining corrected PSA velocity by repeating steps (i) - (iii) at least once, using a first sample and/or a second sample taken at a different time than the first of said first and/or second sample, and calculating a corrected PSA velocity based on the corrected PSA quantity determined for samples obtained at the different times.
74. A kit for determining PSA levels in a human individual, the kit comprising
(a) reagents necessary for determining the quantity of PSA in a blood sample from the individual; and
(b) instructions for correcting the PSA quantity determined in (a) based on the genetic composition of the individual.
75. The kit of claim 74, wherein the reagents for determining PSA quantity comprise at least one antibody selective for PSA.
76. The kit of claim 74 or 75, wherein the kit further comprises reagents for determining the identity of at least one allele of at least one polymorphic marker in the genome of the individual.
77. The kit of any one of the claims 74 - 76, wherein said instructions for correcting PSA quantity comprise instructions for correcting PSA quantity based on the genotype of the individual for at least one polymorphic marker selected from the group consisting of rs7193343, rs7618072, rsl0077199, rsl0490066, rsl0516002, rsl0519674, rsl394796, rs2935888, rs4560443, rs6010770 and rs7733337, and markers in linkage disequilibrium therewith.
78. A system for determining corrected PSA levels in a human subject, the system
comprising :
at least one processor;
at least one computer-readable medium;
a susceptibility database operatively coupled to a computer-readable medium of the system and containing population information correlating the presence or absence of one or more alleles of at least one polymorphic marker with PSA levels in a population of humans; a measurement tool that receives an input about the human subject and generates information from the input about:
uncorrected PSA levels in the human subject, and
the presence or absence of at least allele of at least one polymorphic marker in the human subject that is correlated with PSA levels in humans; and an analysis tool that:
is operatively coupled to the susceptibility database and the the measurement tool,
is stored on a computer-readable medium of the system,
is adapted to be executed on a processor of the system, to compare the information about the human subject with the population information in the susceptibility database and generate a conclusion with respect to corrected PSA levels for the human subject.
79. The system according to claims 78, further including :
a communication tool operatively coupled to the analysis tool, stored on a computer- readable medium of the system and adapted to be executed on a processor of the system to communicate to the subject, or to a medical practitioner for the subject, the conclusion with respect to corrected PSA levels for the subject.
80. The system according to any claim 78 or claim 79, the at least one polymorphic marker is selected from the group consisting of rs401681, rs2736098, rsl0788160, rsl l067228, rsl0993994, rs4430796, rs2735839 and rsl7632542, and markers in linkage disequilibrium therewith .
81. The system according to any one of claims 78 to 80, wherein the measurement tool comprises a tool stored on a computer-readable medium of the system and adapted to be executed by a processor of the system to receive a data input about a subject and determine information about the presence or absence of the at least one marker allele in a human subject from the data .
82. The system according to claim 81, wherein the data is genomic sequence information, and the measurement tool comprises a sequence analysis tool stored on a computer readable medium of the system and adapted to be executed by a processor of the system to determine the presence or absence of the at least one marker allele from the genomic sequence information.
83. The system according to any one of claims 78 to 82, wherein the input about the human subject is a biological sample from the human subject, and wherein the measurement tool comprises a tool to identify the presence or absence of the at least one marker allele in the biological sample, thereby generating information about the presence or absence of the at least one marker allele in a human subject.
The system according to claim 83, wherein the input about the human subject further comprises a second biological sample from the human subject, and wherein the measurement tool comprises a tool to identify uncorrected PSA levels in the human subject, thereby generating information about uncorrected PSA levels in the human subject.
The system according to claim 83 or claim 84, wherein the measurement tool includes: an oligonucleotide microarray containing a plurality of oligonucleotide probes attached to a solid support;
a detector for measuring interaction between nucleic acid obtained from or amplified from the biological sample and one or more oligonucleotides on the oligonucleotide microarray to generate detection data; and
an analysis tool stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to determine the presence or absence of the at least one marker allele based on the detection data.
The system according to any one of the claims 83 to 85, wherein the measurement tool includes:
a nucleotide sequencer capable of determining nucleotide sequence information from nucleic acid obtained from or amplified from the biological sample; and
an analysis tool stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to determine the presence or absence of the at least one marker allele based on the nucleotide sequence information.
The system according to any one of claims 78 to 86, further comprising :
a medical protocol database operatively connected to a computer-readable medium of the system and containing information correlating the presence or absence of the at least one marker allele and medical protocols for human subjects at risk for prostate cancer; and
a medical protocol routine, operatively connected to the medical protocol database and the analysis routine, stored on a computer-readable medium of the system, and adapted to be executed on a processor of the system, to compare the conclusion from the analysis routine with respect to susceptibility to prostate cancer for the subject and the medical protocol database, and generate a protocol report with respect to the probability that one or more medical protocols in the database will :
reduce susceptibility to prostate cancer; or delay onset of prostate cancer; or
increase the likelihood of detecting prostate cancer at an early stage to facilitate early treatment.
88. The system according to any one of claims 79 to 87, wherein the communication tool is operatively connected to the analysis routine and comprises a routine stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to:
generate a communication containing the conclusion; and
transmit the communication to the subject or the medical practitioner, or enable the subject or medical practitioner to access the communication .
89. The system according to claim 88, wherein the communication expresses a susceptibility to prostate cancer in terms of odds ratio or relative risk or lifetime risk.
90. The system according to claim 88 or 89, wherein the communication further includes the protocol report.
91. The system according to any one of claims 78 to 90, wherein the susceptibility database further includes information about at least one parameter selected from the group consisting of age, sex, ethnicity, race, medical history, weight, diabetes status, blood pressure, family history of the cancer, and smoking history in humans and impact of the at least one parameter on susceptibility to prostate cancer.
92. A system for assessing or selecting a treatment protocol for a subject diagnosed with, or at risk for, prostate cancer, comprising :
at least one processor;
at least one computer-readable medium;
a medical treatment database operatively connected to a computer-readable medium of the system and containing information correlating values of corrected PSA levels and efficacy of treatment regimens for prostate cancer;
a measurement tool to receive an input about the human subject and generate information from the input about genetically corrected PSA levels in humans; and a medical protocol tool operatively coupled to the medical treatment database and the measurement tool, stored on a computer-readable medium of the system, and adapted to be executed on a processor of the system, to compare the information with respect to the corrected PSA levels for the subject and the medical treatment database, and generate a conclusion with respect to at least one of: the probability that one or more medical treatments will be efficacious for treatment of prostate cancer for the patient; and
which of two or more medical treatments for the cancer will be more efficacious for the patient.
93. The system according to claim 92, wherein the measurement tool comprises a tool stored on a computer-readable medium of the system and adapted to be executed by a processor of the system to receive a data input about a subject and determine information about the presence or absence of at least one allele of at least one polymorphic marker in a human subject from the data, and wherein the tool further comprises a data processing function that determines corrected PSA values for the human subject based on the presence or absence of the at least one allele.
94. The system according to claim 92, wherein the measurement tool comprises a tool stored on a computer-readable medium of the system and adapted to be executed by a processor of the system to receive a data input about a subject and determine information about the presence or absence of at least one allele of at least one polymorphic marker in a human subject from the data, and wherein the measurement tool further comprises a tool stored on a computer-readable medium of the system and adapted to be executed by a processor of the system to receive a data input about corrected PSA levels in a human subject from the data.
95. The system according to claim 93 or claim 94, wherein the data is genomic sequence information, and the measurement tool comprises a sequence analysis tool stored on a computer readable medium of the system and adapted to be executed by a processor of the system to determine the presence or absence of the at least one marker allele from the genomic sequence information.
96. The system according to claim 92, wherein the input about the human subject is a
biological sample from the human subject, and wherein the measurement tool comprises a tool to identify the presence or absence of the at least one marker allele in the biological sample, thereby generating information about the presence or absence of the at least one marker allele in a human subject.
97. The system according to any one of claims 92 to 96, further comprising a communication tool operatively connected to the medical protocol routine for communicating the conclusion to the subject, or to a medical practitioner for the subject.
98. The system according to claim 97, wherein the communication tool comprises a routine stored on a computer-readable medium of the system and adapted to be executed on a processor of the system, to:
generate a communication containing the conclusion; and transmit the communication to the subject or the medical practitioner, or enable the subject or medical practitioner to access the communication .
99. The method, apparatus or medium according to any one of the preceding claims, wherein linkage disequilibrium between markers is characterized by particular numerical values of the linkage disequilibrium measures r2 and/or | D'| .
100. The method, apparatus or medium according to any of the preceding claims, wherein linkage disequilibrium between markers is characterized by values of r2 of at least 0.2.
PCT/IS2011/050012 2010-08-30 2011-08-30 Sequence variants associated with prostate specific antigen levels WO2012029080A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
IS8924 2010-08-30
IS8924 2010-08-30
IS50002 2010-12-13
IS050002 2010-12-13

Publications (1)

Publication Number Publication Date
WO2012029080A1 true WO2012029080A1 (en) 2012-03-08

Family

ID=45772231

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IS2011/050012 WO2012029080A1 (en) 2010-08-30 2011-08-30 Sequence variants associated with prostate specific antigen levels

Country Status (1)

Country Link
WO (1) WO2012029080A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9418203B2 (en) 2013-03-15 2016-08-16 Cypher Genomics, Inc. Systems and methods for genomic variant annotation
US9600627B2 (en) 2011-10-31 2017-03-21 The Scripps Research Institute Systems and methods for genomic annotation and distributed variant interpretation
US9618474B2 (en) 2014-12-18 2017-04-11 Edico Genome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US9859394B2 (en) 2014-12-18 2018-01-02 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US9857328B2 (en) 2014-12-18 2018-01-02 Agilome, Inc. Chemically-sensitive field effect transistors, systems and methods for manufacturing and using the same
US10006910B2 (en) 2014-12-18 2018-06-26 Agilome, Inc. Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same
US10020300B2 (en) 2014-12-18 2018-07-10 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
WO2018141828A1 (en) * 2017-02-01 2018-08-09 Phadia Ab Method for indicating a presence or non-presence of prostate cancer in individuals with particular characteristics
US10235496B2 (en) 2013-03-15 2019-03-19 The Scripps Research Institute Systems and methods for genomic annotation and distributed variant interpretation
US10429342B2 (en) 2014-12-18 2019-10-01 Edico Genome Corporation Chemically-sensitive field effect transistor
US10811539B2 (en) 2016-05-16 2020-10-20 Nanomedical Diagnostics, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US11342048B2 (en) 2013-03-15 2022-05-24 The Scripps Research Institute Systems and methods for genomic annotation and distributed variant interpretation
WO2022263033A1 (en) * 2021-06-15 2022-12-22 A3P Biomedical Ab Methods of determining time interval for further diagnostics in prostate cancer
US11761962B2 (en) 2014-03-28 2023-09-19 Opko Diagnostics, Llc Compositions and methods related to diagnosis of prostate cancer
US11921115B2 (en) 2015-03-27 2024-03-05 Opko Diagnostics, Llc Prostate antigen standards and uses thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008010084A2 (en) * 2006-07-12 2008-01-24 Progenika Biopharma S.A. Method of prognosing recurrence of prostate cancer
WO2009056862A2 (en) * 2007-11-02 2009-05-07 Cancer Research Technology Ltd Prostate cancer susceptibility screening
WO2010018601A2 (en) * 2008-08-15 2010-02-18 Decode Genetics Ehf Genetic variants predictive of cancer risk
US20100041037A1 (en) * 2007-02-07 2010-02-18 Julius Gudmundsson Genetic variants contributing to risk of prostate cancer
US20100129799A1 (en) * 2006-10-27 2010-05-27 Decode Genetics Ehf. Cancer susceptibility variants on chr8q24.21

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008010084A2 (en) * 2006-07-12 2008-01-24 Progenika Biopharma S.A. Method of prognosing recurrence of prostate cancer
US20100129799A1 (en) * 2006-10-27 2010-05-27 Decode Genetics Ehf. Cancer susceptibility variants on chr8q24.21
US20100041037A1 (en) * 2007-02-07 2010-02-18 Julius Gudmundsson Genetic variants contributing to risk of prostate cancer
WO2009056862A2 (en) * 2007-11-02 2009-05-07 Cancer Research Technology Ltd Prostate cancer susceptibility screening
WO2010018601A2 (en) * 2008-08-15 2010-02-18 Decode Genetics Ehf Genetic variants predictive of cancer risk

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
GUDMUNDSSON ET AL.: "Genetic Correction of PSA Values Using Sequence Variants Associated with PSA Levels", SCI TRANSL MED, vol. 2, no. ISS.62, 15 December 2010 (2010-12-15), pages 1 - 8 *
HUANG ET AL.: "Prognostic Significance of Prostate Cancer Sesceptibility Variants on Prostate-Specific Antigen Recurrence after Radical Prostatectomy", CANCER EPIDEMIOL BIOMARKERS PREV, vol. 18, no. 11, 2009, pages 3068 - 3074 *
PALSDOTTIR: "Einskirnisbreytileikar og tjaning a KLK3 geninu i blodruhalskirtelskrabbameini", RITGERD TIL DIPIOMAPROFS, HASKOLI ISLANDS, LAELNADEILD, NAMSBRAUT I GEISLA- OG LIFEINDAFRAEDI, HEILBRIGDISVISINDASVID, May 2010 (2010-05-01), pages 1 - 39 *
WIKLUND ET AL.: "Association of Reported Prostate Cancer Risk Alleles With PSA Levels Among Men Without a Diagnosis of Prostate Cancer", THE PROSTATE, vol. 69, 2009, pages 419 - 427 *
XU ET AL.: "Polymorphisms at the Microseminoprotein-beta Locus Associated with Physiologic Variation in beta-Microseminoprotein and Prostate-Specific Antigen Levels", CANCER EPIDEMIOL BIOMARKERS PREV, vol. 19, no. 11, 8 August 2010 (2010-08-08), pages 3068 - 3074 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9600627B2 (en) 2011-10-31 2017-03-21 The Scripps Research Institute Systems and methods for genomic annotation and distributed variant interpretation
US9773091B2 (en) 2011-10-31 2017-09-26 The Scripps Research Institute Systems and methods for genomic annotation and distributed variant interpretation
US10204208B2 (en) 2013-03-15 2019-02-12 Cypher Genomics, Inc. Systems and methods for genomic variant annotation
US11342048B2 (en) 2013-03-15 2022-05-24 The Scripps Research Institute Systems and methods for genomic annotation and distributed variant interpretation
US9418203B2 (en) 2013-03-15 2016-08-16 Cypher Genomics, Inc. Systems and methods for genomic variant annotation
US10235496B2 (en) 2013-03-15 2019-03-19 The Scripps Research Institute Systems and methods for genomic annotation and distributed variant interpretation
US11761962B2 (en) 2014-03-28 2023-09-19 Opko Diagnostics, Llc Compositions and methods related to diagnosis of prostate cancer
US9857328B2 (en) 2014-12-18 2018-01-02 Agilome, Inc. Chemically-sensitive field effect transistors, systems and methods for manufacturing and using the same
US10494670B2 (en) 2014-12-18 2019-12-03 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10020300B2 (en) 2014-12-18 2018-07-10 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10006910B2 (en) 2014-12-18 2018-06-26 Agilome, Inc. Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same
US10429342B2 (en) 2014-12-18 2019-10-01 Edico Genome Corporation Chemically-sensitive field effect transistor
US10429381B2 (en) 2014-12-18 2019-10-01 Agilome, Inc. Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same
US9618474B2 (en) 2014-12-18 2017-04-11 Edico Genome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US9859394B2 (en) 2014-12-18 2018-01-02 Agilome, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10607989B2 (en) 2014-12-18 2020-03-31 Nanomedical Diagnostics, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US11921115B2 (en) 2015-03-27 2024-03-05 Opko Diagnostics, Llc Prostate antigen standards and uses thereof
US10811539B2 (en) 2016-05-16 2020-10-20 Nanomedical Diagnostics, Inc. Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
JP2020505928A (en) * 2017-02-01 2020-02-27 ファディア・アクチボラゲットPhadia AB Method for indicating the presence or absence of prostate cancer in an individual with certain characteristics
WO2018141828A1 (en) * 2017-02-01 2018-08-09 Phadia Ab Method for indicating a presence or non-presence of prostate cancer in individuals with particular characteristics
JP7138112B2 (en) 2017-02-01 2022-09-15 ファディア・アクチボラゲット Methods for indicating the presence or absence of prostate cancer in individuals with certain characteristics
CN110382718A (en) * 2017-02-01 2019-10-25 法迪亚股份有限公司 It is used to indicate the present or absent method of the prostate cancer in the individual with special characteristic
WO2022263033A1 (en) * 2021-06-15 2022-12-22 A3P Biomedical Ab Methods of determining time interval for further diagnostics in prostate cancer

Similar Documents

Publication Publication Date Title
EP2663656B1 (en) Genetic variants as markers for use in urinary bladder cancer risk assessment
WO2012029080A1 (en) Sequence variants associated with prostate specific antigen levels
US20170191134A1 (en) Sequence Variants Associated with Prostate Specific Antigen Levels
AU2008256219B2 (en) Genetic variants on Chr 5p12 and 10q26 as markers for use in breast cancer risk assessment, diagnosis, prognosis and treatment
US8951735B2 (en) Genetic variants for breast cancer risk assessment
WO2013035114A1 (en) Tp53 genetic variants predictive of cancer
EP2247755B1 (en) Susceptibility variants for lung cancer
WO2013088457A1 (en) Genetic variants useful for risk assessment of thyroid cancer
CA2729931A1 (en) Genetic variants predictive of cancer risk in humans
US20140329719A1 (en) Genetic variants for predicting risk of breast cancer
US20110020320A1 (en) Genetic Variants Contributing to Risk of Prostate Cancer
AU2009269541A1 (en) Genetic variants as markers for use in urinary bladder cancer risk assessment, diagnosis, prognosis and treatment
WO2014074942A1 (en) Risk variants of alzheimer&#39;s disease
WO2013065072A1 (en) Risk variants of prostate cancer
EP2681337B1 (en) Brip1 variants associated with risk for cancer
WO2010131268A1 (en) Genetic variants for basal cell carcinoma, squamous cell carcinoma and cutaneous melanoma
WO2011104730A1 (en) Genetic variants predictive of lung cancer risk
WO2011095999A1 (en) Genetic variants for predicting risk of breast cancer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11821224

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11821224

Country of ref document: EP

Kind code of ref document: A1