WO2009011911A2

WO2009011911A2 - Methods of detecting prostate cancer

Info

Publication number: WO2009011911A2
Application number: PCT/US2008/008798
Authority: WO
Inventors: Jun Luo; William Isaacs; Thomas Dunn; Rong Hu
Original assignee: The Johns Hopkins University
Priority date: 2007-07-18
Filing date: 2008-07-18
Publication date: 2009-01-22
Also published as: WO2009011911A3

Abstract

Methods of detecting the presence or absence of a nucleic acid segment in the (Phosphoserine phosphatase-like) PSPHL gene locus of a subject, identifying a subject at risk for developing prostate cancer, determining the prognosis of a subject with prostate cancer and kits for use in identifying a subject at risk for developing prostate cancer are disclosed.

Description

METHODS OF DETECTING PROSTATE CANCER

RELATED APPLICATIONS

This application claims priority from U.S. Provisional Application No. 60/959,954 filed July 18, 2007, and which is incorporated herein by reference in its entirety.

BACKGROUND

The incidence of prostate cancer has increased in recent years. On January 1 , 2005, in the United States there were approximately 2,106,499 men alive who had a history of cancer of the prostate. This includes any person alive on January 1, 2005 who had been diagnosed with cancer of the prostate at any point prior to January 1 , 2005 and includes persons with active disease and those who are cured of their disease.

It is estimated that 186,320 men will be diagnosed with and 28,660 men will die of cancer of the prostate in 2008 (http://seer.cancer.gov/csr/ 1975_2005/results_single/ sect Ol table.01.pdf). From 2001-2005, the median age at diagnosis for cancer of the prostate was 68 years of age, and in the US from 2001-2005, the median age at death for cancer of the prostate was 80 years of age (http://seer.cancer.gov/csr/

1975_2005/results_single/ sect_01_table.l3_2pgs.pdf). The incidence of prostate cancer has been found to be population specific. The age-adjusted death rate was 26.7 per 100,000 men per year, based on patients who died in 2001-2005 in the US. Among White men, the rate was 24.6 per 100,000 men; among Black men the rate was 59.4 per 100,000 men; among

Asian/Pacific Islanders, the rate was 11.0 per 100,000 men; among American Indian/ Alaskan Natives, the rate was 21.1 per 100,000 men; and among Hispanics, the rate was 20.6 per 100,000 men.

According to the National Cancer Institute, based on rates from 2003-2005, 15.78% of men born today will be diagnosed with cancer of the prostate at some time during their lifetime. This number can also be expressed as 1 in 6 men will be diagnosed with cancer of the prostate during their lifetime. These statistics are called the lifetime risk of developing cancer. Lifetime risk may also be discussed in terms of the probability of developing or of dying from cancer. Based on cancer rates from 2003 to 2005, it was estimated that men had about a 44 percent chance of developing cancer in their lifetimes, while women had about a 37 percent chance, of developing cancer. When looking at the probability of developing cancer of the prostate between two age groups, for example, 8.04% of men will develop cancer of the prostate between their 50th and 70th birthdays, (information available on the world wide web at seer.cancer.gov/statfacts/html/prost.html). Prostate cancer is a latent disease. Many men carry prostate cancer cells without overt signs of disease. The progression of the disease usually, goes from a well-defined mass within the prostate to a breakdown and invasion of the lateral margins of the prostate, followed by metastasis to regional lymph nodes, and metastasis to the bone marrow. Cancer metastasis to bone is common and often associated with uncontrollable pain. Autopsies of individuals dying of other causes show prostate cancer cells in 30% of men at age 50 and in 60% of men at age 80. Furthermore, prostate cancer can take up to 10 years to kill a patient after the initial diagnosis.

Current microarray technologies and the sequencing of the human genome have significantly enhanced the potential for investigations in all fields and particularly in the area of cancer research. High-throughput gene expression profiling technologies offer an opportunity to uncover critical molecular events in the development and progression of various cancers and can be used to design improved prognostic testing and effective treatment strategies. High-density tissue microarrays (TMA) are useful for profiling protein expression in a large number of samples (Rubin M.A. et al., Am J Surg Pathol. 2002 Mar;26(3):312-9), and previous transcriptome analyses in various malignancies have provided valuable information for the assessment of patient group classifications such as subgroups of patients that are likely to respond to a particular therapy (Sondak, V. K. Adjuvant therapy for melanoma. Cancer J7 Suppl 1, S24-7. (2001)). Particularly, in prostate cancer, microarray analysis may provide a useful way to examine a large number of clinical samples for putative prostate cancer biomarkers.

Prostate cancer is typically diagnosed with biopsy examination following a digital rectal exam and/or prostate specific antigen (PSA) screening. An elevated serum PSA level can indicate the presence of PCA. PSA is used as a screening marker for prostate cancer because it is secreted only by prostate cells. A healthy prostate will release a stable amount— typically below 4 nanograms per milliliter into the circulation, or a serum PSA reading of "4" or less— whereas cancer cells release escalating amounts that correspond with the severity of the cancer. A level between 4 and 10 may raise a doctor's suspicion that a patient has prostate cancer, while amounts above 50 may show that the tumor has spread elsewhere in the body. When PSA or digital tests indicate a strong likelihood that cancer is present, a transrectal ultrasound (TRUS) may be used to map the prostate and show any suspicious areas. Biopsies of various sectors of the prostate are used to determine if prostate cancer is present. Treatment options depend on the stage, grade, and other clinical variables of the cancer. Men with a 10-year life expectancy or less who have a low Gleason number and whose tumor has not spread beyond the prostate are often treated with watchful waiting (no treatment). Treatment options for more aggressive cancers include surgical treatments such as radical prostatectomy (RP), in which the prostate is completely removed (often with nerve sparing techniques to preserve potency and urinary functions) and radiation, applied through an external beam that directs the dose to the prostate from outside the body or via low-dose radioactive seeds that are implanted within the prostate to kill cancer cells locally. For more aggressive prostate cancers, anti-androgen hormone therapy is also used, alone or in conjunction with surgery or radiation. Hormone therapy uses luteinizing hormone-releasing hormones (LH-RH) analogs, which block the pituitary from producing hormones that stimulate testicular testosterone production, or by surgical removal of the testis, alone or in combination with chemical (anti-androgens) that block androgenic signaling.

While surgical and hormonal treatments are often effective for localized PCA, advanced disease remains essentially incurable. Androgen ablation is the most common therapy for advanced PCA, leading to massive apoptosis of androgen-dependent malignant cells and temporary tumor regression. In most cases, however, the tumor reemerges with a vengeance and can proliferate independent of androgen signals.

The advent of prostate specific antigen (PSA) screening has led to earlier detection of PCA and significantly reduced PCA-associated fatalities. However, the impact of PSA screening on cancer-specific mortality is still unknown pending the results of prospective randomized screening studies (Etzioni et al., J. Natl. Cancer Inst., 91 :1033.1999); Maattanen et al., Br. J. Cancer 79:1210 1999; Schroder et al., J. Natl. Cancer Inst., 90:1817.1998). A major limitation of the serum PSA test is a lack of prostate cancer sensitivity and specificity especially in the intermediate range of PSA detection (4-10 ng/ml). Elevated serum PSA levels are often detected in patients with non-malignant conditions such as benign prostatic hyperplasia (BPH) and prostatitis, and provide little information about the aggressiveness of the cancer detected. Coincident with increased serum PSA testing, there has been a dramatic increase in the number of prostate needle biopsies performed (Jacobsen et al., JAMA 274:1445 1995). This has resulted in a surge of equivocal prostate needle biopsies (Epstein and Potter J. Urol., 166:402 2001). Thus, development of additional serum and tissue biomarkers or additional methods to detect a patient at risk for prostate cancer remains. There remains a need for determining those at risk for or susceptible to prostate cancer, early-stage prostate cancer prognosis, and early intervention.

SUMMARY

The present invention provides, for the first time, a novel structural variation of the (Phosphoserine phosphatase-like) PSPHL locus that is tightly linked to gene expression and demonstrates unusual patterns of population differentiation. Given the potential importance of genomic variations in the differential risk for diseases, and the invention provides an association of the variation within the PSPHL locus with prostate cancer in the African American population.

The findings presented herein may have an important impact on the design of clinical trials focused upon the prevention of prostate cancer in subject populations, for example in high-risk individuals, on the implementation of community based outreach programs aimed at early screening and timely treatment during the window of curability, or on individualized treatment of subjects with advanced diseases.

In a first aspect, the invention provides methods of detecting the presence or absence of a nucleic acid segment in the (Phosphoserine phosphatase-like) PSPHL gene locus of a subject, wherein the presence or absence of the nucleic acid segment in the gene locus indicates an altered risk of cancer.

In preferred embodiments, the cancer is prostate cancer.

In one embodiment, the presence or absence of the nucleic acid segment in the PSPHL gene locus is detected in an African American subject. In another embodiment, the absence of the nucleic acid segment indicates an increased risk of prostate cancer in the African American subject.

In a further embodiment, the nucleic acid segment comprises 133 base pairs of exon 1 of human PSPHL mRNA encoded by GenBank Accession No. AJOOl 612 corresponding to SEQ ID NO: 1.

In one embodiment, the nucleic acid segment comprises SEQ ID NO: 2. In one embodiment, the nucleic acid segment comprises SEQ ID NO: 13.

In one embodiment, the nucleic acid segment comprises SEQ ID NO: 14.

In one embodiment, the nucleic acid segment comprises SEQ ID NO: 15.

In a further embodiment, the presence of the insertion allele of the PSPHL gene locus is correlated with the expression of the PSPHL gene product. In a related embodiment, the absence of the insertion allele of the PSPHL gene locus is correlated with the absence of the PSPHL gene product.

In another embodiment, the deletion allele is associated with the expression of a set of genes. In one embodiment, the subject is homozygous for a deletion in the PSPHL gene locus. In another embodiment, the subject is heterozygous for a deletion in the in the PSPHL gene locus. In a related embodiment, the homozygous deletion allele is associated with the expression of a set of genes. In another embodiment, In another embodiment, the heterozygous deletion allele is associated with the expression of a set of genes. In one embodiment, the expression of the PSPHL gene product is associated with the expression of a set of genes.

In another aspect, the invention features a method of determining the ancestry of a subject comprising detecting the presence or absence of a nucleic acid segment in the PSPHL gene locus of a sample subject population, wherein the presence or absence of the variation indicates the ancestry of the subject.

In one embodiment, the presence or absence of a nucleic acid segment is indicative of African, e.g., African American or European, e.g., European American, ancestry. In another embodiment, the absence of the nucleic acid segment identifies the population as an African American subject. In one embodiment of any one of the above aspects, the method further comprises selecting subjects with an increased risk of developing prostate cancer. In a related embodiment, the method comprises obtaining a sample from the subjects.

In another aspect, the invention features a biomarker for prostate cancer in an African American subject comprising an insertion in the PSPHL gene locus, wherein the presence of the biomarker is correlated with a decreased risk of prostate cancer.

In one embodiment, the insertion encodes a nucleic acid comprising 133 base pairs of exon 1 of human PSPHL mRNA encoded by GenBank Accession No. AJOOl 612 corresponding to SEQ ID NO: 1.

In another embodiment, the insertion encodes a nucleic acid comprising SEQ ID NO: 2.

In another related embodiment, the absence of the biomarker is correlated with an increased risk of prostate cancer in the African American subject.

In still another embodiment, the presence of the insertion in the PSPHL gene locus is correlated with the expression of the PSPHL gene product. In one embodiment, the insertion allele is associated with the expression of a set of genes.

In another aspect, the invention features a method of identifying a subject at risk for developing prostate cancer comprising detecting the presence or absence of a nucleic acid segment in the PSPHL gene locus of a subject to determine the genotype of the subject, wherein the absence of the nucleic acid segment in the gene locus indicates an increased risk of prostate cancer.

In yet another aspect, the invention features a method of determining the prognosis of a patient with prostate cancer comprising: detecting the presence or absence of a nucleic acid segment in the PSPHL gene locus of a subject, wherein the absence of the variation determines the prognosis of a patient with prostate cancer.

In one embodiment, the prognosis determines the course of treatment.

In another embodiment of any one of the above aspects, the subject is homozygous for a deletion in the in the PSPHL gene locus. In still another embodiment of any one of the above aspects, the subject is heterozygous for a deletion in the in the PSPHL gene locus.

In a further embodiment of any one of the above aspects, the subject is selected from an African American population.

In another embodiment of any one of the above aspects, the absence of the nucleic acid segment indicates an increased risk of, or risk of recurrence of, prostate cancer. In still another embodiment of any one of the above aspects, the nucleic acid comprises 133 base pairs of exon 1 of human PSPHL mRNA encoded by GenBank Accession No. AJOOl 612 corresponding to SEQ ID NO: 1.

In another related embodiment of any one of the above aspects, the nucleic acid comprises SEQ ID NO: 2. In another embodiment of any one of the above aspects, the nucleic acid comprises SEQ ID NO: 13, SEQ ID NO; 14 or SEQ ID NO; 15.

In one embodiment of any one of the above aspects, the presence of the insertion allele of the PSPHL gene locus is correlated with the expression of the PSPHL gene product. In another embodiment of any one of the above aspects, the absence of the insertion allele of the PSPHL gene locus is correlated with the absence of the PSPHL gene product. In a related embodiment of any one of the above aspects, the homozygous deletion allele is associated with the expression of a set of genes. In another related embodiment of any one of the above aspects, the heterozygous deletion allele is associated with the expression of a set of genes.

In another embodiment of any one of the above aspects, the presence or absence of a nucleic acid segment in the PSPHL gene locus is determined using a polymerase chain reaction (PCR) assay. In a related embodiment, the PCR assay is a multiplexed PCR assay. In a further related embodiment, the PCR is carried out using primers comprising the nucleic acid sequences as set forth as SEQ ID NO: 3 and SEQ ID NO: 4 and primers comprising the nucleic acid sequences as set forth as SEQ ID NO: 5 and SEQ ID NO: 6. In another related embodiment, the nucleic acid sequences as set forth as SEQ ID NO: 3 and SEQ ID NO: 4 amplify a 133 base pair fragment of the insertion sequence in exon 1 of the PSPHL gene. In still another related embodiment, the nucleic acid sequences as set forth as SEQ ID NO: 5 and SEQ ID NO: 6 generate an amplicon only if the insertion sequence is absent.

In another embodiment of any one of the above aspects, the subject has previously been treated for prostate cancer.

In still another embodiment of any one of the above aspects, the measurement is performed after surgery or therapy to treat prostate cancer.

In another aspect, the invention features an antibody to detect PSPHL protein in cells and tissues with PSPHL genotypes. In one embodiment, the antibody is polyclonal. In another embodiment, the antibody is monoclonal. In a related embodiment, the polyclonal antibody is directed to the 72AA antigen of prostate cells corresponding to SEQ ID NO: 7.

In another aspect, the invention features a kit for use in identifying a subject at risk for developing prostate cancer comprising primers directed to amplify a 133 base pair sequence of exon 1 of human PSPHL mRNA encoded by GenBank Accession No. AJOOl 612 corresponding to SEQ ID NO: 1, and instructions for use.

In one embodiment, the primers comprise the nucleic acid sequences as set forth as SEQ ID NO: 3 and SEQ ID NO: 4.

In another embodiment, the primers comprise the nucleic acid sequences as set forth as SEQ ID NO: 5 and SEQ ID NO: 6.

In another aspect, the invention features a kit comprising primers comprising the nucleic acid sequences set forth as SEQ ID NO: 3 and SEQ ID NO: 4, and instructions for use.

In another aspect, the invention features a kit comprising primers comprising the nucleic acid sequences set forth as SEQ ID NO: 5 and SEQ ID NO: 6, and instructions for use.

In another further aspect, the invention features a kit comprising primers designed against the nucleic acid sequence set forth as SEQ ID NO: 17, and instructions for use. In still another aspect, the invention features a kit comprising primers designed against the nucleic acid sequence set forth as SEQ ID NO: 18, and instructions for use.

In still another further aspect, the invention features a kit comprising primers designed against the nucleic acid sequence set forth as SEQ ID NO: 19, and instructions for use. In one embodiment of any one of the above aspects, the kits further comprise instructions for use in PCR assay. In one embodiment, the PCR is multiplexed PCR.

In another aspect, the invention features a kit for use in identifying a subject at risk for developing prostate cancer comprising: an antibody directed to a PSPHL antigen, and instructions for use. In another aspect, the invention features a kit comprising an antibody directed to a

PSPHL antigen.

In one embodiment of any one of the above aspects, the antibody is monoclonal.

In one embodiment of any one of the above aspects, the antibody is polyclonal. In a related embodiment, the polyclonal antibody is used to detect the 72AA antigen. In another related embodiment, the polyclonal antibody is directed to a sequence encoded by SEQ ID NO: 7.

The above features and advantages of the present invention will be apparent from or are set forth in more detail in the accompanying drawings, which are incorporated in and form a part of this specification, and the following Detailed Description, which together serve to explain by way of example the principles of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 is a graph showing that PSPHL gene expression is higher in African American prostate cancer tissues when compared to European American prostate tissues. The graph shows Expression microarray analysis of prostate cancer tissues by race. Surgical prostate cancer tissues from 12 European American cases were compared to those from 8 African American cases. In this "volcano" plot, each gene is represented by a dot, positioned by fold expression change (x axis) and p value (y axis), that were calculated by comparing the expression values between the two racial groups. Genes represented by red dots demonstrated expression changes greater than 2 fold and p<0.05. Using this cut-off criteria (2 fold change and p<0.05), six genes are under-expressed in African American prostate tissue, and only gene, PSPHL, is over-expressed in African American prostate tissues.

Figure 2 shows the nucleotide sequence of the PSPHL mRNA (GenBank accession number AJ001612 represented by SEQ ID NO: 1) with the nucleotide numbers on the left. The position of the Pl (underlined, SEQ ID NO: 12) and P2 (double underlined, SEQ ID NO: 13) primer pairs are shown.

Figure 3 shows RT-PCR validation of PSPHL expression in the same tissue samples used in expression microarray analysis, with the addition of one more African American Tumor sample. The PSPHL gene was expressed in 3 of 12 European American prostate cancers and 7 of the 9 African American prostate cancers. Primer sets used in the assays are indicated in the parentheses to the right and annotated below. Note than primer set 2 amplified two products, because the primers spanned exon 3, which is alternatively spliced during gene transcription (see Figure 6) . GAPDH is a control for RT-PCR. Pl : CTGGGAGAACCGGAA GAATAACAT (forward), corresponds to nt 414-438 of the AJOO 1612 sequence,

CCAATATTCACTGAAGGCTGCCGA (reverse), corresponds to nt 760-783 of the AJOOl 612 sequence. P2: CGGTC ATC AGTGAAGAAGGAATCGGA (forward), corresponds to nt 304-329 of the AJ001612 sequence; TGGCGTTATCCTTGACTTGTTGCC (reverse), corresponds to nt 362-385 of the AJOOl 612 sequence. Figure 4 shows RT-PCR validation of PSPHL expression in paired normal (N) and cancer tissues (T) from 8 European American prostate cancer patients. The 8 cases are independent of cases used in Figure 1 and 2. Primer sets used were identical to those used in Figure 2. Note the high concordance of PSPHL gene expression between the paired normal and tumor tissues from the same patient. Figure 5 shows RT-PCR validation of PSPHL expression in paired normal (N) and cancer tissues (T) from 7 African American prostate cancer patients. Five of the 7 cases are independent of cases used in Figure 1 and 2, while two cases (1081, 1134) overlaps with those used in Figure 2. Primer sets used were identical to those used in Figure 2. Note again the high concordance of PSPHL gene expression between the paired normal and tumor tissues from the same patient.

Figure 6 shows the nucleotide sequence of the alternative PSPHL mRNA. The nucleotide sequence of an alternative PSPHL mRNA (GenBank accession number BC065228) is shown with the nucleotide numbers on the left. The position of the P2 primer pair is shown double underlined. This sequence has an insertion of 122 bp (shown in capitals and bold) compared to the sequence of AJ001612 (Figure 2), resulting in a PCR product that is 122 bp longer.

Figure 7 shows comparison of the human AJ001612 and BC065228 mRNA sequences with the predicted gene from the Chimnpanzeee genome chromosome 7 contig (Genbank Accession No. NW 001237953.1). Figure 8 shows the nucleotide sequence of AAIns sequence in the PSPHL mRNA. The nucleotide sequence of the PSPHL mRNA (GenBank accession number AJ001612) is shown with the nucleotide numbers on the left. The sequence of a 326 bp portion of the African American specific insertion (AAIns) that overlaps with the PSPHL mRNA sequence is shown in bold italics. The position of the P3 primer pair is shown heavy underlined and they amplify a product of 133 bp.

Figure 9A and B are schematic drawings. Figure 9A shows the predicted gene structure of PSPHL. Solid boxes represent the exons and the lines represents the introns. The size and the position of the introns, exons in PSPHL gene is predicted based on the assumption that the human PSPHL and Chimpanzee PSPHL genes are similar to each other. Figure 9B shows the two spliced variants of PSPHL mRNA. Numbers above the bars represent the nucleotide position of the exon boundary in the respective clones. Note that the 5 prime of the exon 1 sequence of BC065228 was not complete, and the last exon in BC065228 was shortened when compared to AJOOl 612 possibly due to alternative cleavage during mRNA synthesis.

Figure 10 shows that expression of the PSPHL Gene is Determined by the Presence of AAIns. The Figure shows the results of PCR. A total of 1 1 cases for which expression status of PSPHL were known were examined for the presence of AAIns in genomic DNA isolated from seminal vesicles. The 8 PSPHL mRNA positive cases (362, 731, 994, 1081, 1 115, 1166, 1665, 1863) were also positive for the presence of AAIns, as detected by the primer set P3. Three cases that were know to be negative for PSPHL expression (554, 1134, 1957), plus 5 European American genomic DNA samples randomly picked from the case-control cohort (413, 414, 436, 441, 483), were all negative for AAIns (with the exception of a faint band for 1134, which may be caused by contamination). Three cDNA samples were used as positive (731 , 1863) and negative controls (1957). The P3 primers: TCAGCTAAAGTGGCTGTTG GGTGT (Forward) nt 25-48 in AJOOl 612 sequence; AAGCTTCTGCGCTACCTTGCGAT (Reverse) nt 135-157 in AJ001612 sequence.

Figure 11 is a graph showing AAIns is associated with Cancer Risk in European Americans. The graph shows percentages of individuals (expressed as a fraction) with positive AAIns in European American controls (1), prostate cancer cases with Gleason 6 and below (2), and prostate cancer cases with Gleason 7 and above (3).

Figure 12 shows the results of Multiplexed PCR using primer sets P3 and P4 to genotype the AAIns locus. P3 primer pair, as described in Figures 7 and 9, amplified the allele with the AAIns sequence, while P4 primer pair, as designed below, amplified the allele without the sequence. The validity of the assay is confirmed by the use of mixed genomic DNA samples known to have two copies of the AAIns sequence (1665, 1863), and zero copies of the AAIns sequence (1957, 1704). The P4 primer sequence was designed against a sequence 5' of the AAIns sequence, as determined by alignment of the Chimpanzee human genome assemblies as of 2006, and based on the prediction of the breakpoints in chromosome 7 human reference sequence. P4 primers: AGTCTTGCTATCTTGCCCAGGCTGAT (forward), nt 5419659-5419684 in chromosome 7 human reference assembly GTAGAGACTGGGTTTCACCATGTTGG (Reverse), nt 5421321-5421346 in chromosome 7 human reference assembly. Lane and corresponding samples are as follows: 1. Mixture of genomic DNA from sample 1665 and 1704; 2. Mixture of genomic DNA from sample 1665 and 1957; 3. Mixture of genomic DNA from sample 1863 and 1704; 4. Mixture of genomic DNA from sample 1863 and 1957; 5. Mixture of genomic DNA from sample 1704 and 1957; 6. Mixture of genomic DNA from sample 1665 and 1863; 7.No DNA template; 8.No DNA template. Figure 13 is a schematic diagram of AAIns in relation to the assembled Human genome (HG) and Chimpanzee genome (CG) chromosome 7 sequences (not to scale). The human WGS and Trace sequences presented in Exhibit F and H were used to obtain the partial assembly of AAIns and their positions and IDs marked. Coordinates for the assembled genome were marked below the line positions and sizes of the exons and introns in both genomes and AAIns were marked above their respective spaced-out positions. The 5' breakpoint is chr7 55798228 in the assembled human genome, and separated by 9bp (GTGCGTCTA) from the 3' break point at Chr7 55798238 in the assembled human genome. The position of primer sets (P3 and P4) used to genotype the AAIns locus are marked with red vertical bars as shown. P3 primer set amplifies sequences in Exonl which is part of AAIns, while P4 primer set amplifies the PSPHL-null allele. A Black triangle marks the site of a

~9kb viral insertion specific to the Chimpanzee genome, which resulted in size differences in intron 3 between the two genomes.

Figure 14 shows the alignment of AJOOl 612 with the human genome. Matching bases in AJOOl 612 and human genomic sequences are colored blue and capitalized. Light blue bases mark the boundaries of gaps in either sequence. The mRNA sequence (query sequence) is presented first, followed by the genomic sequence. Each sequence is marked by the nucleotide position (for mRNA) or coordinates (genome sequence based on the March 2006 assembly).

Figure 15 shows the alignment of A JOOl 612 with the assembled Chimpanzee genome. Matching bases in AJOOl 612 and Chimpanzee genomic sequences are colored blue and capitalized. Light blue nucleotides mark the boundaries of predicted exons in either sequence. The mRNA sequence (query sequence) is presented first, followed by the genomic sequence. Each sequence is marked by the nucleotide position (for mRNA) or coordinates (genome sequence based on the March 2006 assembly).

Figure 16 shows the alignment of BC065228 with the assembled Chimpanzee genome. Matching bases in BC065228 and Chimpanzee genomic sequences are colored blue and capitalized. Light blue nucleotides mark the boundaries of predicted exons in either sequence. The mRNA sequence (query sequence) is presented first, followed by the genomic sequence. Each sequence Is marked by the nucleotide position (for mRNA) or coordinates (genome sequence based on the March 2006 assembly). Note that the exon 1 sequence is not complete in BC065228.

Figure 17 shows partial assembly of AAIns based on WGS and Trace Archive. In the Figure, the 4 exon sequences in BC0065228 (SEQ ID NO: 8, Figure 16) were used to query the human whole genome shotgun (WGS) and Trace Archive. These queries identified 4 genomic sequences - ContigO, gnl|ti|226793227, gi|148184701 |, gi|68978189| - each of which contain at least one exon sequence (red underlined; see Figure 13).

Figure 18 shows identification of the AAIns 3 ' breakpoint. The first 4kb of >gi|68978189|gb|AADB02010625.1 | Homo sapiens chromosome 7 CRA_219000002701389 whole genome shotgun (WGS) sequence was aligned with the assembled human genome. The matched sequence is blue capitalized and the light blue bold nucleotides border to small region of mismatch. Exon 3 in the WGS sequence is marked red underline. The nucleotide position (for WGS sequence) and coordinates (for human chr7 genome sequence) are marked to the right. Results of the alignment are ordered by the query sequence followed by the genome sequence. Therefore, the 3' breakpoint of AAIns (upstream black) is where the region of homology (blue) with the assembled human genome sequence ends.

Figure 19 shows identification of the AAIns 5' breakpoint. First, a lOObp non- AAIns sequence upstream to the AAIns 3' breakpoint from the assembled human genome (1, see Exhibit G above) was used to query the Trace Archive to obtain gnl|ti| 1656600323 (2). The overlapping sequence is underlined. Next, the Trace sequence gnl|ti| 1656600323 (2) was then used to query the assembled human genome sequence (3). The overlap between the gnl|ti| 1656600323 sequence and the human genome sequence is shown. Therefore, the 5' breakpoint of AAIns (downstream black) is where the region of homology with the assembled human genome sequence ends. Finally, the Trace sequence gnl|ti| 1656600323 (2) was used to query the assembled Chimpanzee genome. The overlap between the gnl|ti| 1656600323 sequence and the chimpanzee genome sequence upstream of the 5' AAIns breakpoint is shown in blue bold, and the downstream of the 5' AAIns breakpoint is shown in red italics. This confirms that the gnl|ti| 1656600323 sequence contains the human AAIns sequence.

Figure 20 shows validation of the breakpoint position. A 200bp assembled human genome sequence flanking the breakpoint (5' at chr7 55798228, 3' at chr7 55798238) was aligned with the assembled Chimpanzee genome. The two matched sequences on the Chimpanzee genome spanned 52765 bp on chromosome 7. The bold nucleotides mark the boundaries of the AAIns insertion. The 52765 bp insertion includes exons 1, 2, and 3

(underlined) of the Chimpanzee PSPHL gene. Nucleotide positions are marked to the right. Again query sequence is followed by Chimpanzee genome sequence.

Figure 21 A and B shows differential expression of PSPHL detected in two independent samples sets using different array platforms. Panel A is a "Volcano" plot of expression data derived from the Agilent platform. Average expression ratios for each gene from the comparison of African American vs. European American prostate tumors were plotted on the X axis in log scale, and -loglO of p values comparing the two groups were plotted on the Y axis. Differentially expressed genes (red) were defined by fold expression greater than 2 and p value less than 0.05. Panel B is a heatmap for PSPHL expression ratios detected using cDNA microarrays in paired normal and tumor prostate tissues. Expression ratios of each of the 52 (13x4) samples were derived from the comparison of test sample vs a common BPH reference, and represented by red color if overexpressed relative to BPH. Paired N/T samples from each of the 26 cases were aligned vertically as shown. AA: African American; EA: European American; N: normal; T: tumor. Figure 22 shows two alternatively spliced PSPHL transcripts. The ORF sequences were marked as color matched bars above numbered variant sequence positions, the translated amino acid sequences were similarly color coded according to their respective coding sequences. GenBank sequence associated with BC065228 is incomplete at the 5' end but no other putative start sites were identified from the additional sequence we revealed from RACE analysis. Variant 1 encodes SEQ ID NO: 9. Variant 2 encodes SEQ ID NO: 10.

Figure 23 A and B. Panel A shows the results of RT-PCR. The RT-PCR shows concordance between DNA and RNA in matched cases. Primers were designed to amply a short stretch of DNA sequence within nt 50-200 ofAJ00161 12, later confirmed to be within a single exon (see Figure 5), and correspond to the PSPHL mRNA sequences absent in the reference genome. Primers used to examine gene expression by RT-PCR spanned the exons, as revealed after the exon structure was later defined, excluding the possibility of false positive RNA detection due to DNA contamination. Primers for GAPDH were similarly designed to detect DNA within a single exon, and to detect the transcript sequences spanning two exons. Panel B shows the results of RT-PCR. In Panel B The PSPHL gene structure, indel break points, and the genotyping assay. The top is a schematic diagram of the complete PSPHL gene structure, with 4 exons sized at 212bp, 1 13bp, 122bp, and 502 bp. Positions for TRACE and WGS sequences used to partially assemble the insertion allele were marked with light blue lines above the insertion allele track. The deletion allele sequence positions were defined by the reference genome. Sizes of the DNA segments, when available, are indicated by numbers shown above the tracks. The bottom shows representative genotyping results in African Americans and European Americans. The genomic positions of primer sets (P3 and P4) used to genotype the PSPHL locus were marked with red vertical bars as shown in A. Ins/Ins genotype: P3 signal only; Ins/Del genotype: both P3 and P4 signals, Del/Del genotype: P4 signal only.

Figure 24 is a graph that shows a summary of genotyping results in 3 populations. AA: African American; EA: European American.

Figure 25 is a graph that shows PSPHL genotype in cases and controls in the African American population. Figure 26 is a Western blot that shows detection of the polyclonal anti 72AA antibody.

Other aspects of the invention are described infra.

DETAILED DESCRIPTION Genetic variation refers to heritable DNA level differences that exist in all living organisms. There are 3 billion chemical base pairs that make up human DNA. One of the most common types of genetic variation is called single nucleotide polymorphisms (SNPs). Each SNP accounts for only one base pair difference, however there are millions of SNPs that account for an average genetic difference between humans in about 0.08% of the 3 billion chemical base pairs in the human genome. Structural variation is another type of genetic variation that each involves at least 1000 such chemical base pair codes. These longer stretches of DNA can be deleted, duplicated, or inserted, leading to change in DNA dosage (therefore also termed copy number variation). They can also be inverted in orientation or translocated to a different location in the genome. In recent years, knowledge regarding this type of variation has increased, following the improvement of genomic technologies such as microarrays. These variations, by definition, are differences that can be inherited, not alterations such as the ERG-TMPRS S2 fusion that has been discovered in prostate cancer, which are not heritable. Genomic variants of all sizes and types can contribute to genetic disease. From the twin studies the understanding has emerged that there is a strong genetic basis for human prostate cancer. Many geneticists have made use of the genetic variations such as SNPs to identify the risk factors that would elevate the chance of developing prostate cancer if inherited. Indeed, they have been very successful. However, these studies are not without limitations. One such limitation is that the gene or genes that contribute to this elevated risk have not been definitively identified. Yet another limitation is that corresponding studies in African American men, the highest risk group, have not been conducted in similar scale with similar intensity. While commercial tests, some costing a few hundred dollars, are being developed to assess prostate cancer risk, with the hope to benefit humankind by early diagnosis and timely treatment to reduce prostate cancer mortality, it remains to be established whether these tests are equally effective in identifying high risk African American men, already a high risk group for developing prostate cancer. Further, the use of structural variation for identification of prostate cancer genes is a research area that has not been explored. Recently, the present inventors have discovered a novel genomic variation that has the potential to address the above-mentioned aspects of prostate cancer biology and research. Described herein is a locus on chromosome 7, termed PSPHL, that harbors a segment of DNA that can be either present or missing from the human genome. When it is present, the PSPHL gene is expressed in the prostate and may function through the expressed products. When it is absent, the gene is not expressed in the prostate or any other tissues in the body because there is no genetic code to start with. Furthermore, it has been found that this segment of DNA is present in -96% of healthy African Americans but deleted in most healthy Americans of European descent.

Genetic differences between humans is shaped by the common ancestry of humans in Africa some 50,000 years ago. Rare forms of genomic variations that seem to be skewed in its distribution among different populations are, in certain cases, more commonly associated with medically relevant traits. Many such genetic differences have been analyzed in detail. One clear example is the genes that confer resistance to malaria. Another example that may be applied to human prostate cancer, is the use of ancestry informative markers to identify prostate cancer genes in admixture studies. .

The present invention provides methods of detecting the presence or absence of a nucleic acid segment in the (Phosphoserine phosphatase-like) PSPHL gene locus of a subject, wherein the presence or absence of the nucleic acid segment in the gene locus indicates an altered risk of cancer.

In certain embodiments, the invention provides biomarkers for prostate cancer in an African American subject comprising an insertion in the PSPHL gene locus, wherein the presence of the biomarker is correlated with a decreased risk of prostate cancer. Accordingly, the present invention presents a biomarker or biomarkers that are differentially present in samples of prostate cancer subjects and control subjects, or in subjects of different populations, or in subjects at different stages of cancer, e.g. prostate cancer, progression, and the application of this discovery in methods and kits for determining the presence of prostate cancer. These biomarkers are found in samples from prostate cancer subjects at levels that are different than the levels in samples from subject in whom prostate cancer is undetectable. Accordingly, the amount of the biomarker, or, one or more biomarkers, found in a test sample compared to a control, or the presence or absence of one or more markers in the test sample provides useful information regarding the cancer status of the subject.

In accordance with one embodiment of the invention, a set of genes whose expression correlates with expression of the PSPHL gene product were identified. Further, in preferred embodiments, the homozygous deletion allele is associated with the expression of a set of genes. In other embodiments, the heterozygous deletion allele is associated with the expression of a set of genes. These genes may also be useful to determine disease onset/progression, to determine prognosis, to determine risk of recurrence and to determine course of therapy in subjects having routine health screenings, routine prostate cancer screenings, in those suspected of having prostate cancer, for those with known a risk of prostate cancer, for those previously treated for prostate cancer. The genes described herein are also useful as novel therapeutic targets.

In one embodiment, the absence of the nucleic acid segment in the PSPHL gene product indicates an increased risk of prostate cancer in an African American subject. The absence of the nucleic acid segment as described herein is useful, for example, to predict disease progression. The claimed methods allow for earlier detection of disease recurrence/progression and therefore earlier treatment of subjects with recurrent/progressive disease. In addition, knowledge of genetic changes that occur in prostate cancer enable the design and screening for targeted therapeutic agents that interact with the targets. The interaction may be direct or indirect. Therapeutic agents are agents that improve survival in subjects with disease, including advanced disease. Provided herein are methods of detecting the presence or absence of a nucleic acid segment in the PSPHL gene locus of a subject, methods of identifying a subject at risk for developing prostate cancer, methods of determining the prognosis of a patient with prostate cancer, biomarkers for prostate cancer, and microarray technologies to identify molecular and genetic defects associated with prostate cancer onset or progression, and to correlate the expression of the biomarkers with the presence or stage of disease, thus providing diagnostic and prognostic markers for this disease. Such markers are useful clinically to determine therapeutic strategies for subjects and guide subject treatment.

Definitions Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them unless specified otherwise.

The term "set of genes" refers to the one or more genes. In certain embodiments, one or more genes is particularly expressed when the nucleic acid segment in the PSPHL gene locus is present. In other embodiments, one or more genes is particularly expressed when the nucleic acid segment in the PSPHL gene locus is present. The set of one or more genes expressed when the nucleic acid segment in the PSPHL gene locus is present may be overlapping, may be the same, or may be different (e.g. the set of genes may have one, two three or more genes in common). The "set of genes" may refer to genes whose expression level, alone or in combination with other genes, is correlated with cancer or prognosis of cancer, for example prostate cancer. The correlation may relate to either an increased or decreased expression of the gene. For example, the expression of the gene may be indicative of cancer, or lack of expression of the gene may be correlated with poor prognosis in a cancer patient. The term "detect" refers to identifying the presence, absence or amount of an object or molecule.

The term "nucleic acid" or "nucleic acid segment" as used herein refers to any nucleic acid containing molecule, including but not limited to, DNA or RNA. The term encompasses sequences that include any of the known base analogs of DNA and RNA including, but not limited to, 4-acetylcytosine, 8-hydroxy-N6-methyladenosine, aziridinylcytosine, pseudoisocytosine, 5-(carboxyhydroxylmethyl) uracil, 5-fluorouracil, 5-bromouracil, 5- carboxymethylaminomethyl-2-thiouracil, 5-carboxymethylaminomethyluracil, dihydrouracil, inosine, N6-isopentenyladenine, 1 -methyladenine, 1 -methylpseudouracil, 1 -methylguanine, 1 -methylinosine, 2,2-dimethyl guanine, 2-methyladenine, 2-methyl guanine, 3-methylcytosine, 5-methylcytosine, N6-methyladenine, 7-methylguanine, 5-methylaminomethyluracil, 5- methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine, 5'- methoxycarbonylmethyluracil, 5-methoxyuracil, 2-methylthio-N6-isopentenyladenine, uracil- 5-oxyacetic acid methylester, uracil-5-oxyacetic acid, oxybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, N-uracil-5- oxyacetic acid methylester, uracil-5-oxyacetic acid, pseudouracil, queosine, 2-thiocytosine, and 2,6-diaminopurine. In preferred embodiments, the nucleic acid segment is part of the (Phosphoserine phosphatase-like) PSPHL gene locus of a subject. In further examples the nucleic acid segment comprises SEQ ID NO: 2. The term "gene" refers to a nucleic acid (e.g., DNA) sequence that comprises coding sequences necessary for the production of a polypeptide, precursor, or RNA (e.g., rRNA, tRNA). The polypeptide can be encoded by a full length coding sequence or by any portion of the coding sequence so long as the desired activity or functional properties (e.g., enzymatic activity, ligand binding, signal transduction, immunogenicity, etc.) of the full-length or fragment are retained. The term also encompasses the coding region of a structural gene and the sequences located adjacent to the coding region on both the 5' and 3' ends for a distance of about 1 kb or more on either end such that the gene corresponds to the length of the full- length nRNA. Sequences located 5' of the coding region and present on the mRNA are referred to as 5' non-translated sequences. Sequences located 3' or downstream of the coding region and present on the mRNA are referred to as 3' non-translated sequences. The term

"gene" encompasses both cDNA and genomic forms of a gene. A genomic form or clone of a gene contains the coding region interrupted with non-coding sequences termed "introns" or "intervening regions" or "intervening sequences." Introns are segments of a gene that are transcribed into nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or "spliced out" from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide. In certain preferred embodiments, the gene is Phosphoserine phosphatase-like (PSPHL). The phrase "Phosphoserine phosphatase-like (PSPHL) gene locus" is meant to refer to a gene locus on chromosome 7 that harbors a segment of DNA that can be either present or missing from the human genome. When the gene locus is present, the PSPHL gene is expressed in the prostate. When the gene locus is absent, the gene is not expressed in the prostate. Human PSPHL mRNA is encoded by GenBank Accession No. AJ001612. In certain embodiments, the PSPHL gene locus contains a nucleic acid segment comprising 133 base pairs of exon 1 of human PSPHL mRNA whose presence or absence corresponds to PSPHL expression.

As used herein the phrase "prostate cancer" refers to cancers of the prostate tissue and/or other tissues of the male genitalia, or reproductive or urinary tracts. As used herein, the term "gene expression" refers to the process of converting genetic information encoded in a gene into RNA (e.g., mRNA, rRNA, tRNA, or snRNA) through "transcription" of the gene (i. e., via the enzymatic action of an RNA polymerase), and for protein encoding genes, into protein through "translation" of mRNA. Gene expression can be regulated at many stages in the process. "Up-regulation" or "activation" refers to regulation that increases the production of gene expression products (i.e., RNA or protein), while

"down-regulation" or "repression" refers to regulation that decrease production. In addition to containing introns, genomic forms of a gene may also include sequences located on both the 5' and 3' end of the sequences that are present on the RNA transcript. These sequences are referred to as "flanking" sequences or regions (these flanking sequences are located 5' or 3' to the non-translated sequences present on the mRNA transcript). The 5' flanking region may contain regulatory sequences such as promoters and enhancers that control or influence the transcription of the gene. The 3' flanking region may contain sequences that direct the termination of transcription, post-transcriptional cleavage and polyadenylation.

As used herein, the term "primer" refers to an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, that is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product that is complementary to a nucleic acid strand is induced, (i.e., in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer is preferably single stranded for maximum efficiency in amplification, but may alternatively be double stranded. If double stranded, the primer is first treated to separate its strands before being used to prepare extension products. Preferably, the primer is an oligodeoxyribonucleotide. The primer must be sufficiently long to prime the synthesis of extension products in the presence of the inducing agent. The exact lengths of the primers will depend on many factors, including temperature, source of primer and the use of the method.

The phrase "determining a prognosis" or "providing a prognosis" refers to determining or providing information regarding the impact of the presence of cancer, for example prostate cancer, (e.g., as determined by the diagnostic methods of the present invention) on a subject's future health (e.g., expected morbidity or mortality, the likelihood of getting cancer, the risk of metastasis).

The term "measuring" means methods which include detecting the presence or absence of marker(s) in the sample, quantifying the amount of marker(s) in the sample, and/or qualifying the type of biomarker. Measuring can be accomplished by methods known in the art and those further described herein, including but not limited to microarray analysis (with Significance Analysis of Microarrays (SAM) software), SELDI and immunoassay. Any suitable methods can be used to detect and measure one or more of the markers described herein. These methods include, without limitation, mass spectrometry (e.g., laser desorption/ionization mass spectrometry), fluorescence (e.g. sandwich immunoassay), surface plasmon resonance, ellipsometry and atomic force microscopy.

"Detect" as used herein refers to identifying the presence, absence or amount of the object to be detected.

"Marker" or "biomarker" in the context of the present invention refer to a polypeptide (of a particular apparent molecular weight) or nucleic acid, which is differentially present in a sample taken from subjects having prostate cancer as compared to a comparable sample taken from control subjects (e.g., a person with a negative diagnosis or undetectable prostate cancer, normal or healthy subject). The term "biomarker" is used interchangeably with the term "marker." The biomarkers are identified by, for example, molecular mass in Daltons, and include the masses centered around the identified molecular masses for each marker, affinity binding, nucleic acid detection, etc.

A marker can be a polypeptide, which is detected at a higher frequency or at a lower frequency in samples of unaffected tissue from prostate cancer subjects compared to samples of affected tissue from prostate cancer subjects. A marker can be a polypeptide, which is detected at a higher frequency or at a lower frequency in samples of human unaffected tissue from prostate cancer subjects compared to samples of control subjects.

A marker can be a polypeptide, which is detected at a higher frequency or at a lower frequency in samples of human affected tissue from prostate cancer subjects compared to samples of control subjects.

A marker can be differentially present in terms of quantity, frequency or both. "Subject" as used herein refers to any animal (e.g., a mammal), including, but not limited to, humans, non-human primates, rodents, and the like, which is to be the recipient of a particular treatment. Typically, the terms "subject" and "patient" are used interchangeably herein in reference to a human subject.

"At risk for cancer" refers to a subject with one or more risk factors for developing a specific cancer. Risk factors include, but are not limited to, ancestry, gender, age, genetic predisposition, environmental expose, previous incidents of cancer, preexisting non-cancer diseases, and lifestyle.

"Unaffected tissue," as used herein refers to a tissue from a prostate cancer subject that is from a portion of tissue that does not have gross disease present, for example tissue that is about 1, 2, 5, 10, 20 or more cm from grossly diseased tissue.

A polypeptide is differentially present between two samples if the amount of the polypeptide or nucleic acid in one sample is statistically significantly different from the amount of the polypeptide or nucleic acid in the other sample. For example, a polypeptide or nucleic acid is differentially present between the two samples if it is present at least about 25%, at least about 50%, at least about 75%, at least about 100%, 120%, at least about 130%, at least about 150%, at least about 180%, at least about 200%, at least about 300%, at least about 500%, at least about 700%, at least about 900%, or at least about 1000% greater than it is present in the other sample, or if it is detectable in one sample and not detectable in the other.

Alternatively or additionally, a polypeptide or nucleic acid is differentially present between two sets of samples if the frequency of detecting the polypeptide or nucleic acid in the cancer subjects' samples is statistically significantly higher or lower than in the control samples. For example, a polypeptide or nucleic acid is differentially present between the two sets of samples if it is detected at least about 25%, at least about 50%, at least about 75%, at least about 100%, at least about 120%, at least about 130%, at least about 150%, at least about 180%, at least about 200%, at least about 300%, at least about 500%, at least about 700%, at least about 900%, or at least about 1000% more frequently or less frequently observed in one set of samples than the other set of samples.

"Diagnostic" means identifying the presence or nature of a pathologic condition, i.e., cancer. Diagnostic methods differ in their sensitivity and specificity. The "sensitivity" of a diagnostic assay is the percentage of diseased individuals who test positive (percent of "true positives"). Diseased individuals not detected by the assay are "false negatives." Subjects who are not diseased and who test negative in the assay, are termed "true negatives." The "specificity" of a diagnostic assay is 1 minus the false positive rate, where the "false positive" rate is defined as the proportion of those without the disease who test positive. While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis.

A "diagnostic amount" of a marker refers to an amount of a marker in a subject's sample that is consistent with a diagnosis of cancer. A diagnostic amount can be either in absolute amount (e.g., μg/ml) or a relative amount (e.g., relative intensity of signals). A "control amount" of a marker can be any amount or a range of amount, which is to be compared against a test amount of a marker. For example, a control amount of a marker can be the amount of a marker in a person without cancer. A control amount can be either in absolute amount (e.g., μg/ml) or a relative amount (e.g., relative intensity of signals). As used herein, the term "sensitivity" is the percentage of subjects with a particular disease. For example, in the cancer group, the biomarkers of the invention have a sensitivity of about 80.0%-98.6%, and preferably a sensitivity of 85%, 87.5%, 90%, 92.5%, 95%, 97%, 98%, 99% or approaching 100%.

As used herein, the term "specificity" is the percentage of subjects correctly identified as having a particular disease i.e., normal or healthy subjects. For example, the specificity is calculated as the number of subjects with a particular disease as compared to non-cancer subjects (e.g., normal healthy subjects). The specificity of the assays described herein may range from about 80% to 100%. Preferably the specificity is about 90%, 95%, or 100%.

The terms "polypeptide," "peptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an analog or mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers. Polypeptides can be modified, e.g., by the addition of carbohydrate residues to form glycoproteins. The terms "polypeptide," "peptide" and "protein" include glycoproteins, as well as non- glycoproteins.

"Antibody" refers to a polypeptide ligand substantially encoded by an immunoglobulin gene or immunoglobulin genes, or fragments thereof, which specifically binds and recognizes an epitope (e.g., an antigen). The recognized immunoglobulin genes include the kappa and lambda light chain constant region genes, the alpha, gamma, delta, epsilon and mu heavy chain constant region genes, and the myriad immunoglobulin variable region genes. Antibodies exist, e.g., as intact immunoglobulins or as a number of well- characterized fragments produced by digestion with various peptidases. This includes, e.g., Fab" and F(ab)"2 fragments. The term "antibody," as used herein, also includes antibody fragments either produced by the modification of whole antibodies or those synthesized de novo using recombinant DNA methodologies. It also includes polyclonal antibodies, monoclonal antibodies, chimeric antibodies, humanized antibodies, or single chain antibodies. "Fc" portion of an antibody refers to that portion of an immunoglobulin heavy chain that comprises one or more heavy chain constant region domains, CHl, CH2 and CH3, but does not include the heavy chain variable region.

The phrase "specifically (or selectively) binds" to an antibody or "specifically (or selectively) immunoreactive with," when referring to a protein or peptide, refers to a binding reaction that is determinative of the presence of the protein in a heterogeneous population of proteins and other biologies. Thus, under designated immunoassay conditions, the specified antibodies bind to a particular protein at least two times the background and do not substantially bind in a significant amount to other proteins present in the sample. Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein. For example, polyclonal antibodies raised to marker "X" from specific species such as rat, mouse, or human can be selected to obtain only those polyclonal antibodies that are specifically immunoreactive with marker "X" and not with other proteins, except for polymorphic variants and alleles of marker "X". This selection may be achieved by subtracting out antibodies that cross-react with marker "X" molecules from other species. A variety of immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein. For example, solid-phase ELISA immunoassays are routinely used to select antibodies specifically immunoreactive with a protein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual (1988), for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity). Typically a specific or selective reaction will be at least twice background signal or noise and more typically more than 10 to 100 times background.

PROSTATE CANCER MARKERS The present invention is based upon the discovery that the presence or absence of a nucleic acid segment in the PSPHL gene locus of a subject indicates an altered risk of cancer, in particular prostate cancer, and the application of this discovery in methods and kits for determining the risk of prostate cancer. Some of these markers are found at an elevated level and/or more frequently in samples from prostate cancer subjects compared to a control (e.g., subjects with diseases other than prostate cancer). Accordingly, this novel structural variation of the PSPHL locus that is tightly linked to gene expression and demonstrates unusual patterns of population differentiation provides useful information regarding probability of whether a subject being tested is at risk for prostate cancer, and has prognostic value. The invention further provides biomarkers that find use in the diagnosis and characterization (e.g. the determination of risk of developing) prostate cancer.

Detection

The invention provides methods of detecting the presence or absence of a nucleic acid segment in the (Phosphoserine phosphatase-like) PSPHL gene locus of a subject, where the presence or absence of the nucleic acid segment in the gene locus indicates an altered risk of cancer, for example prostate cancer.

Prostate cancer disproportionately affects men of African descent. In certain embodiments of the invention, the presence or absence of the nucleic acid segment in the PSPHL gene locus is detected in an African American subject, where the absence of the nucleic acid segment indicates an increased risk of prostate cancer in the African American subject.

As described herein, the nucleic acid segment is in the PSPHL gene locus of a subject. In particular examples, the nucleic acid segment comprises 133 base pairs of exon 1 of human PSPHL mRNA encoded by GenBank Accession No. AJOOl 612 corresponding to SEQ ID NO: 1. In more particular examples, the nucleic acid segment comprises SEQ ID NO: 2. SEQ ID NO: 2 is set forth below:

SEQ ID NO: 2 aagccacaggctccctggctggcgtcagctaaagtggctgttgggtgtccgcaggcttct 61 gcctggccgccgccgcctataagctaccaggaggagctttacgacttcccgtcctgcggg 121 aagtggcgggcacgatcgcaaggtagcgcagaagcttctcaatggccagcgccagctgca 181 gccccggcggcgcactcgcctcacctgagcctgggaggaaaattcttccaaggatgatct 241 cccactcagagctgaggaagcttttctactcagcagatgctgtgtgttttgatgttgaca

301 gcacggtcatcagtgaagaaggaatcggatgctttcattggatttggaggaaatgtgatc 361 aggcaacaagtcaaggataacgccaaatggtatatcactgattttgtagagctgctggga 421 gaaccggaagaataacatccattgtcatacagctccaaacaacttcagatgaatttttac 481 aagttacacagattgatactgtttgcttacaattgcctattacaacttgctataaaaagt 541 tggtacagatgatctgcactgtcaagtaaactacagttaggaatcctcaaagattggttt

601 gtttgtttttaactgtagttccagtattatatgatcactatcgatttcctggagagtttt 661 gtaatctgaattctttatgtatattcctagctatatttcatacaaagtgttttaagagtg 721 gagagtcaattaaacacctttactcttaggaatatagattcggcagccttcagtgaatat 781 tggtttttttccctttggtatgtcaataaaagtttatccatgtgtcagaaaaaaaaaaa

SEQ ID NO: 2 is the transcribed mRNA sequence. However, genomic sequences that are not transcribed can also be used to detect the presence of the insertion allele. For example, SEQ ID NO: 13, SEQ ID NO: 14 and SEQ ID NO: 15 as set forth below can be used to as markers of the presence of the insertion allele.

SEQ ID NO: 13

CAGTGCAGCTGAGATCAGACTTCCATGTGTAACTCCCACTACCCTACCAG GATGCCTTTTCATAAAGGTAAGAAATGTAAATTTGGCCTTAATATACAAA GTTGCCAGGGCAGCACTGGGTCAGTTCTACATATAGTACTTCTACGTTCAT CAGCGGAAACTTTAAGGGAAGGTGAAAATGCTTCTAGAAGGCGACTGGA CACCAGCGCCTTTGGGCTCTTCTTCTAAGGCCAATAGTGACCTAAATTATT

GACTGACTGCTCCAATCAAGTGGGCAAAAGGGTACCAAGGCCGCCAACAT CAGACAAATTCACTTGAGGGCCTACCTATGTGCTTTGAAAGACAAAACTG CTGTTGTGAAGGACACTGTATTTCAGAAAAACATAATCATATTAACAACT AGTAACAATGTAAAATGCTGATGTGTTGAATGCTACTTTAGAAAAACATG TTAAAATCTACAAAAAAAAATTTATGATACAAAACTACGTTATCAATCAT

CTAGCTAGCTAACTATCTACAGACATGGTTTTCATTTCTGTTGCTCAGGAT GGAAAGCAGTGGGATGATCATAGCTCACTGCAGCCTTGAGCTCCTGGCCT CAAGTGATCCTCCTGCCTCCTAAGTAGCTGGGGCCACAGGTGGACACAGT GACACCTGGGTTTTTTCTTTTGTAGAGACAGGGTTTCACTACACTGCCCAG GCTGGTGTCAAACTTTGGAGTCTCGCTGTGTCACCCAGGCTGGGGTGCAGT GGTGGGGATCTCGGCTCACTGCAACCTCTGCCACCTGGGTTCAAGCGATT ACTTCCTGTATCAGCCTCCCGAGTAGCTGGGACTACAGGCATGTGCCACC ACACCCGGCTAATTTTTGTATTTTTTGTAGAGAACTGGGTTTCACCATGTT

GGCCAGGCTGGTCTCAAACTCCTGACCTCAGGTGATCCACCTGCCTCGGCC TCCCAAAGTGCTGGGATTACAGTCATGAGCCGATGAACACTTTCTTATGCT ATTAAATGGCCTAACCCAGGCGGGTCGTGGTGGCTCACGCCTGTAATCCC GAAAACTCGGATGGCCAAGGTAAGAAGATCACTTGAACCTAGGAATTCCA AACTGGCCTGGGCAACATAGCGAGACTCCCATCCCTACAAAAGATACAAA

AATTAGGCCTGGCGCACACCACGCTCGGCTAGTTTTTGTACCTTTTGTAGA GACAGGGTTGCGTCATGTTGCCTAAGCCGGTCTCGAACTCCTGAACCCAA GCCATCCATCCTCCCGCCTCGGCCTCCCAAAGTGCTGGAGATTACAGGGG CCCAGCCAGCCTCATGTTTTCCTTTTAAGCAGTCCCTTCCCTGTTGCACACT TGGGTTAGTTTTCTTTTTAATTTTTTTAAAACAGGGGGTTACCTCAATCTCG

CAGCCTGGAGTGCTGTGGTGGGATCACAGCTCATTGGAGCCTTGAACCTT GGGGTTCAAGTAGCTGGGGGGCTGAGGTAGGACTACAGAGATGGGGTCC CGCCATGTTGCCAGGCTGCTCTTGGCCTGAAGGGCATCTCCCGCTGGCCGT GCCCGGACATAGTTTTCTATTTTTGACCGACATAAACACTGTGCTGAGTCG GTGTTTGTCAACACACAGGACCTGGCGGGGAGGTCGCGGTTACCAGGCTC

CACTCTAAGTAGAAGACTGCCCAGCTCCAAGCACTGTACCTCCCGGTGAC GTCGCCGAACGCCCGCCCTGTGACGATACCTAAGGCCCACCTTCATGACG CCGCCGAAGGCCCGCCCCTGTGACGCCGCCGGAGGCCCGCCCCTCACGCG GAGCCAATCGGAACTCGAGGCGGGGCTGTTGGGTCTTCGGGAGCGCGCAT GCGCGGGGGGCCACAGGCTCCCTGGCTGGCGTCAGCTAAAGTGGCTGTTG

GGTGTCCGCAGGCTTCTGCCTGGCCGCCGCCGCCTATAAGCTACCAGGAG GAGCTTTACGACTTCCCGTCCTGCGGGAAGTGGCGGGCACGATCGCAAGG TAGCGCAGAAGCTTCTCAATGGCCAGCGCCAGCTGCAGCCCCGGCGGCGC ACTCGCCTCACCTGAGCCTGGgtACGTGCAGCCCCACAACACCTTCCCCAG CCAGGGCCCGGGGACCCCGGGAGCGTCCCCCGCCACCTGGCGCCGCTCAT

ACCTGGGCAAGGGTGGGACCCCACTGAGGCCCGCCACGCATTAGGGAGCT TGCACTTCCCGAGTTTTGACCTCTGACGGGCAGTTGTAATAGCATTAAAGT TTTTGAAATTTTGTAGCGGGGGTAGAAGGGGCTTGGAAAGGGAAGAAAAC ATCTTTTAAAATATAACGTTCCGGCCGGGCCCGGGGGTTAACCCTTTAATC CACATTTGGGAGGCCAGGCAGGGGATTACGAGGTAGGAGTTAAGACCAC CTGGCCGCATGGGGAACCCTGTTTTACTAAA

SEQ ID NO: 13 contains exon 1 and is assembled from sequences gnl|ti|945553014, gnl|ti|949228349, gi|41830605|gb|AADD01081234.1, and gnl|ti|954837275.

SEQ ID NO: 14

GAGATATAGATGGGGAGATTTTTAATTTAGTTTTTTATTTAAAATTGTGTT

TATAAGGAAAGAGATTATTATGTNTTTTGATAGAGAATTTGGGAGAGTTT GTTTAATTTAGAGAGAGATGGGTTAGNAAGATTTATAGGGTTGGANTGTG

ATTTANGAAAAGATTTAAGGTGTTTGGAGAATAATTTTTGGGGAGGAATT TTTGGAATAAAAAGAAAANTTTGGTAAAAANGGTGGATTTGGTTTTANAT GAAATAAAAAAATTAACCGGATGTGGTTGCACACGCCCTGTATNCCCAGC TACTCAGGAGGCTGAGGCACGAGAATCACTAGAACCCAGGAGGTGGAGG TTGCAGTAAGCCAAGTTCGTGCCACTACCCTCCAGCCTGGGCAACAGAGT

AAGACTCCATCTAAAAAAAAAATGAAGAAGAAGAAATTAGTGTAGTGTG GGAAGTGAAAAAAAAAAAAAGAAAAGGAAAAGAAAAATGATTGAATTC ATGAATACACTCTTATGTGGCCTGCACCGACTTTGACACAAATTAGATTGG CTTAGTAGGCAΛGGGTGGGATCTTTTCATAATTTTATTTGATGTCTAAAAT ACATTTATCTTTTTTTCTTATAGGAGGAAAATTCTTCCAAGGATGATCTCC

CACTCAGAGCTGAGGAAGCTTTTCTACTCAGCAGATGCTGTGTGTTTTGAT GTTGACAGCACGGTCATCAGTGAAGAAGGAATCGGTGAGCTAGCCAAAAT CTGTGGTGTAGAGGACGCGGTGTCAGAAATGTAGGGATAGCATTTATTCA CTTTATGAAATGATAAAGAATTTTTTTTTTTTTTTTTTTTGAGACAGAGTCT CACTCTATTGCCCAGACTGGAGTGCAGTGGCACAATCTTGGCTTACTGCAA

CCTCCACCTCCCAGGTTCAAGCGATTATCCTGCCTCAGCCTCCTGAGTAGC TGGGATTACAGGCGTGTGCAACCACATCCGGCTAATTTTTTTATTTTTAGT AGAGATGGAGTTTCACCATGTTGGTCAGGCTGGTCTCAAACTCCTGACCTC GTGATCCACCCACCTCAGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGC CACCATGCCCGGCCATGAAATGATAAAGAATTTCTAAAGGGTGGCTGTTT

TGGATGAAGTGCTGGACCCTGGCTATAAAAGATGAGCACTAGGCTTTTTC TCTCACCCCTTAGAGTTGAATTTGATATATTGAGAACTNTGCTATCGCTCC NN SEQ ID NO: 14 comprises a Trace sequence that has exon 2 (red underlined), and the underlined 3' sequence overlaps with gi| 14818470 l |gb|ABBA01007462.1 | Homo sapiens CTG_1103276812568, whole genome shotgun sequence.

SEQ ID NO: 15

CACCCACCTCAGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCAT GCCCGGCCATGAAATGATAAAGAATTTCTAAAGAGTGGCTGTTTTGGATG AAGTGCTGGACCCTGGCTATGGAACATGAGCACTAGGCTTTTTCTCTCACC CCTTAGAGTTGAATTTGATATATTCATTCATTTATATAAATATTTATGGCA AAAAAATATGGATAAGACATGTTCTTAGCTTGATTAGGGGAATGATGACC

ATCACTAACCAAAGAAGATGGCCTAGGGGGCAGGGCTCAAGTGTCCAGTG GCTTTGTGTCATGGTTGAAAGTGTGGCCTTAGTGGTTCCCTGGAAAGGTTC AGAGTCTCTTGGTGGATCCTGGGTCAGAGCCCCTCTCTCCCTCCCTCCCTC CTGCCCTCCCACCTCCTGCCCTGCAGCTGGGCACCACCCTCTGCAGCCCCA GTCCCCCAGTCATGCACCATGTCATTTTCTTTTTTTTTTTTTTTTCAGGACGG

AGTCTCGCTCTGTCACCAGGCTGGAGTGCAATGGTGCAATCTCGGCTCACT GCAACCTCCGCCTCCTGGGTTCAAGCAATTCTCCTGCCTCCACCTCCTGAG TAGCTGAAACTACAGGCACGCACCACCACACCCGGCTTATTTTTGTATTTTT AGTAGAGATGGTGTTTCACCATGTTGGCCAGCCTGGTCTTGAACTCCTGAC CTCGTGATCTGCCGGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCATG

AGCCACCACACTCGGCCACCATTTTATTTTCAACTCCCTTTCATGGAAGAA CGTTTAGCCTTTGGTCTCTTTCTTGATTTATGACAGCTCGCGGCTTTCAAGA AAACTACCTATGAATAGGCTGTGTAACTTTTATTTATTTATTTATTTTTTGA GATGGAGTCTCTGTCTTCTAGGCTGGGGTGCAGTGGCATGATCTCGGCTCA CTGCAACCTCCCCCTCCCAGGTTCCAGCAATTCTCCTGCCTCAGCTTCCCA

AGTAGCTGGGATTACAGGCATGCACCACCACACCTGGCTAATTTTTGCATT TTΓTTTTTATTAGAGATGGGGTTTTGTCATGTTGGCCAGGCTGGTCTCGAA CTCCTGGCTTCAGATGAGCTGCCCACCTCAGCTTCCCAAAGTTTTGGCATT ATAGGCATGAGTCACTGCACCCAGCTTGTTAATGTTATTTTCAAGCACACC TTCAAAAGTTTATTCCAAGGCCTTGCTCTCATAGCAGCAAAGCCTGTCTGT

TGCATAGTGGGGCTCTTGTTAGCATCTTCCTTTTGGTGGATGTTGTGTAGA CCCGGGGCAGTAAGGCATGTTAACCTCGAGGACATTGGACCTGGCTGGCC TGACTGTTGGGTCTCCCTCCTAGGACATGGCGAGCCATGGGTGGGGCAGT GTCTTTCAAAGCTGCTCTCACGGAGCACTTAGCCCCAATCCAGCCCTCCAG OXVOOOVW-LXWOOIOOIOVWDOOXOOOOJJLDOVXOI-LLLVOIOVVOID

DOXXDDXOOVVDXDXXOXDOOVODDOXXOXOXDOXXOXWOOXVOVOVXOX

XDXXDXWOWXXXDDOVDDDVXVXDVOVOWXXXOOVXVXXVVOOXXOXO

WVDDDXDOOVDXDWDVDXXXXWODVOXOWXXDOOOXOXXDWOXXDX

OOXDOOVODDOXXVDVXDVDXDXVDOVXVDVOVXVXXXXXXXWVWWW 0£

VXXXXWXDDOXVDDVDDDOOXDDODODDVDXOVOXODOOVDVXXVOOOX

DOXOVWDXDXDOOVDXDDOXDDVODXVOXOXXDDVOXDDXDXWDXDXOO

XVOOVDDOVXXOXVODVDXXXOOOODVOVOVXWXXXXXVXOXXXXXXWX

DVOXDDVDVDOVOOOXODODOOVDVXDVOOOXDOVXOVODDOXDDOVDXD

DDXXDXDXXVDDODVDXXOOOXODXDDVDOXDOWOOXDVDXDOOXXDXVD SZ

DODOOXOVDOXOVOOXDOOVDDOOXXOXDXDODXOXOVOOXVOVOXXXXXX

XXXXXXXXXXXXWDDOXVDDVDDOOVXVOVOVDVXDVOOOXDOVXOWXD

DXXDOXDXDDOXDOXDDXVOOOWDXDOOOXDXXDXVVDXVDOVDOXDVDX

DOVDVDWOXVDOOXOVDOXOVOOXDOOVDDDVOXOXXXDOXXDXOVWO

XOXXXXXXXXXXXXXWOVWXXWXDOVDDDOXODDVODOXWOXOOVDO OZ

XDVOOOVXOVXOVXXODXOOOVXXDDVDDDXDDDOVXOWDXOOOOXDDXD

WODXDDOVOOXDVDDDOOXVDXWXVXOOXOVDVXOVOOXXOOVODDOX

XOXDXXVDXDXOVOVDXXVOVXXXXXXXXXXXVXXVXXOOOXDDVDVODVDX

OVOXOXOOVDVXXVOVOXDOXODWDDDXDDOODXOOOXDDVODXVOXOV

VDXDXVDXXDXDWODXDXOOXDOOVDDOOXXOXVDOODXXXOOOXVOVOV Ζ I

XOVXXXXXVXOXXXXXVVXDOOXDDVOVDDVDXOXDDOOOOVDVXXVOOOX

DOVXOVOOVDXOOOVDXDDOXDXXDXXVODVDVDXXOOVDDDXDDOXXXXD

WDOXDVOXDWDXOXWOOXOOXOVDOXOVOOXDOOVDDDODXOXDXOV

OXDXOVWDVOVOXXXXXXXXVXXXVXXXVXXXXWWDOOXDDODOXDVOD

OVOXODOOVDVDXVOOOXDOXOVWDDDXDVOVDXDDOODOOXDXOOXOO O I

VOXDOVOXDDXDWODXOXOOXDOOVXXXOXXOXVDDVOXXXVOOVDVOVO

VXOVXXXXXVXOXXXXXOVXDOOVDDOXVDDVDDDVDOOVDXXXVOOOXXO

VXOVOXDDXDDOVOXOOOXOOXDXXVOXOWDXXOOOODDXDOOXDXXDW

DOXDVDXDOODXVXVODVXOOXWDOXOVOOXDVOVODDODXOXOXDODX

DXOVOVDOOVOXXVXXXVXXXVXXXVXXXVXXXVXXXVXXXVXXXVXXXVXXX G

VXVVVVVXXXXVXXVXXXXVVXXXXXXVXXXXXVDXXXXVXXXVXXXOOXOOO

XDXDDDOVOVOVXXXOOXXVODWXDXVDOVOVOXODXDDXOXVDOVOOO

OXXOXOVDDVDOVOXXXVXOXOOVDDXDOXDODDOVOOVOWXOWXVDO

VDDDDDVOXDDVDVDDDDDOVOOVOVDOVXVDXDVOVOVDOXOOVDOVOO

86/.800/800ZSfl/13d ΪΪ6ΪΪ0/600Z OΛV GACTGTGTGGTTCTTCTGGCACTTACACGTGGTCTTGTCTGGCCGGTTGTC TGGTCCTGTCTGTTTCTGCCTTTCCTCTTTCTCCAGGGAAAACCTAAGCTTT CCTTTTTGTCCTCATCTTGTGTTTTTCTGGGTCCATGGGCAGAGTAGAGTTC TAGAATGGTTTCCTAAAGCAGCCAAGCCCTACCCTTTGATTTCTAAGTACA TTTAGAAAAACACATATAAGGCTGGGCATGGTGGCTCACGCCTGTAATCC

CAGCACTTTGGGAGGCCGAAGTGGATGGATCACCAGGTCAGGAGTTCGAG ACCAGGCTAACCAACATGATGAAACCCTGTCTCTACTAAAAATACAAAAA TAAGCCGGGCACGGTAGCTCACGTCTGTAATCCCAGCACTTTGGGAGGCC AAGATGGGTGTATCACCTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAAC ATGGTGAAACCCAGTCTCTACTAAAAATACAAAAAAAGTAGCTGGGCGTG

GTGATAGGCGCCTGTAATCCCAGCTACCCAGGAGGCTGAGGGAGGAAAAT CACTGGAACCCGGGAGGCAGAGGTTGCTGTGAGCCGAGATCATGCCAGTG CACTCCAGCTAGGGCAACAGAGCAAGATTCCATCTCAAAAAAAAAAAAA AAAAAAATTAGCCGGGCATTGTGGCATGGGCCTATAATACTCAGGAGGCT GAGGCAGGAGAAGCGCTTGAACCCGGGAGGTAGAGGTTGCAGTGAGCCA

AGATTGCAGCCTTGCACTCCAGCCTGGGTGACAAGAGTGAAACTCTGTCT CAAAAAATAAAGAGAAAAACATATAGAAAACATTAACACCCCAGGCAGT ATACCTTGTCAAACATACCTCAGGCAAATGCATTCAGGAGAAGAAAATAC ATCTTATTTCCCTCTTCATGTTTCGTTTTTTTTTTT^ TCAGTGTTGGGTATTTTGTATTTTATTTTGCAGGGAGCTGGTAAGTCGCCT

ACAGGAGCGAAATGTTCAGGTTTTCCTAATATCTGGTGGCTTTAGGAGTAT TATAGAGCATGTTGCTTCAAAGCTCAATATCCCAGCAACCAATGTATTTGC CAGTAGGCTGAAATTCTACCTTAATGGTAAGATGTTAACGGTAACATGTTC CCTTTCTTAGCAGTTCCATTATTCAGTATTCTGGGTAATGTCTTTTGGAATG CAACTTGAACAGTCACATCAGAGTTAAATATTGAGATGAATGGTTCTCTTT

TTTTTTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGTCGCCCAGGCTGG GGTGCAGTGGCGCGATCTCGGCTCACTGCAAGCTCCCCCTCCCGGGTTCAC GCCATTCTCCTGCCTCAGCCTCCCCAGTAGCTGGGACTACAGGCGCCCGCT ACCATGTCAGGATAATTTTTTGTATTTTTAGTAGAGACGGGGTTTCTCCGT GTTAGCCAGGATGGTGTCGATCTCCTGACCTCGTGATCCGCCCGCCTCGGC

CTCACAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCCGAGAT GAATGGTTCTCTTAATTGATGTCTTTTGCCCTTTGGTACCCTTTGCTCAGCA AAACAGACTTAGATCATCACCTGTCTTAGCTTTATTATCTATAACATCACC TGTCTGTACATAAATGTCTGCATTTCG SEQ ID NO: 15 contains exon 3 (nucleotides 504-625) and 3' sequence. The underlined sequence overlaps with the 3' end of gnl|ti|226793227 (SEQ ID NO: 14). The presence of the insertion sequence can be detected by PCR of any of these sequences as set forth herein. In certain cases, the absence of the insertion sequence can be detected by absence of signal. In other cases, the absence of the insertion sequence can be detected by the presence of the deletion allele.

In still other cases, the presence of the deletion allele can be detected by PCR. Primers can be designed to the following exemplary sequences:

SEQ ID NO: 16

GTGGGCTCAGCCATCCTCCCAACTGAGCCTCCTGAGTAGCTGGGACTACA GGTGTGAGCCATCACACTCAACTGGTGTAGCTATTTTAGAAGACAAACTG GCAGTTTCTCAAAAGGCTAAACATACAGTCATCATATAATGCAACAATTT CACTCCTAGGCATATATCCCAGAGAAATAGAAATATATGTCCACACAAAA

ACTTGTACAGCAATCTTCATAGCAGCATTGTTCATAATAGCCAATACGTGG AAACAACCCAAATGTCCATCAACTGATGAACAGATAAACAAAATGCAGTG TGTCTCTACCATGGAATATTATTCAGCCACAGAAGAAATGAAATACTGAT ACACACTATGACATAAAGGAACTTTGAAAACATTGTGCTAAGAGGGAAAA AAAAGCCA

SEQ ID NO: 17 GTGGGCTCaAGCCATCCTCCCAACTGAGCCTCCTGAGTAGCTGGGACTACA GGTGTGAGCCATCACACTCAACTGGTGTAGCTATTTTAGAAGACAAACTG

GCAGTTTCTCAAAAGGCTAAACATACAGTCATCATATAATGCAACAATTT CACTCCTAGGCATATATCCCAGAGAAATAGAAATATATGTCCACACAAAA ACTTGTACAGCAATCTTCATAGCAGCATTGTTCgTAATAGCCAATACGTGG AAACAACCCAAATGTCCATCAACTGATGAACAGATAAACAAAATGCAGTG TGTCTCTACCATGGAATATTATTCAGCCACAGAAGAAATGAAATACTGAT

ACACACTATGACATAAAGGAACTTTGAAAACATTGTGCTAAGAGGGAAAA AAAAGCCA

SEQ ID NO: 18 GTGGGCTCaAGCCATCCTCCCAACTGAGCCTCCTGAGTAGCTGGGACTACA GGTGTGAGCCATCACACcCAACTGGTGTAGCTATTTTAGAAGACAAACTG GCAGTTTCT CAAAAGGCTA AACATACAGT CATCATATAA TGCAACAATTTCACTCCTAGGCATATATCCCAGAGAAATAGAAATATATG TCCACACAAAAACTTaTACAGCAATCTTCATAGCAGCATTaTTCATAATAG

CCAATACGTGGAAACAACCCAAATGTCCA TCAACTGATG AACAGATAAA CAAAATGCAGTGTGTCTCTA CCATGGAATA TTATTCAGCC ACAGAAGAAA TGAAATACTGATACACACTA TGACATAAAG GAACTTTGAA AACATTGTGC TAAGAGGGgAAAAAAAGCCA

Methods of the invention for determining the prostate cancer status, or the risk of developing prostate cancer of a subject, include for example, obtaining a biomarker profile from a sample taken from the subject; and comparing the subject's biomarker profile to a reference biomarker profile obtained from a reference population, wherein the comparison is capable of classifying the subject as belonging to or not belonging to the reference population; wherein the subject's biomarker profile and the reference biomarker profile comprise one or more markers as described herein.

The method may further comprise repeating the method at least once, wherein the subject's biomarker profile is obtained from a separate sample taken each time the method is repeated.

Samples from the subject may be taken at any time, for example, the samples may be taken 24 hours apart or any other time determined useful.

Such comparisons of the biomarker profiles can determine prostate cancer status or risk of prostate cancer in the subject with an accuracy of at least about 60%, 70%, 80%, 90%, 95%, and approaching 100% as shown in the examples which follow.

The reference biomarker profile can be obtained from a population comprising a single subject, at least two subjects, at least 20 subjects or more. The number of subjects will depend, in part, on the number of available subjects, and the power of the statistical analysis necessary. The invention includes methods of qualifying prostate cancer status in a subject comprising:

(a) measuring at least one biomarker in a sample from the subject, and

(b) correlating the measurement with prostate cancer status. The method may also comprise the step of measuring the at least one biomarker after subject management.

In a preferred embodiment, any one of the markers described herein or contemplated by the instant invention are used to make a correlation with the presence or absence of prostate cancer, wherein the prostate cancer may be any type or subtype of prostate cancer.

In another example, the biomarker is an insertion sequence corresponding to a nucleic acid segment in the PSPHL gene locus.

In another example, the biomarker is an insertion sequence set forth in SEQ ID NO: 2. Optionally, the methods of the invention may further comprise generating data on immobilized subject samples on a biochip, by subjecting the biochip to laser ionization and detecting intensity of signal for mass/charge ratio; and transforming the data into computer readable form; and executing an algorithm that classifies the data according to user input parameters, for detecting signals that represent biomarkers present in prostate cancer subjects and are lacking in non-prostate cancer subject controls. In some embodiments, the present invention provides methods for detection of the presence or absence of the nucleic acid segment in the PSPHL gene locus as described herein, wherein the absence of the sequence is associated with prostate cancer. In some embodiments, the presence or absence of the nucleic acid segment is detected in tissue samples (e.g., biopsy tissue). In other embodiments, detection is carried out in bodily fluids (e.g., including but not limited to, plasma, serum, whole blood, mucus, and urine). Exemplary methods are described below.

Direct sequencing Assays

In some embodiments of the present invention, a nucleic acid segment, for example, but not only limited to a nucleic acid segment in the PSPHL gene locus, is detected using a direct sequencing technique. In these assays, DNA samples are first isolated from a subject using any suitable method. In some embodiments, the region of interest is cloned into a suitable vector and amplified by growth in a host cell (e.g., a bacteria). In other embodiments, DNA in the region of interest is amplified using PCR.

Following amplification, DNA in the region of interest (e.g., the region containing the insertion, the region containing the SNP) is sequenced using any suitable method, including but not limited to manual sequencing using radioactive marker nucleotides, and automated sequencing. The results of the sequencing are displayed using any suitable method. The sequence is examined and the presence or absence of a given SNP is determined. PCR Assay

In some embodiments of the present invention, the presence or absence of a nucleic acid segment is detected using a PCR-based assay. In some embodiments, the PCR assay comprises the use of oligonucleotide primers that hybridize only to the insertion or deletion allele (e.g., to the region of polymorphism). Both sets of primers are used to amplify a sample of DNA. In certain embodiments, the subject is homozygous for a deletion in the PSPHL gene locus. In other embodiments, the subject is heterozygous for a deletion in the in the PSPHL gene locus.

Hybridization Assays

In preferred embodiments of the present invention, the presence or absence of a nucleic acid segment is detected using a hybridization assay. In a hybridization assay, the presence of absence of a given SNP is determined based on the ability of the DNA from the sample to hybridize to a complementary DNA molecule (e.g., a oligonucleotide probe). A variety of hybridization assays using a variety of technologies for hybridization and detection are available. A description of a selection of assays is provided herein.

(1) Direct Detection of Hybridization. In some embodiments, hybridization of a probe to the sequence of interest (e.g., a SNP or mutation) is detected directly by visualizing a bound probe (e.g., a Northern or Southern assay; See e.g., Ausabel et al. (eds.), Current Protocols in Molecular Biology, John Wiley & Sons, NY [1991]). hi a these assays, genomic DNA (Southern) or RNA (Northern) is isolated from a subject. The DNA or RNA is then cleaved with a series of restriction enzymes that cleave infrequently in the genome and not near any of the markers being assayed. The DNA or RNA is then separated (e.g., on an agarose gel) and transferred to a membrane. A labeled (e.g., by incorporating a radionucleotide) probe or probes specific for the SNP or mutation being detected is allowed to contact the membrane under a condition or low, medium, or high stringency conditions. Unbound probe is removed and the presence of binding is detected by visualizing the labeled probe.

(2) Detection of Hybridization Using "DNA Chip" Assays. In some embodiments of the present invention, the nucleic acid segment is detected using a DNA chip hybridization assay. In this assay, a series of oligonucleotide probes are affixed to a solid support. The oligonucleotide probes are designed to be unique to a given SNP or mutation. The DNA sample of interest is contacted with the DNA "chip" and hybridization is detected. In some embodiments, the DNA chip assay is a GeneChip (Affymetrix, Santa Clara, Calif.; See e.g., U.S. Pat. Nos. 6,045,996; 5,925,525; and 5,858,659; each of which is herein incorporated by reference) assay. The GeneChip technology uses miniaturized, high-density arrays of oligonucleotide probes affixed to a "chip." Probe arrays are manufactured by Affymetrix's light-directed chemical synthesis process, which combines solid-phase chemical synthesis with photolithographic fabrication techniques employed in the semiconductor industry. Using a series of photolithographic masks to define chip exposure sites, followed by specific chemical synthesis steps, the process constructs high-density arrays of oligonucleotides, with each probe in a predefined position in the array. Multiple probe arrays are synthesized simultaneously on a large glass wafer. The wafers are then diced, and individual probe arrays are packaged in injection-molded plastic cartridges, which protect them from the environment and serve as chambers for hybridization.

The nucleic acid to be analyzed is isolated, amplified by PCR, and labeled with a fluorescent reporter group. The labeled DNA is then incubated with the array using a fluidics station. The array is then inserted into the scanner, where patterns of hybridization are detected. The hybridization data are collected as light emitted from the fluorescent reporter groups already incorporated into the target, which is bound to the probe array. Probes that perfectly match the target generally produce stronger signals than those that have mismatches. Since the sequence and position of each probe on the array are known, by complementarity, the identity of the target nucleic acid applied to the probe array can be determined.

In other embodiments, a DNA microchip containing electronically captured probes (Nanogen, San Diego, Calif.) is utilized (See e.g., U.S. Pat. Nos. 6,017,696; 6,068,818; and 6,051,380; each of which are herein incorporated by reference). Through the use of microelectronics, Nanogen's technology enables the active movement and concentration of charged molecules to and from designated test sites on its semiconductor microchip. DNA capture probes unique to a given SNP or mutation are electronically placed at, or "addressed" to, specific sites on the microchip. Since DNA has a strong negative charge, it can be electronically moved to an area of positive charge.

First, a test site or a row of test sites on the microchip is electronically activated with a positive charge. Next, a solution containing the DNA probes is introduced onto the microchip. The negatively charged probes rapidly move to the positively charged sites, where they concentrate and are chemically bound to a site on the microchip. The microchip is then washed and another solution of distinct DNA probes is added until the array of specifically bound DNA probes is complete. A test sample is then analyzed for the presence of target DNA molecules by determining which of the DNA capture probes hybridize, with complementary DNA in the test sample (e.g., a PCR amplified gene of interest). An electronic charge is also used to move and concentrate target molecules to one or more test sites on the microchip. The electronic concentration of sample DNA at each test site promotes rapid hybridization of sample DNA with complementary capture probes (hybridization may occur in minutes). To remove any unbound or nonspecifically bound DNA from each site, the polarity or charge of the site is reversed to negative, thereby forcing any unbound or nonspecifically bound DNA back into solution away from the capture probes. A laser-based fluorescence scanner is used to detect binding.

In still further embodiments, an array technology based upon the segregation of fluids on a flat surface (chip) by differences in surface tension (ProtoGene, Palo Alto, Calif.) is utilized (See e.g., U.S. Pat. Nos. 6,001,311 ; 5,985,551 ; and 5,474,796; each of which is herein incorporated by reference). Protogene's technology is based on the fact that fluids can be segregated on a flat surface by differences in surface tension that have been imparted by chemical coatings. Once so segregated, oligonucleotide probes are synthesized directly on the chip by ink-jet printing of reagents. The array with its reaction sites defined by surface tension is mounted on a X/Y translation stage under a set of four piezoelectric nozzles, one for each of the four standard DNA bases. The translation stage moves along each of the rows of the array and the appropriate reagent is delivered to each of the reaction site. For example, the A amidite is delivered only to the sites where amidite A is to be coupled during that synthesis step and so on. Common reagents and washes are delivered by flooding the entire surface and then removing them by spinning.

DNA probes unique for the SNP or mutation of interest are affixed to the chip using Protogene's technology. The chip is then contacted with the PCR-amplified genes of interest. Following hybridization, unbound DNA is removed and hybridization is detected using any suitable method (e.g., by fluorescence de-quenching of an incorporated fluorescent group).

In yet other embodiments, a "bead array" is used for the detection of polymorphisms (Illumina, San Diego, Calif.; See e.g., PCT Publications WO 99/67641 and WO 00/39587, each of which is herein incorporated by reference). Illumina uses a BEAD ARRAY technology that combines fiber optic bundles and beads that self-assemble into an array. Each fiber optic bundle contains thousands to millions of individual fibers depending on the diameter of the bundle. The beads are coated with an oligonucleotide specific for the detection of a given SNP or mutation. Batches of beads are combined to form a pool specific to the array. To perform an assay, the BEAD ARRAY is contacted with a prepared subject sample (e.g., DNA). Hybridization is detected using any suitable method.

(3) Enzymatic Detection of Hybridization. ]In some embodiments, hybridization of a bound probe is detected using a TaqMan assay (PE Biosystems, Foster City, Calif; See e.g., U.S. Pat. Nos. 5,962,233 and 5,538,848, each of which is herein incorporated by reference). The assay is performed during a PCR reaction. The TaqMan assay exploits the 5'-3' exonuclease activity of DNA polymerases such as AMPLITAQ DNA polymerase. A probe, specific for a given allele or mutation, is included in the PCR reaction. The probe consists of an oligonucleotide with a 5'-reporter dye (e.g., a fluorescent dye) and a 3'-quencher dye. During PCR, if the probe is bound to its target, the 5'-3' nucleolytic activity of the

AMPLITAQ polymerase cleaves the probe between the reporter and the quencher dye. The separation of the reporter dye from the quencher dye results in an increase of fluorescence. The signal accumulates with each cycle of PCR and can be monitored with a fluorimeter. In still further embodiments, polymorphisms are detected using the SNP-IT primer extension assay (Orchid Biosciences, Princeton, N.J.; See e.g., U.S. Pat. Nos. 5,952,174 and 5,919,626, each of which is herein incorporated by reference). In this assay, SNPs are identified by using a specially synthesized DNA primer and a DNA polymerase to selectively extend the DNA chain by one base at the suspected SNP location. DNA in the region of interest is amplified and denatured. Polymerase reactions are then performed using miniaturized systems called microfluidics. Detection is accomplished by adding a label to the nucleotide suspected of being at the SNP or mutation location. Incorporation of the label into the DNA can be detected by any suitable method (e.g., if the nucleotide contains a biotin label, detection is via a fluorescently labeled antibody specific for biotin). Numerous other assays are known in the art.

Other Detection Assays

Additional detection assays that are suitable for use in the present invention include, but are not limited to, enzyme mismatch cleavage methods (e.g., Variagenics, U.S. Pat. Nos. 6,110,684, 5,958,692, 5,851,770, herein incorporated by reference in their entireties); polymerase chain reaction; branched hybridization methods (e.g., Chiron, U.S. Pat. Nos. 5,849,481, 5,710,264, 5,124,246, and 5,624,802, herein incorporated by reference in their entireties); rolling circle replication (e.g., U.S. Pat. Nos. 6,210,884 and 6,183,960, herein incorporated by reference in their entireties); NASBA (e.g., U.S. Pat. No. 5,409,818, herein incorporated by reference in its entirety); molecular beacon technology (e.g., U.S. Pat. No. 6,150,097, herein incorporated by reference in its entirety); E-sensor technology (Motorola, U.S. Pat. Nos. 6,248,229, 6,221,583, 6,013,170, and 6,063,573, herein incorporated by reference in their entireties); ESTVADER assay, Third Wave Technologies; See e.g., U.S. Pat. Nos. 5,846,717, 6,090,543; 6,001,567; 5,985,557; and 5,994,069; each of which is herein incorporated by reference; cycling probe technology (e.g., U.S. Pat. Nos. 5,403,711,

5,011,769, and 5,660,988, herein incorporated by reference in their entireties); Dade Behring signal amplification methods (e.g., U.S. Pat. Nos. 6,121,001, 6,110,677, 5,914,230, 5,882,867, and 5,792,614, herein incorporated by reference in their entireties); ligase chain reaction (Barnay Proc. Natl. Acad. Sci USA 88, 189-93 (1991)); and sandwich hybridization methods (e.g., U.S. Pat. No. 5,288,609, herein incorporated by reference in its entirety).

Mass Spectroscopy Assay

In some embodiments, a MassARRAY system (Sequenom, San Diego, Calif.) is used to detect the presence or absence of the nucleic acid segments as described herein. U.S. Pat. Nos. 6,043,031 ; 5,777,324; and 5,605,798; each of which is herein incorporated by reference, described Mass Spectroscopy assay. DNA is isolated from blood samples using standard procedures. Next, specific DNA regions containing the mutation or SNP of interest, about 200 base pairs in length, are amplified by PCR. The amplified fragments are then attached by one strand to a solid surface and the non-immobilized strands are removed by standard denaturation and washing. The remaining immobilized single strand then serves as a template for automated enzymatic reactions that produce genotype specific diagnostic products.

Very small quantities of the enzymatic products, typically five to ten nanoliters, are then transferred to a SpectroCHIP array for subsequent automated analysis with the SpectroREADER mass spectrometer. Each spot is preloaded with light absorbing crystals that form a matrix with the dispensed diagnostic product. The MassARRAY system uses MALDI-TOF (Matrix Assisted Laser Desorption Ionization-Time of Flight) mass spectrometry. In a process known as desorption, the matrix is hit with a pulse from a laser beam. Energy from the laser beam is transferred to the matrix and it is vaporized resulting in a small amount of the diagnostic product being expelled into a flight tube. As the diagnostic product is charged when an electrical field pulse is subsequently applied to the tube they are launched down the flight tube towards a detector. The time between application of the electrical field pulse and collision of the diagnostic product with the detector is referred to as the time of flight. This is a very precise measure of the product's molecular weight, as a molecule's mass correlates directly with time of flight with smaller molecules flying faster than larger molecules. The entire assay is completed in less than one thousandth of a second, enabling samples to be analyzed in a total of 3-5 second including repetitive data collection. The SpectroTYPER software then calculates, records, compares and reports the genotypes at the rate of three seconds per sample.

Data Analysis

In some embodiments, a computer-based analysis program is used to translate the raw data generated by the detection assay (e.g., the presence, absence, or amount of a given nucleic acid segment) into data of predictive value for a clinician. The clinician can access the predictive data using any suitable means. Thus, in some preferred embodiments, the present invention provides the further benefit that the clinician, who is not likely to be trained in genetics or molecular biology, need not understand the raw data. The data is presented directly to the clinician in its most useful form. The clinician is then able to immediately utilize the information in order to optimize the care of the subject. The present invention contemplates any method capable of receiving, processing, and transmitting the information to and from laboratories conducting the assays, information provides, medical personal, and subjects. For example, in some embodiments of the present invention, a sample (e.g., a biopsy or a serum or urine sample) is obtained from a subject and submitted to a profiling service (e.g., clinical lab at a medical facility, genomic profiling business, etc.), located in any part of the world (e.g., in a country different than the country where the subject resides or where the information is ultimately used) to generate raw data. Where the sample comprises a tissue or other biological sample, the subject may visit a medical center to have the sample obtained and sent to the profiling center, or subjects may collect the sample themselves (e.g., a urine sample) and directly send it to a profiling center. Where the sample comprises previously determined biological information, the information may be directly sent to the profiling service by the subject (e.g., an information card containing the information may be scanned by a computer and the data transmitted to a computer of the profiling center using an electronic communication systems). Once received by the profiling service, the sample is processed and a profile is produced (i.e., expression data), specific for the diagnostic or prognostic information desired for the subject.

The profile data is then prepared in a format suitable for interpretation by a treating clinician. For example, rather than providing raw expression data, the prepared format may represent a diagnosis or risk assessment (e.g., likelihood of cancer being present or the subtype of cancer) for the subject, along with recommendations for particular treatment options. The data may be displayed to the clinician by any suitable method. For example, in some embodiments, the profiling service generates a report that can be printed for the clinician (e.g., at the point of care) or displayed to the clinician on a computer monitor.

In some embodiments, the information is first analyzed at the point of care or at a regional facility. The raw data is then sent to a central processing facility for further analysis and/or to convert the raw data to information useful for a clinician or patient. The central processing facility provides the advantage of privacy (all data is stored in a central facility with uniform security protocols), speed, and uniformity of data analysis. The central processing facility can then control the fate of the data following treatment of the subject. For example, using an electronic communication system, the central facility can provide data to the clinician, the subject, or researchers.

In some embodiments, the subject is able to directly access the data using the electronic communication system. The subject may chose further intervention or counseling based on the results. In some embodiments, the data is used for research use. For example, the data may be used to further optimize the inclusion or elimination of markers as useful indicators of a particular condition or stage of disease.

Antibodies

Antibodies are well known to those of ordinary skill in the science of immunology. As used herein, the term "antibody" means not only intact antibody molecules, but also fragments of antibody molecules that retain immunogen binding ability. Such fragments are also well known in the art and are regularly employed both in vitro and in vivo. Accordingly, as used herein, the term "antibody" means not only intact immunoglobulin molecules but also the well-known active fragments F(ab')₂, and Fab. F(ab')₂, and Fab fragments which lack the Fc fragment of intact antibody, clear more rapidly from the circulation, and may have less non-specific tissue binding of an intact antibody (Wahl et al., J. Nucl. Med. 24:316-325 (1983). The antibodies of the invention comprise whole native antibodies, bispecific antibodies; chimeric antibodies; Fab, Fab', single chain V region fragments (scFv) and fusion polypeptides. In one embodiment, an antibody that binds PSPHL polypeptide (e.g., PSPHL or a

PSPHL variant ) is monoclonal. Alternatively, the anti-PSPHL antibody is a polyclonal antibody. The preparation and use of polyclonal antibodies are also known the skilled artisan. The invention also encompasses hybrid antibodies, in which one pair of heavy and light chains is obtained from a first antibody, while the other pair of heavy and light chains is obtained from a different second antibody. Such hybrids may also be formed using humanized heavy and light chains. Such antibodies are often referred to as "chimeric" antibodies.

In general, intact antibodies are said to contain "Fc" and "Fab" regions. The Fc regions are involved in complement activation and are not involved in antigen binding. An antibody from which the Fc' region has been enzymatically cleaved, or which has been produced without the Fc' region, designated an "F(ab')₂" fragment, retains both of the antigen binding sites of the intact antibody. Similarly, an antibody from which the Fc region has been enzymatically cleaved, or which has been produced without the Fc region, designated an "Fab"' fragment, retains one of the antigen binding sites of the intact antibody. Fab' fragments consist of a covalently bound antibody light chain and a portion of the antibody heavy chain, denoted "Fd." The Fd fragments are the major determinants of antibody specificity (a single Fd fragment may be associated with up to ten different light chains without altering antibody specificity). Isolated Fd fragments retain the ability to specifically bind to immunogenic epitopes.

Antibodies can be made by any of the methods known in the art utilizing PSPHL gene product (e.g. polypeptides gene product) or immunogenic fragments thereof, as an immunogen. In embodiments, a synthetic PSPHL protein sequence is used to generate the PSPHL antibody, In other embodiments, said sequence correspond to SEQ ID NO: 7 (MASASCSPGGALASPEPGRKILPRMISHSELRKLFYSADA VCFDVDSTVISEEGIGCF HWIWRKCDQATSQG).

One method of obtaining antibodies is to immunize suitable host animals with an immunogen and to follow standard procedures for polyclonal or monoclonal antibody production. The immunogen will facilitate presentation of the immunogen on the cell surface. Immunization of a suitable host can be carried out in a number of ways. Nucleic acid sequences encoding a PSPHL polypeptide, or immunogenic fragments thereof, can be provided to the host in a delivery vehicle that is taken up by immune cells of the host. The cells will in turn express the receptor on the cell surface generating an immunogenic response in the host. Alternatively, nucleic acid sequences encoding a PSPH: polypeptide, or immunogenic fragments thereof, can be expressed in cells in vitro, followed by isolation of the receptor and administration of the receptor to a suitable host in which antibodies are raised. Using either approach, antibodies can then be purified from the host. Antibody purification methods may include salt precipitation (for example, with ammonium sulfate), ion exchange chromatography (for example, on a cationic or anionic exchange column preferably run at neutral pH and eluted with step gradients of increasing ionic strength), gel filtration chromatography (including gel filtration HPLC), and chromatography on affinity resins such as protein A, protein G, hydroxyapatite, and antiimmunoglobulin.

Antibodies can be conveniently produced from hybridoma cells engineered to express the antibody. Methods of making hybridomas are well known in the art. The hybridoma cells can be cultured in a suitable medium, and spent medium can be used as an antibody source. Polynucleotides encoding the antibody of interest can in turn be obtained from the hybridoma that produces the antibody, and then the antibody may be produced synthetically or recombinantly from these DNA sequences. For the production of large amounts of antibody, it is generally more convenient to obtain an ascites fluid. The method of raising ascites generally comprises injecting hybridoma cells into an immunologically naive histocompatible or immunotolerant mammal, especially a mouse. The mammal may be primed for ascites production by prior administration of a suitable composition; e.g., Pristane.

Monoclonal antibodies (Mabs) produced by methods of the invention can be "humanized" by methods known in the art. "Humanized" antibodies are antibodies in which at least part of the sequence has been altered from its initial form to render it more like human immunoglobulins. Techniques to humanize antibodies are particularly useful when non- human animal (e.g., murine) antibodies are generated. Examples of methods for humanizing a murine antibody are provided in U.S. Patent Nos. 4,816,567, 5,530,101, 5,225,539, 5,585,089, 5,693,762 and 5,859,205.

Kits

The invention features kits for use in identifying a subject at risk for developing prostate cancer. In preferred examples, the kits for use in identifying a subject at risk for developing prostate cancer comprise primers directed to amplify a 133 base pair sequence of exon 1 of human PSPHL mRNA encoded by GenBank Accession No. AJOOl 612 corresponding to SEQ ID NO: 1, and instructions for use.

In certain embodiments, the invention features kits for the detection of AAIns or PSPHL. For example, a kit for detecting AAIns might include reagents for genomic DNA extraction, PCR reagents, and AAIns specific primers. A kit for detecting PSPHL gene expression might include reagents for mRNA isolation, RT-PCR reagents and PSPHL specific primers. A kit for detection of PSPHL protein expression may include primary antibodies against the PSPHL antigen coupled with general detection methods for specific binding.

In preferred embodiments, the kits of the invention feature primers for use in detecting a nucleic acid segment in the (Phosphoserine phosphatase-like) PSPHL gene locus of a subject. Thus, in certain examples, the kits preferably comprise primers. For example, the primers, in certain embodiments, comprise the nucleic acid sequences as set forth as SEQ ID NO: 3 and SEQ ID NO: 4. In other embodiment, the primers comprise the nucleic acid sequences as set forth as SEQ ID NO: 5 and SEQ ID NO: 6. In other preferred embodiment, the kits comprise the nucleic acid sequences set forth as SEQ ID NO: 3 and SEQ ID NO: 4, and instructions for use or the kits comprise the nucleic acid sequences set forth as SEQ ID NO: 5 and SEQ ID NO: 6, and instructions for use. The kits may further comprise instructions for use in PCR assay, for example in multiplexed PCR. Other kits that are featured in the invention are kits for use in identifying a subject at risk for developing prostate cancer comprising an antibody directed to a PSPHL antigen, and instructions for use. The antibody may be monoclonal or polyclonal. The polyclonal antibody may be used to detect the 72AA antigen, for example the polyclonal antibody comprising a sequence encoded by SEQ ID NO: 7. Tissue sources for the detection of the AAIns genomic DNA, the expressed products including mRNA and protein, may include any tissue sources where genomic DNA, mRNA, or protein can be retrieved.

It is possible in certain embodiments that the kits of this invention could include a solid substrate having a hydrophobic function, such as a protein biochip (e.g., a Ciphergen ProteinChip array) and a buffer for washing the substrate, as well as instructions providing a protocol to measure the biomarkers of this invention on the chip and to use these measurements to diagnose prostate cancer.

In one aspect, the invention provides kits for detecting a biomarker for prostate cancer in an African American subject. The invention provides kits for detecting the presence (or absence) of a nucleic acid segment in the (Phosphoserine phosphatase-like) PSPHL gene locus of a subject, wherein the presence or absence of the nucleic acid segment in the gene locus indicates an altered risk of cancer The kits include PCR primers for at least one marker, preferably the nucleic acid comprising SEQ ID NO: 2 as described herein, however, the kit may include identification of more than one biomarker as described herein. The kit may further include instructions for use and correlation of the biomarker with disease status. The kit may also include a DNA array containing the complement of one or more of the biomarkers, reagents, and/or enzymes for amplifying or isolating sample DNA. The kits may include reagents for PCR, for example, probes and/or primers, and enzymes. The kits of the invention have many applications. For example, the kits can be used to differentiate if a subject has prostate cancer or does not have prostate cancer (a negative diagnosis).

In one embodiment, a kit comprises: (a) a substrate comprising an adsorbent thereon, wherein the adsorbent is suitable for binding a biomarker, and (b) instructions to detect the marker or markers by contacting a sample with the adsorbent and detecting the biomarker or markers retained by the adsorbent. In some embodiments, the kit may comprise an eluant (as an alternative or in combination with instructions) or instructions for making an eluant, wherein the combination of the adsorbent and the eluant allows detection of the biomarkers using gas phase ion spectrometry. Such kits can be prepared from the materials described above, and the previous discussion of these materials (e.g., probe substrates, adsorbents, washing solutions, etc.) is fully applicable to this section and will not be repeated.

In another embodiment, the kit may comprise a first substrate comprising an adsorbent thereon (e.g., a particle functionalized with an adsorbent) and a second substrate onto which the first substrate can be positioned to form a probe, which is removably insertable into a gas phase ion spectrometer. In other embodiments, the kit may comprise a single substrate, which is in the form of a removably insertable probe with adsorbents on the substrate. In yet another embodiment, the kit may further comprise a pre-fractionation spin column (e.g., Cibacron blue agarose column, anti-HSA agarose column, K-30 size exclusion column, Q-anion exchange spin column, single stranded DNA column, lectin column, etc.).

In another embodiment, a kit comprises (a) an antibody that specifically binds to a biomarker; and (b) a detection reagent. Such kits can be prepared from the materials described above, and the previous discussion regarding the materials (e.g., antibodies, detection reagents, immobilized supports, etc.) is fully applicable to this section and will not be repeated. Optionally, the kit may further comprise pre-fractionation spin columns. In some embodiments, the kit may further comprise instructions for suitable operation parameters in the form of a label or a separate insert. Optionally, the kit may further comprise a standard or control information so that the test sample can be compared with the control information standard to determine if the test amount of a biomarker detected in a sample is consistent with a diagnosis of prostate cancer.

Reference cells may be normal cells (cells that are not prostate cancer cells) or prostate cells at a different stage from the prostate cancer cells being compared to. The reference cells may be primary cultured cells, fresh blood cells, established cell lines or other cells determined to be appropriate to one of skill in the art.

EXAMPLES

This invention is further illustrated by the following examples, which should not be construed as limiting. All documents mentioned herein are incorporated herein by reference.

The incidence and mortality rates of prostate cancer vary considerably among different geographic areas and ethnic groups. African American men have the highest incidence and mortality from prostate cancer in the world (Ries 2007, Chu 2003). They have a 60% increased risk of developing prostate cancer, twice the risk of developing distant disease, and over twice the mortality relative to Americans of European descent (Jemal 2007, Clegg 2002). Multiple reasons have been postulated to explain these findings including access to care, attitudes about care, socioeconomic and education differences, differences in type and aggressiveness of treatment, dietary, and genetic factors. Yet etiological factors accounting for this disparity remain elusive (Freedland and Isaacs 2005). For a long period of time, the consistent risk factors identified for prostate cancer in addition to race, have been age and family history.

Recently, a number of common genetic variants joined and expanded the known list of risk factors for human prostate cancer (Zheng 2008, Eeles 2008, Thomas 2008). This most recent finding was the result of combined efforts of multiple large-scale genome-wide association studies, in which single nucleotide polymorphism (SNP) allele frequencies were compared between prostate cancer cases and unaffected controls. Because treatments for early stage prostate cancer are effective, these newly established genetic risk variants, most of which have cumulative effect, may be useful for seeking out high-risk individuals for early screening and timely intervention, as the inherited risk alleles can be tested inexpensively anytime during the lifespan. However, whether these genetic findings can be translated to better control of prostate cancer in African American men is still largely unknown, owing to the relative scarcity of corresponding studies at similar scales in the African American population. Further, it is still unknown whether any of the established genetic variants may lead to phenotypic differences in gene expression or gene function. In the absence of functional validation and follow-up studies of these SNPs, it is presently not clear how they contribute to prostate cancer risk.

To better understand the potential genetic-based contribution to prostate cancer disparity, the experiments described herein employ an alternative approach focusing on gene expression studies to identify candidate genes that can be followed up for the assessment of related genetic variations contributing to the development of prostate cancer. The experiments describe a genome-wide gene expression analysis on surgical human prostate specimens and reveal differences in the mRNA expression of the PSPHL (Phosphoserine Phosphatase- Like) gene when comparing prostate tumors from patients of African and European descent. The cDNA for this gene was originally isolated from fibroblasts derived from a patient with Fanconi's anemia by cDNA differential display technique and was described as a homologue to L-3-phosphoserine-phosphatase (PSPH) (Planitzer 1998). As described herein, the expression of the PSPHL gene was completely shut down in approximately 70% of prostate tumors from patients of European descent, but expressed at readily detectable levels in 80- 90% of prostate tumors from patients of African descent. This observation was recently confirmed by another group in a published study that examined expression differences between prostate tumors derived from patients of European and

African populations (Wallace 2008). Further, detailed follow-up genomic analysis revealed that the expression status of the PSPHL gene is 100% concordant with the presence or absence of an insertion allele at the PSPHL gene locus in the assayed genomes (i.e., the corresponding germ line DNA). The insertion sequence is absent in the genomes of individuals with negative prostate PSPHL expression and also absent in the current version of the assembled reference human genome.

The experiments described herein subsequently defined the insertion/deletion breakpoints based on TRACE DNA sequence data and developed a robust genotyping assay. Genotyping results in populations of different geographical ancestries validated the unusual pattern of population differentiation of this novel structural variation. The studies described herein are based on this novel finding.

Example 1. Expression microarray analysis using Agilent Whole Genome Expression Microarray was used to compare gene expression in surgical prostate cancer tissues derived from 12 European American patients and 8 African American patients (Figure 1). Using a cut-off criteria of 2-fold change in expression and p<0.05, six genes were found to be under-expressed in African American prostate tissues, and one gene, PSPHL, was found to be over-expressed in African American prostate tissues.

The PSPHL probe from the Agilent Whole Genome Expression Microarray was mapped to two homologous genes on chromosome 7pll .2; the two genes were PSPH and PSPHL, and further validation using gene-specific primers identified the gene as PSPHL, and not PSPH. PSPHL (Phosphoserine Phosphatase-Like) was previously identified by Planitzer et al (1998) as a human L-3-phosphoserine-phosphatase homologue that is significantly upregulated in FA fibroblasts. The sequence of PSPHL transcript can be found in the NCBI GenBank database under accession number AJOOl 612 (SEQ ID NO: 1), shown below.

SEQ ID NO: 1 AAGCCACAGGCTCCCTGGCTGGCGTCAGCTAAAGTGGCTGTTGGGTGTCC

GCAGGCTTCTGCCTGGCCGCCGCCGCCTATAAGCTACCAGGAGGAGCTTT ACGACTTCCCGTCCTGCGGGAAGTGGCGGGCACGATCGCAAGGTAGCGCA GAAGCTTCTCAATGGCCAGCGCCAGCTGCAGCCCCGGCGGCGCACTCGCC TCACCTGAGCCTGGGAGGAAAATTCTTCCAAGGATGATCTCCCACTCAGA GCTGAGGAAGCTTTTCTACTCAGCAGATGCTGTGTGTTTTGATGTTGACAG

CACGGTCATCAGTGAAGAAGGAATCGGATGCTTTCATTGGATTTGGAGGA AATGTGATCAGGCAACAAGTCAAGGATAACGCCAAATGGTATATCACTGA TTTTGTAGAGCTGCTGGGAGAACCGGAAGAATAACATCCATTGTCATACA GCTCCAAACAACTTCAGATGAATTTTTACAAGTTACACAGATTGATACTGT TTGCTTACAATTGCCTATTACAACTTGCTATAAAAAGTTGGTACAGATGAT

CTGCACTGTCAAGTAAACTACAGTTAGGAATCCTCAAAGATTGGTTTGTTT GTTTTTAACTGTAGTTCCAGTATTATATGATCACTATCGATTTCCTGGAGA GTTTTGTAATCTGAATTCTTTATGTATATTCCTAGCTATATTTCATACAAAG TGTTTTAAGAGTGGAGAGTCAATTAAACACCTTTACTCTTAGGAATATAGA TTCGGCAGCCTTCAGTGAATATTGGTTTTTTTCCCTTTGGTATGTCAATAAA

AGTTTATCCATGTGTCAGAAAAAAAAAAA

The race-specific expression difference in PSPHL was further validated by RT-PCR using the gene-specific primer pairs Pl and P2 (Figures 2 and 3), using RNA isolated from the same tissue samples used in the microarray analysis (Figure 3). This confirmed that there is a higher frequency of PSPHL expression in tissues from African Americans (7 out of 9) compared to European Americans (3 of 12). Additional analyses showed that PSPHL is expressed in 12 of 14 African American cases, but only in 6 of 20 European American cases (data not shown).

Analysis of paired tissue samples determined the patient-specific expression pattern of PSPHL (Figures 4 and 5). Patients with a positive value for PSPHL expression in cancer tissues also had a positive value for the paired normal tissues, and vice versa. These results shows that the expression pattern of PSPHL is all-or-none and the terms either "positive" or "negative" can be used to describe its expression. Further, its expression is individual- specific, i.e, if it is positive, it is positive in both normal and tumor tissues from the same individual and vice versa.

Example 2. PSPHL Gene Analysis Planitzer et al (1998) described the overexpression of the PSPHL mRNA (the cDNA referred to by Plantizer as CO9) in fibroblasts isolated from a few Fanconi's anemia patients, and published the PSPHL cDNA sequence (GenBank Accession number AJOOl 612;SEQ ID NO: 1 as above). Two variants of the PSPHL mRNA were cloned, one is identical to AJ001612, and the other (identified in GenBank under Accession Number BC065228; SEQ ID NO: 8) has an additional insertion of 122bp, that is in an equivalent position to nt 327 of AJOOl 612 (Figure 6). SEQ ID NO: 8 is shown below. Thus the two variants, if translated, would lead to two distinctive protein products. The function of the protein products is unknown.

SEQ ID NO: 8

CCGCAGGCTTCTGCCTGGCCGCCGCCGCCTATAAGCTACCAGGAGGAGCT TTACGACTTCCCGTCCTGCGGGAAGTGGCGGGCACGATCGCAAGGTAGCG CAGAAGCTTCTCAATGGCCAGCGCCAGCTGCAGCCCCGGCGGCGCACTCG CCTCACCTGAGCCTGGGAGGAAAATTCTTCCAAGGATGATCTCCCACTCA GAGCTGAGGAAGCTTTTCTACTCAGCAGATGCTGTGTGTTTTGATGTTGAC

AGCACGGTCATCAGTGAAGAAGGAATCGGACGGAGTCTCGCTCTGTCACC AGGCTGGAGTGCAATGGTGCAATCTCGGCTCACTGCAACCTCCGCCTCCT GGGTTCAGGCAGTTCTCCTGCCTCCACCTCCTGAGTAGCTGAAACTACAGG ATGCTTTCATTGGCTTTGGAGGAAATGTGATCAGGCAACAAGTCAAGGAT AACGCCAAATGGTATATCACTGATTTTGTAGAGCTGCTGGGAGAACCGGA AGAATAACATCCATTGTCATACAGCTCCAAACAACTTCAGATGAATTTTTA CAAGTTACACAGATTGATACTGTTTGCTTACAATTGCCTATTACAACTTGC TATAGAAAGTTGGTACAGATGATCTGCACTGTCAAGTAAACTACAGTTAG GAATCCTCAAAGATTGGTTTGTTTGTTTTTAACTGTAGTTCCAGTATTATAT

GATCACTATTGATTTCCTGGAGAGTTTTGTAATCTGAATTCTTTATGTATAT TCCTAGCTATATTTCATACAAAGTGTTTTAAGAGTGGAGAGTCAATTAAAC ACCTTTACTCTTAGGAAAAAAAAAAAAAAAAA

BLAST searches of the AJOOl 612 sequence against the assembled Human and

Chimpanzee genome sequences using the UCSC genome browser identified matches for nt 327-830 of AJ001612 in both genomes assembled on March 2006 (Figure 14 and Figure 15). This matched sequence is the last exon of the gene based on the absence of gaps in this matched sequence in either genome. Alignment of the mRNA sequence AJ001612 to human genomic DNA shows that nt 1-326 of AJOOl 612 had no strong match in the human genomic DNA (see below), but matched, with more than 98% homology, to DNA sequences in the Chimpanzee genome (Figure 15). The first stretch of matched sequence corresponds to nt 3- 214 of AJ001612, suggesting that nt 3-214 of AJ001612 is the sequence for the first exon. Likewise, the second exon is likely to be 113bp in size and corresponds to nt 215-327 of AJOO 1612 (Figure 15). Furthermore, the unique 122bp insertion present in BC065228 (Figure 6) does not match the assembled human genome sequence (not shown) but matched a continuous sequence in the Chimpanzee genome between predicted exon 2 and the last exon (Figure 16), suggesting that this 122bp insertion is exon 3 that is likely alternatively spliced out in AJOOl 612. Therefore, the human PSPHL has 4 predicted exons based on the alignment of AJOOl 612 and BC065228 with the assembled Chimpanzee genome (Figure 7).

The results from the alignment of the AJOOl 612 sequence with the assembled Human and Chimpanzee genomes (Figures 14 - 16 and Figure 7) suggested that there is a genomic DNA indel (insertion or deletion) of unknown size that is present in the assembled Chimpanzee genome, but not present in the assembled Human genome. This segment of DNA (hereafter named AAIns for African American specific insertion; see below) aligns with nt 1-326 of the AJOOl 612 sequence (Figure 8) and includes the first 3 exons of the PSPHL gene and possibly 5 prime regulatory regions and part of intron 3. Deletion of this sequence may lead to the absence of PSPHL gene expression and may explain the correlation with PSPHL gene expression that is observed. The predicted exon structure and the mRNA variants are shown in Figure 9. Both the predicted exon structure and the involvement of AAIns were validated by the identification of the breakpoint and partial assembly of AAIns as described herein.

Example 3. Partial Assembly of AAIns and Identification of the Breakpoints

Although AAIns was not assembled into the human genome, it was reasoned that the partial or whole AAIns sequence may be assembled from the Celera Whole Genome Shotgun (WGS) sequence database as well as the Trace Archive that contained DNA sequences not assembled due to lack of consensus. Next, the Celera whole genome shotgun (WGS) sequence database was queried using the mRNA sequences from BC065228, which has sequences from all 4 exons.

First, a contig (ContigO, Figure 17) was assembled based on the WGS and Trace sequences as shown (Figure 17). This contig contains exon 1 and flanking genomic sequences. This sequence, which is part of AAIns, was not assembled into the human genome as previously mentioned. Second, a Trace genomic sequence of 1200 bp that contained part of intron 1, exon 2, and part of intron 2 was identified (gnl|ti|226793227, Figure 17). This sequence, which is part of AAIns, again was not assembled into the human genome as previously mentioned. Third, a WGS sequence of 4906bp that overlaps with gnl|ti|226793227 and contains the 122bp exon 3 was identified (gi| 1481847011, Figure 17). The size of intron 2 is therefore determined to be 83 lbp and consistent with predictions from the alignment with the assembled Chimpanzee genome (Figure 9). This sequence, which is part of AAIns, again was not assembled into the human genome. Fourth, a WGS sequence of 96797bp was identified (gi|68978189|, Figure 17) that contains part of exon 3, complete intron 3 (1 1954bp), and complete exon 4. This sequence matched the assembled human genome at chr7: 55798238 (Figure 18), at a point which starts 1820 bp downstream of the exon 3 boundary. This is the 3' breakpoint for AAIns.

Finally, the Trace Archive with a lOObp non- AAIns sequence from upstream of the 3' prime AAIns breakpoint from the assembled human genome was queried, and a Trace sequence was identified (gnl|ti| 1656600323, Figure 19). The assembled human genome sequence breaks apart from this Trace sequence at chr7: 55798228 (Figure 19). This is the 5' AAIns breakpoint. The distance between the two breakpoints, in the assembled human genome, is 9 base pairs (55798228-55798238). To ensure that the Trace sequence (gnl|ti| 1656600323, as shown in Figure 17) contains AAIns sequence following the breakpoint, this sequence was aligned with the assembled Chimpanzee genome. This alignment resulted in nearly perfect match with the Chimpanzee genome sequence (Figure 19).

To further validate the position of the breakpoint, the Chimpanzee was queried genome with a 200bp assembled human sequence flanking the breakpoints (Figure 20). As predicted, the human sequence was broken into two pieces with a sequence of 52765 bp in size separating the two pieces in the corresponding Chimpanzee genome. The three exons of the PSPHL gene are within this 52765 bp AAIns sequence. While the complete sequence of the human counterpart of the AAIns)is not fully assembled, the gene structure of the human PSPHL locus has been largely decoded based on this bioinformatics approach.

Example 4. PSPHL gene expression and the detection of AAIns in prostate cancer cases and controls

Based on the data presented, which suggested that the PSPHL gene was expressed in African Americans in higher frequency compared to European Americans, it was thought that (1) the expression of the gene is controlled by the presence of absence of this AAIns; (2) AAIns is present in higher proportion of African Americans when compared to European Americans.

Next, a new primer set, P3, was designed that specifically amplifies a 133 bp fragment of the AAIns (Figures 8 and 10). As Figure 10 shows, the presence of the AAIns is perfectly correlated with the expression of PSPHL in the African American Samples tested. Furthermore, absence of PSPHL expression appears to correlate perfectly with the absence of AAIns (with the exception of a potential positive negative in sample 1 134), which is very common in European Americans. This confirms that the expression of PSPHL is regulated by the structural variation of the PSPHL gene and, in particular, gene sequences associated with the AAIns that are required for expression of the PSPHL gene. Next, a case-control cohort comprised of 321 individuals from both racial groups were examined for the presence of AAIns using the same P3 PCR primers. When grouped according to race, 149 of 156 (95%) African Americans are AAIns positive, while only 49 of 165 European Americans (30%) were AAIns positive. This confirms that AAIns is present in higher proportion of African Americans when compared to European Americans. There is no difference between prostate cancer case and controls identified so far within the African American group. However, within the European American group, AAIns is detected in 35.2% of low grade cases (19/54), in 33% of high grade cases (18/55), but in only 21.4 (12/56) of healthy control individuals (Figure 11). Therefore, one conclusion that can be reached is that the absence of AAIns in the assembled human genome sequence is probably associated with a lack of representation of African Americans in the DNA samples pooled for the Human Genome Project,

Example 5. Genotyping of the AAIns Locus

The resulted provided herein indicate that the presence of the AAIns sequence, and thus expression of the PSPHL gene product, is associated with populations with a modulated risk of developing prostate cancer. Thus, assaying for the presence of AAIns through molecular diagnostic techniques provides a method for identifying people with a modulated risk of developing prostate cancer. Furthermore, because AAIns is more prevalent in African Americans, a detection of AAIns through molecular diagnostic techniques provides a genetic test for determining ancestral origin. However, Primer set P3, while reliably detects the presence of AAIns, can not differentiate one copy (heterozygous) versus two copies (homozygous) of the AAIns sequence in the genome. The identification of the breakpoint allowed the design of primer set P4 (Figure 12), which spans the entire AAIns, to detect the PSPHL-null allele. Here, the PSPHL-null allele refers to the allele that does not have the AAIns sequence. P4 primers would not detect the allele with the AAIns sequence because the amplicon would exceed 50kb.

The primer sets P3 and P4 are set forth by SEQ ID NOs: 3, 4, 5 and 6 as follows: P3

SEQ ID NO:3

Forward. TCAGCTAAAGTGGCTGTTGG GTGT

SEQ ID NO: 4

Reverse. AAGCTTCTGCGCTACCTTGCGA

P4

SEQ ID NO: 5

Forward. AGTCTTGCTATCTTGCCCAGGCTGAT

SEQ ID NO: 6 Reverse. GTAGAGACTGGGTTTCACCATGTTGG

The validity of the genotype assay for the AAIns locus was demonstrated by multiplex PCR as shown in Figure 12. The multiplexed PCR allows the detection of both the AAIns allele and the PSPHL-null allele in a single reaction, in samples that are heterozygous for the locus. First, samples were identified that were homozygous for the locus. Two samples, 1704 and 1957, are negative for AAIns (as they lack expression of PSPHL as shown in Figure 3) and therefore known to be homozygous for the PSPHL-null allele (both chromosomes lack the AAIns sequence). Two other samples, 1665 and 1863, are positive for AAIns as detected by the P3 primers (Figure 10), but negative for the PSPHL-null allele because there was no signal from the P4 primers (Figure 12, lane 6) and are therefore homozygous for the AAIns allele. A 1 : 1 mixture of samples with the homozygous AAIns and the homozygous PSPHL-null allele would mimic a heterozygous sample. As shown in Figure 12 (lanes 1 -4), all AAIns heterozygous mixtures are detected positive by both the P3 and P4 primer.

Figure 13 is a schematic diagram showing the AAIns locus relative to the PSPHL gene in the human and chimpanzee genomes. Figure 13 shows the position of the AAIns relative to the assembled genomic sequences and the PSPHL gene, the sizes of exons and introns when known, the human WGS and Trace sequences used to assemble the AAIns, the positions of primers used for genotyping, and the precise location of the breakpoints.

Example 7. Molecular Diagnostic Kits for Determining the Risk of Prostate Cancer and Ancestral Origin

Assaying for the presence of AAIns, or genotype of the locus, or the PSPHL gene products can be done by any molecular method known to those skilled in the art. The AAIns sequence can be detected by a number of methods including, but not limited to, Southern blotting, RFLP analysis, PCR, genome arrays (SNP microarrays, Array CGH), Sequenom Assays, and DNA sequencing.

For example, the presence or absence of AAIns is detected by PCR using primers specific to the AAIns genomic sequence and primers specific to the PSPHL-null allele (which lacks the AAIns sequence), and using genomic DNA from any tissue sample. In the example shown here, the P3 primer pairs identified an amplification product of 133 bp that is specific to AAIns, and P4 primers detected the presence of the PSPHL-null allele. However, any other primer pairs that are specific to the AAIns, or that lies on opposite sides of the AAIns (so that the amplicon would span the breakpoints), can be used. The PCR product can be detected by gel electrophoresis or the PCR primers may be fluorescently labeled for detection by any method for measuring fluorescence that is known in the art. Primer pairs that align with any sequence within the AAIns genomic sequence, with any sequence in the vicinity of the breakpoints (up to 10kb from the breakpoints), or with any sequence that spans the breakpoints can also be used in the methods of the invention as described herein. PSPHL gene expression can be detected by a number of methods including, but not limited to, Northern Blotting, RT-PCR, real-time PCR, in-situ hybridization, microarrays, or any method that detects the gene product, such as Western Blotting, ELISA, mass spectrometry, immunohistochemistry, or protein arrays.

In preferred examples, PSPHL gene expression is detected by RT-PCR using primers specific to PSPHL coding sequence and using RNA from any tissue sample where the gene is expressed. In the example shown here, PSPHL gene expression was detected using either the Pl or P2 primer pairs for RT-PCR, although any other primer pairs specific to PSPHL can be used. The RT-PCR product can be detected by gel electrophoresis or the PCR primers may be fluorescently labeled for detection by any method for measuring fluorescence that is known in the art. Also anticipated are primer pairs that differentially amplify the different PSPHL expression products such as the P2 primer pairs described herein.

The genome- wide gene expression analysis on surgical human prostate specimens revealed differences in the mRNA expression of the PSPHL (Phosphoserine Phosphatase- Like) gene when comparing prostate tumors from patients of African and European descent. The cDNA for this gene was originally isolated from fibroblasts derived from a patient with Fanconi's anemia by cDNA differential display technique and was described as a homologue to L-3-phosphoserine-phosphatase (PSPH) (Planitzer 1998). It was found that the expression of the PSPHL gene was completely shut down in approximately 70% of prostate tumors from patients of European descent, but expressed at readily detectable levels in 80- 90% of prostate tumors from patients of African descent. This observation was recently confirmed by another group in a published study that examined expression differences between prostate tumors derived from patients of European and African populations (Wallace 2008). Most recently, a detailed follow-up genomic analysis revealed that the expression status of the PSPHL gene is 100% concordant with the presence or absence of an insertion allele at the PSPHL gene locus in the assayed genomes (i.e., the corresponding germ line DNA). The insertion sequence is absent in the genomes of individuals with negative prostate PSPHL expression and also absent in the current version of the assembled reference human genome. Subsequently, the insertion/deletion breakpoints based on TRACE DNA sequence data were defined, and a robust genotyping assay was developed. Genotyping results in populations of different geographical ancestries validated the unusual pattern of population differentiation of this novel structural variation. The studies described herein are based on this novel finding.

Example 8. Differential PSPHL Gene Expression in Prostate Tumors from African Americans Compared to those from European Americans

In studies previously funded partially by the Howard/Hopkins Partnership (see attached letter from Dr. William Nelson), two sets of expression data were generated using two different microarray platforms to analyze expression differences in human prostate cancer specimens from patients of different geographic ancestries. Surgical prostate tissue specimens representing African Americans were procured and processed. The first expression dataset using surgical prostate TUMOR tissues prepared from 12 European American cases and 9 African American cases. As shown in the "volcano" plot in Figure 21 A, genes differentially expressed between the two tumor groups can be identified by applying cut-off values of fold expression change and empirically derived p values. Using cut-off criteria of expression changes greater than 2 fold (in either directions) and p<0.05, six genes were found to be under-expressed in African American prostate tumors (red dots at the top left side), while only one gene, PSPHL (Phosphoserine Phosphatase-Like), was shown over-expressed in African American prostate tumors. The second expression dataset consists of 13 African American prostate cancer cases and 13 European American cases, each represented by paired NORMAL and TUMOR samples, and profiled using an in-house manufactured 2OK cDNA microarray that we described in a previous publication (Dunn 2006). As shown in the extracted "heatmap" for the PSPHL gene (Figure 21B), 11 out of 13 African American prostate tissue pairs expressed visually apparent high expression relative to a BPH common reference, while 9 out of 13 European American prostate tissue pairs expressed background levels of PSPHL similar to that in the BPH reference sample.

Example 9. Cloning, sequencing, and mapping of the PSPHL transcripts

The PSPHL (also named CO9) cDNA was initially cloned from fibroblasts of a Fanconi's anemia patient, sequenced, and described as a homologue to L-G-phosphoserine- phosphatase (PSPH) (Planitzer 1998). In that original study the PSPHL transcript was also detected in Raji cells, Burkitt lymphoma cells from an African American donor (Planitzer 1998). Published mRNA sequence (AJOOl 612) suggested a coding region of 216 bp (72 amino acids) with partial N-terminal homology to PSPH (Planitzer 1998). Due to lack of follow-up confirmatory studies, the accuracy of the published mRNA sequences was confirmed. 5' RACE (rapid amplification of cDNA ends) was performed using mRNA samples from prostate tissues positive for PSPHL mRNA expression. RACE products were subcloned and sequenced. Sequences were obtained for two alternative transcripts, termed variant 1 and variant 2 (Figure 22), which differed by a contiguous stretch of 122bp sequence (shown in yellow). This 122bp was later confirmed as a single alternatively spiced exon (see Figure 23B). The sequence of variant 1 matched GenBank accession number AJOOl 612 over a contiguous stretch of sequence that included the full open reading frame (72 amino acids) (Figure 22). An antibody to this variant has been reported and used in a recent study (Kuo 2007). The sequence of variant 2 matched GenBank accession number BC065228, also over a contiguous stretch of sequence that included an open reading frame (91 amino acids) (Figure 22). The two open reading frames (marked with colored bars, Figure 22) started from the same consensus starting site but differed at the site of the spliced 122bp exon (Figure 22). In order to define the exon structure of these sequences, the two variant sequences were blasted against the reference human genome, nt 1-326 of the variant 1 sequence (red) and nt 1-447 of the variant 2 sequence (red and yellow) do not match the genomic DNA sequences in the reference human genome assembly (HGl 8). This unexpected finding strongly suggested that part of the PSPHL gene is deleted in the genome represented in the reference assembly, as there is no sequence gap in this region. Therefore, despite the previous cloning and our validation of the PSPHL cDNA, the complete coding sequence is absent in the corresponding genomic locus. The lack of information regarding the PSPHL gene structure, combined with the fact that the transcripts may not be detectable in many samples particularly those derived from individuals of European descent (Figure 21), may have contributed to the lack of literature on this gene and its expression products other than the three aforementioned studies (Planitzer 1998, Kuo 2007, Wallace 2008).

Example 10. The presence or absence of a PSPHL DNA segment in the genome determines expression status of the PSPHL gene in prostate tissue.

The results presented herein suggest that the expression of the PSPHL gene in the human prostate was associated with a potential deletion/insertion variation, one form of structural DNA variation, in the human genome. Next a total of 24 prostate cancer cases were identified from our banked fresh frozen prostate tumors, for which genomic DNA samples have also been prepared previously using blood or seminal vesicle samples from the matched individuals. Samples from African Americans (n=12) were intentionally enriched. For the genomic DNA samples, PCR was performed to detect the presence or absence of the DNA sequences corresponding to AJOOl 612, part of the segment that was absent in the reference genome. For the prostate tumor samples, RNA was extracted and RT-PCR was performed to detect the PSPHL mRNA. As shown in Figure 23, all nine cases negative for the targeted genomic DNA were also negative for the PSPHL transcript, while all of the 15 cases detected positive for the targeted segment of PSPHL genomic DNA were also positive for the PSPHL transcript. These results suggested that a novel structural variation of the PSPHL locus, likely in the form of insertion/deletion (indel), was responsible for the observed expression differences. The observed expression pattern in microarray studies (Figure 21) also suggested that the deletion allele was the predominant allele in European Americans while the insertion allele was the predominant allele in African Americans. All these speculations were confirmed by the subsequent identification and confirmation of the insertion/deletion breakpoints as well as the genotyping results in 3 populations (see Figures 23A, 24 below).

Example 11. In silico assembly of the PSPHL genomic sequence and development of a genotyping assay

Next, in silico analyses of genomic DNA sequences in the Trace Archive and other genomic sequences derived from whole genome shotgun sequencing was performed. Partial assembly of the insertion allele was possible, and the exon structure of the gene was defined, as well as the 5' and 3' breakpoints of the insertion/deletion locus. Figure 23B showed a schematic diagram comparing the insertion allele with the deletion allele (not to scale). Coordinates for the assembled reference genome, which is represented by the deletion allele, were marked below the line positions. Sizes of the exons and introns were marked above their respective positions in the insertion allele. The 5 prime breakpoint is chr7 55798228 in the assembled human genome (deletion allele), and separated by 9bp (GTGCGTCTA) from the 3 prime break point at Chr7 55798238. However, the sizes and sequences of the remaining gaps are still not known. Mapping of the break points however allowed the design of a multiplexed genotyping assay for the PSPHL locus using primer sets (P3 and P4) at their respective sites marked with red vertical bars. Representative genotyping results in African Americans and European Americans were also shown. The genotyping assay differentiates the three different genotypes, each with 0 copy (homozygous deletion, detected by single lower band for the smaller amplicon, a predominant genotype in European Americans), 1 copy (heterozygous, detected by the presence of both amplicons), and 2 copies (homozygous insertion, detected by a single upper band for the larger amplicon, a predominant genotype in African Americans) of the insertion allele. Because the reference genome sequence is from a deletion allele, previous genome wide analysis of structural variation that depended on the reference genome for experimental design and analysis would not detect this novel structural variation.

Example 12. PSPHL genotyping in 3 populations

Next, the above mentioned genotyping assay was used to type an expanded set of genomic DNA samples derived from Yoruba trios (purchased from Coriell Institute for Medical Research), African Americans (AA), and European Americans (EA). Figure 24 summarizes the results from an initial typing of 999 genotypes. More than 96% of African Americans (n=335) and 100% of 30 Yoruba trios (n=90 genotypes) have at least one copy of the insertion allele (insertion/insertion and insertion/deletion), therefore should express the PSPHL gene, at least in the prostate, according to Figure 23. In sharp contrast, approximately 70% European Americans (n=574) are homozygous deletion (deletion/deletion) genotype. The allele frequency distributions are in accordance with Hardy Weinberg Equilibrium in each of the 3 populations (data not shown) and consistent with Mendelian pattern of inheritance in the Yoruba trios (data not shown), thus further validating the accuracy of the genotyping results. PSPHL genotype frequencies in prostate cancer cases and controls. The distribution of structural variation between populations, like all other forms of genomic variation such as SNPs, is dominated and shaped by the common ancestry of humans in Africa some 50,000 years ago. Rare forms of genomic variations shared among different populations but with high level of population differentiation, such as the PSPHL alleles, may be more likely to be associated with differential risk to diseases as shown in previous such examples (Gonzalez 2005, Stefansson 2005). To determine whether the PSPHL genotype is associated with risk for prostate cancer, 1541 prostate cancer cases were genotyped and 574 controls in the European American population were genotyped. Preliminary analysis did not reveal statistically significant association between the PSPHL alleles and the risk for prostate cancer in the European American population. Parallel genotyping and analysis in prostate cancer cases (n=356) and controls (n=335) form the African American population, however, revealed an interesting trend. As shown in Figure 25, the deletion allele was more frequently detected in cases than in the controls. The frequencies for the deletion/deletion (OfD), deletion/insertion (D/I), and insertion/insertion (I/I) genotypes were 6.46%, 32.58, and 60.96 in prostate cancer case (n=356), and 3.6%, 27.76, and 68.66% in the unaffected controls (n=335), respectively. Using the additive model, the deletion allele was found to be statistically associated with prostate cancer (p=0.03), prior to adjustment for individual ancestry. Further studies will therefore focus on assessment of this potential association using 700 cases and 700 controls following control for population stratification.

Example 13. Association of the PSPHL gene and prostate cancer in a case-control African American population

The experiments described herein are aimed at determining the genetic contribution to prostate cancer disparity by further analysis of the PSPHL gene. The gene expression based approaches described herein have resulted in the discovery of a novel structural variation at the PSPHL locus that is tightly linked to gene expression and geographic ancestry. Given the potential importance of such genomic variations in the differential risk for diseases, and important leads from our preliminary data, the experiments described herein address the association and functional link between this novel structural human genome variation and disease risk in Americans of African descent.

In a recent report, a two-gene prostate tumor expression signature was shown to accurately differentiate prostate tumors from African- American and European American patients (Wallace 2008). One of these two genes was PSPHL. This finding was consistent the results described herein. The results described herein demonstrate that the deletion allele of the P SPHL gene was putatively associated with prostate cancer specifically in the African American case -control population (see, e.g. Figure 25), but no significant association was found between prostate cancer and the variants in the European American population. Thus, further experiments as described assess the potential association of the PSPHL genotype with the risk for developing human prostate cancer in an expanded African American case-control cohort. Such association studies present take into account population stratification that can be augmented in the setting of ancestral variants and an admixed population, resulting in spurious associations. However, in a single gene association study, this confounding may be more effectively controlled. Accordingly, controls for population stratification will be applied to mitigate population stratification. The experiments described herein make use well-established collection of DNA samples from cases and controls from the African American population (described below). Study population and Case Control Samples

Cases for this study have been drawn from the large number of men undergoing radical prostatectomy (RP) for treatment of prostate cancer in the Department of Urology at the Johns Hopkins Hospital. A standard tissue processing procedure has been in place that takes portions of the non-cancer seminal vesicles, surgically excised along with the prostate, for storage at -80C and extraction of high quality DNA. Each of these cases is assigned an anonymous tissue bank number and entered into a database that contains all the pathological information from the radical prostatectomy specimen, demographic information including age at surgery and self-reported racial information, and any follow-up information that are updated annually. Of the over 1100 men undergoing this surgery at Hopkins annually, -6% are African American; over 750 African American men have undergone RP at Hopkins and have seminal vesicle tissues available. Approximately 55% of the African American cases are from either Baltimore or Maryland, with the remainder primarily from the Mid -Atlantic region (VA, DE, WV, NJ, NC). Currently, over 500 DNA samples are being prepared from African American prostate cancer cases and are in various stages of being integrated into the study, and it is anticipated that an additional 200 cases could be available by the time the proposed studies begin as we have intensified our DNA extraction effort from these cases. A summary of information regarding cases from which DNA has already been extracted is presented in Table 1, below.

Table 1 age sdβv range PSA sdβv

(avg) (ng/ml) controls (n=611) 58.2 9.3 45-92 1.25 0.82 cases (n=514) 56.7 7.1 36-74

Gleason Sum % Stage %

5 1.1 T2 62

6 52.3 T3A 25.5

7 37.7 T3B 6.2

8 5.8 N1 3.4

9 2.6

The most common case has org an confined pathologic stage (T2, 60%) and pathologic Gleason Sum 6 (5 2%). Controls for this study have been drawn from men undergoing screenings for prostate cancer in Baltimore and throughout Maryland. Over 600 blood samples from African American men who have serum PSA values below 4ng/ml and ages at screening 45 years or older have been obtained from Dr. Partin and DNA samples has been prepared. An additional 100 samples are to be procured and prepared prior to the start of the proposed study. Genotyping

The current genotyping assay as described (Figure 23B) is a multiplexed PCR assay based on the partial assembly of the PSPHL locus. Primer set P3 (Forward. TCAGCTAAAGTGGCTGTTGG GTGT, Reverse. AAGCTTCTGCGCTACCTTGCGA) (SEQ ID NO:3 and SEQ ID NO: 4) was designed to specifically amplify a 133 bp fragment of the insertion sequence in exon 1 of the PSPHL gene. The detection of the exon DNA sequence indicates the presence of the insertion allele (Figure 23B). The deletion allele will be detected by primer set P4 (Forward. A GTCTTGCTA TCTTGCCCA GGCTGA T; Reverse. GTA GA GA CTGGGTTTCA CCA TGTTGG) (SEQ ID NO: 5 and SEQ ID NO: 6) that would generate an amplicon only if the insertion sequence is absent. For genotyping quality control, we will test the Hardy- Weinberg proportions (HWP). Maximum likelihood estimates of allele frequencies will be tested for departures from HWP using the chi-square goodness of fit tests. Data analysis One concern regarding the use of an ancestry associated genotype for association study is that it may lead to false positive associations, in which allele frequency differences detected among cases and controls are simply the result of unequal distributions of subgroups between the cases and controls. This confounding due to population stratification can be minimized if the cases and controls are drawn from the same genetic group. However, it is difficult to genetically classify individuals accurately based solely on self-report. The proposed study will incorporate information on ancestry proportions and population substructure by matching on ancestry. The study will generate data of a panel of 120 Ancestry -informative Markers (AIMs) for the set of prepared DNA samples as part of his DOD-funded project not related to this proposal. Individual ancestral proportions estimated from STRUCTURE using these AIMs will be used as a covariate in the analysis. Each association test will be adjusted for covariates such as age and individual ancestral proportions. Descriptive statistics will be performed using SAS statistical package (Version 9.1.3; SAS Institute Inc., Cary, NC). Genotype and allele frequencies will be estimated by gene counting using the SAS statistical package. Odds ratios, 95% confidence intervals and P values will be determined by logistic regression analyses from comparisons of genotype between individuals with prostate cancer and healthy controls with adjustment for age and ancestry as covariates, using SAS statistical package (Version 9.1.3; SAS Institute Inc., Cary, NC). Association will be tested under the assumption of a dominant, additive and recessive model of inheritance. Probability of significance for all analyses will be set at P < 0.05. While multiple genes in a causal pathway, working additively or interactively together to increase an individual's risk, is one scenario for prostate carcinogenesis, it is also possible that a single gene can impose a risk of prostate cancer. The results presented in Figure 25, though not adjusted for individual ancestry, suggest a high likelihood of a positive association between the PSPHL deletion allele with prostate cancer in the African American population. Further studies as described can be used to confirm the positive association, an subsequently genome wide structural variation analysis in the same samples could help to identify multiple variants that has cumulative effects when combined. Therefore, the significance of the study is not limited to the targeted PSPHL gene only. Rather, technically feasible means are used to address the question of genetic contribution to prostate cancer disparity that has the potential to be translated to better control of prostate cancer in the African American and the general population.

Example 14. A functional link between the PSPHL gene and prostate cancer development and progression

Detailed functional studies on genes known to be functionally different among populations of different geographic ancestry constitute an alternative approach for investigating genetic contributions to prostate cancer disparity. For example, individuals homozygous for the inactivating DARC (Duffy Antigen Receptor for Chemokines) variant (Tournamille 1995) do not express the DARC protein, leading to the Duffy null phenotype that is prevalent in West Africa. While loss of DARC function has been functional linked to resistance to malaria (Hamblin 2002), recent data also implicated its role in prostate cancer progression. The precise mechanism of function may involve the molecular interaction between DARC and a tumor suppressor named KAI 1 that leads to subsequent cancer cell senescence and inhibition of a critical step in the metastatic cascade: extravasation

(Bandyopadhyay 2006). These functional data suggest that the inactivation of DARC as seen in individuals of African descent, while protective for malaria, may allow for prostate cancer spread/metastasis due to the absence of interactions between KAIl and DARC. hi this aim, we will investigate the potential functional contribution of the PSPHL variants in human prostate cancer using two complementing approaches.

Experiments will be carried out to determine the gene expression correlates of the different PSPHL genotypes using whole genome expression analysis in paired normal and tumor prostate tissues derived from both African and European Americans, hi vitro genetic manipulation of the gene in prostate cancer cell lines will be performed to interrogate their putative functional roles. Together, the results will determine whether P SPHL is functionally linked to prostate cancer and whether its expression correlates are altered during the prostate cancer development and progression.

Prostate samples for gene expression analysis

Samples representing each genotype category for whole genome expression analysis as shown in the Table 2 will be collected.

Table 2

In one experiment, prostate tumor RNA samples from 214 cases of prostate cancer specimens, about half of which were paired with normal appearing epithelial tissues from matched cases are collected. The majority of the 214 RNA cases can be matched to DNA samples based on tumor bank number, allowing us to genotype the corresponding individual before using these existing RNA samples. Identification of the required number of cases (n=l 0) may be a challenge for two of the genotype groups, African American cases with deletion/deletion genotype and European American case with the insertion/insertion genotype, owing to relative small percentage of these specified genotypes in the corresponding populations (Figure 23B).

Gene expression analysis One possibility is that the different PSPHL genotypes are correlated to the expression signature of a set of genes in addition to PSPHL gene itself. Using published microarray data (Wallace 2008) as well as microarray data generated by these inventors, the results suggested coordinated gene expression changes as a function of the PSPHL genotype in African American cases but not in European American cases. The Agilent whole human genome oligo arrays may be used to systemically analyze expression correlates of the PSPHL variants in paired normal and tumor samples from both African and European American cases. The Agilent array contains 44k 60mers in the sense orientation and over 95% of the probes target regions within 600 bp of 3' ends of the transcripts. Hybridization will be carried out using the conditions specified in the Agilent system, using the two-color design in which all test samples will be compared to a single reference samples with the D/ITM genotype. Following image analysis, processed ratio data will be imported to GeneSpring(Agilent Technologies) for further processing, visualization, and analysis. Both supervised and unsupervised approaches will be used to analyze the data, with the primary goals to (1) identity a set of genes that are correlated to the PSPHL genotype and PSPHL gene expression; (2) assess the concordance of the correlated genes in both normal tissues and cancer tissues, and in both African and European cases; (3) assess whether the correlated genes contribute to the differences seen in normal and tumor tissues. Detailed analytical approaches are not presented here due to space limitations yet our previous studies and publications have demonstrated our expertise in gene expression analysis. Cell line based functional studies

Recombinant antigens corresponding to the open reading frames presented in Figure 22 have been generated. Polyclonal antibody against one of the spliced variants (variant 1) has been made.

The synthetic PSPHL protein sequence used to generate the PSPHL antibody is shown below, and corresponds to SEQ ID NO: 7:

SEQ ID NO: 7

MASASCSPGGALASPEPGRKILPRMISHSELRKLFYSADA VCFDVDSTVISEEG IGCFHWIWRKCDQATSQG

This polyclonal antibody has been used to detect the 72AA antigen in prostate cells with D/D genotype (REPE-2) and transfected with cDNAs corresponding to the ORF region (ORFonly) or the full length cDNA (FLcDNA) (ORF plus the 5' and 3' untranslated regions) as shown in the inserted western blot. Monoclonal antibodies can also be used. The antibodies will first be screened using the recombinant antigen and then used to detect the PSPHL protein in cell lines and tissue samples with known PSPHL genotypes. It is possible that, if the protein is present in low abundance and the antibody affinity is not high enough, the antigen will be concentrated through immunoprecipitation in order for detection in biological specimens. Gene manipulation followed by functional analysis are then performed, to assess whether experimental manipulation of PSPHL gene expression levels will have biological consequences in the form of altered cell proliferation, cell migration, survival, and adaptation to stress. The genotypes of commonly used human prostate cancer cells have been characterized(data not shown). Experimental depletion of PSPHL expression can be readily achieved using routine stable gene knockdown techniques in MDA -PCa-2b cells (D/I genotype), an AR positive cell line derived from an African American donor. The PSPHL protein will be expressed in E006AA cells, also a prostate cell line derived from an African American donor but with the rare OfD genotype, followed by the same functional assays. The effect of altered expression levels of P SPHL in the stable clones will be examined using a suite of assays including MTS (Promega) assay to examine overall cell growth, the cell adhesion assay, the anoikis resistance assay (BD Biosciences) to examine the effect of PSPHL on anchorage-independent survival, as well as Annexin V-FITC kit (BD Biosciences) based apoptosis assays to examine the effect of PSPHL in apoptosis. In addition, global gene expression changes will be examined and validated by quantatative RT-PCR following altered P SPHL expression and establishment of the stable clones. Expected results and alternative approaches

By inclusion of multiple groups of samples including paired normal and tumor samples from both African and European American cases the functional read-out of the PSPHL genotypes in the form of altered expression profiles will be assessed. It is possible that expression differences shown among the different genotypes in African Americans may not be mirrored in prostate specimens from European Americans, and that genes differentially expressed as a function of the genotype in tumor samples may not be replicated in the corresponding normal samples. The results will be utilized to provide functional insight regarding the differential impact of the genotype in different tissue histologies (i.e., normal or tumor) and different risk groups (i.e., European American and African American). The signature set of genes can be further analyzed by using the OntoExpress software (Khatri, 2002) or Gene Set enrichment analysis (Subramanian 2005) or other gene ontology tools and pathway analysis tools to identify key gene expression changes in specific functional categories. From the function of these concordant and discordant genes we may be able to draw mechanistic inferences regarding context-dependent genetic contribution of the PSPHL genotype to the development of prostate cancer. This analysis will also determine whether there is any expression change, within cases with the same genotypes, in the transition from normal prostate epithelium to prostate cancer. The PSPHL transcript levels were markedly different between normal and tumor samples derived from the same population as recently reported (Wallace 2008). Although our microarray data did not show difference between normal and cancer samples (Figure 2), the definitive assessment could be best achieved by follow-up RT-PCR due to limitations of microarray analysis when a D/D genotype (negative expression) was used to generate expression ratios. The results from the tissue based studies can be further corroborated by those from cell lines based studies and functional alterations as a result of manipulation of the PSPHL expression levels. Results from these efforts will provide mechanistic insights into the functional role of PSPHL in prostate cancer disparity.

Other Embodiments From the foregoing description, it will be apparent that variations and modifications may be made to the invention described herein to adopt it to various usages and conditions. Such embodiments are also within the scope of the following claims.

The recitation of a listing of elements in any definition of a variable herein includes definitions of that variable as any single element or combination (or subcombination) of listed elements. The recitation of an embodiment herein includes that embodiment as any single embodiment or in combination with any other embodiments or portions thereof.

All patents and publications mentioned in this specification are herein incorporated by reference to the same extent as if each independent patent and publication was specifically and individually indicated to be incorporated by reference.

REFERENCES

Aitman TJ, Dong R, Vyse TJ, Norsworthy PJ, Johnson MD, Smith J, Mangion J, Roberton- Lowe C, Marshall AJ, Petretto E, Hodges MD, Bhangal G, Patel SG, Sheehan-Rooney K, Duda M, Cook PR, Evans DJ, Domin J, Flint J, Boyle JJ, Pusey CD, Cook HT. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature. 2006 Feb 16;439(7078):851-5.

Bandyopadhyay S, Zhan R, Chaudhuri A, Watabe M, Pai SK, Hirota S, Hosobe S, Tsukada T, Miura K, Takano Y, Saito K, Pauza ME, Hayashi S, Wang Y, Mohinta S, Mashimo T, Iiizumi M, Furuta E, Watabe K. Interaction of KAIl on tumor cells with DARC on vascular endothelium leads to metastasis suppression. Nat Med. 2006 Aug;12(8):933-8

Chu KC, Tarone RE, Freeman HP. Trends in prostate cancer mortality among black men and white men in the United States. Cancer. 2003 Mar 15;97(6):1507-16

Clegg L, Li FP, Hankey BF, Chu K, Edwards BK. Cancer survival among US whites and minorities: a SEER (Surveillance, Epidemiology, and End Results) Program population- based study. Arch Intern Med. 2002;162(17):1985-93.

Conrad D. F., and Hurles M. E. The population genetics of structural variation. Nature Genetics 39 (7s): S30-S36, 2007.

Dunn T., Chen S., Faith D., Hicks J., Platz E. , Chen Y., Ewing C, Sauvageot J., Isaacs W., De Marzo A., Luo J. A Novel Role of Myosin VI in Human Prostate Cancer. American Journal of Pathology. 169(5): 1843-54. 2006.

Eeles RA, Kote-Jarai Z, Giles GG, Olama AA, Guy M, Jugurnauth SK, Mulholland S, Leongamornlert DA, Edwards SM, Morrison J, Field HI, Southey MC, Seven G, Donovan JL, Hamdy FC, Dearnaley DP, Muir KR, Smith C, Bagnato M, Ardern- Jones AT, Hall AL, O'Brien LT, Gehr-Swain BN, Wilkinson RA, Cox A, Lewis S, Brown PM, Jhavar SG, Tymrakiewicz M, Lophatananon A, Bryant SL; UK Genetic Prostate Cancer Study Collaborators; British Association of Urological Surgeons' Section of Oncology; UK ProtecT Study Collaborators, Horwich A, Huddart RA, Khoo VS, Parker CC, Woodhouse CJ, Thompson A, Christmas T, Ogden C, Fisher C, Jamieson C, Cooper CS, English DR, Hopper JL, Neal DE, Easton DF. Multiple newly identified loci associated with prostate cancer susceptibility.Nat Genet. 2008 Mar;40(3) :316-21.

Freedland SJ, Isaacs WB. Explaining racial differences in prostate cancer in the United States: sociology or biology? Prostate. 2005 Feb 15;62(3):243-52.

Gonzalez, E. et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-I /AIDS susceptibility. Science 307, 1434-1440 (2005).

Hamblin, M. T.; Thompson, E. E.; Di R ienzo, A. Complex signatures of natural selection at the Duffy blood group locus. Am. J. Hum. Genet. 70: 369-3 83, 2002.

Jakobsson J, Ekstrδm L, Inotsume N, Garle M, Lorentzon M, Ohlsson C, Roh HK, Carlstrόm K, Rane A. Large differences in testosterone excretion in Korean and Swedish men are strongly associated with a UDP-glucuronosyl transferase 2Bl 7 polymorphism. J Clin Endocrinol Metab. 2006 Feb;9 1(2): 687-93.

Jemal A, Siegel R, Ward E, Murray T, Xu J, Thun MJ. Cancer statistics, 2007. CA Cancer J Clin. 2007 Jan-Feb;57(l):43-66.

Khatri P, Draghici S, Ostermeier GC, Krawetz SA. Profiling gene expression using onto- express. Genomics 2002; 79(2):266-270.

Kittles RA, Chen W, Panguluri RK, Ahaghotu C, Jackson A, Adebamowo CA, Griffin R, Williams T, Ukoli F, Adams-Campbell L, Kwagyan J, Isaacs W, Freeman V, Dunston GM. CYP3A4-V and prostate cancer in African Americans: causal or confounding association because of population stratification? Hum Genet. 2002 Jun;l 10(6):553-60

Kuo CH, Miyazaki D, Nawata N, Tominaga T, Yamasaki A, Sasaki Y, Inoue Y. Prognosis- determinant candidate genes identified by whole genome scanning in eyes with pterygia Invest Ophthalmol Vis Sci. 2007 Aug;48(8):3566-75.

Luo, J., Duggan, D. J., Chen, Y., Sauvageot, J., Ewing, CM., Bittner, M. L., Trent, J. M., and Isaacs, W.B. Human Prostate Cancer and Benign Prostatic Hyperplasia: Molecular Dissection by Gene Expression Profiling. Cancer Research 61(12): 4683-4688, 2001.

Luo, J., Zha, S., Gage, W.R., Dunn, T.A., Hicks, J., Bennett, C.J., Ewing, CM., Platz, E.A., Ferdinandusse, S., Wanders, R. J., Trent, J. M., Isaacs, W. B., and De Marzo, A.M. - Methylacyl-CoA Racemase: A New Molecular Marker for Prostate Cancer. Cancer Research 62(8): 2220-2226, 2002.

Park J, Chen L, Ratnashinge L, Sellers TA, Tanner JP, Lee JH, Dossett N, Lang N, Kadlubar FF, Ambrosone CB, Zachariah B, Heysek RV, Patterson S, Pow-Sang J. Deletion polymorphism of UDP-glucuronosyltransferase 2B 17 and risk of prostate cancer in African American and Caucasian men. Cancer Epidemiol Biomarkers Prev. 2006 Aug; 15(8): 1473-8.

Perry GH, Ben-Dor A, Tsalenko A, Sampas N, Rodriguez-Revenga L, Tran CW, Scheffer

A, Steinfeld I, Tsang P, Yamada NA, Park HS, Kim JI, Seo JS, Yakhini Z, Laderman S, Bruhn L, Lee C. The fine-scale and complex architecture of human copy-number variation. Am J Hum Genet. 2008 Mar;82(3):685-95.

Planitzer SA, Machl AW, Rueckels M, Kubbies M. Identification of a novel c-DNA overexpressed in Fanconi's anemia fibroblasts partially homologous to a putative L-3- phosphoserine-phosphatase. Gene. 1998 Apr 1 4;2 10(2) :297-306.

Ries LAG, Melbert D, Krapcho M, Mariotto A, Miller BA, Feuer EJ, Clegg L, Homer MJ, Howlader N, Eisner MP, Reichman M, Edwards BK (eds). SEER Cancer Statistics Review, 1975-2004, National Cancer Institute. Bethesda, MD, 2007.

Sebat J Major changes in our DNA lead to major changes in our thinking. Nature Genetics 39 (7s): S3-S5, 2007

Stefansson, H. et al. A common inversion under selection in Europeans. Nat. Genet. 37, 129— 137 (2005).

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome- wide expression profiles. Proc Natl Acad Sci U S A 2005; 102(43): 15545-15550. Thomas G, Jacobs KB, Yeager M, Kraft P, Wacholder S, Orr N, Yu K, Chatterjee N, Welch R, Hutchinson A, Crenshaw A, Cancel-Tassin G, Staats BJ, Wang Z, GOnZaIeZ-¹BoSqUCt J, Fang J, Deng X,

Berndt SI, Calle EE, Feigelson HS, Thun MJ, Rodriguez C, Albanes D, Virtamo J, Weinstein S, Schumacher FR, Giovannucci E, Willett WC, Cussenot O, Valeri A, Andriole GL, Crawford ED, Tucker M, Gerhard DS, Fraumeni JF Jr, Hoover R, Hayes RB, Hunter DJ, Chanock SJ. Multiple loci identified in a genome - wide association study of prostate cancer.Nat Genet. 2008 Mar;40(3):310 -5.

Tournamille, C; Colin, Y.; Cartron, J. P.; Le Van Kim, C. Disruption of a GATA motif in the Duffy gene promoter abolishes erythroid gene expression in Duffy-negative individuals. Nature Genet. 10: 224-228, 1995.

Wallace TA, Prueitt RL, Yi M, Howe TM, Gillespie JW, Yfantis HG, Stephens RM, Caporaso NE, Loffredo CA, Ambs S. Tumor immunobiological differences in prostate cancer between African- American and European-American men. Cancer Res. 2008 Feb 1 ;68(3):927-36.

Zheng SL, Sun J, Wiklund F, Smith S, Stattin P, Li G, Adami HO, Hsu FC, Zhu Y, BSlter K, Kader AK, Turner AR, Liu W, Bleecker ER, Meyers DA, Duggan D, Carpten JD, Chang BL, Isaacs WB, Xu J, Grsnberg H. Cumulative association of five genetic variants with prostate cancer. N Engl J Med. 2008 Feb 28;358(9):910-9. Epub 2008 Jan 16

Claims

What is claimed is:

1. A method of detecting the presence or absence of a nucleic acid segment in the (Phosphoserine phosphatase-like) PSPHL gene locus of a subject, wherein the presence or absence of the nucleic acid segment in the gene locus indicates an altered risk of cancer.

2. The method of claim 1 , wherein the cancer is prostate cancer.

3. The method of claim 1, wherein the presence or absence of the nucleic acid segment in the PSPHL gene locus is detected in an African American subject.

4. The method of claim 1, wherein the absence of the nucleic acid segment indicates an increased risk of prostate cancer in the African American subject.

5. The method of claim 1, wherein the nucleic acid segment comprises 133 base pairs of exon 1 of human PSPHL mRNA encoded by GenBank Accession No. AJOOl 612 corresponding to SEQ ID NO: 1.

6. The method of claim 1, wherein the nucleic acid segment comprises SEQ ID NO: 2.

7. The method of claim 1, wherein the nucleic acid segment comprises SEQ ID NO: 13.

8. The method of claim 1, wherein the nucleic acid segment comprises SEQ ID NO: 14.

9. The method of claim 1, wherein the nucleic acid segment comprises SEQ ID NO: 15.

10. The method of claim 1, wherein the presence of the insertion allele of the PSPHL gene locus is correlated with the expression of the PSPHL gene product.

1 1. The method of claim 1 , wherein the absence of the insertion allele of the PSPHL gene locus is correlated with the absence of the PSPHL gene product.

12. The method of claim 1, wherein the deletion allele is associated with the expression of a set of genes.

13. The method of claim 1 , wherein the subject is homozygous for a deletion in the PSPHL gene locus.

14. The method of claim 1 , wherein the subject is heterozygous for a deletion in the in the PSPHL gene locus.

15. The method of claim 13, wherein the homozygous deletion allele is associated with the expression of a set of genes.

16. The method of claim 14, wherein the heterozygous deletion allele is associated with the expression of a set of genes.

17. The method of claim 10, wherein the expression of the PSPHL gene product is associated with the expression of a set of genes.

18. A method of determining the ancestry of a subject comprising detecting the presence or absence of a nucleic acid segment in the PSPHL gene locus in a sample from a subject, wherein the presence or absence of the variation indicates the ancestry of the subject.

19. The method of claim 18, wherein the presence or absence of a nucleic acid segment is indicative of African American or European American ancestry.

20. The method of claim 18, wherein the absence of the nucleic acid segment identifies the subject as having an African American ancestry.

21. The method of any one of claims 1 - 20, further comprising selecting subjects with an increased risk of developing prostate cancer.

22. The method of claim 21, further comprising obtaining a sample from the subjects.

23. A biomarker for prostate cancer in an African American subject comprising an insertion in the PSPHL gene locus, wherein the presence of the biomarker is correlated with a decreased risk of prostate cancer.

24. The biomarker of claim 23, wherein the insertion encodes a nucleic acid comprising 133 base pairs of exon 1 of human PSPHL mRNA encoded by GenBank Accession No. AJ001612 corresponding to SEQ ID NO: 1.

25. The biomarker of claim 23, wherein the insertion encodes a nucleic acid comprising SEQ ID NO: 2.

26. The biomarker of claim 23, wherein the absence of the biomarker is correlated with an increased risk of prostate cancer in the African American subject.

27. The biomarker of claim 23, wherein the presence of the insertion in the PSPHL gene locus is correlated with the expression of the PSPHL gene product.

28. The biomarker of claim 23, wherein the insertion allele is associated with the expression of a set of genes.

29. A method of identifying a subject at risk for developing prostate cancer comprising: detecting the presence or absence of a nucleic acid segment in the PSPHL gene locus of a subject to determine the genotype of the subject, wherein the absence of the nucleic acid segment in the gene locus indicates an increased risk of prostate cancer.

30. A method of determining the prognosis of a patient with prostate cancer comprising: detecting the presence or absence of a nucleic acid segment in the PSPHL gene locus of a subject, wherein the absence of the variation determines the prognosis of a patient with prostate cancer.

31. The method of claim 30, wherein prognosis determines the course of treatment.

32. The method of claim 29 or 30, wherein the subject is homozygous for a deletion in the in the PSPHL gene locus.

33. The method of claim 29 or 30, wherein the subject is heterozygous for a deletion in the in the PSPHL gene locus.

34. The method of claim 29 or 30, wherein the subject is selected from an African American population.

35. The method of claim 29 or 30, wherein the absence of the nucleic acid segment indicates an increased risk of, or risk of recurrence of, prostate cancer.

36. The method of claim 29 or 30, wherein the nucleic acid comprises 133 base pairs of exon 1 of human PSPHL mRNA encoded by GenBank Accession No. AJ001612 corresponding to SEQ ID NO: 1.

37. The method of claim 29 or 30, wherein the nucleic acid comprises SEQ ID NO: 2.

38. The method of claim 29 or 30, wherein the presence of the insertion allele of the PSPHL gene locus is correlated with the expression of the PSPHL gene product.

39. The method of claim 29 or 30, wherein the absence of the insertion allele of the PSPHL gene locus is correlated with the absence of the PSPHL gene product.

40. The method of claim 32, wherein the homozygous deletion allele is associated with the expression of a set of genes.

41. The method of claim 33, wherein the heterozygous deletion allele is associated with the expression of a set of genes.

42. The method of claim 29 or 30 wherein the presence or absence of a nucleic acid segment in the PSPHL gene locus is determined using a polymerase chain reaction (PCR) assay.

43. The method of claim 42, wherein the PCR assay is a multiplexed PCR assay.

44. The method of claim 42, wherein the PCR is carried out using primers comprising the nucleic acid sequences as set forth as SEQ ID NO: 3 and SEQ ID NO: 4 and primers comprising the nucleic acid sequences as set forth as SEQ ID NO: 5 and SEQ ID NO: 6.

45. The method of claim 44, wherein the nucleic acid sequences as set forth as SEQ ID NO: 3 and SEQ ID NO: 4 amplify a 133 base pair fragment of the insertion sequence in exon 1 of the PSPHL gene.

46. The method of claim 44, wherein the nucleic acid sequences as set forth as SEQ ID NO: 5 and SEQ ID NO: 6 generate an amplicon only if the insertion sequence is absent.

47. The method of claims 29 or 30 , wherein the subject has previously been treated for prostate cancer.

48. The method of claims 29 or 30, wherein the measurement is performed after surgery or therapy to treat prostate cancer.

49. An antibody to detect PSPHL protein in cells and tissues with PSPHL genotypes.

50. The antibody of claim 49, wherein the antibody is polyclonal.

51. The antibody of claim 49, wherein the antibody is monoclonal.

52. The antibody of claim 50, wherein the polyclonal antibody is directed to the 72AA antigen of prostate cells corresponding to SEQ ID NO: 7.

53. A kit for use in identifying a subject at risk for developing prostate cancer comprising: primers directed to amplify a 133 base pair sequence of exon 1 of human PSPHL mRNA encoded by GenBank Accession No. AJOOl 612 corresponding to SEQ ID NO: 1, and instructions for use.

54. The kit of claim 53, wherein the primers comprise the nucleic acid sequences as set forth as SEQ ID NO: 3 and SEQ ID NO: 4.

55. The kit of claim 53, wherein the primers comprise the nucleic acid sequences as set forth as SEQ ID NO: 5 and SEQ ID NO: 6.

56. A kit comprising primers comprising the nucleic acid sequences set forth as SEQ ID NO: 3 and SEQ ID NO: 4, and instructions for use.

57. A kit comprising primers comprising the nucleic acid sequences set forth as SEQ ID NO: 5 and SEQ ID NO: 6, and instructions for use.

58. A kit comprising primers designed against the nucleic acid sequence set forth as SEQ ID NO: 17, and instructions for use.

59. A kit comprising primers designed against the nucleic acid sequence set forth as SEQ ID NO: 18, and instructions for use.

60. A kit comprising primers designed against the nucleic acid sequence set forth as SEQ ID NO: 19, and instructions for use.

61. The kit of any one of claims 54 — 60, further comprising instructions for use in PCR assay.

62. The kit of claim 61 , wherein the PCR is multiplexed PCR.

63. A kit for use in identifying a subject at risk for developing prostate cancer comprising: an antibody directed to a PSPHL antigen, and instructions for use.

64. A kit comprising an antibody directed to a PSPHL antigen.

65. The kit of claim 63 or 64, wherein the antibody is monoclonal.

66. The kit of claim 63 or 64, wherein the antibody is polyclonal.

67. The kit of claim 66, wherein the polyclonal antibody is used to detect the 72AA antigen.

68. The kit of claim 67, wherein the polyclonal antibody is directed to a sequence encoded by SEQ ID NO: 7.