WO2013142982A1 - Colca1 and colca2 and their use for the treatment and risk assessment of colon cancer - Google Patents

Colca1 and colca2 and their use for the treatment and risk assessment of colon cancer Download PDF

Info

Publication number
WO2013142982A1
WO2013142982A1 PCT/CA2013/000306 CA2013000306W WO2013142982A1 WO 2013142982 A1 WO2013142982 A1 WO 2013142982A1 CA 2013000306 W CA2013000306 W CA 2013000306W WO 2013142982 A1 WO2013142982 A1 WO 2013142982A1
Authority
WO
WIPO (PCT)
Prior art keywords
protein
expression
colca1
colon cancer
colca2
Prior art date
Application number
PCT/CA2013/000306
Other languages
French (fr)
Inventor
Thomas J. Hudson
Vanya PELTEKOVA
Mathieu LEMIRE
Original Assignee
Ontario Institute For Cancer Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ontario Institute For Cancer Research filed Critical Ontario Institute For Cancer Research
Publication of WO2013142982A1 publication Critical patent/WO2013142982A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57419Specifically defined cancers of colon
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • A61K38/16Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • A61K38/17Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • This invention relates to newly characterized COLCA1 and COLCA2 and their use in the treatment and risk assessment of colon cancer.
  • an isolated protein comprising SEQ ID NO. 1 or a functional fragment thereof.
  • an isolated protein comprising SEQ ID NO. 3, 5, 7, 9, 11 or a functional fragment thereof.
  • an expression vector comprising the nucleic acid described herein operably linked to an expression control sequence.
  • a cultured cell comprising the vector described herein.
  • a method of determining risk of colon cancer in a patient using a sample therefrom comprising: determining the level of expression of at least one of COLCA1 and COLCA2; and comparing the level of expression of the sample with a control sample; wherein a higher level of expression of at least one of COLCA1 and COLCA2 in the patient sample compared to the control indicates a low risk of colon cancer.
  • a diagnostic kit for determining risk of colon cancer in a patient comprising reagents for detecting the level of gene or protein expression of at least one of COLCA1 and COLCA2 in a patient sample and instructions for use.
  • a method of treating or preventing colon cancer in a subject comprising administering the protein described herein.
  • the protein described herein for treating or preventing colon cancer in a subject. In an aspect, there is provided a use of the protein described herein for treating or preventing colon cancer in a subject.
  • a use of the protein described herein in the preparation of a medicament for treating or preventing colon cancer in a subject comprising a therapeutically effective amount of the protein of claim 1 or 2 and a pharmaceutically acceptable carrier.
  • Figure 1 shows association analysis of cases and controls from the Ontario Familial Colorectal Cancer Registry.
  • A Manhattan plot showing the significance level, on the negative log scale, for all variants in frequency above 1% in 11 GWAS regions. Red dots indicate published GWAS SNPs.
  • B Quantile-quantile plots of significance levels against theoretical quantiles for unconditional tests of association. Red lines represent 95% confidence bands.
  • C Same plot as in (B), but with tests of association conditional on GWAS SNP genotypes.
  • D Same plot and data as in (C), restricted to tag SNPs at r ⁇ O.5.
  • E Architecture of the 11q23 locus.
  • base position and known genes From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs3802842); LD structure between all variants, with color shading showing the squared correlation coefficient i ⁇ .the complete amino sequences of the expressed SIRP proteins.
  • Figure 2 shows risk-associated genotypes correlate with decreased expression of 11q23.1 transcripts in colon tissues.
  • A Relative expression levels for C11orf53, C11orf92/COLCA1 , C1 orf93/COLCA2 and POU2AF1 in benign adjacent (BA) and tumor (T) samples as a function of rs3802842 genotype. For each transcript, expression data is shown for rs3802842 AA (blue bars), AC (yellow bars), and CC (red bars) genotypes. Error bars indicate SEM.P values are derived from one-way ANOVA followed by Student-Newman-Keuls test. * P ⁇ 0.01 , ** P ⁇ 0.001.
  • C11orf92/COLCA1 (orange) and C11orf93/COLCA2 (blue), which are located on opposite DNA strands. Each gene contains a tandem of multiple non-coding first exons (1 * ) which are spliced to a set of constant exons (labeled as x2, x3, etc.).
  • C and D Luciferase expression in HeLa cells comparing risk (RH) and protective (PH) haplotypes at the COLCA1/COLCA2 bidirectional promoter.
  • Figure 3 shows Western blot and immunohistochemistry of COLCA1 expression in colon biopsy samples.
  • A COLCA1 expression in colon tissues obtained at the time of CRC or adenoma resections is higher in benign adjacent tissues compared to CRC tumors.
  • B COLCA1 expression is higher in benign adjacent colon tissues from patients homozygous for the protective rs3802842 allele (AA) compared to the risk allele (CC).
  • C, D, E and F Immunohistochemical staining for COLCA1 (brown; hematoxylin counterstain; scale bars, 50 ⁇ ) on human colon, benign adjacent (BA) (C and E) and tumor (T) (D and F) tissues for patients with protective (C and D) and risk (E and F) genotypes.
  • G-H 100x oil objective images (scale bars, 10 Dm) of representative tissues immunostained with anti-human C11orf92 antibodies (brown; hematoxylin counterstain) identified strong characteristic COLCA1 positive signals in intracellular granules (G) and extracellular granules (H).
  • Cord blood CD34 " cells; Peripheral blood: CD 123 + basophils, CD16- eosinophils, CD16 + neutrophils, mononuclear cell fraction (MNC), and polymorphonuclear cell fraction (PMN).
  • B Peripheral blood: PMN, CD4 + T cells, CD8 + T cells, CD56 + NK cells, CD14 + monocytes.
  • C Peripheral blood: CD19 + B cells, iDC dendritic cells; Cord blood CD34 ' and CD34 + . All western blots were re-probed with anti-/i?-actin antibody; normal colon tissue was run as a positive control.
  • Eosinophils COLCA1 staining (red) with CD45 (blue) and eosinophil major basic protein (green).
  • Figure 6 shows architecture of the 8q23 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs 16892766); LD structure between all variants, with color shading showing the squared correlation coefficient r 2 .
  • Figure 7 shows architecture of the 8q24 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs6983267); LD structure between all variants, with color shading showing the squared correlation coefficient r 2 .
  • Figure 8 shows architecture of the 9p24 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs719725); LD structure between all variants, with color shading showing the squared correlation coefficient r 2 .
  • Figure 9 shows architecture of the 10p14 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs7894531); LD structure between all variants, with color shading showing the squared correlation coefficient r 2 .
  • Figure 10 shows architecture of the 14q22 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs4444235); LD structure between all variants, with color shading showing the squared correlation coefficient r 2 .
  • Figure 11 shows architecture of the 15q13 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs4779584); LD structure between all variants, with color shading showing the squared correlation coefficient r 2 .
  • Figure 12 shows architecture of the 16q22 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs9929218); LD structure between all variants, with color shading showing the squared correlation coefficient r 2 .
  • Figure 13 shows architecture of the 18q21 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs4939827); LD structure between all variants, with color shading showing the squared correlation coefficient r 2 .
  • Figure 14 shows architecture of the 19q13 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs10411210); LD structure between all variants, with color shading showing the squared correlation coefficient r 2 .
  • Figure 15 shows architecture of the 20p12 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs961253); LD structure between all variants, with color shading showing the squared correlation coefficient r 2 .
  • Figure 16 shows mRNA expression profiles for C11orf92/COLCA1 , C11orf93/COLCA2 and C11orf53.
  • RT-PCR expression of the transcripts was assessed on first strand cDNA for multiple tissue panels (Clontech, Inc) using primers specific for each gene. A yS-actin primer pair was used as internal control.
  • A Human digestive system panel.
  • B Human immune Panel. For both panels, the amplified cDNA product sizes are indicated on the left and the transcript names are shown on the right. Acronyms are used for lymph node (LN), peripheral blood lymphocytes (PBL), bone marrow (BM) and fetal liver (FL).
  • LN lymph node
  • PBL peripheral blood lymphocytes
  • BM bone marrow
  • FL fetal liver
  • Figure 17 shows tissue and cell line mRNA expression profiles for C11orf92/COLCA1 , C11orf93/COLCA2 and C11orf53.
  • RT-PCR of transcripts was performed on first strand cDNA from multiple tissue panels (Clontech, Inc) using primers specific for each gene. A ⁇ -actin primer pair was used as an internal control.
  • A Human adult tissue panel.
  • B Human cell line panel. For both panels, the amplified cDNA product sizes are indicated on the left and the transcript names are shown on the right. Details on cell lines are found in the Materials and Methods section.
  • Figure 18 shows C11orf92/COLCA1 splice isoforms correlate with rs10891246 genotypes.
  • A Schematic of the long (B-L) and short (B-S) isoforms of C11orf92. Transcripts are shown as light gray bars (non-coding) and dark gray bars (coding); introns are indicated as thin black lines. Solid horizontal lines below the Isoform B-L transcript indicate TaqMan probe positions: probe S (red) and probe L (blue). SNP positions and alleles are drawn relative to each other; therefore, this map is not to physical scale.
  • B RT-PCR using isoform-specific primer sets for the B-L and B-S transcripts.
  • GWAS marker rs3802842 genotypes (AA, AC and CC) are indicated above. Colon tissue source is indicated below: benign adjacent (BA) and tumor (T).
  • C Relative expression of long (B-L) isoform, calculated by dividing the expression value of the isoform by the expression value of the housekeeping gene, GAPDH. All data are plotted as relative to the expression in non-tumor colon samples.
  • FIG 19 shows ENCODE chromatin features, transcription factor sequence motifs and polymorphisms at the chromosome 11q23.1 locus.
  • ENCODE features in human lymphoblastoid cell lines for CTCF occupancy and histone modifications. Specific regions that are enriched with chromatin marks are shown in boxes. All SNPs in this region are listed and affected putative transcription factor (TF) binding sites (below the rsSNPs) are identified. The protective alleles are shown in blue within the brackets.
  • the core sequence of the transcription binding site (capital letters) is defined by the highest conserved consecutive position of the matrix (Genomatix, Software GmbH). The letters in red represent the conserved TF binding sequences with a degree of conservation more than 60 (ci-value > 60).
  • the red asterisk denotes the GWAS SNP rs3802842.
  • the rs5794738 SNP shows a 9 bp deletion.
  • FIG. 20 shows Western blots validate the specificity of anti-C11orf92 antibody.
  • C11orf92-GFP is a construct of COLCA1/c11orf92 cDNA fused with the GFP protein.
  • A Western blot probed with the anti-C11orf92 antibody.
  • B Western blot probed with the anti-GFP antibody using the same cell lysates as in panel A. UT and V are non-transfected and vector transfected controls. Colon tissue was used as a positive control.
  • Figure 21 shows COLCA1 protein expression in the mucosal stroma of colon tissues.
  • Representative (A) benign adjacent (BA) and (B) tumor (T) tissues were immunostained with anti-human C11orf92 antibody. Sections were counterstained with hematoxylin. Rabbit IgG stained sections are shown alongside as negative controls. Scale bars (bottom right) are 50 ⁇
  • Figure 22 shows expression of the COLCA1/C11orf92 protein in the proximity of tumor cells.
  • a and B Double immunohistochemical staining shows COLCA1 protein (red) and the tumor cell-specific carcinoembryonic antigen (CEA, brown) expression in the colon benign adjacent (BA, left panel) and the tumor (T, right panel) tissues. Sections were counterstained with hematoxylin. Corresponding higher magnification images are shown in the bottom panels. Scale bars in the upper and lower panels are 50 pm and 20 ⁇ , respectively.
  • Figure 23 shows expression of the COLCA1/C11orf92 protein in eosinophils as visualized by the three-dimensional deconvolution microscopy. Confocal images from the colon tissue show color composites at different magnifications. Scale bars in panels A, B, and C-F are 18 ⁇ , 9 ⁇ , and 3.8pm, respectively. Grey scale components of the merged image in panel C are shown in panels D-F (blue, green and red, respectively). The upper panels in C-F represent z cross-sections.
  • the COLCA1 protein (red) co-resides with the immune cell marker CD45 (blue) and with the eosinophil major basic protein (green).
  • Figure 24 shows expression of the COLCA1/C11orf92 protein in mast cells as seen by the three-dimensional deconvolution microscopy. Confocal images from the colon tissue show color composites at different magnifications. Scale bars in panels A, B, and C-F are 18 pm, 9 ⁇ , and 3.2 pm, respectively. Grey scale components of the merged image in panel C are shown in panels D-F (blue, green and red, respectively). The upper panels in C-F show z cross-sections. COLCA1 protein (red) co-resides with the immune cell marker CD45 (blue) and with the mast cell tryptase (green).
  • Figure 25 shows expression of the COLCA1/C11orf92 protein in neutrophils as viewed by the three-dimensional deconvolution microscopy. Confocal images from the colon tissue show color composites at different magnifications. Scale bars in panels A, B, and C-F are 18 pm, 9 ⁇ , and 3.3 ⁇ , respectively. Grey scale components of the merged image in panel C are shown in panels D-F (blue, green and red, respectively). The upper panels in C-F show z cross-sections. COLCA1 protein (red) co-resides with the immune cell marker CD45 (blue) and with the neutrophil elastase (green).
  • Figure 26 shows expression of the COLCA1/C11orf92 protein in macrophages as seen by the three-dimensional deconvolution microscopy. Confocal images from the colon tissue show color composites at different magnifications. Scale bars in panels A, B, and C-F are 18 pm, 9 ⁇ , and 3.6 pm, respectively. Grey scale components of the merged image in panel C are shown in panels D-F (blue, green and red, respectively). The upper panels in C-F represent z cross-sections. COLCA1 protein (red) co-resides with the immune cell marker CD45 (blue) and with the macrophage specific CD68 protein (green).
  • Figure 27 shows expression of COLCA1/C11orf92 protein in dendritic cells as seen by three-dimensional deconvolution microscopy. Confocal images from the colon tissue show color composites at different magnifications. Scale bars in panels A, B, and C-F are 18 pm, 9 pm, and 3.2 pm, respectively. Grey scale components of the merged image in panel C are shown in panels D-F (blue, green and red, respectively). The upper panels in C-F show z cross-sections. COLCA1 protein (red) co-resides with the immune cell marker CD45 (blue) and with the dendritic cell marker, CD83 (green).
  • Figure 28 shows immunofluorescence confocal images from colon tissue show color composites in two dimensional (A) and three-dimensional opacity rendering (B).
  • COLCA1 protein is seen as granules (red) in close proximity of the tumor cells, CEA (grey blue). Scale bars in panels A and B are 20 pm and 12 pm, respectively.
  • Figure 29 shows Revised organization of the COLCA1 and COLCA2 genes in comparison with the RefSeq gene structure from the UCSC NCBI36/hg 18 assembly. Red and blue boxes (exons) and lines with arrowheads (introns), represent COLCA1 or COLCA2 genes, and their location on minus or plus strands, respectively.
  • Exons are shown individually or as part of the transcripts that were identified by sequencing.
  • the 12 CRCassociated SNPs are shown as green bars with GWAS SNP rs3802842 as the purple bar. Thicker boxes represent coding regions.
  • the SGP program developed at the Genome Bioinformatics Laboratory shows gene predictions using mouse/human homology for COLCA2.
  • ENCODE histone methylation and acetylation marks indicate regulatory activity at the COLCA1/ COLCA2 locus.
  • Figure 30 shows Organization of the COLCA1 gene.
  • (A) COLCA1 is organized into variable (yellow boxes) and constant exons (pink box), spanning genomic intervals of 6.4 Kb and 5.3 Kb, respectively.
  • the thick pink boxes represent coding region on the constant exon.
  • Figure 31 shows Organization of the COLCA2 gene.
  • Figure 32 shows Western blot analysis of COLCA2 proteins encoded by multiple transcripts.
  • A Distribution of alternatively spliced COLCA2 protein isoforms in a, colon biopsy samples from benign adjacent (BA) and tumor (T) tissues, peripheral blood leukocytes (PBL), CD34- cells, human myeloid (AML2 and HL60), and colon cancer (HT29) cell lines.
  • the migration pattern of five of them coincides with the predicted molecular weights (kDa) of five verified protein-encoded COLCA2 transcripts.
  • B Immunodetection of COLCA2 protein isoforms in colon benign adjacent (BA) and tumor (T) samples from patients homozygous for the risk (CC) or protective (AA) allele.
  • C COLCA2 transcript-specific RT-PCR. Total RNA extracted from normal colon and peripheral blood lymphocytes, was reverse-transcribed into cDNA followed by PCR amplification using isoform specific primer sets (see Table 11).
  • M DNA molecular size marker.
  • COLCA1 and COLCA2 share a bidirectional promoter and are co-regulated.
  • Immunochemical studies of COLCA1 in colonic tissues reveal strong co-localization in cytoplasmic granules present in eosinophils, mast cells, neutrophils, macrophages and dendritic cells.
  • COLCA1 exists within extracellular granules in normal mucosa and at the periphery of colon cancer cells.
  • an isolated protein comprising SEQ ID NO. 1 or a functional fragment thereof.
  • polypeptide and protein are used interchangeably and mean proteins, protein fragments, modified proteins, amino acid sequences and synthetic amino acid sequences.
  • the polypeptide can be glycosylated or not.
  • fragment' relating to a polypeptide or polynucleotide means a polypeptide or polynucleotide consisting of only a part of the intact polypeptide sequence and structure, or the nucleotide sequence and structure, of the reference gene.
  • the polypeptide fragment can include a C-terminal deletion and/or N-terminal deletion of the native polypeptide, or can be derived from an internal portion of the molecule.
  • a polynucleotide fragment can include a 3' and/or a 5' deletion of the native polynucleotide, or can be derived from an internal portion of the molecule.
  • an isolated protein comprising SEQ ID NO. 3, 5, 7, 9, 11 or a functional fragment thereof.
  • an isolated nucleic acid encoding the protein of any one of claims 1 3, 5, 7, 9 and 11.
  • the isolated nucleic acid comprises SEQ ID NO. 2, 4, 6, 8, 10 or 12.
  • an expression vector comprising the nucleic acid described herein operably linked to an expression control sequence.
  • a cultured cell comprising the vector described herein.
  • a method of determining risk of colon cancer in a patient using a sample therefrom comprising: determining the level of expression of at least one of COLCA1 and COLCA2; and comparing the level of expression of the sample with a control sample; wherein a higher level of expression of at least one of COLCA1 and COLCA2 in the patient sample compared to the control indicates a low risk of colon cancer.
  • level of expression or “expression level” as used herein refers to a measurable level of expression of the products of biomarkers, such as, without limitation, the level of messenger RNA transcript expressed or of a specific exon or other portion of a transcript, the level of proteins or portions thereof expressed of the biomarkers, the number or presence of DNA polymorphisms of the biomarkers, the enzymatic or other activities of the biomarkers, and the level of specific metabolites.
  • control refers to a specific value or dataset that can be used to prognose or classify the value e.g. expression level or reference expression profile obtained from the test sample associated with an outcome class.
  • control refers to a specific value or dataset that can be used to prognose or classify the value e.g. expression level or reference expression profile obtained from the test sample associated with an outcome class.
  • the term “differentially expressed” or “differential expression” as used herein refers to a difference in the level of expression of the biomarkers that can be assayed by measuring the level of expression of the products of the biomarkers, such as the difference in level of messenger RNA transcript or a portion thereof expressed or of proteins expressed of the biomarkers. In a preferred embodiment, the difference is statistically significant.
  • the term “difference in the level of expression” refers to an increase or decrease in the measurable expression level of a given biomarker, for example as measured by the amount of messenger RNA transcript and/or the amount of protein in a sample as compared with the measurable expression level of a given biomarker in a control.
  • low risk refers to a lower risk of colon cancer as compared to a general or control population.
  • sample refers to any fluid, cell or tissue sample from a subject that can be assayed for biomarker expression products and/or a reference expression profile, e.g. genes differentially expressed in subjects.
  • the level of gene expression is determined and compared.
  • RNA products of the biomarkers within a sample, including arrays, such as microarrays, RT-PCR (including quantitative RT-PCR), nuclease protection assays and Northern blot analyses.
  • arrays such as microarrays, RT-PCR (including quantitative RT-PCR), nuclease protection assays and Northern blot analyses.
  • nucleic acid includes DNA and RNA and can be either double stranded or single stranded.
  • hybridize or “hybridizable” refers to the sequence specific non-covalent binding interaction with a complementary nucleic acid.
  • the hybridization is under high stringency conditions. Appropriate stringency conditions which promote hybridization are known to those skilled in the art, or can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1 6.3.6. For example, 6.0 x sodium chloride/sodium citrate (SSC) at about 45°C, followed by a wash of 2.0 x SSC at 50°C may be employed.
  • SSC sodium chloride/sodium citrate
  • probe refers to a nucleic acid sequence that will hybridize to a nucleic acid target sequence.
  • the probe hybridizes to an RNA product of the biomarker or a nucleic acid sequence complementary thereof.
  • the length of probe depends on the hybridization conditions and the sequences of the probe and nucleic acid target sequence. In one embodiment, the probe is at least 8, 10, 15, 20, 25, 50, 75, 100, 150, 200, 250, 400, 500 or more nucleotides in length.
  • primer refers to a nucleic acid sequence, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of synthesis when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand is induced (e.g. in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH).
  • the primer must be sufficiently long to prime the synthesis of the desired extension product in the presence of the inducing agent. The exact length of the primer will depend upon factors, including temperature, sequences of the primer and the methods used.
  • a primer typically contains 15-25 or more nucleotides, although it can contain less or more.
  • the level of gene expression is determined by hybridizing a labelled probe to at least one of COLCA1 and COLCA2 mRNA and detecting labelled probe hybridized to the mRNA.
  • the level of gene expression is determined on a DNA microarray.
  • the method further comprises polymerase chain reaction (PCR) to amplify the mRNA.
  • the level of gene expression is determined by using a tag based analysis, preferably serial analysis of gene expression (SAGE).
  • the level of protein expression is determined and compared.
  • a person skilled in the art will appreciate that a number of methods can be used to determine the amount of a protein product of the biomarker of the invention, including immunoassays such as Western blots, ELISA, and immunoprecipitation followed by SDS-PAGE and immunocytochemistry.
  • immunoassays such as Western blots, ELISA, and immunoprecipitation followed by SDS-PAGE and immunocytochemistry.
  • the level of protein expression is determined by binding a COLCA1 or COLCA2 specific antibody to COLCA1 or COLCA2 respectively and detecting the presence of the resulting protein-antibody complex.
  • the sample is a colon tissue sample. In other embodiments, the sample is a peripheral blood sample.
  • a use of the protein described herein for determining the risk of colon cancer comprising reagents for detecting the level of gene or protein expression of at least one of COLCA1 and COLCA2 in a patient sample and instructions for use.
  • the instructions correlate to the method steps described herein.
  • a method of treating or preventing colon cancer in a subject comprising administering the protein described herein.
  • the protein described herein for treating or preventing colon cancer in a subject.
  • a use of the protein described herein for treating or preventing colon cancer in a subject there is provided a use of the protein described herein for treating or preventing colon cancer in a subject. In an aspect, there is provided a use of the protein described herein in the preparation of a medicament for treating or preventing colon cancer in a subject.
  • a pharmaceutical composition for the treatment of colon cancer comprising a therapeutically effective amount of the protein of claim 1 or 2 and a pharmaceutically acceptable carrier.
  • harmaceutically acceptable carrier means any and all solvents, dispersion media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents, and the like that are physiologically compatible.
  • pharmaceutically acceptable carriers include one or more of water, saline, phosphate buffered saline, dextrose, glycerol, ethanol and the like, as well as combinations thereof.
  • isotonic agents for example, sugars, polyalcohols such as mannitol, sorbitol, or sodium chloride in the composition.
  • Pharmaceutically acceptable carriers may further comprise minor amounts of auxiliary substances such as wetting or emulsifying agents, preservatives or buffers, which enhance the shelf life or effectiveness of the pharmacological agent.
  • therapeutically effective amount refers to an amount effective, at dosages and for a particular period of time necessary, to achieve the desired therapeutic result.
  • a therapeutically effective amount of the pharmacological agent may vary according to factors such as the disease state, age, sex, and weight of the individual, and the ability of the pharmacological agent to elicit a desired response in the individual.
  • a therapeutically effective amount is also one in which any toxic or detrimental effects of the pharmacological agent are outweighed by the therapeutically beneficial effects.
  • fusion protein refers to a composite polypeptide, i.e., a single contiguous amino acid sequence, made up of two (or more) distinct, heterologous polypeptides that are not normally or naturally fused together in a single amino acid sequence.
  • a fusion protein may include a single amino acid sequence that contains two entirely distinct amino acid sequences or two similar or identical polypeptide sequences, provided that these sequences are not normally found together in the same configuration in a single amino acid sequence found in nature.
  • Fusion proteins may generally be prepared using either recombinant nucleic acid methods, i.e., as a result of transcription and translation of a recombinant gene fusion product, which fusion comprises a segment encoding a polypeptide of the invention and a segment encoding a heterologous polypeptide, or by chemical synthesis methods well known in the art. Fusion proteins may also contain a linker polypeptide in between the constituent polypeptides of the fusion protein.
  • the OFCCR is a member of the National Cancer Institute Cooperative Family Registries for Colorectal Cancer Studies (Colon CFRs) (http://epi.grants.cancer.gov/CFR/about_colon.html) (42).
  • the OFCCR includes 3,770 population-based CRC cases diagnosed in the province of Ontario, Canada between 1997 and 2000 and between 2002 and 2006, with an age at the time of diagnosis of 20 to 74 years. Age- and sex-matched control subjects with no personal history of CRC were recruited by telephone from a list of randomly selected residential telephone numbers.
  • A) 40 CRC cases and 40 controls were sequenced in the SNP discovery phase of the project; and B) an additional 1 ,121 CRC cases and 1 ,153 controls from OFCCR were used in the genotyping.
  • the OFCCR has also recruited pedigrees showing autosomal dominant transmission classified as Familial Colorectal Cancer Type X.
  • 25 probands and 15 affected siblings were selected for sequencing. All have been well characterized to have microsatellite stable tumors and known high penetrant syndromes have been excluded by pre-screening for mutations in genes causing familial CRC.
  • DNA sample GM12155 (CEPH/UTAH Pedigree 1408, NIGMS Human Genetic Cell Repository) was also used for sequencing. Including the CEPH, 121 samples were sequenced. One sequenced DNA (allegedly from a sporadic case) turned out to be identical to the DNA of another sequenced sample and two DNA samples (one proband and one kin of another proband) likely contaminated each other. We were thus left with sequences for 117 distinct samples, 1 DNA mixture and 1 CEPH control, for 119 DNA samples. Targeted sequencing
  • the regions that we selected to undergo sequencing include: 10 regions identified by genome-wide association scan that harbor common susceptibility variants for colorectal cancer (CRC) (rs16892766 [8q23.3], rs10505477 [8q24], rs10795668 [10p14]; rs3802842 [11q23]; rs4444235 [14q22]; rs4779584 [15q13]; rs9929218 [16q22]; rs4939827 [18q21]; rs10411210 [19q13]; rs961253 [20p12]) (15) and an additional region identified (rs719725 [9p24]) (13) that did not replicate in the last stage of the study, but replicated independently elsewhere (18).
  • CRC colorectal cancer
  • the 11 GWAS regions were defined to be the largest regions that include all SNPs in linkage disequilibrium with the risk variants identified, (r ⁇ O.20), based on release 23a of the CEU HapMap data. The sum of these regions is 2.29 Megabases. Table 1 provides an overview of genomic intervals targeted for variant discovery.
  • the UCSC-Table Browser function with repeats masked on Human Genome build was used to identify unique sequences within the 11 selected colorectal cancer associated genomic regions (Table 1).
  • the selected genomic sequences spanning 2.3 Mb were used to design oligonucleotides for sequence capture (SC) arrays (www.nimblegen.com/seqcap, Roche NimbleGen Inc.).
  • Standard bioinformatics filters that check for genomic uniqueness against an indexed human genome were used to select capture oligonucleotides.
  • the NimbleGen proprietary repeat-masking method was used to remove repetitive sequences.
  • the capture oligonucleotides of 60-75 bp were designed with an optimized and empirically tested algorithm (version 2.0) to achieve optimal isothermal hybridization across the microarray.
  • Whole-genome fragment libraries were used to identify unique sequences within the 11 selected colorectal cancer associated genomic regions (Table 1).
  • the selected genomic sequences spanning 2.3 Mb were used to design oligonucle
  • Genomic DNA fragmentation DNA (5 /vg at 25 ng /pi concentration) was sonicated using a BioRuptor (Diagenode Inc.) with high power intensity at a pulse of 30 seconds followed by a 30 second rest, for a total procedure time of 25 min.
  • the DNA fragment-size distribution of 300 to 500 (+/- 50) bp was confirmed on a Bioanalyzer 2100 DNA chip (Agilent Inc.).
  • the fragmented DNA samples were concentrated to 75 ⁇ using a Speed-Vac (Thermo Savant).
  • T4 DNA ligase buffer containing 10 mM ATP, 4 ⁇ dNTP mix, 5 ⁇ T4 DNA Polymerase, 1 ⁇ of Klenow DNA polymerase and 5 ⁇ of T4 polynucleotide kinase (PNK) were added and DNA ends were filled at 20°C for 30 minutes.
  • QIAquick PCR purification kit QIAGEN Inc. was used and DNA was eluted in 32 ⁇ of water.
  • Microcon 100K membrane (Millipore, Inc.) with size cut-off of 300 bp ssDNA and 125 bp ds DNA was used. This procedure was performed by centrifugation at 500xg for 15 min at 25°C, followed by recovery of the concentrated DNA by inverting the sample reservoir and centrifuging at 1000xg for 3 min.
  • the adaptor-ligated DNA fragments were PCR amplified (9 cycles) using the Phusion High-Fidelity PCR master mix with HF buffer (New England Biolabs) and 200 nM of each of lllumina PCR primers PE .1 and PE2.1 according to the lllumina whole-genome fragment library protocol. After PCR clean-up using the Zymo DNA Clean and Concentrator-25 Kit (Zymo Research Inc.), the DNA amounts in the fragment libraries were quantified by NanoDrop ND-1000 spectrophotometer (Thermo Scientific) prior to sequence capture hybridization procedure.
  • Array Capture hybridization Purified adaptor-ligated DNA fragments (5 pg) were mixed with 60 pg Cot-1 DNA (Invitrogen) in a total volume of 4.8 ⁇ of water. As a control, a 10 "5 dilution of the same DNA fragment library was prepared but not used for hybridization. Sixteen ⁇ of hybridization mix was prepared as per manufacturer's instructions (NimbleGen Arrays User's Guide: Sequence Capture Array Delivery v3.0, Roche NimbleGen Inc.) Briefly, DNA/Cot-1 solution was mixed with 8 ⁇ of 2xSC Hybridization buffer and 3.2 ⁇ of SC Component A (Sequence Capture Hybridization Kit, Roche NimbleGen Inc.).
  • the hybridization mix was incubated at 95°C for 10 min, and then kept at 42°C until ready to use.
  • the array was prepared for hybridization by attachment of the X1 Mixer (Roche NimbleGen Inc.) filled with the hybridization mix and loaded onto the NimbleGen Hybridization System. Hybridization was carried out at 42°C for 68-72 hrs with station mix mode "B". At the end, the mixer was disassembled and the array was washed in a 50 ml wash tube with NimbleGen wash buffers (Sequence Capture Wash and Elution Kit, Roche NimbleGen Inc.).
  • Captured DNA was eluted from the array by using the Elution Chamber ES1 and the Elution System (Roche NimbleGen Inc.). Briefly, the chamber was filled with 425 ⁇ of water (pre-warmed at 95°C) and incubated for 5 min at 95°C. The eluted DNA was collected from the chamber and kept on ice. This procedure was repeated to achieve 3 elution samples, which were pooled.
  • Eluted DNA was dried in a SpeedVac (Thermo Savant) and rehydrated in 300 ⁇ of water and amplified (20 PCR cycles) using the Phusion High-Fidelity PCR master mix with HF buffer (New England Biolabs) and 200 nM of lllumina PCR primers PE1.1 and PE2.1 according to the lllumina whole-genome fragment library protocol. This procedure was performed on both hybridized (captured) and non-hybridized (non-captured) DNA libraries. The captured and non-captured PCR products were subjected to PCR clean up and quantification as described above. Genomic enrichment was determined by using 10 ng/ ⁇ aliquots of captured and non-captured DNA samples and CustomTaqMan Expression Assay.
  • Fold enrichment was calculated by comparing the cycle threshold of the gDNA amplification of captured and non-captured sample. Assuming that the DNA concentration doubles every cycle, enrichment was calculated by 2 N , with N being the difference between the cycle thresholds.
  • Post-enrichment DNA libraries were sequenced on lllumina Genome Analyzer II instruments as paired-end 2x76 bp reads, following the manufacturer's protocols and using the standard sequencing primers. Image analyses and base callings were performed by the Genome Analyzer Pipeline version 1.3 with default parameters and default filtering.
  • the average coverage per base in all GWAS regions was 53.6 reads/base; the proportion of bases covered by at least one read was 96.4% and the proportion of bases covered by at least 6 reads was 85.1% (Table 3).
  • the average coverage was 1.4-fold higher to 76.4, the proportion of bases covered by at least one read was 98.7% and the proportion of bases covered by at least 6 reads was 93.8%.
  • Genotyping of 2,380 samples was done at core facilities of the McGill University and Genome Quebec Innovation Centre (http://gqinnovationcenter.com) using established protocols. After excluding variants (325 SNPs and 17 indels) that failed to generate genotyping calls in > 95% of samples, 7,149 submitted SNPs were deemed to have yielded successful assays. Of these, 1 ,169 putative SNPs and 390 putative indels were monomorphic. For indels that turned out to be monomorphic (390 indels, 381 of which were detected in only one sample), a large majority of samples that were supporting the indel with at least 6 reads had 4 times as many reads not supporting it (a fraction less than 20%). In retrospect, most of the indels supported by at least 20% of the reads in at least one individual turned out to be validated by genotyping, with minimal misclassification.
  • Variants with frequencies less than 1% in either the cases or the controls were collectively analyzed within each region using the method of Madsen and Browning: in brief, for each sample, a weighted count of all minor alleles observed over all SNPs was computed, where weights were based on the inverse standard deviation of the minor allele frequency. This weighting scheme puts more weight on the less common SNPs, which has the desirable effect that the contribution of true rare risk alleles is not diluted by combining it with more common non-risk alleles. Then a rank-based test was applied on the weighted counts and significance was computed with a permutation procedure.
  • SNPs were tested for association using the Cochran-Armitage test for trend. Significance levels for SNPs with minor allele frequency less than 5% were empirically evaluated with a permutation procedure that consist of randomly re- assigning case or control status to all samples. A minimum of 5000 random replicates was used; this number was increased to guarantee that the ratio of the estimated p-value to its standard error is at least 10.
  • conditional tests of association which condition upon the presence or the absence of a GWAS risk allele on the same haplotype as the test allele, were performed using UNPHASED (45).
  • RIN RNA integrity number
  • the cDNA was synthesized using 3.5 /g of total RNA and the Superscript III First Strand Synthesis System following the manufacturer's recommendations (#18080- 051 , Invitrogen, Inc.). In parallel, an identical reaction was carried out in absence of reverse transcriptase. This RT- minus control served to ensure that the PCR amplification was not from genomic DNA.
  • PCR primers and probes specific for C11orf92, C11orf93, C11orf53, and POU2AF1 were designed using the sequence data obtained from NCBI (http://www.ncbi.nlm.nih.gov/) and Primer Express software (Applied Biosystems). Primer sequences and PCR product sizes are shown in Table 7.
  • Real-time quantitative PCR was carried out using the SYBR Green or TaqMan Gene Expression Assays (Applied Biosystems) on the 7900HT Fast Real-Time PCR System (Applied Biosystems). Three technical replicates were run for each sample. Standard curves comprising dilutions of homologous standards derived from a known starting concentration of mRNA were included on each plate. SDS2.2.2 software (Applied Biosystems) was used for relative quantification analysis of gene expression by relative standard curve method, and GAPDH and ?-actin genes (Applied Biosystems) served as endogenous controls.
  • Tissue distribution for C11orf92, C11orf93, and C11orf53 transcripts was analyzed by reverse transcription PCR (RT-PCR) in cDNAs derived from multiple human tissues and cell lines (MTC panels, Clontech, Inc.). These include the human digestive system panel (# 636746), the human immune system panel (# 636748), the human panel II (#636743) and a human cell line panel (# 636753).
  • the human cell line panel includes human embryonic kidney 293 (HEK-293), ovarian carcinoma (SKOV-3), skin epidermoid carcinoma (A-431 ), epithelial-like osteosarcoma (Saos-2), prostate carcinoma (Du145), non-small lung carcinoma (H1299), uterine cervical carcinoma (HeLa) and breast adenocarcinoma (MCF7) cell lines.
  • HEK-293 human embryonic kidney 293
  • SKOV-3 ovarian carcinoma
  • A-431 skin epidermoid carcinoma
  • Saos-2 epithelial-like osteosarcoma
  • Du145 non-small lung carcinoma
  • HeLa uterine cervical carcinoma
  • MCF7 breast adenocarcinoma
  • Luciferase reporter assays The effects of the CRC associated SNPs on C1 1 orf92 promoter activities were assessed by dual luciferase reporter assay (Promega, Madison, Wl).
  • forward and reverse primers were: 5'- qtatctcqaqtgagcactcactatgt-3' and 5'-ttgtataagcttgccaaacttgtcattgtttcc-3'.
  • forward and reverse primers were: 5'-ttgtatctcgaggccaaacttgtcattgtttcc-3' and 5'-ttgtataagctttgagcactcactatgtggaaag-3'.
  • the restriction sites that were introduced in primer sequences to aid cloning are underlined.
  • the amplicons were resolved on 1.5% (w/v) agarose gels, purified using a QIAquick Gel Extraction kit (QIAGEN Inc., Toronto, Canada), and cloned into the promoter-less pGL3-basic vector containing the firefly luciferase reporter gene (Promega, Madison, Wl).
  • the constructs were sequenced to verify all 10 SNPs. Longer versions of COLCA1 promoter-luciferase reporter constructs encompassing all 12 SNPs and covering ⁇ 5 kbp genomic regions were also generated from the genomic DNA of abovementioned CRC patients.
  • the forward (5'- qtatctcgaqtqaqcactcactatgt-3 ' ) and the reverse (5'- gaatcaaqcttgctgcttggttcactgttccttca-3') primers were used for PCR amplification.
  • the restriction sites that were introduced in primer sequences to aid cloning are underlined.
  • the constructs were sequenced to verify all 12 SNPs. No luciferase activity was observed with these 2 constructs carrying the risk and protective haplotypes.
  • HeLa cells (ATCC, Manassas, VA) were transfected with the experimental pGL3 promoter-luciferase constructs (2 pg) using FuGENE HD reagent as per manufacturer's protocol (Promega, USA).
  • the plasmid pRL-null containing the Renilla luciferase gene was co-transfected to normalize for transfection efficiencies.
  • the promoter-less pGL3-basic vector served as a negative control.
  • the reporter activities were expressed as relative light intensity unit (RLU) ratios of the firefly/ 'Renilla luciferase activities after subtraction of the background autoluminescence of non-transfected cells.
  • RLU relative light intensity unit
  • the GeneRacer Kit (Invitrogen, Inc.) was used for 5' and 3' RACE experiments.
  • the gene-specific primer sequences for RACE experiments are described in Table 8.
  • the identification of ESTs within the C11orf92 and C11orf93 genes by RACE experiments provided us with information on complete transcript sequences and spliced isoforms of these genes.
  • the RefSeq gene structure from UCSC NCBI36/hg 18 assembly for C11orf92/COLCA1 predicted a transcript of 5443 bp with a coding region of 375 bp that encodes for a protein of 124 amino acids (Appendix 1 ), and for C11orf93/COLCA2, a transcript of 1414 bp, with coding region of 465 bp, that encodes a protein of 154 amino acids (Appendix 2).
  • transcript isoforms To obtain full length cDNAs of COLCA1 and COCLA2 isoforms, PCR were performed on cDNAs from normal colon, tumor and benign adjacent colon tissues (OFCCR Biobank), peripheral blood lymphocytes, SUDHL4 and OCI-LY10 cells using transcript-specific primers (Table 11 ). All PCR reactions were performed using Hot Start Taq DNA polymerase according to manufacturer's instructions (Sigma-Aldrich, St Louis, MO). The amplification conditions were: 30 s at 94°C for denaturation, 30 s at 55-58 °C for annealing, and 90 s at 72°C for extension for a total 35 cycles. PCR products were cloned and sequenced as described above. The PCR products were cloned into TOPO TA vector (Invitrogen, Carlsbad, CA) and inserts were sequenced on an ABI PRISM 310 genetic analyzer (Applied Biosystems).
  • mRNA-seq advanced strand-specific RNA sequencing
  • the directional mRNA-Seq sample preparation kit (lllumina, Inc.) was used to generate libraries from total RNA for high-throughput RNA sequencing on lllumina Genome Analyzer II, according the manufacturer's instructions.
  • the adaptor-ligated libraries were gel size-selected at 200 bp and PCR enriched to create final libraries prior to sequencing using 76 bp reads.
  • 4 lanes of lllumina GAIIx were sequenced for each sample. Image analysis and base calling were done by lllumina pipeline, version 1.2.3, with recommended default filtering parameters.
  • Reads are aligned to the human reference genome (NCBI Build 36.1) using Bowtie 0.12.7 (46) and Tophat 1.3.0. (47).
  • For C11orf92 transcript the average coverage of bases for normal, benign adjacent, and tumor samples are 8.9, 7.8, and 3.1 , respectively.
  • the average coverage of bases covered for normal, benign adjacent, and tumor samples for C11orf93 are 24.5, 27.0, and 3.9 respectively.
  • Integrative Genomics Viewer (IGV) (49) was used to confirm and visualize the expression levels of C11orf92, C11orf93 and nearby genes.
  • PBL peripheral blood lymphocytes
  • CD34 + and CD34 " cells were isolated using EasySep human CD34 positive selection kit according to the manufacturer protocol (StemCell Technologies Inc.). Cells were stained with CD34 + APC (clone 581 , BD Biosciences Inc.) and the purity of the cells confirmed by FACS (>95%).
  • Peripheral blood mononuclear (MNC) and polymorphonuclear (PMN) cell fractions were isolated from whole blood using discontinuous Histopaque density gradient kit (Sigma-Aldrich), according to manufacturer's instructions. Briefly, diluted blood 1 :2 in RPMI 1640 media (Sigma Aldrich) was added onto the top of two Histopaque layers, Histopaque-1077 and Hisopaque-1119 to create three interfacing layers. By the effect of the centrifugal force, the PMN (lower) and MNC (upper) fractions were isolated simultaneously, the cells withdrawn and washed three times with PBS and cell pellets were lysed using modified RIPA buffer.
  • MNC multinuclear
  • PMN polymorphonuclear
  • Human NK (CD56 + ), monocytes (CD14 + ), neutrophil (CD16 + ), B cell (CD19 + ), basophil (CD123 + ), eosinophil (CD16 ), CD8 + T, CD4 + T and CD14 + monocyte- derived immature dendritic (iMoDC) whole-cell lysates were purchased from 3H Biomedical, Uppsala, Sweden.
  • Human basophils were purified from peripheral blood by depletion of lymphocytes, monocytes, NK cells, B cells and plasmacytoid dendritic cells from blood mononuclear cells, followed by CD123 positive selection. The purity of basophils (>90%) was confirmed by CD123-FITC staining.
  • Human eosinophils were purified from peripheral blood by a two-step method: 1) the gradient separation of granulocytes; followed by 2) the CD16 depletion of neutrophils. The purity of eosinophils (>90%) was confirmed by May-Grunewald-Giemsa staining.
  • Human CD4 + and CD8 + T cells were isolated from peripheral blood mononuclear cells by CD4 positive and CD8 positive selection, respectively. The purity of both fractions was higher than 90%.
  • Human NK cells were isolated from peripheral blood mononuclear cells by CD56 positive selection. The purity of NK cells (>90%) was confirmed by CD56 staining. Human monocytes were purified from mononuclear cells by CD14 positive selection.
  • monocytes The purity of monocytes (>90%) was confirmed by CD14 staining.
  • Human neutrophils were purified from peripheral blood by gradient separation, followed by CD 16 positive selection. The purity of neutrophils (>90%) was confirmed by CD16 staining.
  • Monocyte-derived immature dendritic cells iMoDC
  • iMoDC Monocyte-derived immature dendritic cells
  • the iMoDC were CD86 ⁇ CD80 LOW , CD40 + , CD11 b + CD14-, and CD123 " .
  • Human B cells were purified from peripheral blood mononuclear cells by CD16 positive selection. The purity of B cells was higher than 90%. All lysates were provided in modified RIPA buffer containing protease inhibitors (Roche, Applied Science).
  • CD34 + cord blood cells (2x10 5 ) were cultured in H5100 medium (StemCell Technologies Inc.) supplemented with recombinant stem cell factor (rhSCF, 100 ng/ml), human Interleukin 6 (rhlL-6, 50 ng/ml), human Interleukin 3 (rhlL-3, 50 ng/ml) and human GM-CSF (20 ng/ml) [all from R &D Systems].
  • rhSCF recombinant stem cell factor
  • rhSCF recombinant stem cell factor
  • rhSCF recombinant stem cell factor
  • rhSCF recombinant stem cell factor
  • rhSCF recombinant stem cell factor
  • rhSCF recombinant stem cell factor
  • rhSCF recombinant stem cell factor
  • rhSCF recombinant stem cell factor
  • rhL-6 human Interleukin 6
  • Kit + CD34 " CD11b " CD11c " cells were isolated using an Aria II cell sorter (BD Bioscience Inc.). Purity of these cells was confirmed by Giemsa staining of the cytospins.
  • Cell pellets or frozen colon tissue samples were homogenized in RIPA buffer (50 mM Tris-HCI, pH 7.4, 150 mM NaCI, 1%w/v NP-40, 0.25% Na-deoxycholate, 1mM EDTA, 1 mM PMSF, 1mM Na 3 V0 4 ), supplemented with protease inhibitor cocktail (Roche, Applied Science). Lysates were then centrifuged (5 min, 10,000 x g, 4°C) and supernatants were collected. Protein concentration was measured using the BCA protein assay kit (Thermo Scientific, Pierce).
  • COLCA1 expression membranes were incubated with the polyclonal rabbit anti-human C11orf92/COLCA1 antibody (Atlas Antibodies, AB) at a 1 :500 dilution in 5% BSA in TBST for 3 hrs at room temperature. Blots were then washed with TBST and incubated for 1 hour at room temperature with horseradish peroxidase- conjugated goat anti-rabbit IgG (Santa Cruz Biotechnology, Inc.) at 1 :7500 dilution in 5% skim milk in TBST. To identify the protein size, the Precision Plus Protein Western C standards (BioRad Laboratories, Inc.) were used.
  • the blots were incubated with StrepTactin-HRP conjugate (BioRad Laboratories, Inc.) at 1 :40000 dilution in the presence of 5% skim milk in TBST.
  • the immunoreactive bands were visualized by Immun-Star HRP chemiluminescence kit (BioRad Laboratories, Inc.) according to manufacturer's instructions.
  • polyclonal anti-human beta-actin antibody Cell Signaling Technology Inc.
  • horseradish peroxidase- conjugated goat anti-rabbit IgG Santa Cruz Biotechnology, Inc.
  • Paraffin tissue sections (4 ⁇ ), were deparaffinized in xylene and rehydrated in graded ethanol. Tissue sections were microwaved (micro MED T/T Mega, Milestone Microwave Lab System) for 3 min. in antigen unmasking solution (Vector Laboratories, Inc.) for antigen retrieval and then incubated with 3% H 2 0 2 for 15 min to quench endogenous peroxidase activity. Nonspecific absorption was minimized by Background Sniper (Biocare Medical) in Tris-buffered saline (TBS) for 10 min.
  • Tissue sections were microwaved (micro MED T/T Mega, Milestone Microwave Lab System) for 3 min. in antigen unmasking solution (Vector Laboratories, Inc.) for antigen retrieval and then incubated with 3% H 2 0 2 for 15 min to quench endogenous peroxidase activity. Nonspecific absorption was minimized by Background Sniper (Biocare Medical) in Tris-buffered saline (TB
  • the sections were incubated overnight at 4°C in a solution containing polyclonal rabbit anti-human C11orf92/COLCA1 (Atlas Antibodies, AB) antibody at a dilution 1 :100, and mouse monoclonal anti-human carcinoembryonic antigen (CEA) antibody, (clone COL-1, Biocare Medical) at a dilution 1 :200.
  • the Vectastain Elite ABC kit, diaminobenzidine tetrahydrochloride (DAB), and Vulcan Fast Red chromogen Kit2 were used to detect immune complexes as described by the suppliers (Vector Laboratories Inc., and Biocare Medical). Sections were counterstained with Meyer's hematoxylin. The images were acquired on an Olympus BX61 microscope fitted with an Olympus DP72 camera using the CellSens Standard proprietary acquisition software (Olympus, Markham, Ontario, Canada).
  • biopsies from colonic tissues (5 mm) were obtained endoscopically. Histopathological examination on hematoxylin and eosin (H & E) stain confirmed the presence of tumor and benign adjacent mucosa.
  • the triple immunofluorescence staining protocol included the following primary antibodies: rabbit polyclonal human anti-C11orf92/COLCA1 (Atlas Antibodies, Sigma), rat monoclonal anti-human CD45 antibody (Santa Cruz Biotechnology, Inc), monoclonal mouse antibody against human basophils 2D7 (BioLegend, Inc.), human eosinophil major basic protein, clone BMK13 (EMD Millipore), human mast cell tryptase, clone G3 (Chemicon, Int.), human CD68 (KP1) (Santa Cruz Biotechnology, Inc.) and human NCAM (2Q692) monoclonal antibody against CD56 positive cells (Santa Cruz Biotechnology, Inc.).
  • secondary antibodies for immunofluorescence microscopy goat antisera against rabbit, rat and mouse IgG conjugated to Alexa Fluor 594, 647, and 488 (Invitrogen, Inc.) were used.
  • the secondary antibodies did not produce nonspecific labeling on colon sections when exposed to PBS only.
  • the specificity of the rabbit polyclonal C11orf92 antibodies has been validated by incubation of the sections with normal rabbit IgG (Millipore), followed by incubation with goat anti-rabbit Alexa 594 conjugated antibody. Cryosections of human colon tissues (4 pm) were fixated in cold acetone for 10 min. and rehydrated in PBS prior to incubation in 10% goat serum blocking solution for 30 min. Sections were then incubated with primary antibodies overnight at 4°C. After three washes in PBS, the sections were incubated with an appropriate Alexa Fluor conjugated secondary antibodies (Invitrogen, Inc.) for 60 min.
  • Co-localization of COLCA1 with specific immune cell markers was validated using three-dimensional deconvolution microscopy (Quorum, WaveFX Spinning Disc Confocal Microscope System) with optimized Yokogawa CSU X1 , Hamamatsu EM- CCD digital camera, Leica DMI6000B inverted research grade motorized microscope (Quorum Technologies, Guelph, Canada), and the Volocity 5.2.2 acquisition software (Improvision/PerkinElmer, Massachusetts, USA). Deconvolution of the images was done using Huygens Essential 4.0 deconvolution software (Scientific Volume Imaging, Hilversum, the Netherlands). High-power images of the single cells were taken at sequential 0.1 - ym z-axes. Resultant image stacks were analyzed using a three-dimensional deconvolution algorithm.
  • COLCA1/C1 1orf92 and COLCA2/C1 1orf93 are arranged head-to-head on opposite strands of chromosome 11q23.
  • Gene and protein expression studies reveal their presence in several immune cell types located in the colonic mucosa and lower levels of expression correlating with the risk alleles identified by GWAS studies.
  • the manipulation of COLCA function represents a potential target to prevent colon cancer.
  • Sequenced samples include genomic DNA from 40 sporadic CRC cases and 40 matched controls selected from the 2,380 samples from the Ontario Familial Colorectal Cancer Registry (OFCCR) that were previously genotyped by GWAS (13, 14, 15) and 25 probands and 15 affected siblings selected from pedigrees showing autosomal dominant transmission that were selected based on absence of mutations in genes causing familial CRC.
  • OFCCR Ontario Familial Colorectal Cancer Registry
  • Fig. 1A is a Manhattan plot representation of the association levels of Cochran- Armitage tests for trends between variants in frequency above 1 % in cases and controls combined and risk of CRC in 1 ,030 cases and 1 ,061 controls.
  • the OFCCR sample was used in the discovery of 5 of these regions: 8q24 (73), 9p24 (78), 11q23 (14), 16q22 ( 15) and 19q13 (75). There are signals of association in 4 of these 5 regions, the exception being the 19q13 region for which the published SNP did not replicate.
  • Q-Q quantile-quantile
  • the associated 1 1q23 region was first reported in a Scottish study (14) and subsequently refined using 10,638 cases and 10,457 controls from Europe, North America and Australia (21).
  • the region includes three uncharacterized protein-coding genes (C11orf53, C11orf92, and C1 1orf93).
  • POU2AF1 also known as BOB1
  • BOB1 a nearby gene which is 51 kb distal to rs3802842, was also deemed a possible candidate as it was observed to be differentially expressed in the cells of patients with several forms of lymphoma and leukemia (22- 24).
  • C11orf53 decreased expression of C11orf53 in the tumor samples from individuals that are associated with the number of risk alleles, but no correlations are observed in the benign adjacent colonic tissue. Furthermore, no association is found between POUF2AF1 expression levels and rs3802842 genotypes.
  • tissue panels representing the gastrointestinal tract and organs of the immune system Expression of C1 1orf92 and C1 1orf93 is observed from the esophagus to the rectum (Fig. 16A), multiple immune organs (Fig. 16B), and other tissues such as prostate, testis, and ovary (Fig. 17A).
  • C1 1orf92 and C11orf93 transcripts are also expressed in CRC cell line Caco-2, but not in HCT1 16 (another CRC line) and HeLa (Fig. 17B).
  • CA2 genes provides clues to the similarities in their expression levels. They are arranged head- to-head on opposite strands and share common regulatory region (Fig. 2B). To investigate the cis-regulatory potential of the most common protective and risk haplotypes, we cloned three independent triplicate DNA fragments of -4.2 kbp for each allele of rs3802842, as well as 10 additional variants (for 9 SNPs and rs5794738, a 9 bp indel), into luciferase reporter vectors (Fig. 2C).
  • COLCA1 has multiple alternative 5' non-coding exons, and one constant exon that includes coding sequence for a 124-amino acid protein.
  • COLCA2 has 8 exons, with variable exons 1 to 4 added in various combinations to constant exons 5 to 8 to generate a minimum of five transcripts yielding different protein isoforms ranging from 154 to 379 amino acids in length; additional protein isoforms that are predicted based on Western blots are described later.
  • the revised gene models allow in silico predictions of functional correlates for alleles contained on the protective/risk haplotypes related to protein isoforms, composition, and regulation.
  • One of the most strongly CRC-associated variants at this locus is rs10891246 that is in LD with GWAS SNP rs3802842 (r ⁇ O.99) and can affect both candidate genes.
  • rs10891246 coincides with a splice site resulting in a short and long version of exon 1 , a non-coding exon (Supplementary Methods), which we named C11orf92B-L and C11orf92B-S (Fig. 18A).
  • C11orf92B-L the long isoform
  • C11orf92B-S the short isoform
  • Fig. 18D the short isoform
  • Chromatin features in a human lymphoblastoid cell line (Fig. 19) at the COLCA1 , COLCA2 and C11orf53 loci were obtained from ENCODE (http://www.genome.gov/ENCODE/) (26).
  • ENCODE http://www.genome.gov/ENCODE/
  • the densities for four histone modifications and occupancy of CTCF binding sites generated by ChlP-seq reveal strong signals at the bi-directional promoter of COLCA1 and COLCA2 (Fig. 19).
  • RNA expression data is consistent with RNA expression data.
  • Fig. 3B COLCA1 protein expression is stronger in homozygotes having the protective A allele compared to homozygotes for the C allele, which is also in agreement with RNA expression data.
  • Immunochemistry with anti-COLCA1 antibody of benign adjacent colon tissue and colon tumor from two donors with AA and CC genotypes is shown in Fig. 3C-3F (Fig. 21 shows negative control data). Positive staining is observed in the lamina basement of all biopsies, but not in normal epithelium or epithelium-derived tumor cells.
  • COLCA1 expression can be observed in stromal cells that are mono- and multi-nuclear. At higher magnification, COLCA1 expression is cytoplasmic and often appears to be part of granular structures (Fig. 3G). In addition, cell-free COLCA1 is observed in normal adjacent tissue and in some cases the COLCA1 signal appears to infiltrate spaces between epithelial cells (Fig. 3H). Finally, multiple COLCA1 -expressing cells can be seen to surround tumor cells (Fig. 3I-3J, Fig. 22). To determine the immune cell populations that express COLCA1 at the protein level, we examined COLCA1 protein expression in immune cells derived from peripheral blood, cord blood and colonic tissues using purified cell populations (Fig.
  • COLCA1 is expressed strongly in a polymorphonuclear fraction that was further resolved to include eosinophils (strongest signal) and neutrophils, and more weakly in a mononuclear fraction including CD14+ monocytes, but not in lymphocytes (Fig. 4A-4C).
  • eosinophils strongest signal
  • neutrophils neutrophils
  • Fig. 4A-4C Cell lysates obtained from cord blood that had been separated into CD34+ and CD34- fractions showed no or minimal expression of COLCA1 (Fig. 4A and 4C).
  • Cord blood cells cultured in conditions to promote mast cell differentiation were also negative.
  • COLCA1 Cryosections of benign colon tissues adjacent to tumors, and tumor tissues themselves, were interrogated using triple immunofluorescence methods with several antibodies used as immune cell-specific markers. Strong COLCA1 expression is shown in eosinophils (Fig. 4D) and moderate expression is observed in mast cells, neutrophils, macrophages and dendritic cells (Fig. 4E-H, Fig. 23-27). Within all COLCA1 -positive immunofluorescent cells, COLCA1 signal is present in granular structures, consistent with intracellular granules that are characteristic of several immune cell lineages.
  • immunoreactive bands ranging from 17 to 47 kDa that potentially represent 8 COLCA2 protein isoforms are observed in different permutations in all samples tested (Fig. 32A), including colonic tissues, peripheral blood and 17 cell lines (data not shown) representing multiple cell types.
  • Fig. 32A colonic tissues, peripheral blood and 17 cell lines (data not shown) representing multiple cell types.
  • Eosinophils having tumoricidal functions. Abundance of eosinophils in gastrointestinal cancers is a favorable prognostic factor (36). Eosinophils may induce apoptosis and directly kill tumor cells, via the release of eosinophilic cationic protein, eosinophil-derived neurotoxin, TNF- ⁇ and granzyme A (37). Eosinophil products can degrade necrotic materials from tumor and other stressed cells through production of reactive oxygen species (38).
  • eosinophils have been recognized as regulators of tissue homeostasis in peripheral tissues with high turnover and active stem cell populations such as the gastrointestinal tract and the endometrium; this function may be as important as the more recognized role of eosinophils as end-stage effector cells (39).
  • Eosinophils which are the highest expressers of COLCA1 , contain granular structures that are known to harbor pre-formed proteins that can be secreted by exocytosis, piecemeal degranulation or as extracellular vesicles that are typically in the size range of 150-300 nm (40). The latter structures have only been characterized recently as receptor-mediated secretory organelles that respond to IFN- and eotaxin to elicit secretion of their content. Extra-cellular COLCA1 staining in colon tissues has a similar pattern that has been described for extracellular eosinophil-derived granules (41).
  • COLCA1 and COLCA2 point to potential anti-tumoral properties. These could be through intrinsic cytocidal activities as secreted proteins, immunomodulatory functions, or biochemical interactions with other molecules that are co-secreted by immune cells or released by tumors. Collectively, the polymorphic regulation of COLCA1 and COLCA2 potentially represents the first inherited mechanistic link in humans between microenvironmental factors and cancer predisposition.
  • EYFYPSTDCV DFAPSAAATS DFYKRETNCD ICYS- (SEQ ID NO. 3)
  • Results for successfully genotyped coding-nonsynonymous SNPs The table includes the number of carriers among the sequenced samples (40 Cases, 40 Controls, 25 Probands and 15 Kins), and the number of Probands and Kins who share the alternative allele, based on genotype data (Shared). Genotype counts are in the format AA/AB/BB, where A is the minor allele and B the major allele; genotypes counts only include self-declared "white” samples, and exclude the sequenced samples. Cochran-Armitage test for trends significance levels are included (PvTrend).
  • Table 6 Additional SNPs discovered from sequencing the coding exon of C11orf92.
  • the table includes the number of carriers among the sequenced samples (40 Cases, 40 Controls, 25 Probands and 15 Kins), and the number of Probands and Kins who share the alternative allele, based on genotype data (Shared).
  • Genotype counts are in the format AA/AB/BB, where A is the minor allele and B the major allele. For each variant, genotypes were called from chromatograms only in the 384-well plates in which an alternative allele was detected.
  • R-GSP-S > -(nested)-Cllorfl2 5 ' - CCCCAGGAGCCCTCCCAGGCGCTGA
  • Genome-wide association scan identifies a colorectal cancer susceptibility locus on 1 1 q23 and replicates risk loci at 8q24 and 18q21. Nat. Genet. 40, 631-7 (2008).
  • Giubelan, E. Lazar, A. Dema, et al, B-cell transcription factors Pax-5, Oct-2, BOB.1 , Bcl-6, and MUM1 are useful markers for the diagnosis of nodular lymphocyte predominant Hodgkin lymphoma. Rom. J. Morphol. Embryo!. 52, 69-74 (2011).
  • S. Advani, K. Lim, S. Gibson, M. Shadman, T. Jin, E. Copelan, M. Kalaycio, ef al, OCT-2 expression and OCT-2/BOB.1 co-expression predict prognosis in patients with newly diagnosed acute myeloid leukemia. Leuk. Lymphoma. 51 , 606-12 (2010).

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Biochemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Zoology (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Hospice & Palliative Care (AREA)
  • Urology & Nephrology (AREA)
  • Oncology (AREA)
  • Hematology (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • General Chemical & Material Sciences (AREA)
  • Cell Biology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Food Science & Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Toxicology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

This invention relates to newly characterized COLCA1 and COLCA2 and their use in the treatment and risk assessment of colon cancer.

Description

COLCA1 AND COLCA2 AND THEIR USE FOR THE TREATMENT AND
RISK ASSESSMENT OF COLON CANCER
RELATED APPLICATIONS This application claims priority to U.S. Provisional Patent Application No. 61/616477 filed on March 28, 2012, which is hereby incorporated by reference.
FIELD OF THE INVENTION
This invention relates to newly characterized COLCA1 and COLCA2 and their use in the treatment and risk assessment of colon cancer.
BACKGROUND OF THE INVENTION
Most genetic risk studies focus on cell autonomous factors. However the role of non- cell autonomous factors in cancer risk is poorly understood and there is increasing evidence implicating the microenvironment in both the suppression and promotion of cancer. Genome-wide association studies (GWAS) for more than two dozen cancers have identified more than 150 loci that individually confer modest increases in cancer risk, although few of these have led to the precise identification of causal genes and alleles (1). Of the 16 published loci for colorectal cancer (CRC) (2, 3, 4), functional studies implicate SMAD7 (5), MYC (6, 7), CDH1 (8), and EIF3H (9), whose expression correlate with common sequence variants in their regulatory elements, as well as other plausible candidates such as BMP2, BMP4 and Gremlini (2) that are expressed in colon epithelial cells and are involved in the SMAD7/TGF-beta pathway which modulates cell proliferation (10). SUMMARY OF THE INVENTION
In an aspect, there is provided an isolated protein comprising SEQ ID NO. 1 or a functional fragment thereof.
In a further aspect, there is provided an isolated protein comprising SEQ ID NO. 3, 5, 7, 9, 11 or a functional fragment thereof.
In a further aspect, there is provided an isolated nucleic acid encoding the protein of any one of claims 1 3, 5, 7, 9 and 11.
In an aspect, there is provided an expression vector comprising the nucleic acid described herein operably linked to an expression control sequence. In an aspect, there is provided a cultured cell comprising the vector described herein.
In an aspect, there is provided a method of determining risk of colon cancer in a patient using a sample therefrom comprising: determining the level of expression of at least one of COLCA1 and COLCA2; and comparing the level of expression of the sample with a control sample; wherein a higher level of expression of at least one of COLCA1 and COLCA2 in the patient sample compared to the control indicates a low risk of colon cancer.
In an aspect, there is provided the protein described herein for determining the risk of colon cancer.
In an aspect, there is provided a use of the protein described herein for determining the risk of colon cancer.
In an aspect, there is provided a diagnostic kit for determining risk of colon cancer in a patient comprising reagents for detecting the level of gene or protein expression of at least one of COLCA1 and COLCA2 in a patient sample and instructions for use.
In an aspect, there is provided a method of treating or preventing colon cancer in a subject comprising administering the protein described herein.
In an aspect, there is provided the protein described herein for treating or preventing colon cancer in a subject. In an aspect, there is provided a use of the protein described herein for treating or preventing colon cancer in a subject.
In an aspect, there is provided a use of the protein described herein in the preparation of a medicament for treating or preventing colon cancer in a subject. In an aspect, there is provided a pharmaceutical composition for the treatment of colon cancer comprising a therapeutically effective amount of the protein of claim 1 or 2 and a pharmaceutically acceptable carrier.
BRIEF DESCRIPTION OF FIGURES These and other features of the preferred embodiments of the invention will become more apparent in the following detailed description in which reference is made to the appended drawings wherein:
Figure 1 shows association analysis of cases and controls from the Ontario Familial Colorectal Cancer Registry. (A) Manhattan plot showing the significance level, on the negative log scale, for all variants in frequency above 1% in 11 GWAS regions. Red dots indicate published GWAS SNPs. (B) Quantile-quantile plots of significance levels against theoretical quantiles for unconditional tests of association. Red lines represent 95% confidence bands. (C) Same plot as in (B), but with tests of association conditional on GWAS SNP genotypes. (D) Same plot and data as in (C), restricted to tag SNPs at r^O.5. (E) Architecture of the 11q23 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs3802842); LD structure between all variants, with color shading showing the squared correlation coefficient i^.the complete amino sequences of the expressed SIRP proteins.
Figure 2 shows risk-associated genotypes correlate with decreased expression of 11q23.1 transcripts in colon tissues. (A) Relative expression levels for C11orf53, C11orf92/COLCA1 , C1 orf93/COLCA2 and POU2AF1 in benign adjacent (BA) and tumor (T) samples as a function of rs3802842 genotype. For each transcript, expression data is shown for rs3802842 AA (blue bars), AC (yellow bars), and CC (red bars) genotypes. Error bars indicate SEM.P values are derived from one-way ANOVA followed by Student-Newman-Keuls test. * P < 0.01 ,** P < 0.001. (B) Genomic organization of C11orf92/COLCA1 (orange) and C11orf93/COLCA2 (blue), which are located on opposite DNA strands. Each gene contains a tandem of multiple non-coding first exons (1*) which are spliced to a set of constant exons (labeled as x2, x3, etc.). (C and D) Luciferase expression in HeLa cells comparing risk (RH) and protective (PH) haplotypes at the COLCA1/COLCA2 bidirectional promoter. Both the protective (blue) and risk (red) haplotypes of the 4.2 kb genomic regions from three heterozygous patients were subcloned into a firefly promoter-less luciferase reporter vector (pGL3 basic). Shown are ratios of Firefly luciferase expression to Renilla luciferase expression (expressed from co-transfected plasmids) measured at 24 h after transfection. P#1 , P#2, and P#3 are CRC patients, heterozygous for GWAS rs 3802842. Transfection with the promoter-less pGL3 vector is denoted as 'V. All values are expressed as mean +/- SEM for n=3. *P < 0.05.a comparison of eluted fractions from Ni-NTA column for the purified SIRP proteins.
Figure 3 shows Western blot and immunohistochemistry of COLCA1 expression in colon biopsy samples. (A) COLCA1 expression in colon tissues obtained at the time of CRC or adenoma resections is higher in benign adjacent tissues compared to CRC tumors. (B) COLCA1 expression is higher in benign adjacent colon tissues from patients homozygous for the protective rs3802842 allele (AA) compared to the risk allele (CC). (C, D, E and F) Immunohistochemical staining for COLCA1 (brown; hematoxylin counterstain; scale bars, 50 μητι) on human colon, benign adjacent (BA) (C and E) and tumor (T) (D and F) tissues for patients with protective (C and D) and risk (E and F) genotypes. (G-H) 100x oil objective images (scale bars, 10 Dm) of representative tissues immunostained with anti-human C11orf92 antibodies (brown; hematoxylin counterstain) identified strong characteristic COLCA1 positive signals in intracellular granules (G) and extracellular granules (H). (I- I) Double immunohistochemical staining for COLCA1 (red; hematoxylin; scale bars, 50 pm) and tumor specific CEA (carcinoembryonic antigen) marker (brown) on paraffin embedded tissues from colorectal cancer patients shows the immediate proximity and specific aggregation of COLCA1 -positive cells around and within the tumor in both benign adjacent (I) and tumor tissue (J); see Fig. 27 for higher magnification images. Figure 4 shows Identification and subcellular localization of COLCA1 -expressing immune cells. (A-C) Cell-type specific Western blots for COLCA1. (A) Cord blood: CD34" cells; Peripheral blood: CD 123+ basophils, CD16- eosinophils, CD16+neutrophils, mononuclear cell fraction (MNC), and polymorphonuclear cell fraction (PMN). (B) Peripheral blood: PMN, CD4+ T cells, CD8+ T cells, CD56+ NK cells, CD14+ monocytes. (C) Peripheral blood: CD19+ B cells, iDC dendritic cells; Cord blood CD34' and CD34+. All western blots were re-probed with anti-/i?-actin antibody; normal colon tissue was run as a positive control. (D-H) Cryosections from colon tissue biopsies were subjected to triple immunofluorescence staining as described in the Methods section. Co-localization of COLCA1 with specific immune cell markers was determined using three-dimensional deconvolution microscopy. High-power images of single immune cells were taken at sequential 0.1-0.3 μτη z axis depths separation. (D) Eosinophils: COLCA1 staining (red) with CD45 (blue) and eosinophil major basic protein (green). (E) Mast Cells: COLCA1 staining (red) with CD45 (blue) and mast cell tryptase (green). (F) Neutrophils: COLCA1 staining (red) with CD45 (blue) and neutrophil elastase (green). (G) Macrophages: COLCA1 staining (red) with CD45 (blue) and CD68 (green). (H) Dendritic cells: COLCA1 staining (red) with CD45 (blue) and CD83 (green). Additional images are provided in Fig. 22-26. Figure 5 shows polymorphism discovery, quality filters, and genotyping of 2.3 Mb at 11 loci identified by GWAS.
Figure 6 shows architecture of the 8q23 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs 16892766); LD structure between all variants, with color shading showing the squared correlation coefficient r2.
Figure 7 shows architecture of the 8q24 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs6983267); LD structure between all variants, with color shading showing the squared correlation coefficient r2.
Figure 8 shows architecture of the 9p24 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs719725); LD structure between all variants, with color shading showing the squared correlation coefficient r2.
Figure 9 shows architecture of the 10p14 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs7894531); LD structure between all variants, with color shading showing the squared correlation coefficient r2.
Figure 10 shows architecture of the 14q22 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs4444235); LD structure between all variants, with color shading showing the squared correlation coefficient r2.
Figure 11 shows architecture of the 15q13 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs4779584); LD structure between all variants, with color shading showing the squared correlation coefficient r2.
Figure 12 shows architecture of the 16q22 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs9929218); LD structure between all variants, with color shading showing the squared correlation coefficient r2.
Figure 13 shows architecture of the 18q21 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs4939827); LD structure between all variants, with color shading showing the squared correlation coefficient r2.
Figure 14 shows architecture of the 19q13 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs10411210); LD structure between all variants, with color shading showing the squared correlation coefficient r2.
Figure 15 shows architecture of the 20p12 locus. From top to bottom: base position and known genes; percentage of samples with at least 6x sequence coverage as a function of base position; significance of tests of association, on the negative log scale, with the red dot indicating GWAS SNP (rs961253); LD structure between all variants, with color shading showing the squared correlation coefficient r2.
Figure 16 shows mRNA expression profiles for C11orf92/COLCA1 , C11orf93/COLCA2 and C11orf53. RT-PCR expression of the transcripts was assessed on first strand cDNA for multiple tissue panels (Clontech, Inc) using primers specific for each gene. A yS-actin primer pair was used as internal control. (A) Human digestive system panel. (B) Human immune Panel. For both panels, the amplified cDNA product sizes are indicated on the left and the transcript names are shown on the right. Acronyms are used for lymph node (LN), peripheral blood lymphocytes (PBL), bone marrow (BM) and fetal liver (FL).
Figure 17 shows tissue and cell line mRNA expression profiles for C11orf92/COLCA1 , C11orf93/COLCA2 and C11orf53. RT-PCR of transcripts was performed on first strand cDNA from multiple tissue panels (Clontech, Inc) using primers specific for each gene. A β-actin primer pair was used as an internal control. (A) Human adult tissue panel. (B) Human cell line panel. For both panels, the amplified cDNA product sizes are indicated on the left and the transcript names are shown on the right. Details on cell lines are found in the Materials and Methods section.
Figure 18 shows C11orf92/COLCA1 splice isoforms correlate with rs10891246 genotypes. (A) Schematic of the long (B-L) and short (B-S) isoforms of C11orf92. Transcripts are shown as light gray bars (non-coding) and dark gray bars (coding); introns are indicated as thin black lines. Solid horizontal lines below the Isoform B-L transcript indicate TaqMan probe positions: probe S (red) and probe L (blue). SNP positions and alleles are drawn relative to each other; therefore, this map is not to physical scale. (B) RT-PCR using isoform-specific primer sets for the B-L and B-S transcripts. GWAS marker rs3802842 genotypes (AA, AC and CC) are indicated above. Colon tissue source is indicated below: benign adjacent (BA) and tumor (T). (C) Relative expression of long (B-L) isoform, calculated by dividing the expression value of the isoform by the expression value of the housekeeping gene, GAPDH. All data are plotted as relative to the expression in non-tumor colon samples. (D) Relative expression of short (B-S) isoform, calculated as for (C). Error bars indicate SEM for AA (n=18), AC (n=21), and CC (n=3). P values were calculated using one- way ANOVA followed by Student-Newman-Keuls test. *P < 0.05.
Figure 19 shows ENCODE chromatin features, transcription factor sequence motifs and polymorphisms at the chromosome 11q23.1 locus. ENCODE features in human lymphoblastoid cell lines for CTCF occupancy and histone modifications. Specific regions that are enriched with chromatin marks are shown in boxes. All SNPs in this region are listed and affected putative transcription factor (TF) binding sites (below the rsSNPs) are identified. The protective alleles are shown in blue within the brackets. The core sequence of the transcription binding site (capital letters) is defined by the highest conserved consecutive position of the matrix (Genomatix, Software GmbH). The letters in red represent the conserved TF binding sequences with a degree of conservation more than 60 (ci-value > 60). The red asterisk denotes the GWAS SNP rs3802842. The rs5794738 SNP shows a 9 bp deletion.
Figure 20 shows Western blots validate the specificity of anti-C11orf92 antibody. C11orf92-GFP is a construct of COLCA1/c11orf92 cDNA fused with the GFP protein. (A) Western blot probed with the anti-C11orf92 antibody. (B) Western blot probed with the anti-GFP antibody using the same cell lysates as in panel A. UT and V are non-transfected and vector transfected controls. Colon tissue was used as a positive control.
Figure 21 shows COLCA1 protein expression in the mucosal stroma of colon tissues. Representative (A) benign adjacent (BA) and (B) tumor (T) tissues were immunostained with anti-human C11orf92 antibody. Sections were counterstained with hematoxylin. Rabbit IgG stained sections are shown alongside as negative controls. Scale bars (bottom right) are 50 μητ
Figure 22 shows expression of the COLCA1/C11orf92 protein in the proximity of tumor cells. (A and B) Double immunohistochemical staining shows COLCA1 protein (red) and the tumor cell-specific carcinoembryonic antigen (CEA, brown) expression in the colon benign adjacent (BA, left panel) and the tumor (T, right panel) tissues. Sections were counterstained with hematoxylin. Corresponding higher magnification images are shown in the bottom panels. Scale bars in the upper and lower panels are 50 pm and 20 μητι, respectively.
Figure 23 (A-F) shows expression of the COLCA1/C11orf92 protein in eosinophils as visualized by the three-dimensional deconvolution microscopy. Confocal images from the colon tissue show color composites at different magnifications. Scale bars in panels A, B, and C-F are 18 μητι, 9 μηη, and 3.8pm, respectively. Grey scale components of the merged image in panel C are shown in panels D-F (blue, green and red, respectively). The upper panels in C-F represent z cross-sections. The COLCA1 protein (red) co-resides with the immune cell marker CD45 (blue) and with the eosinophil major basic protein (green).
Figure 24 (A-F) shows expression of the COLCA1/C11orf92 protein in mast cells as seen by the three-dimensional deconvolution microscopy. Confocal images from the colon tissue show color composites at different magnifications. Scale bars in panels A, B, and C-F are 18 pm, 9 μητι, and 3.2 pm, respectively. Grey scale components of the merged image in panel C are shown in panels D-F (blue, green and red, respectively). The upper panels in C-F show z cross-sections. COLCA1 protein (red) co-resides with the immune cell marker CD45 (blue) and with the mast cell tryptase (green).
Figure 25 (A-F) shows expression of the COLCA1/C11orf92 protein in neutrophils as viewed by the three-dimensional deconvolution microscopy. Confocal images from the colon tissue show color composites at different magnifications. Scale bars in panels A, B, and C-F are 18 pm, 9 μηι, and 3.3 μιη, respectively. Grey scale components of the merged image in panel C are shown in panels D-F (blue, green and red, respectively). The upper panels in C-F show z cross-sections. COLCA1 protein (red) co-resides with the immune cell marker CD45 (blue) and with the neutrophil elastase (green).
Figure 26 (A-F) shows expression of the COLCA1/C11orf92 protein in macrophages as seen by the three-dimensional deconvolution microscopy. Confocal images from the colon tissue show color composites at different magnifications. Scale bars in panels A, B, and C-F are 18 pm, 9 μηη, and 3.6 pm, respectively. Grey scale components of the merged image in panel C are shown in panels D-F (blue, green and red, respectively). The upper panels in C-F represent z cross-sections. COLCA1 protein (red) co-resides with the immune cell marker CD45 (blue) and with the macrophage specific CD68 protein (green).
Figure 27 (A-F) shows expression of COLCA1/C11orf92 protein in dendritic cells as seen by three-dimensional deconvolution microscopy. Confocal images from the colon tissue show color composites at different magnifications. Scale bars in panels A, B, and C-F are 18 pm, 9 pm, and 3.2 pm, respectively. Grey scale components of the merged image in panel C are shown in panels D-F (blue, green and red, respectively). The upper panels in C-F show z cross-sections. COLCA1 protein (red) co-resides with the immune cell marker CD45 (blue) and with the dendritic cell marker, CD83 (green).
Figure 28 shows immunofluorescence confocal images from colon tissue show color composites in two dimensional (A) and three-dimensional opacity rendering (B). COLCA1 protein is seen as granules (red) in close proximity of the tumor cells, CEA (grey blue). Scale bars in panels A and B are 20 pm and 12 pm, respectively. Figure 29 shows Revised organization of the COLCA1 and COLCA2 genes in comparison with the RefSeq gene structure from the UCSC NCBI36/hg 18 assembly. Red and blue boxes (exons) and lines with arrowheads (introns), represent COLCA1 or COLCA2 genes, and their location on minus or plus strands, respectively. Exons are shown individually or as part of the transcripts that were identified by sequencing. The 12 CRCassociated SNPs are shown as green bars with GWAS SNP rs3802842 as the purple bar. Thicker boxes represent coding regions. The SGP program developed at the Genome Bioinformatics Laboratory shows gene predictions using mouse/human homology for COLCA2. ENCODE histone methylation and acetylation marks indicate regulatory activity at the COLCA1/ COLCA2 locus. Figure 30 shows Organization of the COLCA1 gene. (A) COLCA1 is organized into variable (yellow boxes) and constant exons (pink box), spanning genomic intervals of 6.4 Kb and 5.3 Kb, respectively. (B) Splicing of variable exons to the constant exon generates different transcripts, which produce the same COLCA1 protein. The thick pink boxes represent coding region on the constant exon. Figure 31 shows Organization of the COLCA2 gene. (A) COLCA2 is organized into variable (1 to 4; yellow boxes) and constant exons (5 to 8; blue boxes), spanning genomic intervals of 2.1 and 8.1 kb, respectively. The positions of the exons are drawn relative to each other, but not to scale. (B) Splicing of the variable exons to the four constant exons generates multiple transcript isoforms, five of which were confirmed by sequencing, as indicated.
Figure 32 shows Western blot analysis of COLCA2 proteins encoded by multiple transcripts. (A) Distribution of alternatively spliced COLCA2 protein isoforms in a, colon biopsy samples from benign adjacent (BA) and tumor (T) tissues, peripheral blood leukocytes (PBL), CD34- cells, human myeloid (AML2 and HL60), and colon cancer (HT29) cell lines. Immunodetection of COLCA2 protein by rabbit polyclonal human anti-COLCA2 antibody (antigen designed at the C-terminus of the protein), recognizes 8 immunoreactive bands (A-H). The migration pattern of five of them (isoforms 1 to 5), relative to the protein standard marker (M), coincides with the predicted molecular weights (kDa) of five verified protein-encoded COLCA2 transcripts. (B) Immunodetection of COLCA2 protein isoforms in colon benign adjacent (BA) and tumor (T) samples from patients homozygous for the risk (CC) or protective (AA) allele. (C) COLCA2 transcript-specific RT-PCR. Total RNA extracted from normal colon and peripheral blood lymphocytes, was reverse-transcribed into cDNA followed by PCR amplification using isoform specific primer sets (see Table 11). (M) DNA molecular size marker.
DETAILED DESCRIPTION
In the following description, numerous specific details are set forth to provide a thorough understanding of the invention. However, it is understood that the invention may be practiced without these specific details.
In this study of a chromosome 11q23 locus that is genetically associated with colorectal cancer risk, we investigated two previously uncharacterized transcripts named C11orf92 and C11orf93 and showed that decreased RNA expression correlates with risk for colorectal cancer. Renamed Colorectal Cancer Associated 1 and 2, COLCA1 and COLCA2 share a bidirectional promoter and are co-regulated. Immunochemical studies of COLCA1 in colonic tissues reveal strong co-localization in cytoplasmic granules present in eosinophils, mast cells, neutrophils, macrophages and dendritic cells. Furthermore, COLCA1 exists within extracellular granules in normal mucosa and at the periphery of colon cancer cells. Thus, our study potentially provides the first inherited mechanistic link between microenvironmental factors and human cancer predisposition.
In an aspect, there is provided an isolated protein comprising SEQ ID NO. 1 or a functional fragment thereof. As used herein, "polypeptide" and "protein" are used interchangeably and mean proteins, protein fragments, modified proteins, amino acid sequences and synthetic amino acid sequences. The polypeptide can be glycosylated or not.
As used herein "fragment' relating to a polypeptide or polynucleotide means a polypeptide or polynucleotide consisting of only a part of the intact polypeptide sequence and structure, or the nucleotide sequence and structure, of the reference gene. The polypeptide fragment can include a C-terminal deletion and/or N-terminal deletion of the native polypeptide, or can be derived from an internal portion of the molecule. Similarly, a polynucleotide fragment can include a 3' and/or a 5' deletion of the native polynucleotide, or can be derived from an internal portion of the molecule. In a further aspect, there is provided an isolated protein comprising SEQ ID NO. 3, 5, 7, 9, 11 or a functional fragment thereof.
In a further aspect, there is provided an isolated nucleic acid encoding the protein of any one of claims 1 3, 5, 7, 9 and 11. Preferably, the isolated nucleic acid comprises SEQ ID NO. 2, 4, 6, 8, 10 or 12. In an aspect, there is provided an expression vector comprising the nucleic acid described herein operably linked to an expression control sequence.
In an aspect, there is provided a cultured cell comprising the vector described herein.
In an aspect, there is provided a method of determining risk of colon cancer in a patient using a sample therefrom comprising: determining the level of expression of at least one of COLCA1 and COLCA2; and comparing the level of expression of the sample with a control sample; wherein a higher level of expression of at least one of COLCA1 and COLCA2 in the patient sample compared to the control indicates a low risk of colon cancer. The term "level of expression" or "expression level" as used herein refers to a measurable level of expression of the products of biomarkers, such as, without limitation, the level of messenger RNA transcript expressed or of a specific exon or other portion of a transcript, the level of proteins or portions thereof expressed of the biomarkers, the number or presence of DNA polymorphisms of the biomarkers, the enzymatic or other activities of the biomarkers, and the level of specific metabolites.
As used herein, the term "control" refers to a specific value or dataset that can be used to prognose or classify the value e.g. expression level or reference expression profile obtained from the test sample associated with an outcome class. A person skilled in the art will appreciate that the comparison between the expression of the biomarkers in the test sample and the expression of the biomarkers in the control will depend on the control used.
The term "differentially expressed" or "differential expression" as used herein refers to a difference in the level of expression of the biomarkers that can be assayed by measuring the level of expression of the products of the biomarkers, such as the difference in level of messenger RNA transcript or a portion thereof expressed or of proteins expressed of the biomarkers. In a preferred embodiment, the difference is statistically significant. The term "difference in the level of expression" refers to an increase or decrease in the measurable expression level of a given biomarker, for example as measured by the amount of messenger RNA transcript and/or the amount of protein in a sample as compared with the measurable expression level of a given biomarker in a control.
The term "low risk" as used herein in respect of colon cancer refers to a lower risk of colon cancer as compared to a general or control population. The term "sample" as used herein refers to any fluid, cell or tissue sample from a subject that can be assayed for biomarker expression products and/or a reference expression profile, e.g. genes differentially expressed in subjects.
In some embodiments, the level of gene expression is determined and compared.
A person skilled in the art will appreciate that a number of methods can be used to detect or quantify the level of RNA products of the biomarkers within a sample, including arrays, such as microarrays, RT-PCR (including quantitative RT-PCR), nuclease protection assays and Northern blot analyses.
The term "nucleic acid" includes DNA and RNA and can be either double stranded or single stranded. The term "hybridize" or "hybridizable" refers to the sequence specific non-covalent binding interaction with a complementary nucleic acid. In a preferred embodiment, the hybridization is under high stringency conditions. Appropriate stringency conditions which promote hybridization are known to those skilled in the art, or can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1 6.3.6. For example, 6.0 x sodium chloride/sodium citrate (SSC) at about 45°C, followed by a wash of 2.0 x SSC at 50°C may be employed.
The term "probe" as used herein refers to a nucleic acid sequence that will hybridize to a nucleic acid target sequence. In one example, the probe hybridizes to an RNA product of the biomarker or a nucleic acid sequence complementary thereof. The length of probe depends on the hybridization conditions and the sequences of the probe and nucleic acid target sequence. In one embodiment, the probe is at least 8, 10, 15, 20, 25, 50, 75, 100, 150, 200, 250, 400, 500 or more nucleotides in length.
The term "primer" as used herein refers to a nucleic acid sequence, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of synthesis when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand is induced (e.g. in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer must be sufficiently long to prime the synthesis of the desired extension product in the presence of the inducing agent. The exact length of the primer will depend upon factors, including temperature, sequences of the primer and the methods used. A primer typically contains 15-25 or more nucleotides, although it can contain less or more. The factors involved in determining the appropriate length of primer are readily known to one of ordinary skill in the art. Preferably, the level of gene expression is determined by hybridizing a labelled probe to at least one of COLCA1 and COLCA2 mRNA and detecting labelled probe hybridized to the mRNA. In some embodiments, the level of gene expression is determined on a DNA microarray. In some embodiments, the method further comprises polymerase chain reaction (PCR) to amplify the mRNA. In some embodiments, the level of gene expression is determined by using a tag based analysis, preferably serial analysis of gene expression (SAGE).
In some embodiments, the level of protein expression is determined and compared.
In addition, a person skilled in the art will appreciate that a number of methods can be used to determine the amount of a protein product of the biomarker of the invention, including immunoassays such as Western blots, ELISA, and immunoprecipitation followed by SDS-PAGE and immunocytochemistry.
Preferably, the level of protein expression is determined by binding a COLCA1 or COLCA2 specific antibody to COLCA1 or COLCA2 respectively and detecting the presence of the resulting protein-antibody complex.
In some embodiments, the sample is a colon tissue sample. In other embodiments, the sample is a peripheral blood sample.
In an aspect, there is provided the protein described herein for determining the risk of colon cancer.
In an aspect, there is provided a use of the protein described herein for determining the risk of colon cancer. In an aspect, there is provided a diagnostic kit for determining risk of colon cancer in a patient comprising reagents for detecting the level of gene or protein expression of at least one of COLCA1 and COLCA2 in a patient sample and instructions for use. Preferably, the instructions correlate to the method steps described herein.
In an aspect, there is provided a method of treating or preventing colon cancer in a subject comprising administering the protein described herein.
In an aspect, there is provided the protein described herein for treating or preventing colon cancer in a subject.
In an aspect, there is provided a use of the protein described herein for treating or preventing colon cancer in a subject. In an aspect, there is provided a use of the protein described herein in the preparation of a medicament for treating or preventing colon cancer in a subject.
In an aspect, there is provided a pharmaceutical composition for the treatment of colon cancer comprising a therapeutically effective amount of the protein of claim 1 or 2 and a pharmaceutically acceptable carrier.
As used herein, " harmaceutically acceptable carrier" means any and all solvents, dispersion media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents, and the like that are physiologically compatible. Examples of pharmaceutically acceptable carriers include one or more of water, saline, phosphate buffered saline, dextrose, glycerol, ethanol and the like, as well as combinations thereof. In many cases, it will be preferable to include isotonic agents, for example, sugars, polyalcohols such as mannitol, sorbitol, or sodium chloride in the composition. Pharmaceutically acceptable carriers may further comprise minor amounts of auxiliary substances such as wetting or emulsifying agents, preservatives or buffers, which enhance the shelf life or effectiveness of the pharmacological agent.
As used herein, "therapeutically effective amount' refers to an amount effective, at dosages and for a particular period of time necessary, to achieve the desired therapeutic result. A therapeutically effective amount of the pharmacological agent may vary according to factors such as the disease state, age, sex, and weight of the individual, and the ability of the pharmacological agent to elicit a desired response in the individual. A therapeutically effective amount is also one in which any toxic or detrimental effects of the pharmacological agent are outweighed by the therapeutically beneficial effects.
As used herein "fusion protein" refers to a composite polypeptide, i.e., a single contiguous amino acid sequence, made up of two (or more) distinct, heterologous polypeptides that are not normally or naturally fused together in a single amino acid sequence. Thus, a fusion protein may include a single amino acid sequence that contains two entirely distinct amino acid sequences or two similar or identical polypeptide sequences, provided that these sequences are not normally found together in the same configuration in a single amino acid sequence found in nature. Fusion proteins may generally be prepared using either recombinant nucleic acid methods, i.e., as a result of transcription and translation of a recombinant gene fusion product, which fusion comprises a segment encoding a polypeptide of the invention and a segment encoding a heterologous polypeptide, or by chemical synthesis methods well known in the art. Fusion proteins may also contain a linker polypeptide in between the constituent polypeptides of the fusion protein.
The advantages of the present invention are further illustrated by the following examples. The examples and their particular details set forth herein are presented for illustration only and should not be construed as a limitation on the claims of the present invention.
EXAMPLES MATERIAL AND METHODS
Sequencing, genotyping, and genetic analysis of 11 GWAS loci DNA samples used for sequencing and/or genotyping
The study was approved by the research ethics boards of the University of Toronto and Mount Sinai Hospital, Toronto. Ontario Familial Colorectal Cancer Registry (OFCCR): The OFCCR is a member of the National Cancer Institute Cooperative Family Registries for Colorectal Cancer Studies (Colon CFRs) (http://epi.grants.cancer.gov/CFR/about_colon.html) (42). The OFCCR includes 3,770 population-based CRC cases diagnosed in the province of Ontario, Canada between 1997 and 2000 and between 2002 and 2006, with an age at the time of diagnosis of 20 to 74 years. Age- and sex-matched control subjects with no personal history of CRC were recruited by telephone from a list of randomly selected residential telephone numbers. In the current study: A) 40 CRC cases and 40 controls were sequenced in the SNP discovery phase of the project; and B) an additional 1 ,121 CRC cases and 1 ,153 controls from OFCCR were used in the genotyping.
The OFCCR has also recruited pedigrees showing autosomal dominant transmission classified as Familial Colorectal Cancer Type X. For this study, 25 probands and 15 affected siblings were selected for sequencing. All have been well characterized to have microsatellite stable tumors and known high penetrant syndromes have been excluded by pre-screening for mutations in genes causing familial CRC.
DNA sample, GM12155 (CEPH/UTAH Pedigree 1408, NIGMS Human Genetic Cell Repository) was also used for sequencing. Including the CEPH, 121 samples were sequenced. One sequenced DNA (allegedly from a sporadic case) turned out to be identical to the DNA of another sequenced sample and two DNA samples (one proband and one kin of another proband) likely contaminated each other. We were thus left with sequences for 117 distinct samples, 1 DNA mixture and 1 CEPH control, for 119 DNA samples. Targeted sequencing
Region selection
The regions that we selected to undergo sequencing include: 10 regions identified by genome-wide association scan that harbor common susceptibility variants for colorectal cancer (CRC) (rs16892766 [8q23.3], rs10505477 [8q24], rs10795668 [10p14]; rs3802842 [11q23]; rs4444235 [14q22]; rs4779584 [15q13]; rs9929218 [16q22]; rs4939827 [18q21]; rs10411210 [19q13]; rs961253 [20p12]) (15) and an additional region identified (rs719725 [9p24]) (13) that did not replicate in the last stage of the study, but replicated independently elsewhere (18).
The 11 GWAS regions were defined to be the largest regions that include all SNPs in linkage disequilibrium with the risk variants identified, (r^O.20), based on release 23a of the CEU HapMap data. The sum of these regions is 2.29 Megabases. Table 1 provides an overview of genomic intervals targeted for variant discovery.
Colorectal cancer capture array design
The UCSC-Table Browser function with repeats masked on Human Genome build (HG18, March 2008) was used to identify unique sequences within the 11 selected colorectal cancer associated genomic regions (Table 1). The selected genomic sequences spanning 2.3 Mb were used to design oligonucleotides for sequence capture (SC) arrays (www.nimblegen.com/seqcap, Roche NimbleGen Inc.). Standard bioinformatics filters that check for genomic uniqueness against an indexed human genome were used to select capture oligonucleotides. The NimbleGen proprietary repeat-masking method was used to remove repetitive sequences. The capture oligonucleotides of 60-75 bp were designed with an optimized and empirically tested algorithm (version 2.0) to achieve optimal isothermal hybridization across the microarray. Whole-genome fragment libraries
Whole-genome fragment libraries were prepared using a modification of the paired- end genomic DNA sample preparation protocol (PE1021001 , lllumina Inc.). All adaptors and primers were obtained from lllumina Inc.
Genomic DNA fragmentation DNA (5 /vg at 25 ng /pi concentration) was sonicated using a BioRuptor (Diagenode Inc.) with high power intensity at a pulse of 30 seconds followed by a 30 second rest, for a total procedure time of 25 min. The DNA fragment-size distribution of 300 to 500 (+/- 50) bp was confirmed on a Bioanalyzer 2100 DNA chip (Agilent Inc.). The fragmented DNA samples were concentrated to 75 μΙ using a Speed-Vac (Thermo Savant).
End repairing of sheared DNA
To the 75 μΙ fragmented DNA, 10 μΙ of T4 DNA ligase buffer containing 10 mM ATP, 4 μΙ dNTP mix, 5 μΙ T4 DNA Polymerase, 1 μΙ of Klenow DNA polymerase and 5 μΙ of T4 polynucleotide kinase (PNK) were added and DNA ends were filled at 20°C for 30 minutes. To purify the end-labeled fragments, QIAquick PCR purification kit (QIAGEN Inc.) was used and DNA was eluted in 32 μΙ of water.
Addition of 'A' to the 3' end of DNA fragments
Five μΙ Klenow buffer, 10 μΙ of 1 m dATP and 3 μΙ of Klenow exo (3' to 5' exo minus) were added to the purified DNA sample and incubated at 37°C for 30 minutes. MinElute PCR purification kit (QIAGEN, Inc.) was used to remove reaction components and labeled DNA fragments were eluted in 10 μΙ of water.
Ligation of adapters
Twenty five μΙ of 2X DNA ligase buffer, 10 μΙ of PE adapter oligo mix and 5 μΙ of DNA ligase were added to the DNA sample and incubated at 20°C for 15 minutes, followed by purification (QIAquick PCR Purification kit, QIAGEN, Inc) and elution of DNA with 55 μΙ water.
Size selection by Millipore ultrafiltration and enrichment by PCR
To remove the unligated and self-ligated adaptors and to concentrate the adaptor- ligated DNA fragments, Microcon 100K membrane (Millipore, Inc.) with size cut-off of 300 bp ssDNA and 125 bp ds DNA was used. This procedure was performed by centrifugation at 500xg for 15 min at 25°C, followed by recovery of the concentrated DNA by inverting the sample reservoir and centrifuging at 1000xg for 3 min.
A standard preparation of 5 pg of genomic DNA yielded 1 yg of size-selected material, which was insufficient for one capture selection hybridization. To increase the amount of sequencing template, the adaptor-ligated DNA fragments were PCR amplified (9 cycles) using the Phusion High-Fidelity PCR master mix with HF buffer (New England Biolabs) and 200 nM of each of lllumina PCR primers PE .1 and PE2.1 according to the lllumina whole-genome fragment library protocol. After PCR clean-up using the Zymo DNA Clean and Concentrator-25 Kit (Zymo Research Inc.), the DNA amounts in the fragment libraries were quantified by NanoDrop ND-1000 spectrophotometer (Thermo Scientific) prior to sequence capture hybridization procedure.
Array Capture hybridization Purified adaptor-ligated DNA fragments (5 pg) were mixed with 60 pg Cot-1 DNA (Invitrogen) in a total volume of 4.8 μΙ of water. As a control, a 10"5 dilution of the same DNA fragment library was prepared but not used for hybridization. Sixteen μΙ of hybridization mix was prepared as per manufacturer's instructions (NimbleGen Arrays User's Guide: Sequence Capture Array Delivery v3.0, Roche NimbleGen Inc.) Briefly, DNA/Cot-1 solution was mixed with 8 μΐ of 2xSC Hybridization buffer and 3.2 μΙ of SC Component A (Sequence Capture Hybridization Kit, Roche NimbleGen Inc.). The hybridization mix was incubated at 95°C for 10 min, and then kept at 42°C until ready to use. In parallel, the array was prepared for hybridization by attachment of the X1 Mixer (Roche NimbleGen Inc.) filled with the hybridization mix and loaded onto the NimbleGen Hybridization System. Hybridization was carried out at 42°C for 68-72 hrs with station mix mode "B". At the end, the mixer was disassembled and the array was washed in a 50 ml wash tube with NimbleGen wash buffers (Sequence Capture Wash and Elution Kit, Roche NimbleGen Inc.).
Array elution and post-hybridization DNA preparation
Captured DNA was eluted from the array by using the Elution Chamber ES1 and the Elution System (Roche NimbleGen Inc.). Briefly, the chamber was filled with 425 μΙ of water (pre-warmed at 95°C) and incubated for 5 min at 95°C. The eluted DNA was collected from the chamber and kept on ice. This procedure was repeated to achieve 3 elution samples, which were pooled. Eluted DNA was dried in a SpeedVac (Thermo Savant) and rehydrated in 300 μΙ of water and amplified (20 PCR cycles) using the Phusion High-Fidelity PCR master mix with HF buffer (New England Biolabs) and 200 nM of lllumina PCR primers PE1.1 and PE2.1 according to the lllumina whole-genome fragment library protocol. This procedure was performed on both hybridized (captured) and non-hybridized (non-captured) DNA libraries. The captured and non-captured PCR products were subjected to PCR clean up and quantification as described above. Genomic enrichment was determined by using 10 ng/μΙ aliquots of captured and non-captured DNA samples and CustomTaqMan Expression Assay.
Quantitative PCR by TaqMan expression assay with region-specific probes
To quantify the fold enrichment of the selected colorectal cancer specific regions, 5 loci were randomly chosen for qPCR analysis. As a negative control, two loci from the non-target regions were randomly selected to design primers and probe sets (Table 2). To ensure better performance, target sequences were analyzed for specificity and absence of repetitive sequences.
Fold enrichment was calculated by comparing the cycle threshold of the gDNA amplification of captured and non-captured sample. Assuming that the DNA concentration doubles every cycle, enrichment was calculated by 2N, with N being the difference between the cycle thresholds. lllumina Sequencing
Post-enrichment DNA libraries were sequenced on lllumina Genome Analyzer II instruments as paired-end 2x76 bp reads, following the manufacturer's protocols and using the standard sequencing primers. Image analyses and base callings were performed by the Genome Analyzer Pipeline version 1.3 with default parameters and default filtering.
We generated a total of 6.3 billion short 76-paired-end reads from 317 flow cell lanes. Due to failure of the second run on two flow cells, 15 of these lanes only produced single-end data. Moreover partial failure in one flow cell resulted in 8 lanes with 76x38bp paired-end data.
Read mapping and consensus calling
Alignment was done using MAQ 0.7.1 (43) onto the March 2006 human reference sequence (NCBI Build 36.1). We first aligned reads from each lane individually, removed any potential PCR duplicates and then combined all alignments belonging to the same DNA samples. Reads that were excluded include reads that did not pass quality filtering; reads that did not align to the target regions or that did not align in correct pairs; reads that aligned with more than 7 mismatches, with low mapping quality or with too many high-quality mismatches. Once aligned, paired-end reads revealed that fragment sizes were smaller than the desired size of 300-500 bases, and that both reads were often overlapping.
We also used MAQ 0.7.1 for consensus calling. Only reads with mapping quality value above 0 that mapped onto the reference sequence with no more than 7 mismatches (with Phred-like quality of mismatched bases totaling 60 or less) were used to call consensus; moreover, for paired-end reads, the reads had to be mapped in correct pairs (one read on each strand, with no more than 1000 base between both reads). We defined a base to be callable if a probe on the capture array covered that base and if all 76-mers from the reference sequence that include that base align uniquely onto the reference sequence. The average coverage per base in all GWAS regions was 53.6 reads/base; the proportion of bases covered by at least one read was 96.4% and the proportion of bases covered by at least 6 reads was 85.1% (Table 3). In exons of RefSeq genes, the average coverage was 1.4-fold higher to 76.4, the proportion of bases covered by at least one read was 98.7% and the proportion of bases covered by at least 6 reads was 93.8%. Variant calling
Only bases covered by at least 6 reads, with Phred-like consensus quality of 20 or more, were considered for variant calling. SNPs were called at positions where at least one sample showed on its consensus sequence any nucleotide other than the nucleotide found on the reference sequence. Short insertions and short deletions (indels) were called in a sample whenever it was detected by 6 reads, with at least one read on each strand, irrespective of the number of reads that do not support the presence of the indel. This choice was guided by the observation that the sequences of both fragment's ends were often overlapping, making it less likely for a fragment containing an indel to see one of its two ends map without gaps onto the reference genome, in order to serve as an anchor for the alignment of the other end.
Comparison of sequence data with existing GWAS datasets
We compared the data from each sample to genotypes already available from high- density genotyping arrays: 1) Affymetrix 100K and 500K arrays in OFCCR; and 2) lllumina1M-duo arrays for the set of probands and kin from high-risk families. The number of available SNPs in the target regions varied from 748 to 1781 depending on the sample. One sequenced DNA (allegedly from a sporadic case) turned out to be identical to the DNA of another sequenced sample, and two DNA samples (one proband and one kin of another proband) likely contaminated each other: both showed identity of sequences at nearly all polymorphic sites, they showed 1.5-fold more heterozygous sites than expected, and both mixtures showed ~90% identity with both sets of genotypes. We were thus left with sequences for 117 distinct samples, 1 DNA mixture and 1 CEPH control, sequenced in a total of 317 lanes.
A total of 109,959 non-missing genotypes in 117 samples were available for comparison purposes (we excluded the DNA mixture and the CEPH sample). Requiring a minimum of 6 reads and a minimum consensus base quality of 20 results in good sensitivity and specificity to detect alternative alleles; overall concordance is 99.05%. Among genotypes homozygous for the reference alleles, sequencing revealed an alternative allele in only 0.09% of instances. Among genotypes involving at least one alternative allele, sequencing identified an alternative allele (one or two copies) in 98.4% of instances. Most of the discordant calls are heterozygotes for which the alternative alleles were not detected, likely because of insufficient number of reads. Overlap with dbSNP and 1000 Genomes submissions
We compared all identified variants with submissions to dbSNP version 132 and release 20101123 of the 1000 Genomes project (1kG). To be classified as "known", all alleles that we observed for the variant had to be in dbSNP or 1kG. The locations of indels in dbSNP, as well as the nucleotides involved, suffer a lack of universal rule and are often the results of arbitrary choices (e.g., the deletion of either AG or GA in TAGAC result in an equivalent sequence). For each indel, we thus created a list of all equivalent indels, and used this list to compare with dbSNP submissions. MAQ reports the location of an indel at the smallest possible genomic position. Genotyping
Design of an iSelect genotyping array
We designed a custom 10,640-bead iSelect array from lllumina. We first submitted a list of 10,531 SNPs and 708 short indels to lllumina in order to get SNP scores. We did not include in this list SNPs that are not biallelic, either according to the nucleotides identified by sequencing, or according to dbSNP (version 129 at the time), or a combination of both, and removed biallelic SNPs that did not involve the reference allele. We ignored indels that were not bi-allelic with respect to the string and length of nucleotides that are deleted or inserted, according to our data, or dbSNP, or a combination of both (these include short microsatellites or homo polymers). We excluded indels whenever one of its equivalent indel harbors a SNP, either according to our data or dbSNP. For each indel we submitted two probes, one on each strand, carefully designed such that a one-base extension would result in a polymorphic site with the ability to discriminate between indel- carrying and non-carrying chromosomes. Of the 10,531 submitted SNPs and sequences, 9,365 (88.9%) had adequate score (above 0.4) to be attempted. We further trimmed the list by excluding SNPs for which genotypes were already available, and excluding SNPs based on LD: starting with a SNP with low lllumina score and moving up the list, the SNP was excluded if a SNP with high enough score is in high LD with it (r^O.9), with the additional restriction that a SNP can be used as a tag only once, in order to minimize losses if the tag would fail genotyping. Only frequent enough SNPs in the CEU HapMap samples (>8%) were used in this pruning process. For short indels, scores were obtained from both orientations and the orientation with highest score was retained; when one orientation would result in a C/G or A/T variant at the base of interest (requiring two beads per variant), then the other orientation was preferred as long as it had a high enough lllumina score. None of the SNPs that are coding, or that are from the GWAS literature, were involved in the pruning process. The final design included 7,474 putative SNPs and 600 putative indels.
High-throughput genotyping
Genotyping of 2,380 samples was done at core facilities of the McGill University and Genome Quebec Innovation Centre (http://gqinnovationcenter.com) using established protocols. After excluding variants (325 SNPs and 17 indels) that failed to generate genotyping calls in > 95% of samples, 7,149 submitted SNPs were deemed to have yielded successful assays. Of these, 1 ,169 putative SNPs and 390 putative indels were monomorphic. For indels that turned out to be monomorphic (390 indels, 381 of which were detected in only one sample), a large majority of samples that were supporting the indel with at least 6 reads had 4 times as many reads not supporting it (a fraction less than 20%). In retrospect, most of the indels supported by at least 20% of the reads in at least one individual turned out to be validated by genotyping, with minimal misclassification.
DNA Sequencing by Capillary Electrophoresis Due to a failure to genotype a known missense mutation (rs61753533) using lllumina's 10,640-bead iSelect array, all genotyped DNA samples were also sequenced for the coding exon of C11orf92 using ABI 3730x1 DNA Analyzer systems. Additional SNPs not discovered in the targeted sequencing experiment can be found in Table 6.
Genetic association analyses
Genetic association analyses were conducted for 1 ,030 cases and 1 ,061 control samples, all homogeneous in terms of ancestry, as confirmed by principal component analysis of the correlation between their genomes. We excluded the subset of samples that were sequenced in the SNP discovery phase of the project.
Genetic analysis of rare variants
Variants with frequencies less than 1% in either the cases or the controls were collectively analyzed within each region using the method of Madsen and Browning: in brief, for each sample, a weighted count of all minor alleles observed over all SNPs was computed, where weights were based on the inverse standard deviation of the minor allele frequency. This weighting scheme puts more weight on the less common SNPs, which has the desirable effect that the contribution of true rare risk alleles is not diluted by combining it with more common non-risk alleles. Then a rank-based test was applied on the weighted counts and significance was computed with a permutation procedure. Choosing the set of rare variants based on their frequency in either the cases or the controls (as opposed to their frequency in the combined sample of cases and controls) puts no restriction on how high the frequency of a risk allele may reach in the cases, or how high the frequency of a protective allele may reach in the controls, which is a desirable effect. We chose as a rank-based test, the test of van Elteren (a version of a class of tests also known as stratified Wilcoxon tests (44), where we first stratify the samples based on their genotype at the known GWAS SNP located in the region. These tests are conditional tests and thus the established associations in the regions do not confound our results. Significance levels were estimated using a minimum of 5000 random permutations within each GWAS genotype strata, in order to preserve the association at the GWAS locus.
Analyzing rare variants by collapsing the (weighted) counts of rare alleles in each region, in cases and controls, and then evaluating their combined effects with a rank- based test does not provide additional information of significance (data not shown). The number of rare variants (frequency less than 1% in either the cases or the controls) varies from region to region, but scales linearly with the region size, with on average one rare variant per approximately 1 kb. Genetic analysis of variants > 1%
SNPs were tested for association using the Cochran-Armitage test for trend. Significance levels for SNPs with minor allele frequency less than 5% were empirically evaluated with a permutation procedure that consist of randomly re- assigning case or control status to all samples. A minimum of 5000 random replicates was used; this number was increased to guarantee that the ratio of the estimated p-value to its standard error is at least 10. In the GWAS regions, conditional tests of association, which condition upon the presence or the absence of a GWAS risk allele on the same haplotype as the test allele, were performed using UNPHASED (45).
Expression studies of 11q23 candidate genes
Description of tissue samples
Two biospecimen repositories provided blood, tumor and benign adjacent colonic tissues (collected at the time of surgery and cryopreserved in liquid nitrogen): 1) The Ontario Familial Colorectal Cancer Registry, described above; and 2) the Ontario Tumour Bank (OTB; www.ontariotumorbank.ca). OTB has archived more than 80,000 samples from 5 centers distributed throughout Ontario (London, Hamilton, Mississauga, Kingston and Ottawa). Genotyping data for rs3802824 was available for more than 1 ,500 sample sets, which allowed us to select matched pairs of colon tumors and benign adjacent tissues, based on three genotypic classes. We generated an 11q23 tissue panel comprised of 18 samples that are homozygous (AA) for the protective allele; 21 (AC) heterozygotes and 3 (CC) homozygotes. We also purchased 3 normal human colon total RNA samples from Ambion, Inc. (#AM7986), Clontech, Laboratories (# 636553), and OriGene Technologies, Inc. (#CR560347), which we used as reference samples in gene expression studies (described below).
Total RNA expression studies
RNA isolation, real-time PCR assays and sequence analysis
Total RNA was isolated from colon tissues using TRIzol (Invitrogen, Inc.). RNA quality was determined using the 6000 Nano Chip kit and the Agilent 2100 Bioanalyzer (Agilent Technologies, Inc, Palo Alto, CA). Concentrations were determined as ng per μ\ and integrity of the RNA was scored with an RNA integrity number (RIN) that grades RNA integrity from 0 (decayed) to 10 (intact). To remove contaminating genomic DNA, isolated RNA was treated with RNase-free DNase I (Invitrogen) prior to RT-PCR. The cDNA was synthesized using 3.5 /g of total RNA and the Superscript III First Strand Synthesis System following the manufacturer's recommendations (#18080- 051 , Invitrogen, Inc.). In parallel, an identical reaction was carried out in absence of reverse transcriptase. This RT- minus control served to ensure that the PCR amplification was not from genomic DNA. PCR primers and probes specific for C11orf92, C11orf93, C11orf53, and POU2AF1 were designed using the sequence data obtained from NCBI (http://www.ncbi.nlm.nih.gov/) and Primer Express software (Applied Biosystems). Primer sequences and PCR product sizes are shown in Table 7. Real-time quantitative PCR was carried out using the SYBR Green or TaqMan Gene Expression Assays (Applied Biosystems) on the 7900HT Fast Real-Time PCR System (Applied Biosystems). Three technical replicates were run for each sample. Standard curves comprising dilutions of homologous standards derived from a known starting concentration of mRNA were included on each plate. SDS2.2.2 software (Applied Biosystems) was used for relative quantification analysis of gene expression by relative standard curve method, and GAPDH and ?-actin genes (Applied Biosystems) served as endogenous controls.
Expression analyses in tissues and cell lines
Tissue distribution for C11orf92, C11orf93, and C11orf53 transcripts was analyzed by reverse transcription PCR (RT-PCR) in cDNAs derived from multiple human tissues and cell lines (MTC panels, Clontech, Inc.). These include the human digestive system panel (# 636746), the human immune system panel (# 636748), the human panel II (#636743) and a human cell line panel (# 636753). The human cell line panel includes human embryonic kidney 293 (HEK-293), ovarian carcinoma (SKOV-3), skin epidermoid carcinoma (A-431 ), epithelial-like osteosarcoma (Saos-2), prostate carcinoma (Du145), non-small lung carcinoma (H1299), uterine cervical carcinoma (HeLa) and breast adenocarcinoma (MCF7) cell lines. These MTC panels have been normalized to several housekeeping genes and against each other to ensure accurate assessment of target mRNA abundance and to allow comparisons among different tissues and cells.
Luciferase reporter assays The effects of the CRC associated SNPs on C1 1 orf92 promoter activities were assessed by dual luciferase reporter assay (Promega, Madison, Wl).
Generation of luciferase expression constructs
To generate C11orf92/COLCA1 and C11orf93/COLCA2 promoter-luciferase reporter gene constructs representing risk (RH (C)) and protective (PH (A)) haplotypes, the -4.2 kbp genomic regions covering 10 SNPs (chr11 :110,672,725-110,673,538) were PCR amplified from the genomic DNA of three CRC patients that are heterozygous for the GWAS SNP rs3802842. For COLCA1 , forward and reverse primers were: 5'- qtatctcqaqtgagcactcactatgt-3' and 5'-ttgtataagcttgccaaacttgtcattgtttcc-3'. For COLCA2, forward and reverse primers were: 5'-ttgtatctcgaggccaaacttgtcattgtttcc-3' and 5'-ttgtataagctttgagcactcactatgtggaaag-3'. The restriction sites that were introduced in primer sequences to aid cloning are underlined. The amplicons were resolved on 1.5% (w/v) agarose gels, purified using a QIAquick Gel Extraction kit (QIAGEN Inc., Toronto, Canada), and cloned into the promoter-less pGL3-basic vector containing the firefly luciferase reporter gene (Promega, Madison, Wl). The constructs were sequenced to verify all 10 SNPs. Longer versions of COLCA1 promoter-luciferase reporter constructs encompassing all 12 SNPs and covering ~5 kbp genomic regions were also generated from the genomic DNA of abovementioned CRC patients. The forward (5'- qtatctcgaqtqaqcactcactatgt-3 ' ) and the reverse (5'- gaatcaaqcttgctgcttggttcactgttccttca-3') primers were used for PCR amplification. The restriction sites that were introduced in primer sequences to aid cloning are underlined. The constructs were sequenced to verify all 12 SNPs. No luciferase activity was observed with these 2 constructs carrying the risk and protective haplotypes.
Transient transfection and Dual-Luciferase Reporter assay
HeLa cells (ATCC, Manassas, VA) were transfected with the experimental pGL3 promoter-luciferase constructs (2 pg) using FuGENE HD reagent as per manufacturer's protocol (Promega, USA). The plasmid pRL-null containing the Renilla luciferase gene was co-transfected to normalize for transfection efficiencies. The promoter-less pGL3-basic vector served as a negative control. After 48 hrs, cells were harvested and lysates were used to measure firefly and Renilla luciferase activities using the Dual Luciferase Reporter Assay kit (Promega, Madison, Wl) and the luminometer (Hybrid Microplate Reader, BioTek Synergy).
The reporter activities were expressed as relative light intensity unit (RLU) ratios of the firefly/ 'Renilla luciferase activities after subtraction of the background autoluminescence of non-transfected cells.
Statistical analyses Data were analyzed using SigmaStat (Version 2.0; Systat Software, Inc.). Group comparisons were performed with one-way ANOVA followed by Student-Newman- Keuls tests. Significance was set at p<0.05. Results are expressed as mean ± standard error of the mean (SEM).
Cloning and characterization of C11orf92/COLCA1 and C11orf93/COLCA2 transcript isoforms
Rapid amplification of 5' and 3' cDNA ends (RACE) Total RNA was isolated from human normal colon tissues, tumor and benign adjacent colon tissues (OFCCR Biobank), human white blood cells, a colon cancer cell line (Caco-2), a B-cell lymphoma cell line (SU-DHL-4), and an immunoblastic cell line (OCI-LY10) using the TRIzol reagent (Life Technologies, Inc.). The GeneRacer Kit (Invitrogen, Inc.) was used for 5' and 3' RACE experiments. The PCR products were cloned into TOPO TA vector (Invitrogen, Carlsbad, CA) and inserts were sequenced on an ABI PRISM 310 genetic analyzer (Applied Biosystems). The gene- specific primer sequences for RACE experiments are described in Table 8. The identification of ESTs within the C11orf92 and C11orf93 genes by RACE experiments provided us with information on complete transcript sequences and spliced isoforms of these genes. The RefSeq gene structure from UCSC NCBI36/hg 18 assembly for C11orf92/COLCA1 predicted a transcript of 5443 bp with a coding region of 375 bp that encodes for a protein of 124 amino acids (Appendix 1 ), and for C11orf93/COLCA2, a transcript of 1414 bp, with coding region of 465 bp, that encodes a protein of 154 amino acids (Appendix 2).
Cloning of transcript isoforms To obtain full length cDNAs of COLCA1 and COCLA2 isoforms, PCR were performed on cDNAs from normal colon, tumor and benign adjacent colon tissues (OFCCR Biobank), peripheral blood lymphocytes, SUDHL4 and OCI-LY10 cells using transcript-specific primers (Table 11 ). All PCR reactions were performed using Hot Start Taq DNA polymerase according to manufacturer's instructions (Sigma-Aldrich, St Louis, MO). The amplification conditions were: 30 s at 94°C for denaturation, 30 s at 55-58 °C for annealing, and 90 s at 72°C for extension for a total 35 cycles. PCR products were cloned and sequenced as described above. The PCR products were cloned into TOPO TA vector (Invitrogen, Carlsbad, CA) and inserts were sequenced on an ABI PRISM 310 genetic analyzer (Applied Biosystems).
Semi-quantitative RT-PCR
To examine the expression levels of the COLCA1 specific isoforms, C11orf92B-L and C11orf92B-S, semi-quantitative real-time RT-PCR was performed as described above. Primers and probes sequences and calculated PCR product sizes are shown in Table 7. RNA sequencing
As an alternate approach for identification and characterization of unique mRNA transcripts of genes at the 11q23 CRC locus, advanced strand-specific RNA sequencing (mRNA-seq) was performed on normal colon tissue, tumor and benign adjacent tissues from a patient that was heterozygous for the rs3802842. The directional mRNA-Seq sample preparation kit (lllumina, Inc.) was used to generate libraries from total RNA for high-throughput RNA sequencing on lllumina Genome Analyzer II, according the manufacturer's instructions.
Briefly, total RNA was extracted from tissues using TRIzol (Invitrogen, Inc.) and quality was assessed using the RNA 6000 Nano Chip kit and the Agilent 2100 Bioanalyzer (Agilent Technologies, Inc, Palo Alto, CA). Sera-mag oligo dT beads were used to isolate mRNA prior to fragmentation (RNA fragmentation kit, Ambion), first strand cDNA synthesis using the Superscript II reverse-transcriptase (Invitrogen, Inc.), second strand cDNA synthesis using the E coli DNA poll I (Invitrogen, Inc.), and RNase H treatment. The double stranded cDNA library was processed by lllumina Genomic DNA Sample Prep kit (lllumina, Inc). The adaptor-ligated libraries were gel size-selected at 200 bp and PCR enriched to create final libraries prior to sequencing using 76 bp reads. To ensure the presence of sufficient coverage for both C11orf92 and C11orf93, 4 lanes of lllumina GAIIx were sequenced for each sample. Image analysis and base calling were done by lllumina pipeline, version 1.2.3, with recommended default filtering parameters. Reads are aligned to the human reference genome (NCBI Build 36.1) using Bowtie 0.12.7 (46) and Tophat 1.3.0. (47). For C11orf92 transcript the average coverage of bases for normal, benign adjacent, and tumor samples are 8.9, 7.8, and 3.1 , respectively. Similarly, the average coverage of bases covered for normal, benign adjacent, and tumor samples for C11orf93 are 24.5, 27.0, and 3.9 respectively. Integrative Genomics Viewer (IGV) (49) was used to confirm and visualize the expression levels of C11orf92, C11orf93 and nearby genes.
Mapping of transcription factors and chromatin marks To map the epigenetic landscape of the 11q23 CRC locus we used the information available from the Encyclopedia of DNA Elements (ENCODE) international consortium (48); 50) hosted by the University of California Santa Cruz through their Genome Browser (51); http://genome.ucsc.edu. The ENCODE data has been generated from actual lab experiments (Chromatin Immunoprecipitation Sequencing (ChlP-Seq), DNase hypersensitivity and histone modification studies to provide evidence of putative function compared to predictive algorithms to infer function at any given locus. We generated high resolution epigenomic maps of the entire 55 kb region (from 110630000 position to 110685000 position, Hg18) using data provided by the ENCODE pilot project http://genome.ucsc.edu which overlaps the CRC associated signal encompassing C11orf53, C11orf92 and C11orf93. For localization of the putative promoters and TFBSs (transcription factors binding sites) we used Promoterlnspector,
http://www.genomatix.de/online_help/help_gems/Promoterlnspector_help.html and Matlnspector,
http://www.genomatix.de/online help/help matinspector/matinspector help. To further investigate whether the presence of a SNP is disrupting or creates a putative TFBS, we used TRANS FAC, http://www.gene- regulation.eom/pub/databases.html#transfac. In addition, we demarcated specific patterns of histone modifications in lymphoblastoid cell lines at the promoter regions of C11orf53 and bidirectional C11orf92/C11orf93.
Protein expression and histochemical studies Isolation of peripheral blood lymphocytes, blood cells fractionation and cell purification
Isolation of peripheral blood lymphocytes
Total peripheral blood lymphocytes (PBL) were purified from heparinized blood by gradient density sedimentation using Ficoll/Hypaque (Sigma Aldrich). Diluted blood 1 :2 in RPMI 1640 media (Sigma Aldrich) was added below the Ficoll layer to create two interfacing layers. By the effect of the centrifugal force, lymphocytes were isolated, the layer with the cells withdrawn and washed three times with phosphate buffered saline (PBS). Cell pellets were then resuspended in modified RIPA buffer (25 mM Tris-HCI pH 7.6, 150 mM NaCI, 1% NP-40, 1% sodium deoxycholate, 0.1% SDS) that contained protease inhibitors (Roche, Applied Science). Protein concentration was determined by the BCA Protein Assay Kit (Thermo Scientific, Pierce). Isolation and purification of cord blood progenitors
Cord blood was collected according to approved procedures at the University Health Network. CD34+ and CD34" cells were isolated using EasySep human CD34 positive selection kit according to the manufacturer protocol (StemCell Technologies Inc.). Cells were stained with CD34+ APC (clone 581 , BD Biosciences Inc.) and the purity of the cells confirmed by FACS (>95%).
Blood cells fractionation and cell purification
Peripheral blood mononuclear (MNC) and polymorphonuclear (PMN) cell fractions were isolated from whole blood using discontinuous Histopaque density gradient kit (Sigma-Aldrich), according to manufacturer's instructions. Briefly, diluted blood 1 :2 in RPMI 1640 media (Sigma Aldrich) was added onto the top of two Histopaque layers, Histopaque-1077 and Hisopaque-1119 to create three interfacing layers. By the effect of the centrifugal force, the PMN (lower) and MNC (upper) fractions were isolated simultaneously, the cells withdrawn and washed three times with PBS and cell pellets were lysed using modified RIPA buffer.
Human NK (CD56+), monocytes (CD14+), neutrophil (CD16+), B cell (CD19+), basophil (CD123+), eosinophil (CD16 ), CD8+ T, CD4+ T and CD14+ monocyte- derived immature dendritic (iMoDC) whole-cell lysates were purchased from 3H Biomedical, Uppsala, Sweden. Human basophils were purified from peripheral blood by depletion of lymphocytes, monocytes, NK cells, B cells and plasmacytoid dendritic cells from blood mononuclear cells, followed by CD123 positive selection. The purity of basophils (>90%) was confirmed by CD123-FITC staining. Human eosinophils were purified from peripheral blood by a two-step method: 1) the gradient separation of granulocytes; followed by 2) the CD16 depletion of neutrophils. The purity of eosinophils (>90%) was confirmed by May-Grunewald-Giemsa staining. Human CD4+ and CD8+ T cells were isolated from peripheral blood mononuclear cells by CD4 positive and CD8 positive selection, respectively. The purity of both fractions was higher than 90%. Human NK cells were isolated from peripheral blood mononuclear cells by CD56 positive selection. The purity of NK cells (>90%) was confirmed by CD56 staining. Human monocytes were purified from mononuclear cells by CD14 positive selection. The purity of monocytes (>90%) was confirmed by CD14 staining. Human neutrophils were purified from peripheral blood by gradient separation, followed by CD 16 positive selection. The purity of neutrophils (>90%) was confirmed by CD16 staining. Monocyte-derived immature dendritic cells (iMoDC) were isolated from peripheral blood CD14 positive monocytes cultured with rhGM- CSF and rhlL-4. The iMoDC were CD86\ CD80LOW, CD40+, CD11 b+ CD14-, and CD123". Human B cells were purified from peripheral blood mononuclear cells by CD16 positive selection. The purity of B cells was higher than 90%. All lysates were provided in modified RIPA buffer containing protease inhibitors (Roche, Applied Science).
Isolation and purification of mast cells
CD34+ cord blood cells (2x105) were cultured in H5100 medium (StemCell Technologies Inc.) supplemented with recombinant stem cell factor (rhSCF, 100 ng/ml), human Interleukin 6 (rhlL-6, 50 ng/ml), human Interleukin 3 (rhlL-3, 50 ng/ml) and human GM-CSF (20 ng/ml) [all from R &D Systems]. Non-adherent cells were passaged every week to expand cells. After 3 weeks, IL3 and GM-CSF were no longer added to the culture. After 42 days, the cells were harvested and Kit+CD34" CD11b"CD11c" cells were isolated using an Aria II cell sorter (BD Bioscience Inc.). Purity of these cells was confirmed by Giemsa staining of the cytospins.
SDS-PAGE and Western blot analysis
Cell pellets or frozen colon tissue samples were homogenized in RIPA buffer (50 mM Tris-HCI, pH 7.4, 150 mM NaCI, 1%w/v NP-40, 0.25% Na-deoxycholate, 1mM EDTA, 1 mM PMSF, 1mM Na3V04), supplemented with protease inhibitor cocktail (Roche, Applied Science). Lysates were then centrifuged (5 min, 10,000 x g, 4°C) and supernatants were collected. Protein concentration was measured using the BCA protein assay kit (Thermo Scientific, Pierce). Forty //g of total proteins from colonic tissues and blood fractions (with exception of the eosinophils and basophils fractions, 20 μg per lane) were fractionated on 12% Tris-Glycine gels (BioRad Laboratories, Inc.), transferred onto polyvinylidene difluoride (PVDF) membranes (BioRad Laboratories, Inc.) and incubated with the blocking buffer (2% BSA, Sigma-Aldrich) in TBST (50 mM Tris, 150 mM NaCI, 0.1% Tween 20, pH 7.4), overnight at 4°C. To examine COLCA1 expression, membranes were incubated with the polyclonal rabbit anti-human C11orf92/COLCA1 antibody (Atlas Antibodies, AB) at a 1 :500 dilution in 5% BSA in TBST for 3 hrs at room temperature. Blots were then washed with TBST and incubated for 1 hour at room temperature with horseradish peroxidase- conjugated goat anti-rabbit IgG (Santa Cruz Biotechnology, Inc.) at 1 :7500 dilution in 5% skim milk in TBST. To identify the protein size, the Precision Plus Protein Western C standards (BioRad Laboratories, Inc.) were used. The blots were incubated with StrepTactin-HRP conjugate (BioRad Laboratories, Inc.) at 1 :40000 dilution in the presence of 5% skim milk in TBST. The immunoreactive bands were visualized by Immun-Star HRP chemiluminescence kit (BioRad Laboratories, Inc.) according to manufacturer's instructions. For loading control, polyclonal anti-human beta-actin antibody (Cell Signaling Technology Inc.) and horseradish peroxidase- conjugated goat anti-rabbit IgG (Santa Cruz Biotechnology, Inc.) were used at 1 : 10000 dilutions in 5% skim milk in TBST. Immunohistochemistry on formalin fixed paraffin embedded tissue sections
Paraffin tissue sections (4 μιτι), were deparaffinized in xylene and rehydrated in graded ethanol. Tissue sections were microwaved (micro MED T/T Mega, Milestone Microwave Lab System) for 3 min. in antigen unmasking solution (Vector Laboratories, Inc.) for antigen retrieval and then incubated with 3% H202 for 15 min to quench endogenous peroxidase activity. Nonspecific absorption was minimized by Background Sniper (Biocare Medical) in Tris-buffered saline (TBS) for 10 min. The sections were incubated overnight at 4°C in a solution containing polyclonal rabbit anti-human C11orf92/COLCA1 (Atlas Antibodies, AB) antibody at a dilution 1 :100, and mouse monoclonal anti-human carcinoembryonic antigen (CEA) antibody, (clone COL-1, Biocare Medical) at a dilution 1 :200. The controls for immunohistochemistry included incubation solution alone or purified rabbit IgG (Cell Signaling Technology Inc.). The Vectastain Elite ABC kit, diaminobenzidine tetrahydrochloride (DAB), and Vulcan Fast Red chromogen Kit2 were used to detect immune complexes as described by the suppliers (Vector Laboratories Inc., and Biocare Medical). Sections were counterstained with Meyer's hematoxylin. The images were acquired on an Olympus BX61 microscope fitted with an Olympus DP72 camera using the CellSens Standard proprietary acquisition software (Olympus, Markham, Ontario, Canada).
Immunofluorescence on cryosections
For immunofluorescence analysis and resin embedding, biopsies from colonic tissues (5 mm) were obtained endoscopically. Histopathological examination on hematoxylin and eosin (H & E) stain confirmed the presence of tumor and benign adjacent mucosa. The triple immunofluorescence staining protocol included the following primary antibodies: rabbit polyclonal human anti-C11orf92/COLCA1 (Atlas Antibodies, Sigma), rat monoclonal anti-human CD45 antibody (Santa Cruz Biotechnology, Inc), monoclonal mouse antibody against human basophils 2D7 (BioLegend, Inc.), human eosinophil major basic protein, clone BMK13 (EMD Millipore), human mast cell tryptase, clone G3 (Chemicon, Int.), human CD68 (KP1) (Santa Cruz Biotechnology, Inc.) and human NCAM (2Q692) monoclonal antibody against CD56 positive cells (Santa Cruz Biotechnology, Inc.). As secondary antibodies for immunofluorescence microscopy, goat antisera against rabbit, rat and mouse IgG conjugated to Alexa Fluor 594, 647, and 488 (Invitrogen, Inc.) were used.
The secondary antibodies did not produce nonspecific labeling on colon sections when exposed to PBS only. The specificity of the rabbit polyclonal C11orf92 antibodies has been validated by incubation of the sections with normal rabbit IgG (Millipore), followed by incubation with goat anti-rabbit Alexa 594 conjugated antibody. Cryosections of human colon tissues (4 pm) were fixated in cold acetone for 10 min. and rehydrated in PBS prior to incubation in 10% goat serum blocking solution for 30 min. Sections were then incubated with primary antibodies overnight at 4°C. After three washes in PBS, the sections were incubated with an appropriate Alexa Fluor conjugated secondary antibodies (Invitrogen, Inc.) for 60 min. Triple immunofluorescence labeling was performed simultaneously, with primary antibodies derived from different species to avoid cross-reactivity, followed by incubation with an appropriate fluorochrome conjugated secondary antibody. Slides were mounted in Vectashield mounting medium (Vector Laboratories, Ltd.).
Co-localization of COLCA1 with specific immune cell markers was validated using three-dimensional deconvolution microscopy (Quorum, WaveFX Spinning Disc Confocal Microscope System) with optimized Yokogawa CSU X1 , Hamamatsu EM- CCD digital camera, Leica DMI6000B inverted research grade motorized microscope (Quorum Technologies, Guelph, Canada), and the Volocity 5.2.2 acquisition software (Improvision/PerkinElmer, Massachusetts, USA). Deconvolution of the images was done using Huygens Essential 4.0 deconvolution software (Scientific Volume Imaging, Hilversum, the Netherlands). High-power images of the single cells were taken at sequential 0.1 - ym z-axes. Resultant image stacks were analyzed using a three-dimensional deconvolution algorithm.
RESULTS AND DISCUSSION Here, we report the characterization of two genes at a chromosome 1 1q23 GWAS locus. COLCA1/C1 1orf92 and COLCA2/C1 1orf93 are arranged head-to-head on opposite strands of chromosome 11q23. Gene and protein expression studies reveal their presence in several immune cell types located in the colonic mucosa and lower levels of expression correlating with the risk alleles identified by GWAS studies. The manipulation of COLCA function represents a potential target to prevent colon cancer.
We used microarray-based target selection methods {11), coupled to next generation sequencers (12), to interrogate 2.3 Mb of DNA including exonic, intronic and intergenic intervals at 1 1 CRC loci identified by GWAS (Table 1 ). Sequenced samples include genomic DNA from 40 sporadic CRC cases and 40 matched controls selected from the 2,380 samples from the Ontario Familial Colorectal Cancer Registry (OFCCR) that were previously genotyped by GWAS (13, 14, 15) and 25 probands and 15 affected siblings selected from pedigrees showing autosomal dominant transmission that were selected based on absence of mutations in genes causing familial CRC. We generated 8.6 Gb of usable sequencing data in the target regions, corresponding to an average of 53.6 reads per base of genomic target and 76.4 reads per exonic base (Table 3). We identified 10,577 putative SNPs and 1 ,492 putative insertion/deletions (indels), of which 2,830 SNPs (26.8%) and 945 indels (63.3%) are not present in dbSNP version 132 and the 20101 123 sequence and alignment release of the 1000 Genomes project (16) (Fig. 5 and Table 4). Genotyping of 7,732 putative variants in 1 ,030 cases and 1 ,061 controls from the OFCCR revealed a high overall concordance (98.6%) between sequence-based genotypes and array-based genotypes, although there were many false positives indels (66.9%) and a moderate rate of false-positives for SNPs (16.4%), which generally corresponded to putative variants detected at low coverage in only one sequenced sample and/or misalignment at the codon level (17). We confirmed 30 coding-synonymous SNPs in the original sequenced samples (Table 5), but allele frequencies in an independent set of 2,091 genotyped samples were generally low, and none were associated with CRC after correction for multiple hypothesis testing. Two missense mutations in UTP23, an rRNA-processing protein coding gene, show borderline significance (rs1133950 p = 0.0471 ; rs16888728 p =0.06145) and are in high linkage disequilibrium (r2=.4, D'= 85 for rs1 133950; r2=.42, D -.73 for rs16888728) with GWAS SNP rs3802842 located 148kb centromeric to UTP23. Additional rare missense SNPs showing borderline significance include rs3847262 in TPD52L3 (p=0.04422) and rs73039449 in GPATCH1 (p=0.046).
Fig. 1A is a Manhattan plot representation of the association levels of Cochran- Armitage tests for trends between variants in frequency above 1 % in cases and controls combined and risk of CRC in 1 ,030 cases and 1 ,061 controls. The OFCCR sample was used in the discovery of 5 of these regions: 8q24 (73), 9p24 (78), 11q23 (14), 16q22 ( 15) and 19q13 (75). There are signals of association in 4 of these 5 regions, the exception being the 19q13 region for which the published SNP did not replicate. As a result of these associations, a quantile-quantile (Q-Q) plot illustrating all significance levels clearly shows deviation of the weight of the distribution toward greater levels of associations (Fig.l B). This shift in weight is caused by linkage disequilibrium (LD) with the SNPs identified in previous GWAS. Performing conditional tests of association by conditioning on the presence or the absence of a GWAS risk allele on the same haplotypes as the test alleles shifts the Q-Q plot distribution back toward the null (Fig. 1 C). The apparent inflation that remains in the lower tail of the distribution in Fig. 1C is solely due to the large number of correlated SNPs in our comprehensive map: by only considering SNPs that are modestly correlated with each other (r2<0.5), the lower tail of the distribution falls within 95% confidence bands (Fig. 1 D). We observe four loci where the GWAS SNPs (rs16892766/8q23.3; rs6983267/8q24; rs3802842/1 q23; rs99292 8/ 6q22) reach significance levels p<0.01 in this sample set. At two of these loci (8q23 and 1 1q23), we observe relatively small intervals of 4.9 and 5.8 kb that contain several variants in high linkage disequilibrium (r2>0.8) with the GWAS SNPs. At 8q24 and 16q22, high linkage disequilibrium extends over much larger regions spanning more than 33 and 113 kb respectively. Fig. 1 E (11q23 locus) and Fig. 6 to Fig. 15 (ten other loci) show the location of all SNPs with minimum alleles frequencies above 1%, the significance levels of tests of association and comprehensive LD maps among common variants. Given the existence of strong candidate genes at 8q23 (EIF3H) (19, 20), 8q24 (MYC) (13), and 16q22 (CDH1) (15, 20), the subsequent functional analyses focus on 11q23.
The associated 1 1q23 region was first reported in a Scottish study (14) and subsequently refined using 10,638 cases and 10,457 controls from Europe, North America and Australia (21). The C allele of rs3802842 was shown to predispose to CRC, with OR=1.17 per allele, P = 1.08 x 10"12). The region includes three uncharacterized protein-coding genes (C11orf53, C11orf92, and C1 1orf93). POU2AF1 (also known as BOB1), a nearby gene which is 51 kb distal to rs3802842, was also deemed a possible candidate as it was observed to be differentially expressed in the cells of patients with several forms of lymphoma and leukemia (22- 24). Given the absence of association for the three coding-nonsynonymous SNPs in C11orf53 and C11orf92 that were detected and genotyped in the OFCCR sample (Tables 5 and 6), we hypothesized that the causal mechanism could involve regulation of a nearby gene. We characterized the gene expression levels of the four transcripts in tissues obtained during colon resections. We tested benign adjacent colonic tissue as well as colonic tumors from individuals that are homozygous for the protective (A) or risk (C) alleles of rs3802842, or heterozygous (A/C) (Fig. 2A). Risk alleles correlate with decreased gene expression of C1 1orf92 and C1 1orf93 in benign adjacent colonic tissues as well as in tumors. We also note decreased expression of C11orf53 in the tumor samples from individuals that are associated with the number of risk alleles, but no correlations are observed in the benign adjacent colonic tissue. Furthermore, no association is found between POUF2AF1 expression levels and rs3802842 genotypes. We also examined tissue panels representing the gastrointestinal tract and organs of the immune system. Expression of C1 1orf92 and C1 1orf93 is observed from the esophagus to the rectum (Fig. 16A), multiple immune organs (Fig. 16B), and other tissues such as prostate, testis, and ovary (Fig. 17A). C1 1orf92 and C11orf93 transcripts are also expressed in CRC cell line Caco-2, but not in HCT1 16 (another CRC line) and HeLa (Fig. 17B).
The genomic organization of C11orf92/COLG47 and C1 1orf93/CO.. CA2 genes provides clues to the similarities in their expression levels. They are arranged head- to-head on opposite strands and share common regulatory region (Fig. 2B). To investigate the cis-regulatory potential of the most common protective and risk haplotypes, we cloned three independent triplicate DNA fragments of -4.2 kbp for each allele of rs3802842, as well as 10 additional variants (for 9 SNPs and rs5794738, a 9 bp indel), into luciferase reporter vectors (Fig. 2C). When tested in both orientations, higher luciferase activity is observed for clones expressing the protective haplotype compared to those expressing the risk haplotype (Fig. 2D), which is consistent with gene expression results observed in colon-derived tissues. Using RNA-Seq, RACE and RT-PCR experiments we have further analyzed the region and provided more information on C11orf92/COLG4f and C11orf93/CO--C/A2 genes structure and organization to complement the annotations present in public databases (Fig. 29-31 and Tables 9 and 10). In brief, COLCA1 has multiple alternative 5' non-coding exons, and one constant exon that includes coding sequence for a 124-amino acid protein. COLCA2 has 8 exons, with variable exons 1 to 4 added in various combinations to constant exons 5 to 8 to generate a minimum of five transcripts yielding different protein isoforms ranging from 154 to 379 amino acids in length; additional protein isoforms that are predicted based on Western blots are described later. The revised gene models allow in silico predictions of functional correlates for alleles contained on the protective/risk haplotypes related to protein isoforms, composition, and regulation. One of the most strongly CRC-associated variants at this locus is rs10891246 that is in LD with GWAS SNP rs3802842 (r^O.99) and can affect both candidate genes. For COLCA1, rs10891246 coincides with a splice site resulting in a short and long version of exon 1 , a non-coding exon (Supplementary Methods), which we named C11orf92B-L and C11orf92B-S (Fig. 18A). Interestingly, the long isoform (C11orf92B-L) appears to have decreased expression correlating with the CRC risk haplotype (Fig. 18B and 18C), while the short isoform (C11orf92B-S) is unchanged in benign adjacent tissues and increases in expression in tumor tissues from individuals bearing risk haplotypes (Fig. 18D).
Chromatin features in a human lymphoblastoid cell line (Fig. 19) at the COLCA1 , COLCA2 and C11orf53 loci were obtained from ENCODE (http://www.genome.gov/ENCODE/) (26). The densities for four histone modifications and occupancy of CTCF binding sites generated by ChlP-seq reveal strong signals at the bi-directional promoter of COLCA1 and COLCA2 (Fig. 19). Analysis of variants and putative transcription factor binding sites reveal that some of the variants in high LD with rs3802842 are overlapping: p53/rs6589218, CREB/rs10891245, E2F/rs11213823, FoxB1 rs4520624 and FoxB1/rs5794738 (a 9-bp deletion). We were able to obtain an antibody to COLCA1 , which we validated for specificity to an overexpressed COLCA1-GFP fusion protein (Fig. 20). Western blot analyses of COLCA1 in colorectal tissues show strong expression in benign adjacent tissues obtained at the time of colon cancer or adenoma resections, but weak expression in sigmoid and rectal tumors (Fig. 3A), which is consistent with RNA expression data. When comparing benign adjacent tissues from eight individuals having different genotypes for rs3802842 (Fig. 3B), COLCA1 protein expression is stronger in homozygotes having the protective A allele compared to homozygotes for the C allele, which is also in agreement with RNA expression data. Immunochemistry with anti-COLCA1 antibody of benign adjacent colon tissue and colon tumor from two donors with AA and CC genotypes is shown in Fig. 3C-3F (Fig. 21 shows negative control data). Positive staining is observed in the lamina propria of all biopsies, but not in normal epithelium or epithelium-derived tumor cells. COLCA1 expression can be observed in stromal cells that are mono- and multi-nuclear. At higher magnification, COLCA1 expression is cytoplasmic and often appears to be part of granular structures (Fig. 3G). In addition, cell-free COLCA1 is observed in normal adjacent tissue and in some cases the COLCA1 signal appears to infiltrate spaces between epithelial cells (Fig. 3H). Finally, multiple COLCA1 -expressing cells can be seen to surround tumor cells (Fig. 3I-3J, Fig. 22). To determine the immune cell populations that express COLCA1 at the protein level, we examined COLCA1 protein expression in immune cells derived from peripheral blood, cord blood and colonic tissues using purified cell populations (Fig. 4A-4C) and cell-type specific immunofluorescence techniques (Fig. 4D-4H). In peripheral blood, COLCA1 is expressed strongly in a polymorphonuclear fraction that was further resolved to include eosinophils (strongest signal) and neutrophils, and more weakly in a mononuclear fraction including CD14+ monocytes, but not in lymphocytes (Fig. 4A-4C). Cell lysates obtained from cord blood that had been separated into CD34+ and CD34- fractions showed no or minimal expression of COLCA1 (Fig. 4A and 4C). Cord blood cells cultured in conditions to promote mast cell differentiation (data not shown) were also negative. Cryosections of benign colon tissues adjacent to tumors, and tumor tissues themselves, were interrogated using triple immunofluorescence methods with several antibodies used as immune cell-specific markers. Strong COLCA1 expression is shown in eosinophils (Fig. 4D) and moderate expression is observed in mast cells, neutrophils, macrophages and dendritic cells (Fig. 4E-H, Fig. 23-27). Within all COLCA1 -positive immunofluorescent cells, COLCA1 signal is present in granular structures, consistent with intracellular granules that are characteristic of several immune cell lineages. Double immunohistochemical staining for COLCA1 and tumor specific CEA (carcinoembryonic antigen) marker on paraffin embedded tissues from colorectal cancer patients, disclosed the immediate proximity and specific aggregation of COLCA1 -positive cells and/or extracellular granules around and within the tumor (Fig. 28).
One or more immunoreactive bands ranging from 17 to 47 kDa that potentially represent 8 COLCA2 protein isoforms are observed in different permutations in all samples tested (Fig. 32A), including colonic tissues, peripheral blood and 17 cell lines (data not shown) representing multiple cell types. We observed greater protein expression (immunoreactive band E) in homozygotes having the protective A allele compared to homozygotes for the C allele (Fig. 32B). Several lines of evidence link COLCA1 and COLCA2 with colorectal cancer: 1 ) genetic association at 11q23; 2) correlation between the number of protective alleles with increased RNA and protein expression in multiple cell types present in colon cancer tissues; 3) protein expression in many mucosal immune cells of the colon implicated in tumor immunity; 4) physical proximity between COLCA1 -containing immune cells and/or extracellular granules with colon cancer cells;
The identification of cell types such as eosinophils, mast cells, neutrophils, and macrophages that express COLCA1 provides clues to its function(s). These cell populations reside in the gastrointestinal microenvironment and play important roles in inflammation and immunity (28-30). The significance of eosinophils and mast cells in cancer remain controversial, particularly for mast cells which have been implicated in tumor promotion through angiogenesis, tissue remodeling and regulatory T-cell dysfunction {31-33). Mast cell depletion can lead to remission of polyps in mouse models (34). However, mast cells may also be damaging to tumor cells via cytokines and proteolytic enzymes secretion. In a breast cancer study of 4,444 cases, stromal mast cells correlate with favorable prognosis (35).
The literature provides a stronger case for eosinophils having tumoricidal functions. Abundance of eosinophils in gastrointestinal cancers is a favorable prognostic factor (36). Eosinophils may induce apoptosis and directly kill tumor cells, via the release of eosinophilic cationic protein, eosinophil-derived neurotoxin, TNF-σ and granzyme A (37). Eosinophil products can degrade necrotic materials from tumor and other stressed cells through production of reactive oxygen species (38). More recently, resident immune cells such as eosinophils have been recognized as regulators of tissue homeostasis in peripheral tissues with high turnover and active stem cell populations such as the gastrointestinal tract and the endometrium; this function may be as important as the more recognized role of eosinophils as end-stage effector cells (39).
Eosinophils, which are the highest expressers of COLCA1 , contain granular structures that are known to harbor pre-formed proteins that can be secreted by exocytosis, piecemeal degranulation or as extracellular vesicles that are typically in the size range of 150-300 nm (40). The latter structures have only been characterized recently as receptor-mediated secretory organelles that respond to IFN- and eotaxin to elicit secretion of their content. Extra-cellular COLCA1 staining in colon tissues has a similar pattern that has been described for extracellular eosinophil-derived granules (41).
The lower expression levels of COLCA1 and COLCA2 in individuals at higher risk for colon cancer and the presence of COLCA1 (and potentially COLCA2 as a result of its co-regulation with COLCA2) point to potential anti-tumoral properties. These could be through intrinsic cytocidal activities as secreted proteins, immunomodulatory functions, or biochemical interactions with other molecules that are co-secreted by immune cells or released by tumors. Collectively, the polymorphic regulation of COLCA1 and COLCA2 potentially represents the first inherited mechanistic link in humans between microenvironmental factors and cancer predisposition. Although preferred embodiments of the invention have been described herein, it will be understood by those skilled in the art that variations may be made thereto without departing from the spirit of the invention or the scope of the appended claims. All documents disclosed herein, including those in the following reference list, are incorporated by reference. Appendix 1
COLCA 1
CDNA :
ATGGAGTCTTGCTCAGTCGCCCAGGCTGGAGTGCTAACCTC
TCCTTTCATGTGGAGATGGACAGGGATGGCAGGAGCACTGAGTGCTCTTG
ACAACACCATTGAAGATGATGCTGACGATCAGCTACCCTGTGGAGAAGGC
AGGCCAGGCTGGGTGAGAGGGGAGCTCCTTGGAAGTCAGGGGGTCTGTAA
GGACAGCAAGGATCTCTTTGTCCCAACCTCCAGCAGCCTTTATGGGTGCT
TCTGTGTTGGCCTTGTTTCTGGGATGGCCATCTCAGTGCTGTTGCTGGCT
AGCGATTTCAGAAAACTAGATTTTTCTAGGCCTGAGCCCTGTTTTGAGAA
AGAAGCTTCCCTCTGGTTTGTAGCTCAACATTAA (SEQ ID NO. 2)
MESCSVAQAGVLTSPFMWRWTGMAGALSALDNTIEDDADDQLPCGEGRPG WVRGELLGSQGVCKDSKDLFVPTSSSLYGCFCVGLVSGMAISVLLLASDF RKLDFSRPEPCFE EASLWFVAQH (SEQ ID NO. 1)
Protein size (aa) : 124
Appendix 2
Computed theoretical pI/Mw of COLCA2 isoforms:
Isoform 1 :
cDNA :
cgcacttcccggcgcgattcctggacgcacactgcagggtagtggtcagggccggagccgcc ccggactcctgcggccacctccaccttgaaagctcgcccggctcggcttggcctaacagcca gtccccaacctactcactgcggcttttcggggcccggggctgaggctgcccctgctcccagc gctcaggggtccctcccagcgaacccctcttgagcgcgcccgccacctcccgcgagccgggg cagtccgaccgggccccgccagccccggggcggcccctaccgaagggcggcctcccggcggaj gtgcagggtcatcgggagccccggacggctttccccagctcgcgctgcgcggactcgcccc agcctcagtgcgcctcggaggaaaaaaccggccccttggccccccccaaggaaaaaaggggggaaggggggggii
jgaget9_¾§ΐ§¾ gcaaagacg i'.^CsQ^^g t ::t : ci.-gc ggactgtattttgagcctgaaccaatttcttccacgcccaattatttgcaac ggggagaattttccagttgtgtttcatgtgaagaaaactcaagctgcctcgaccagatcttt gattcctaccttcagacagagATGcacccggagcctttgctcaattccacacaaagtgctcc acaccatttcccagacagcttccaggccacccctttctgctttaaccagagcctgatcccag gatcaccttcaaattcctccattctctctggctccttagactacagttactcgccagtgcag ctgccttcatatgctccagagaattacaattcccctgcttctctggacaccagaacctgtgg ctaccccccagaagaccattcctaccaacacttgtcetcacacgcccagta
cctcggccaccacctccatctgctactgcgcatcgtgtgaggcagaggacttggatgctctc caggcagcagagtacttctacccgagcacagactgtgtggactttgccccctcagcagccgc caccagtgatttctataagagggaaacaaactgtgacatctgctatagtTAAtagaaattac agtaattcagaacatggcatgggtatatctatttttctaccacgtctagatgacactgcaaa atatgcaacttggtaacacaatatcccaagcacagtttacatgtcactatttccaattttct gatgctaagcattcatatgaagtcctcagacccggtcacagcgccactcctactttgt (SEQ
ID NO. 4)
10 20 30 40 50 60
MHPEPLLNST QSAPHHFPDS FQATPFCFNQ SLIPGSPSNS SILSGSLDYS YSPVQLPSYA
70 80 90 100 110 120
PENYNSPASL DTRTCGYPPE DHSYQHLSSH AQYSCFSSAT TSICYCASCE AEDLDALQAA
130 140 150
EYFYPSTDCV DFAPSAAATS DFYKRETNCD ICYS- (SEQ ID NO. 3)
Protein size (aa) : 154
Theoretical pI/Mw: 4.27 / 16850.25 Isoform 2 :
cDNA :
Figure imgf000048_0001
10 20 30 40 50 60
MSEKPKVYQG VRVKITVKEL LQQRRAHQAA SGGTRSGGSS VHLSDPVAPS SAGLYFEPEP
70 80 90 100 110 120
ISSTPNYLQR GEFSSCVSCE ENSSCLDQIF DSYLQTEMHP EPLLNSTQSA PHHFPDSFQA
130 140 150 160 170 180
TPFCFNQSLI PGSPSNSSIL SGSLDYSYSP VQLPSYAPEN YNSPASLDTR TCGYPPEDHS
190 200 210 220 230 240
YQHLSSHAQY SCFSSATTSI CYCASCEAED LDALQAAEYF YPSTDCVDFA PSAAATSDFY
250
KRETNCDICY S- (SEQ ID NO. 5)
Protein size (aa) : 251
Theoretical pI/Mw: 4.66 / 27416.97
Isoform 3 :
cDNA:
Figure imgf000049_0001
(SEQ ID NO. 8)
10 20 30 40 50 60
MSDGARKRRR KGTRESGQKA KTEGSEGGSS SGLAPLLEKP KVYQGVRVKI TVKELLQQRR
70 80 90 100 110 120
AHQAASGGTR SGGSSVHLSD PVAPSSAGLY FEPEPISSTP NYLQRGEFSS CVSCEENSSC
130 140 150 160 170 180
LDQIFDSYLQ TEMHPEPLLN STQSAPHHFP DSFQATPFCF NQSLIPGSPS NSSILSGSLD
190 200 210 220 230 240
YSYSPVQLPS YAPENYNSPA SLDTRTCGYP PEDHSYQHLS SHAQYSCFSS ATTSICYCAS
250 260 270 280
CEAEDLDALQ AAEYFYPSTD CVDFAPSAAA TSDFYKRETN CDICYS- (SEQ ID NO. 7)
Protein size (aa) : 286
Theoretical pI/Mw: 5.15 / 31027.97 Isoform 4 :
cDNA :
ccaccgacccagctcctgctgaccacgATGtcggatggagcgcggaaaaggcggagaaaagg gacgagggagagcgggcagaaggcaaagacagaagggagcgagggagggagttcctcgggcc tggcccctttactaggtcagtctgcaggtacctcgccggcccaggacggggctggccaaacc tcaccgcttgctcccgggctggcttccagaccaagggcacgcagaggtcggagcctgcccag aagccacacctggccagaaaaaccgaaggtgtatqaaggt^
aggagctgctgcagcaaagftfegggcacaccaggcggcptccgggggaacc
- · . · - 1 - gactgtattttgagcctga accaatttcttccacgcccaattatttgcaacggggagaattttccagttgtgtttcatgtg aagaaaactcaagctgcctcgaccagatctttgattcctaccttcagacagagatgcacccg gagcctttgctcaattccacacaaagtgctccacaccatttcccagacagcttccaggccac ccctttctgctttaaccagagcctgatcccaggatcaccttcaaattcctccattctctctg gctccttagactacagttactcgccagtgcagctgccttcatatgctccagagaattacaat tcccctgcttctctggacaccagaacctgtggctaccccccagaagaccattcctaccaaca cttgtcctcacacgcccagtacagctgcttctcctcggccaccacctccatctgctactgcg catcgfgtg¾ggcagaggac tggatgc]t
gactgtgtggactttgccccctcagcagccgccaccagtgatttctataagagggaaacaaa ctgtgacatctgctatagtTAAtagaaattacagtaattcagaacatggcatgggtatatct atttttctaccacgtctagatgacactgcaaaatatgcaacttggtaacacaatatcccaag cacagtttacatgtcactatttccaattttctgatgctaagcattcat¾t
cccggtcacagcgccactcctactttgt
(SEQ ID NO. 10)
10 20 30 40 50 60 SDGA KRRR KGTRESGQKA KTEGSEGGSS SGLAPLLGQY AGTSPAQDGA GQTSPLAPGL
70 80 90 100 110 120
ASRPRARRGR SLPRSHTWPE KPKVYQGVRV KITVKELLQQ RRAHQAASGG TRSGGSSVHL
130 140 150 160 170 180
SDPVAPSSAG LYFEPEPISS TPNYLQRGEF SSCVSCEENS SCLDQIFDSY LQTEMHPEPL
190 200 210_ 220 230^ 240
LNSTQSAPHH FPDSFQATPF CFNQSLIPGS PSNSSILSGS LDYSYSPVQL PSYAPENYNS
250 260 270 280 290 300
PASLDTRTCG YPPEDHSYQH LSSHAQYSCF SSATTSICYC ASCEAEDLDA LQAAEYFYPS
310 320
TDCVDFAPSA AATSDFYKRE TNCDI
Protein size (aa) : 328
Theoretical pI/Mw: 5.85 / 35380.77 Isoform 5:
cDNA :
gttttaaatacttggcatgagggggaaaaaactcaaacctatacccgaaggtataaaacatt aattctgtttgtctcctgttccagggacctgggtgcagccggatgctttgtgaaattttaaa aatgttgcatctctgcttctctccctgccccaggagccctcccaggcgctgagtccctgctg ctgcagcatccagccaccgacccagctcctgctgaccacgATGtcggaacagagaggcacaa agtcaaggcatgcgagtaagggccggggatccggagaggacgcgagccacgctgtggccgct cccactcagcgcctgtcccggagggcagaggcccagaggccgagttggccgctccggacgtc cgcgtaccccaggcccccgcccgccctcccacgtatccactgttgtgtcctgagacacgcgg tgcggaccctgcgcgcccagaagctgctacggggggagctagaggctcacagtgagcctggg acgggggcagagcgcaccgactgcactgagccgagctgtggcggtccctgcgcggaccccga ggggcgcccgctgcccgcggagtccggagccgatcggggaacggggtggtggggctgcgaaa aaccgaaggtgtatcaaggtgtccgagtgaagatcacagtgaaggagctgctgcagcaaaga cgggcacaccaggcggcctccgggggaacc rr: 1;3:;
· gactgtattttgagcctgaaccaatttcttccacgccca attatttgcaacggggagaattttccagttgtgtttcatgtgaagaaaactcaagctgcctc gaccagatctttgattcctaccttcagacagagatgcacccggagcctttgctcaattccac acaaagtgctccacaccatttcccagacagcttccaggccacccctttctgctttaaccaga gcctgatcccaggatcaccttcaaattcctccattctctctggctccttagactacagttac tcgccagtgcagctgccttca atgctccagagaattacaattcccctgcttctctggacac cagaacctgtggctaccccccagaagaccattcctaccaacacttgtcctcacacgcccagt acagctgcttctcctcggccaccacctccatctgctactgcgcatcgtgtgaggcagaggac ttggatgctctccaggcagcagagtacttctacccgagcacagac^
ctcagcagpegccaccagtgattt^
AAtagaaat acagtaattcagaacatggcatgggtatatc atttttctaccacgtctaga tgacactgcaaaatatgcaacttggtaacacaatatcccaagcacagtttacatgtcactat ttccaattttctgatgctaagcatt^
tactttgt (SEQ ID NO. 12)
10 20 30 40 50 60
KSEQRGTKSR HASKGRGSGE DASHAVAAPT QRLSRRAEAQ RPSWPLRTSA YPRPPPALPR
70 80 90 100 110 120
IHCCVLRHAV RTLRAQKLLR GELEAHSEPG TGAERTDCTE PSCGGPCADP EGRPLPAESG
130 140 150 160 170 180
ADRGTGWWGC EKPKVYQGVR VKITVKELLQ QRRAHQAASG GTRSGGSSVH LSDPVAPSSA
190 200 210 220 230 240
GLYFEPEPIS STPNYLQRGE FSSCVSCEEN SSCLDQIFDS YLQTE HPEP LLNSTQSAPH
250 260 270 280 290 300
HFPDSFQATP FCFNQSLIPG SPSNSSILSG SLDYSYSPVQ LPSYAPENYN SPASLDTRTC
310 320 330 340 350 360
GYPPEDHSYQ HLSSHAQYSC FSSATTSICY CASCEAEDLD ALQAAEYFYP STDCVDFAPS
370
AAATSDFYKR ETNCDICYS - (SEQ ID NO. ■ ID
Protein size (aa) : 379
Theoretical pI/Mw: 5.72 / 41151.31 Table 1
Figure imgf000052_0001
Legend to Table 1 . Characteristics of the 1 1 GWAS regions that were selected for sequence capture.
ID TaqMan Probe Forward Primer Reverse Primer
CRC*
Specific
1 TGGGTGTGTTCTTCTATT GCCTTTGATCCCGGTTTTC TTCTAGACCAGGAGATTATGGACGTA
2 AGTACAATGTGTTGGAAAC CGTCACCCTCCATGTCTCCTA GCATTGGTGGGTGCAAACT
3 AGTAAATCAACTAACTGGAGGAG AAGGGCATGGCCAATAGGA TGTACAGAGCTTGAATTGGGTAGAA
4 ACCCAAAGACAATGGG CCAGGCTAAAGAATCTGGACTCA CGTGTTCAACAATCTTCGATGGT
5 ACAGTGAGTACGTGAATC GAAATGGCCTTGGTGTGCTT TCCCCCCACTCCCAACTT
Negative
Control
1 CCACCCACATACCGCC GGCTCAGCCTCTTGAAGGG TGGGTGATGTGCCGTGAAG
2 CCATCACTCCAGCTTTT AGCACACTCTGACATCAGTCTCT CGCTTGTCTTCTAATGTCTGTGATAGT
Legend to Table 2: TaqMan probes and primers for validation of SC target enrichment.
Table 3
Figure imgf000054_0001
Legend to Table 3. Coverage and percent of bases covered over all target regions, and in exons of genes from the target regions. Ranges are taken over 1 19 samples. Among the bases in the target regions, 1 ,721 ,644 are callable (18,662 in exons) and 571 ,216 are other bases (993 in exons).
Table 4
Figure imgf000055_0001
Legend to Table 4: Classification of 10,577 SNPs into known (bold) or novel (regular font) classes, functional categories and frequency classes for the alternative alleles.
Table 5
Amino Genotype Genotype
Refseq Reference Alternative Minor acid Probands Counts Counts
Gene Position (hgl8) dbSNP132 Allele Allele Allele change Cases Controls and Kins Shared Cases Controls PvTrend
UTP23 chr8-l! 7853095 rsl 133950 A C C |Q 5 6 2 0 5/103/922 1/86/974 0.0471
UTP23 chr8-] 17853099 G A A PJH 0 1 0 0 0/0/1030 0/0/1061 NA
UTP23 chr8-l 178531 b rsl6888728 C T T P|L 9 8 5 1 18/198/814 10/1 3/868 0.06145
TPD52L3 chr9-6318947 rs3847262 T c T F|L 40 41 40 14 3/111/902 3/86/961 0.04422
TPD52L3 chr9-6318996 rsl 17022582 G A A GjK 0 1 1 0 0/18/1011 0/21/1040 0.7293
CI lorf)2 chrl 1-110672350 C T T G|R 0 1 0 0 0/1/1029 0/0/1061 0.4965
Cllorf92 chrl 1-110672395 rs61753533 C T T A|T 1 0 3 0 0/29/945 0/36/959 0.4106
BMP4 chrl4-53487197 c T T RjH 0 0 1 0 0/0/1030 0/0/1061 NA
BMP4 chr!4-53487272 rsl7563 A G A V|A 34 33 32 9 194/534/302 210/529/322 0.9814
BMP4 chrl4-53488567 C G G A|P 0 1 0 0 0/2/1028 0/1/1060 0.6064
SCG5 chrl 5-30771245 C T T R|* 0 1 0 0 0/0/1030 0/0/1061 NA
GREM1 chrl 5-30810286 rsl 11262341 C G G P|R 0 1 0 0 0/5/1025 0/12/1049 0.1568
ZFP90 chrl6-67154531 rs61746929 C T T H]Y 2 2 2 1 0/31/999 0/32/1029 1
ZFP90 chrl6-67154720 C T T PS 0 1 0 0 0/0/1030 0/3/1058 0.2388
ZFP90 chrl 6-67154831 G A A D|N 0 0 1 0 0/5/1025 0/4/1057 0.7602
ZFP90 chrl6-67155142 A c C |N 1 0 0 0 0/0/1030 0/2/1059 0.5045
ZFP90 chrl 6-67155498 C G G S|C 0 1 0 0 0/0/1030 0Ό/1061 NA
CDH3 chrl 6-67269951 c T T T[M 1 0 0 0 0/0/1030 0/0/1061 NA
CDH3 chr!6-67276089 rs34394404 G A A V|l 2 3 0 0 0/35/995 0/37/1024 1
CDH3 chrl6-67276614 rs34494880 G A A RH 7 3 3 0 5/1 4/891 10/140/911 0.4995
CDH3 chrl 6-67279026 G A A V|M 0 1 0 0/6/1024 0/2/1059 0.1728
CDH3 clir!6-67279034 rsl 126933 G C C Q:H 22 24 27 8 179/518/331 173/533/355 0.4307
GPATCH1 chrl9- 38280610 A G G Y|C 0 1 0 0 0/3/1027 0/1/1060 0.3686
GPATCH1 chrl9-38292604 rs2287679 T C C L|P 17 19 21 6 65/368/597 70/394/597 0.4621
GPATCH1 chrl9-38297140 rsl 0416265 A G G H R 18 20 21 6 66/381/583 74/398/589 0.5414
GPATCH1 chrl9-38297152 rsl 0421769 T C C L|S 22 24 25 8 129/466/435 131/484/446 0.9947
GPATCH1 chrl9-38307917 rs73039449 C T T A|V 0 0 1 0 0/29/1001 0/16/1045 0.0458
WDR88 chrl9-38320429 T c C F L 1 0 0 0 0/2/1028 0/1/1060 06264
WDR88 chrl 9-38330426 rs74994260 c G G AG 7 4 6 1 6/108/916 3/123/935 0.7304
WDR88 chrl 9-3S339219 rsl 1881580 T C ( ' C,R 6 7 6 1 9/152/869 6/1 60/894 0.87
Legend to Table 5: Results for successfully genotyped coding-nonsynonymous SNPs. The table includes the number of carriers among the sequenced samples (40 Cases, 40 Controls, 25 Probands and 15 Kins), and the number of Probands and Kins who share the alternative allele, based on genotype data (Shared). Genotype counts are in the format AA/AB/BB, where A is the minor allele and B the major allele; genotypes counts only include self-declared "white" samples, and exclude the sequenced samples. Cochran-Armitage test for trends significance levels are included (PvTrend).
Table 6
Amino Genotype Genotype
Refseq Reference Alternative Minor acid Counts Counts
Gene Position (hg!8) dhSNP132 Allele Allele Allele change Sporadic Controls Familial Shared Cases Controls
Cllorf92 chrll-110671999 A A 0/1/182 0/0/164
Cllorf92 chrl 1-110672067 Deletion C Deletion C FS 0/1/183 0/0/184
CI lorf92 chrll-110672081 C C C|C 0/0/173 0/1/163
Cllorf92 chrl 1-110672131 rs60504131 C C L|L 0/0/173 0/1/163
Cllorf92 chrll-110672213 rs61764070 A A K|K 0/4/1151 0/3/1112
Cllorf92 chrl 1-110672230 T T V|F 0/1/178 0/0/176
Cllorf92 chrl 1-110672235 Insertion G Insertion G FS 0/0/184 0/1/183
Legend to Table 6: Additional SNPs discovered from sequencing the coding exon of C11orf92. The table includes the number of carriers among the sequenced samples (40 Cases, 40 Controls, 25 Probands and 15 Kins), and the number of Probands and Kins who share the alternative allele, based on genotype data (Shared). Genotype counts are in the format AA/AB/BB, where A is the minor allele and B the major allele. For each variant, genotypes were called from chromatograms only in the 384-well plates in which an alternative allele was detected.
Table 7
SybrGreen qPCR
Genes Forward Primer Reverse Primer PCR Product, bp
Cl lorf53 5 '- TGGAGCCCTACTTCCCCCAGGA 5 '- GCTGGGAGAGGCAACTCGTGCT 249
POU2AF1 5 '- CTGGCGACCTACACCACAG 5 '- TCATGGGGCACATACTCGGT 167
Cl lorf93 5 '- TTCTCTCTGGCTCCTTAGACTACAGTT 5 '- GGGCGTGTGAGGACAAGTGT 150
Cl lorf92 5'- TGCTGGGGTGTCCTCCCAGTG 5 '- CCCAGCCTGGCCTGCCTTCT 236
GAPDH 5 '- TACTAGCGGTTTTACGGGCG 5'- TCGAACAGGAGGAGCAGAGAGCGA 166 β-actin 5'- CATGTACGTTGCTATCCAGGC 5 '- TCTCCTTAATGTCACGCACGAT 250
TaqMan qPCR
Genes Forward Primer Reverse Primer Probe PCR Product, bp
Cllorf92Tag F-TTCTGTGTTGGCCTTGTTTCTG R-TCTGAAATCGCTAGCCAGCAA P-ATGGCCATCTCAGTGCT 63
Cllorf93Tag F-TGACATGTAAACTGTGCTTGGGATAR-ACCACGTCTAGATGACACTGCAA P-TGTGTTACCAAGTTGCATAT 70
92 B-S Taq F-TTCGCCCCCGCTCCTA R-AAACGCCTGCCCCAGAA P- TCTCCCCTTCTTCCCCT 70
92_B-L_Tag F-TCGCTCCCTTCTGTCTTTGC R- GCGGAAAAGGCGGAGAAA P- TCTGCCCGCTCTC 64
GAPDH(FAM) Assay ID Hs99999905_m l
Legend to Table 7: Primers and Probes Sequences for SybrGreen and TaqMan qPCR Assays.
Table 8
MTP RT-PCR
Genes Forward Primer Reverse Primer PCR Product bp
C 1 1 orf53 5 ' - GCTGGGGCTCAGTC ATACTC 5 '- GGGCGGTAGGTC ATTTG AC A 120
C 1 1 orf93 5 ' - TTCTCTCTGGCTCCTTAG ACT AC AGTT 5 GGGCGTGTGAGG AC A AGTGT 150
Cl lorf92 5'- TGCTGGGGTGTCCTCCCAGTG 5'-CCCAGCCTGGCCTGCCTTCT 236 β-actm 5 '- CATGTACGTTGCTATCCAGGC 5 '- TCTCCTTAATGTCACGC ACGAT 250
RACE
Genes
F-GSP-3 '-Cllorfl2 5 ' - TGCCCCTGGCCCGCCTGACCCGA
F-GSP-3 '-(nested)-Cllorfl2 5 '- AGGCAGGCCAGGCTGGGTGAGAGGGGAGC
R-GSP-5'-Cllorf92 5 '- TGGGTACCTGGGGTCTCAGGGTTGCTCTGGGCCT
R-GSP-S >-(nested)-Cllorfl2 5 ' - CCCCAGGAGCCCTCCCAGGCGCTGA
R-GSP-5 '-Cll or/93 5 ' -CGCCTGGTGTGCCCGTCTTTGC
F-GSP-3 '-Cllorfl3 5 ' -CCGAAGGTGTATC AAGGTGTCCGAGTGA
Legend to Table 8: Primers Sequences for RT-PCR MTP and 573'-RACE Experiments.
Table 9. Gene structure of COLCA1 and characterization of its splice transcripts.
Figure imgf000061_0001
Structure (10+2 (1bS)+2 (1bl_)+2 (1d)+2 (1a)+2 mRNA(bp) 5537 5499 5581 5366 5499 Coding(bp) 375 375 375 375 375 Protein(aa) 124 124 124 124 124 Protein (kDa) 13.4 13.4 13.4 13.4 13.4
Exons 1f (110,675,188: 1bS (110,675,749: 1bl_ (110,675,831: 1d (110,676,167: 1a (110,680,983: boundaries, 110,674,929) 110,675,528) 110,675,528) 110,676,079) 110,680,762) (NCBI36/hg18) 2 (110,674,601: 2 (110,674,601: 2 (110,675,601: 2 (110,674,601: 2 (110,674,601:
110,669,323) 110,669,323) 110,669,323) 110,669,323) 110,669,323)
Table 10. Gene structure of COLCA2 and characterization of its splice isoforms.
Figure imgf000062_0001
Structure (1+2)+5+6+7+8 (1+4L)+5+6+7+8 (1+4S)+5+6+7+8 (1)+5+6+7+8 (3)+5+6+7+8 mRNA(bp) 1558 1205 1079 974 1414 Coding(bp) 1140 987 861 756 465 Protein(aa) 379 328 286 251 154 Protein (kDa) 41.1 35.4 31.0 27.4 16.9
Exons 1 (110,674,291: 1 (110,674,291: 1 (1 0,674,291: 1 (110,674,291: 3 (110,675,241: boundaries, 10,674,523) 110,674,523) 110,674,523) 110,674,523) 110,675,659) (NCBI36/hg18 2 (110,674,610: 41.(110,675,721: 4S (110,675,721: 5 (110,676,461: 5(110,676,461:
110,674,993) 110,675,952) 110,675,825) 110,676,556) 110,676,556) ) 5(110,676,461: 5 ( 10,676,461: 5(110,676,461: 6 (110,680,862: 6 (110,680,862:
110,676,556) 110,676,556) 110,676,556) 110,680,917) 110,680,917) 6(110,680,862: 6 (110,680,862: 6 (110,680,862: 7 (110,682,328: 7 (110,682,328:
110,680,917) 110,680,917) 110,680,917) 110,682,558) 110,682,558) 7 (110,682,328: 7 (110,682,328: 7 (110,682,328: 8 (110,684,004: 8 (110,684,004:
110,682,558) 110,682,558) 110,682,558) 110,684,564) 110,684,564) 8(110,684,004: 8 (110,684,004: 8 (110,684,004:
110,684,564 110,684,564) 110,684,564)
Table 11. Primers sequences for amplification of the COLCA2 transcripts by RT-PCR experiments.
Figure imgf000063_0001
Reference List
1. C. C. Chung, S. J. Chanock, Current status of genome-wide association studies in cancer. Hum. Genet. 130, 59-78 (2011 ).
2. P. Tomlinson, L. G. Carvajal-Carmona, S. E. Dobbins, A. Tenesa, A. M.
Jones AM, ef a/, Multiple common susceptibility variants near BMP pathway loci GREM1 , BMP4, and BMP2 explain part of the missing heritability of colorectal cancer. PLoS Genet. 7, (2011 ). e1002105. Epub 2011
3. R. Cui, Y. Okada, S. G. Jang, J. L. Ku, J. G. Park, et al, Common variant in 6q26-q27 is associated with distal colon cancer in an Asian population. Gut. 60, 799-805 (2011).
4. U. Peters, C. M. Hutter, L. Hsu, F. R. Schumacher, D. V. Conti, et al, Metaanalysis of new genome-wide association studies of colorectal cancer risk. Hum. Genet. (2011). Jul 15. [Epub ahead of print]
5. P. Broderick, L. Carvajal-Carmona, A. M. Pittman, E. Webb, K. Howarth, et al, A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat. Genet. 39, 1315-7 (2007).
6. M. M. Pomerantz, N. Ahmadiyeh, L. Jia, P. Herman, M. P. Verzi, et al, The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat. Genet. 41 , 882-4 (2009).
7. S. Tuupanen, M. Turunen, R. Lehtonen, O. Hallikas, S. Vanharanta, et al. The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nat. Genet. 41 , 885-90 (2009).
8. M. Pittman, P. Twiss, P. Broderick, S. Lubbe, I. Chandler I, et al, The CDH1- 160C>A polymorphism is a risk factor for colorectal cancer. Int. J. Cancer. 125, 1622-5 (2009).
9. M. Pittman, S. Naranjo, S. E. Jalava, P. Twiss, Y. Ma, ef al, Allelic variation at the 8q23.3 colorectal cancer risk locus functions as a cis-acting regulator of EIF3H. PLoS Genet. 6, (2010). pii: e1001126. N. Bellam, B. Pasche, Tgf-beta signaling alterations and colon cancer. Cancer Treat. Res. 155, 85-103 (2010). T. J. Albert, M. N. Molla, D. M. Muzny, L. Nazareth, D. Wheeler, et al, Direct selection of human genomic loci by microarray hybridization. Nat. Methods. 4, 903-5 (2007). K. M. Wong, T. J. Hudson, J. D. McPherson, Unraveling the genetics of cancer: genome sequencing and beyond. Annu. Rev. Genomics Hum. Genet. 12, 407-30 (201 1). W. Zanke, C. M. Greenwood, J. Rangrej, R. Kustra, A. Tenesa, et al, Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat. Genet. 39, 989-94 (2007). Tenesa, S. M. Farrington, J. G. Prendergast, M. E. Porteous, M. Walker, et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on 1 1 q23 and replicates risk loci at 8q24 and 18q21. Nat. Genet. 40, 631-7 (2008). R. S. Houlston, E. Webb, P. Broderick, A. M. Pittman, M. C. Di Bernardo, et al, Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nat. Genet. 40, 1426-35 (2008). 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 467, 1061-73 (2010). P. Markova-Raina, D. Petrov. High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes. Genome Res. 21 , 863-74 (201 1 ). J. N. Poynter, J. C. Figueiredo, D. V. Conti, K. Kennedy, S. Gallinger, et al, Variants on 9p24 and 8q24 are associated with risk of colorectal cancer: results from the Colon Cancer Family Registry. Cancer Res. 67, 1 1 128-32 (2007). P. Tomlinson, E. Webb, L. Carvajal-Carmona, P. Broderick, K. Howarth, ef al, A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3. Nat. Genet. 40, 623-30 (2008). M. Pittman, E. Webb, L. Carvajal-Carmona, K. Howarth, M. C. Di Bernardo, Refinement of the basis and impact of common 11q23.1 variation to the risk of developing colorectal cancer. Hum. Mol. Genet. 17, 3720-7 (2008). L. G. Carvajal-Carmona, J. B. Cazier, A. M. Jones, K. Howarth, P. Broderick, et al, Fine-mapping of colorectal cancer susceptibility loci at 8q23.3, 16q22.1 and 19q 13.11 : refinement of association signals and use of in silico analysis to suggest functional variation and unexpected candidate target genes. Hum. Mol. Genet. 20, 2879-88 (2011). R. L. Auer, J. Starczynski, S. McElwaine, F. Bertoni, A. C. Newland, et al, Identification of a potential role for POU2AF1 and BTG4 in the deletion of 11q23 in chronic lymphocytic leukemia. Genes Chromosomes Cancer. 43, 1- 10 (2005). R. Herbeck, D. Teodorescu Brinzeu, M. Giubelan, E. Lazar, A. Dema, et al, B-cell transcription factors Pax-5, Oct-2, BOB.1 , Bcl-6, and MUM1 are useful markers for the diagnosis of nodular lymphocyte predominant Hodgkin lymphoma. Rom. J. Morphol. Embryo!. 52, 69-74 (2011). S. Advani, K. Lim, S. Gibson, M. Shadman, T. Jin, E. Copelan, M. Kalaycio, ef al, OCT-2 expression and OCT-2/BOB.1 co-expression predict prognosis in patients with newly diagnosed acute myeloid leukemia. Leuk. Lymphoma. 51 , 606-12 (2010). W. Wei, V. Pelechano, A. I . Jarvelin, L. M. Steinmetz. Functional consequences of bidirectional promoters. Trends Genet. 27, 267-76 (2011). ENCODE Project Consortium, R. M. Myers, J. Stamatoyannopoulos, M. Snyder, I. Dunham, A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, (2011). Vesely MD, Kershaw MH, Schreiber RD, Smyth MJ. Natural innate and adaptive immunity to cancer. Annu Rev Immunol. 29, 235-71 (2011). S. A. Woodruff, J. C. Masterson, S. Fillon, Z. D. Robinson, G. T. Furuta, Role of eosinophils in inflammatory bowel and gastrointestinal diseases. J. Pediatr. Gastroenterol. Nutr. 52, 650-61 (2011). K. Khazaie, N. R. Blatner, M. W. Khan, F. Gpimaro. E. Gounaris, ef al, The significant role of mast cells in cancer. Cancer Metastasis Rev. 30, 45-60 (2011). D. Gregory, A. M. Houghton, Tumor-associated neutrophils: new targets for cancer. Therapy Cancer Res. 71, 2411-16 (2011). V. G. Peddareddigari, D. Wang, R. N. Dubois, The tumor microenvironment in colorectal carcinogenesis. Cancer Microenviron. 3, 149-66 (2010). S. Maltby, K. Khazaie, K. M. McNagny, Mast cells in tumor growth: angiogenesis, tissue remodelling and immune-modulation. Biochim. Biophys Acta. 1796, 19-26, (2009). N. R. Blatner, A. Bonertz, P. Beckhove, E. C. Cheon, S. B. Krantz, et al, In colorectal cancer mast cells contribute to systemic regulatory T-cell dysfunction. Proc. Natl. Acad. Sci. U S A. 107, 6430-5 (2010). E. Gounaris, S. E. Erdman, C. Restaino, Gurish MF, Friend DS, ei al, Mast cells are an essential hematopoietic component for polyp development. Proc. Natl. Acad. Sci. U. S. A. 104, 19977-82 (2007). B. Rajput, D. A. Turbin, M. C. U. Cheang, D. K. Voduc, S. Leung, ei al, Stromal mast cells in invasive breast cancer are a marker of favourable prognosis: a study of 4,444 cases. Breast Cancer Res. Treat. 107, 249-57 (2008). R. Lotfi, J. Lee, M. T. Lotze, Eosinophilic granulocytes and damage- associated molecular pattern molecules (DAMPs): role in the inflammatory response within tumors. Journal of Immunother. 30, 16-28 (2007). F. Legrand, V. Driss, M. Delbeke, S. Loiseau, E. Hermann, ei al, Human eosinophils exert TNF-σ and granzyme A-mediated tumoricidal activity toward colon carcinoma cells. J. Immunol. 185, 7443-51 (2010). 38. R. Lotfi, G. I. Herzog, R. A. DeMarco, D. Beer-Stolz, J. J. Lee, et al, Eosinophils oxidize damage-associated molecular pattern molecules derived from stressed cells. J. Immunol. 183, 5023-31 (2009).
39. J. J. Lee, E. A. Jacobsen, M . McGarry, R.P. Schleimer, N.A. Lee, Eosinophils in health and disease: the LIAR hypothesis. Clin. Exp. Allergy. 40,
563-75 (2010).
40. R.C. Melo, L.A. Spencer, S.A. Perez, J.S. Neves, S.P. Bafford, E.S. Morgan, A.M. Dvorak, ef al, Vesicle-mediated secretion of human eosinophil granule- derived major basic protein. Lab Invest. 89, 769-81 (2009). 41. J. S. Neves, S.A. Perez, L.A. Spencer, R.C. Melo, L. Reynolds, I. Ghiran, S.
Mahmudi-Azer, et al, Eosinophil granules function extracellularly as receptor- mediated secretory organelles. Proc. Natl. Acad. Sci. U S A. 105,18478-83 (2008).
42. M. Cotterchio, G. McKeown-Eyssen, H. Sutherland, G. Buchan, M. Aronson, et al., Ontario familial colon cancer registry: methods and first-year response rates. Chronic Dis. Can. 21, 81-86 (2000).
43. H. Li, J. Ruan, R. Durbin, Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851-8 (2008).
44. Y. D. Zhao, Sample size estimation for the van Elteren test - a stratified Wilcoxon-Mann-Whitney test. Statist. Med. 25, 2675-2687 (2006).
45. Dudbridge F, Likelihood-based association analysis for nuclear families and unrelated subjects with missing genotype data. Hum. Hered. 66, 87-98 (2008)
46. B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, Ultrafast and memory- efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
47. C. Trapnell, L. Pachter, S. L. Salzberg, TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105-1111 (2009). E. Birney, ENCODE Project Consortium, Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799-816 (2007). J. T. Robinson, H. Thorvaldsdottir, W. Winckler, M. Guttman, E. S. Lander, ef al, Integrative Genomics Viewer. Nat. Biotechnol. 29, 24-26 (2011 ). ENCODE Project Consortium, The ENCODE (ENCyclopaedia Of DNA Elements) Project. Science 306, 636-640 (2004). W. J. Kent, C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, et al., The human genome browser at UCSC. Genome Res. 12, 996-1006 (2002).

Claims

An isolated protein comprising SEQ ID NO. 1 or a functional fragment thereof .
An isolated protein comprising SEQ ID NO. 3, 5, 7, 9, 11 or a functional fragment thereof.
An isolated nucleic acid encoding the protein of any one of claims 1 and 2.
The isolated nucleic acid of claim 3, comprising SEQ ID NO. 2.
The isolated nucleic acid of claim 3, comprising SEQ ID NO. 4, 6, 8, 10 or 12.
An expression vector comprising the nucleic acid of claim 5 operably linked to an expression control sequence.
A cultured cell comprising the vector of claim 6.
A method of determining risk of colon cancer in a patient using a sample therefrom comprising: a. determining the level of expression of at least one of COLCA1 and COLCA2; and b. comparing the level of expression from step (b) with a control sample; wherein a higher level of expression of at least one of COLCA1 and COLCA2 in the patient sample compared to the control indicates a low risk of colon cancer.
The method of claim 8, wherein the level of gene expression is determined and compared.
The method of claim 9, wherein the level of gene expression is determined by hybridizing a labelled probe to at least one of COLCA1 and COLCA2 mRNA and detecting labelled probe hybridized to the mRNA.
The method of any one of claims 9-10, wherein the level of gene expression is determined on a DNA microarray.
12. The method of any one of claims 9-11 , further comprising polymerase chain reaction (PCR) to amplify the mRNA.
13. The method of claim 9, wherein the level of gene expression is determined by using a tag based analysis, preferably serial analysis of gene expression (SAGE).
14. The method of claim 8, wherein the level of protein expression is determined and compared.
15. The method of claim 14, wherein the level of protein expression is determined by binding a COLCA1 or COLCA2 specific antibody to COLCA1 or COLCA2 respectively and detecting the presence of the resulting protein-antibody complex.
16. The method of any one of claims 8-15, wherein the sample is a colon tissue sample.
17. The method of any one of claims 8-15, wherein the sample is a peripheral blood sample.
18. The protein of claims 1 or 2 for determining the risk of colon cancer.
19. Use of the protein of claims 1 or 2 for determining the risk of colon cancer.
20. A diagnostic kit for determining risk of colon cancer in a patient comprising reagents for detecting the level of gene or protein expression of at least one of COLCA1 and COLCA2 in a patient sample and instructions for use.
21. The kit of claim 20, wherein the instructions correlate to the method steps of any one of claims 8-17.
22. A method of treating or preventing colon cancer in a subject comprising administering the protein of claims 1 or 2.
23. The protein of claims 1 or 2 for treating or preventing colon cancer in a subject. Use of the protein of claims 1 or 2 for treating or preventing colon cancer in a subject.
Use of the protein of claims 1 or 2 in the preparation of a medicament for treating or preventing colon cancer in a subject.
A pharmaceutical composition for the treatment of colon cancer comprising a therapeutically effective amount of the protein of claim 1 or 2 and a pharmaceutically acceptable carrier.
PCT/CA2013/000306 2012-03-28 2013-03-28 Colca1 and colca2 and their use for the treatment and risk assessment of colon cancer WO2013142982A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261616477P 2012-03-28 2012-03-28
US61/616,477 2012-03-28

Publications (1)

Publication Number Publication Date
WO2013142982A1 true WO2013142982A1 (en) 2013-10-03

Family

ID=49258007

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2013/000306 WO2013142982A1 (en) 2012-03-28 2013-03-28 Colca1 and colca2 and their use for the treatment and risk assessment of colon cancer

Country Status (1)

Country Link
WO (1) WO2013142982A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108676885A (en) * 2018-06-26 2018-10-19 华中科技大学鄂州工业技术研究院 Stage of RCC diagnostic marker
CN111370065A (en) * 2020-03-26 2020-07-03 北京吉因加医学检验实验室有限公司 Method and device for detecting cross-sample contamination rate of RNA

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008118258A2 (en) * 2007-02-06 2008-10-02 Genizon Biosciences Inc. Genemap of the human genes associated with adhd

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008118258A2 (en) * 2007-02-06 2008-10-02 Genizon Biosciences Inc. Genemap of the human genes associated with adhd

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SKOG M. ET AL.: "Expression and prognostic value of transcription factor PROX1 in colorectal cancer", BR J. CANCER, vol. 105, no. 9, 25 October 2011 (2011-10-25), pages 1346 - 1351 *
SLATTERY M.L. ET AL.: "Interferon-signaling pathway: associations with colon and rectal cancer risk and subsequent survival", CARCINOGENESIS, vol. 32, no. 11, 22 August 2011 (2011-08-22), pages 1660 - 1667 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108676885A (en) * 2018-06-26 2018-10-19 华中科技大学鄂州工业技术研究院 Stage of RCC diagnostic marker
CN111370065A (en) * 2020-03-26 2020-07-03 北京吉因加医学检验实验室有限公司 Method and device for detecting cross-sample contamination rate of RNA
CN111370065B (en) * 2020-03-26 2022-10-04 北京吉因加医学检验实验室有限公司 Method and device for detecting cross-sample contamination rate of RNA

Similar Documents

Publication Publication Date Title
KR102226219B1 (en) Method for the diagnosis, prognosis and treatment of lung cancer metastasis
EP1824997B1 (en) Genetic alteration useful for the response prediction of malignant neoplasia to taxane-based medical treatment
CA2797291C (en) Novel biomarkers and targets for ovarian carcinoma
Barbieri et al. Molecular genetics of prostate cancer: emerging appreciation of genetic complexity
DK3055429T3 (en) Procedure for the prognosis and treatment of metastatic bone cancer resulting from breast cancer
US11441190B2 (en) Compositions and methods for the diagnosis and treatment of ovarian cancers that are associated with reduced SMARCA4 gene expression or protein function
US8735076B2 (en) Targets for use in diagnosis, prognosis and therapy of cancer
KR101872965B1 (en) Method for the diagnosis, prognosis and treatment of prostate cancer metastasis using c-maf
Peltekova et al. Identification of genes expressed by immune cells of the colon that are regulated by colorectal cancer‐associated variants
Rivera et al. DGCR8 microprocessor defect characterizes familial multinodular goiter with schwannomatosis
KR20150122731A (en) Method for the prognosis and treatment of cancer metastasis
CN111630183A (en) Clear cell renal cell carcinoma biomarkers
US20210238696A1 (en) Biomarkers for the Identification of Prostate Cancer and Methods of Use
KR20180108820A (en) Genetic profiling of cancer
WO2013142982A1 (en) Colca1 and colca2 and their use for the treatment and risk assessment of colon cancer
Marques-Pereira et al. Childhood adrenocortical tumours: a review
Tan et al. No difference in the occurrence of mismatch repair defects and APC and CTNNB1 genes mutation in a multi‐racial colorectal carcinoma patient cohort
WO2014075069A1 (en) A method for diagnosing and assessing risk of pancreatitis using genetic variants
US20170334966A1 (en) Anti-tumor antibody-tumor suppressor fusion protein compositions and methods of use for the treatment of cancer
CA2559134A1 (en) Target genes for inflammatory bowel disease
EP2385135B1 (en) RAD51C as a human cancer susceptibility gene
WO2020213344A1 (en) Method and kit for detecting risk of colorectal cancer
WO2024110458A1 (en) Lnc-znf30-3 as cancer biomarker and therapeutic target
Laitinen Genetic risk factors for hereditary prostate cancer in Finland-From targeted analysis of susceptibility loci to genome-wide copy number variation study
JP2024523001A (en) New NRG1 fusions, fusion junctions and methods for detecting them

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13768904

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13768904

Country of ref document: EP

Kind code of ref document: A1