WO2009047532A2 - Cancer susceptibility loci - Google Patents

Cancer susceptibility loci Download PDF

Info

Publication number
WO2009047532A2
WO2009047532A2 PCT/GB2008/003454 GB2008003454W WO2009047532A2 WO 2009047532 A2 WO2009047532 A2 WO 2009047532A2 GB 2008003454 W GB2008003454 W GB 2008003454W WO 2009047532 A2 WO2009047532 A2 WO 2009047532A2
Authority
WO
WIPO (PCT)
Prior art keywords
polymorphism
site
cancer
individual
risk
Prior art date
Application number
PCT/GB2008/003454
Other languages
French (fr)
Other versions
WO2009047532A3 (en
Inventor
Malcolm Dunlop
Ian Tomlinson
Harry Campbell
Richard Somerset Houlston
Original Assignee
Cancer Research Technology Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cancer Research Technology Limited filed Critical Cancer Research Technology Limited
Publication of WO2009047532A2 publication Critical patent/WO2009047532A2/en
Publication of WO2009047532A3 publication Critical patent/WO2009047532A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/172Haplotypes

Definitions

  • the present invention relates to the identification of genetic loci associated with susceptibility to tumour and cancer development .
  • the present invention provides a method of determining, in a sample obtained from an individual, the allele present at one or more sites of polymorphism, wherein said one or more sites of polymorphism are associated with tumour and/or cancer susceptibility, and wherein at least one site of polymorphism is located in a region selected from:
  • Chromosome 8 117650000-117870000
  • Chromosome 10 8730000-8810000
  • Chromosome 10 76789000-76811000;
  • Chromosome 4 63108000-63170000;
  • Chromosome 1 231840000-231868000.
  • Position numbering in the above refers to position numbering in the stated chromosome as shown in UCSC March 2006 reference assembly; NCBI build 36.1. This applies to all other position numbering given in the present application unless expressly indicated otherwise.
  • the present invention provides a method of determining, in a sample obtained from an individual, the allele present at one or more sites of polymorphism, wherein said one or more sites of polymorphism are associated with tumour and/or cancer susceptibility, and wherein at least one site of polymorphism is located in the region of the SMAD7 gene (e.g., 18q21.1) and/or in the region of the HMPS/CRAC1 locus (e.g., 15ql3.3).
  • the present invention provides a method of assessing an individual for a cancer condition or tumour, comprising : providing a sample obtained from said individual, and determining in said sample the allele present at one or more sites of polymorphism, wherein said one or more sites of polymorphism are associated with tumour and/or cancer susceptibility, and wherein at least one site of polymorphism is located in a region as set out above (i.e., a region selected from chr8 : 117650000-117870000, chrlO : 8730000-8810000, chrlO: 76789000-76811000, chr4 : 63108000-63170000 and chrl : 231840000-231868000, the region of the SMAD7 gene e.g., 18q21.1 and/or the region of the HMPS/CRAC1 locus e.g., 15ql3.3) .
  • the assessment of the individual may be for diagnostic or prognostic purposes.
  • the assessment is of the risk of that individual to tumour or cancer, e.g., colorectal tumour or cancer.
  • the invention also provides a method for identifying an individual who is at risk or likely to be at risk of tumour or cancer, the method comprising: determining in a sample obtained from an individual, the allele present at one or more sites of polymorphism, wherein said one or more sites of polymorphism are associated with tumour and/or cancer susceptibility, and wherein at least one site of polymorphism is located in a region as set out above.
  • the regions identified by the present inventors as being associated with risk may contain transcribed sequences, e.g., encoding RNA and/or proteins.
  • the present invention provides a method of assessing an individual for a cancer condition or tumour or a method of identifying a patient who is at risk or likely to be at risk of tumour or cancer, the method comprising determining in a sample obtained from an individual the transcription level of a gene located in a region as set out above .
  • the kits may comprise reagents for determining in a genomic sample obtained from the individual, the allele present said sites of polymorphism.
  • the kit may comprise amplification reagents for amplifying all or part of the regions described above, from a genomic sample obtained from an individual.
  • Amplification reagents may include buffers, nucleotides, taq or other polymerase and/or one or more oligonucleotide primers which bind specifically within a region as described above and are suitable for amplifying a region containing one or more sites of polymorphism, for example by PCR.
  • the kit may comprise a labelled oligonucleotide probe which binds to an allele at a site of polymorphism in a region as described above.
  • the kit may comprise multiple such probes, e.g., for detection of multiple polymorphisms, which may be supported on a solid substrate.
  • the kit may comprise a microarray .
  • the invention in another aspect, relates to a kit for detecting, in a sample obtained from an individual, the transcription level of a gene located in a region as described above.
  • the kit may comprise reagents for the detection of mRNA and/or protein, e.g., an oligonucleotide probe or an antibody.
  • the reagents may be detectably labelled (directly or indirectly) .
  • the reagents may be immobilised on a solid support or may comprise a tag which is suitable for immobilisation on a solid support.
  • a kit as described herein may comprise one or more articles and/or reagents for performance of the method, such as means for providing the test sample itself, e.g. a swab for removing cells from the buccal cavity or a syringe for removing a blood sample (such components generally being sterile) .
  • the kit may further comprise instructions for using the kit in accordance with a method described herein.
  • a kit may further comprise control nucleic acid.
  • kits may be used for identifying a patient who is at risk of or likely to be at risk of cancer or tumour.
  • this invention relates to the identification of SMAD7 as a susceptibility gene in cancer.
  • the present inventors have shown a large number of polymorphisms in the SMAD7 locus to be associated with a significant increase in CRC risk. This association is clearly of value, e.g., in assessing an individual's likely risk. Moreover, the present inventors have identified a novel polymorphism in the SMAD7 locus, and provided evidence for that this polymorphism is located in a gut enhancer region and can affect the enhancer potential of this region. This causative role is unexpected, as many SNPs are not themselves candidates for causality.
  • HMPS/CRAC1 locus are associated with cancer, and particularly colorectal cancer.
  • HMPS/CRAC1 colorectal cancer
  • HMPS/CRAC1 locus contains variants that increase cancer susceptibility in the general population .
  • the present invention provides a method of determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the SMAD7 gene.
  • the present invention provides a method of determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the HMPS/CRAC1 locus.
  • the present invention provides a method of assessing an individual for a cancer condition or an adenoma, comprising : providing a sample obtained from said individual, and determining in said sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the SMAD7 gene, and/or in the region of the HMPS/CRAC1 locus.
  • the assessment of said individual may be for diagnostic or prognostic purposes.
  • the assessment is of the risk of that individual to adenoma or cancer, e.g., colorectal cancer or adenoma.
  • the present invention provides a method of identifying a patient who is at risk of or likely to be at risk of colorectal adenomas or colorectal cancer, the method comprising: determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the SMAD7 gene, and/or in the region of the HMPS/CRAC1 locus; the presence of a variant allele at the one or more sites being indicative that the individual is at risk of colorectal cancer or colorectal adenoma.
  • kits for determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the SMAD7 gene and/or in the region of the HMPS/CRAC1 locus may comprise reagents for determining in a genomic sample obtained from the individual, the presence or absence of a variant allele at one or more sites of polymorphism in the genomic region of the SMAD7 gene, and/or in the region of the HMPS/CRAC1 locus.
  • the kit may comprise amplification reagents for amplifying all or part of the SMAD7 gene, and/or the HMPS/CRAC1 locus, from a genomic sample obtained from an individual.
  • Amplification reagents may include buffers, nucleotides, taq or other polymerase and/or one or more oligonucleotide primers which bind specifically to the SMAD7 gene and/or the HMPS/CRAC1 locus and are suitable for amplifying a region of the gene containing one or more sites of polymorphism, for example by PCR.
  • the kit may comprise a labelled oligonucleotide probe which binds to an allelic variant at a site of polymorphism in the genomic region of the SMAD7 gene, or in the region of the HMPS/CRAC1 locus.
  • the kit may comprise multiple such probes, e.g., for detection of multiple polymorphisms, which may be supported on a solid substrate.
  • the kit may comprise a microarray .
  • the kit may comprise one or more articles and/or reagents for performance of the method, such as means for providing the test sample itself, e.g. a swab for removing cells from the buccal cavity or a syringe for removing a blood sample (such components generally being sterile) .
  • a swab for removing cells from the buccal cavity
  • a syringe for removing a blood sample (such components generally being sterile) .
  • the kit may further comprise instructions for using the kit in accordance with a method described herein.
  • a kit may further comprise control nucleic acid.
  • kits may be used for identifying a patient who is at risk of or likely to be at risk of colorectal adenoma or cancer, wherein the presence of a variant allele at the one or more sites is indicative that the individual is at risk of colorectal adenoma or cancer.
  • the methods of assessing cancer may be methods of assessing the risk of a colorectal cancer other than that forming part of or arising from HMPS.
  • the individuals may be individuals not having the clinical phenotype multiple polyps in the large bowel, characteristic of HMPS. It may be preferred that the cancer to be assessed is of non-polyposis origin.
  • Variant alleles of SMAD7 or of the HMPS/CRAC1 locus may alter (i.e., increase or decrease) the risk of cancer in an individual .
  • a SMAD7 variant allele may contain one or more mutations relative to a reference sequence, which may be a sequence as given below.
  • a SMAD7 variant allele may contain one or more mutations relative to the sequence of the SMAD7 genomic region as set out below, e.g., between bases 44,700,221 and 44,731,079 of the human Genome 2006 Build, and the surrounding region.
  • the mutation may be a substitution, deletion, copy number change or insertion.
  • the variation may be one or more single nucleotide polymorphisms .
  • One or more of the mutations may be contained within intron 3 of SMAD7.
  • the presence of a variant allele may be identified at one, two, three or more sites of polymorphism.
  • the sites of polymorphism assessed may include one, two or three of Rs4939827, Rsl2953717 and Rs4464148 (described further below) optionally in combination with one, two, three, four, five or more other sites of polymorphism known to the associated with cancer or adenoma risk, e.g., colorectal cancer risk.
  • a method described herein may comprise determining the presence or absence of a T at SNP rs4939827 in the genomic nucleic acid sample obtained from the individual .
  • polymorphism may be assessed at Rs4939827 plus one or more additional sites.
  • Said one or more addition sites may, for example, include Rs4464148 and/or Rsl2953717.
  • a HMPS/CRAC1 variant allele may contain one or more polymorphisms relative to a reference sequence, which may be the sequence residing between bases 29,775,416 and 34,124,337 on chromosome 15 of the Human Genome 2006 Build (http: //genome . ucsc/edu) .
  • the mutation may be a substitution, copy number change, deletion or insertion.
  • the variant may be one or more single nucleotide polymorphisms .
  • the sites of polymorphism assessed may include one or both of rs4779584 or rslO318 (details of which are given below) , optionally in combination with one, two, three, four, five or more other sites of polymorphism known to be associated with cancer risk, e.g., colorectal cancer risk, which may for instance be sites in the SMAD7 gene region or other sites known to be associated with colorectal cancer risk.
  • one such other site may be rs6983267, details of which are given below.
  • the sites of polymorphism assessed may include rs4779584 and rs6983267.
  • the presence or absence of the variant allele at the site may be determined in one or both copies of the region in the genome of the individual.
  • homozygosity of the risk allele may be associated with a higher risk of the disease than heterozygosity.
  • the presence of a variant allele at the one or more sites of polymorphism may be determined by any convenient technique, including amplification of all or part of the genomic region of the SMAD7 gene, including the SMAD7 gene itself, or of the HMPS/CRAC1 locus; sequencing all or part of the genomic region of the SMAD7 gene, including the SMAD7 gene itself, or of the HMPS/CRAC1 locus; and/or hybridisation of a probe which is specific for a variant allele. Suitable methods are described in more detail below (see Methods of determining an allele present at a site of polymorphism) . Moreover, the present inventors have noted that certain SNPs in the SMAD7 sequence alter the raRNA expression levels . Therefore, in a further aspect, the present invention provides a method of assessing an individual for a cancer condition, comprising: determining the expression level of SMAD7 in a sample obtained from said individual.
  • the method may be a method of assessing the risk of cancer in said patient, e.g., the risk of colorectal cancer.
  • a lower SMAD7 expression may be associated with an increased risk of cancer.
  • the expression level of a gene may be measured by measuring the level of an expression product such as mRNA or protein. Methods of quantitatively measuring mRNA and protein are well known in the art.
  • allele refers to one of several alternative forms of a given DNA sequence (which may or may not be present in an exon or in a gene) .
  • haplotype refers to the identity of two or more polymorphic variants occurring within genomic DNA on the same strand of DNA.
  • a cancer or cancer condition as described herein may include any type of solid cancer and malignant lymphoma and especially leukaemia, sarcomas, skin cancer, bladder cancer, breast cancer, uterus cancer, ovary cancer, prostate cancer, lung cancer, colorectal cancer, cervical cancer, liver cancer, head and neck cancer, oesophageal cancer, pancreas cancer, renal cancer, stomach cancer and cerebral cancer.
  • a tumour may be a tumour of any of the above tissues or organs. It may be an adenoma. Methods of the invention may be particularly useful in assessing the risk of colorectal cancer or tumour, e.g., adenoma. Colorectal cancer may in some embodiments be further divided into colon cancer or rectal cancer.
  • assessment of the risk or susceptibility of an individual may comprise assessing the risk of an early or late onset of the cancer or tumour, e.g., an onset at earlier or later than 60 years old.
  • the sample obtained from the individual may be a nucleic acid sample, e.g., a sample of genomic nucleic acid.
  • the sample may be a protein sample, and the presence of the variant allele may be determined by changes in the amino acid sequence.
  • the sample may be an RNA, e.g., mRNA sample.
  • Suitable nucleic acid- or protein- containing samples may include a tissue or cell sample, such as a biopsy, or a biological fluid sample, such as a blood sample or a swab.
  • the individual or patient referred to herein is an individual or patient showing no symptoms of cancer or tumour. Risk assessment may take place for individuals who are considered to be free of the disease/condition at the time the sample is taken.
  • the individual may be a healthy individual, having no symptoms of colorectal disease.
  • known germline mutations in known genes account for a very low percentage of inherited risk of cancer. Much of the remaining variation of genetic risk may be attributable to a large number of susceptibility loci, some of which will be common, each exerting a small influence on risk. An individual having disease-associated alleles, particularly at several positions, may be considered at a higher risk, and may be subject to an appropriate regime of monitoring and/or may take precautionary steps to reduce that risk. Thus, it is of value to identify further risk markers.
  • the present inventors have identified novel susceptibility loci associated with cancer/tumour susceptibility.
  • the inventors have identified single nucleotide polymorphisms within these loci, associated with susceptibility.
  • these SNPs may not themselves be directly causative of the increased susceptibility.
  • polymorphisms can be associated with a causative change, so as to serve as useful markers of risk without being directly causative themselves.
  • the region of linkage disequilibrium with the SNP is likely to contain both the causative gene/mutation and other useful markers.
  • the inventors have identified the SNP rsl6892766 in the linkage disequilibrium block chr : 8 117.65Mb-117.87Mb.
  • the region containing the site of polymorphism may be chromosome 8: 117690773-117712909.
  • the inventors have identified the SNP rsl0795668 in the linkage disequilibrium block chr:10 8.73-8.81Mb.
  • Suitable markers/polymorphisms may be a substitution, copy number change, deletion or insertion. Most preferably, the marker/polymorphism may be one or more single nucleotide polymorphisms .
  • preferred polymorphism may be one or more of:
  • An indication of higher risk in an individual may in some embodiments be associated with detection of a C at rsl6892766.
  • An indication of higher risk in an individual may in some embodiments be associated with detection of an A at rs2488704.
  • An indication of higher risk in an individual may in some embodiments be associated with detection of a C at rs4355419.
  • An indication of higher risk in an individual may in some embodiments be associated with detection of a C at rs2282428.
  • An indication of higher risk in an individual may in some embodiments be associated with detection of a G at rsl0795668.
  • Some embodiments of the invention may comprise determining the allele present at one or more sites of polymorphism in the region of the SMAD7 gene.
  • SMAD7 (Mothers against decapentaplegic homolog 7) acts as an intracellular antagonist of TGF ⁇ signalling by binding stably to the receptor complex and blocking activation of downstream signalling events.
  • the human SMAD7 protein sequence has an exemplary database entry AAL68977, gi: 18418630.
  • Exemplary human nucleic acid sequences for the four exons are given in database entries AF026556.1, GI:18418626, GI:18418627, GI: 18418628 and GI: 1841829.
  • the human SMAD7 gene is located at 18q21.1, and an exemplary sequence of the human SMAD7 gene is set out between bases 44,700,221 and 44,731,079 of the human Genome 2006 Build (http: //genome. ucsc.edu) for chromosome 18.
  • one or more sites of polymorphism may be located in intron 3 of the SMAD7 gene. In some embodiments it may be preferred that one or more sites of polymorphism is located in the region Chr 18: 44,700,221-44,716,898.
  • Exemplary polymorphisms located in the region of SMAD7 include :
  • rs4939827 (Chrl8 :position 44707461) CTCACAGCCTCATCCAAAAGAGGAAA [C/T] AGGACCCCAGAGCTCCCTCAGACTC
  • rsl2953717 (Chr 18: position 44707927) GCATTTCACACCAACCTCGCATGCAG [C/T] CTCCCGGTAAGTTCAGCTCATCCCT rs4464148: (Chrl ⁇ : position 44713030) CGGGGGAACAGACAGAGAAGGATGAA [C/T] GTGAAAAGGAAACACCCTGGTAACT
  • An indication of higher risk in an individual may in some embodiments be associated with detection of a T at rs4939827, a T at rsl2953717 and/or a C at rs4464148.
  • exemplary haplotypes associated with risk are: TTC, TTT and TCT.
  • a polymorphism in the SMAD7 gene region may be a SNP newly identified by the present inventors.
  • This SNP is a C/G polymorphism at 44703563bp of chromosome 18.
  • An indication of higher risk in an individual may in some embodiments be associated with detection of a G at position 44703563bp.
  • exemplary polymorphisms in the SMAD7 region are rs8085824, rs34007497, rs4044177 and rsl2953717.
  • An indication of higher risk in an individual may in some embodiments be associated with detection of a C at rs8085824, a G at rs34007497, AAGAA at rs4044177 and a T at rsl2953717.
  • Methods of the invention may involve analysis of haplotypes comprising more than one SNP.
  • An exemplary haplotype is between markers rs6507874 and rs36025258.
  • Another exemplary haplotype is between rs9946510 and rsl2967711.
  • a risk haplotype may include a G at position 44703563bp.
  • the method may comprise determining the presence or absence of a haplotype shown in table 7, where the allele is shown for each marker (in order) within the specified region (see figure 2c for the order of the markers) .
  • methods of the invention may comprise assessing the allele present at at least one site of polymorphism located in "block 1" (from rs6507874 to rs36025258) and/or at least one site of polymorphism located in "block 2" (rs9946510 to rsl2967711) .
  • the assessment may be of a haplotype in block 1 and/or a haplotype in block 2.
  • Some embodiments of the invention may comprise determining the alelle present at one or more sites of polymorphism in the region of the HMPS/CRAC1 locus.
  • HMPS/CRAC1 locus referred to herein is located at 15ql3.3- ql4, and an exemplary sequence is given between bases 29,775,416 and 34,124,377 on chromosome 15 of the human genome 2006 build (http : //genome . ucsc . edu) .
  • one or more sites of polymorphism may be located in the region chrl5: 30,782,050-30,841,010.
  • Exemplary polymorphisms located in the region of the HMPS/CRAC1 locus include:
  • rs4779584 TAGAACTTGTTGATAAGCCATTCTTC [C/T] GAACAGAAACCATAACTATATACAC
  • rsl0318 CAAGATATTTGTGGTCTTGATCATAC [C/T] TATTAAAATAATGCCAAACACCAAA
  • a higher risk of colorectal cancer or adenoma may be associated with a T at rslO318 and/or a T at rs4779584.
  • Other possible polymorphisms in the region of the HMPS/CRAC1 locus may include: rsl2906413; rsll853552; rsll857190; rsl2148790; rsll857997; rs8034965ins/del (a C/- polymorphism at chrl5:30,799,068) ; rs3743103; NFN28; and/or rsll29456.
  • a higher risk of colorectal cancer or adenoma may be associated with a T at rsl2906413, a G at Rsll853552, an A at Rsll857190, a T at Rsll857997, and/or an A at Rsll29456.
  • Additional polymorphism may be polymorphisms in linkage disequilibrium with any of the above (e.g., having an r 2 greater than 0.3, more preferably greater than 0.4, 0.5, 0.6, 0.7, 0.8 or 0.9, and/or a value of D' of greater than 0.5, more preferably greater than 0.6, 0.7, 0.8 or 0.9).
  • other polymorphism may have an r 2 greater than 0.3, more preferably greater than 0.4, 0.5, 0.6, 0.7, 0.8 or 0.9, and/or a value of D' of greater than 0.5, more preferably greater than 0.6, 0.7, 0.8 or 0.9, with rsl6892766, rsl0795668, rs2488704, rs4355419, rs2282428, rs4939827, rsl2953717, rs4464148, rs4779584 and/or rslO318.
  • table 5 lists 22 SNPs in addition to rs4939827, rsl2953717, rs4464148: all 22 of these SNPs have an r 2 of at least 0.5 with one or more of rs4939827, rsl2953717, rs4464148 and all have an association with risk of developing CRC at the 5% statistical level.
  • a method of the invention may comprise determining the presence or absence of a risk-associated allele which is in linkage disequilibrium with a C at rsl6892766, a G at rsl0795668, an A at rs2488704, a C at rs4355419 or a C at rs2282428.
  • a method of the invention may comprise determining the presence or absence of a risk-associated allele which is in linkage disequilibrium with a T at rs4939827, a T at rsl2953717, a C at rs4464148, a T at rsl0318, and/or a T at rs4779584.
  • a method of the invention may comprise determining the presence or absence of a risk-associated allele which is in linkage disequilibrium with a G at position 44703563bp of chromosome 18.
  • exemplary sites in linkage disequilibrium with rsl689276 ⁇ are rsll986063 and rs6983626.
  • exemplary sites in linkage disequilibrium with rsl0795668 are rs706771, rs7898455 and rs827405.
  • rs7898455 tgacagcttcattgcaggcatatgaa [G/T] ttccaggcagaagacccagataagc
  • the regions in linkage disequilibrium with the identified SNPs may contain transcribed sequences, e.g., coding for proteins and/or RNAs.
  • the site of polymorphism may be located in one of these transcribed sequences .
  • the present invention relates to methods comprising determining the transcription level from one of these genes.
  • the expression level of a gene may be measured by measuring the level of an expression product such as mRNA or protein. Methods of quantitatively measuring mRNA and protein are well known in the art.
  • genes located in the regions set out above are EIF3S3(NM_003756) , C8orf53 (NM_032334 ) , BC031880, LOC389936, FLJ3802842, KCNKl (NM_002245) and SMAD7 (as previously described) .
  • EIF3S3 is known to regulate cell growth and viability, and its overexpression is a feature of breast, prostate and hepatocellular cancers.
  • the work of the present inventors supports this gene as a causative gene in colorectal cancer/adenoma also.
  • preferred embodiments may comprise assessing the transcription level of EIF3S3 in order to assess colorectal cancer/tumours in an individual or to identify an individual at risk of colorectal cancer/tumours.
  • EIF3S3 may be a therapeutic target in the treatment of colorectal cancer or tumours, and thus may be used in methods of screening for therapeutic compounds for the treatment of these conditions.
  • Modulators of EIF3F3 may be useful as therapeutics in the treatment (including preventative treatment) of these conditions, e.g., antibodies against EIF3S3, nucleic acid inhibitors having a sequence complementary to EIF3S3 (such as antisense, RNAi molecules such as siRNA and miRNA, ribozymes and the like) and vectors encoding nucleic acid inhibitors.
  • the alleles present at a plurality of sites of polymorphism are assessed.
  • At least one site of polymorphism is located in a region as set out above, and one or more additional polymorphism known to be associated with the disease/condition is assessed.
  • one such other site may be rs6983267, details of which are given below.
  • a higher risk in an individual may be associated with a G at this position.
  • a plurality of sites may be located in one or more regions as set out herein.
  • the plurality of polymorphisms may comprise a plurality of polymorphisms within a given region as described herein, and/or may include polymorphisms in a plurality of different regions .
  • plural herein is meant at least 2, and in some embodiments 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 50, 100 or more.
  • the risk of developing a condition may be increased in an individual the more risk-associated alleles they have. For instance, counting two for a homozygote, the risk of CRC increases with increasing numbers of variant alleles for the five loci rs6983267, rs4779584, rs4939827, rsl0795668 and rsl6892766 (as discussed further in example 7) .
  • one or more sites of polymorphism according to the present invention may be assessed as part of a panel.
  • the panel may comprise (but is not limited to) :
  • rsl6892766 and rs4939827 and optionally rsl2953717 and/or rs4464148;
  • rsl0795668 and rs4939827 and optionally rsl2953717 and/or rs4464148;
  • the newly identified polymorphism at chr 18 44703563bp, with one or more of rsl6892766, rsll986063, rs6983626, rsl0795668, rs4779584 and rslO318.
  • any of the above combinations, and any of the individual sites of polymorphism taught herein, or combinations thereof, may also be assessed together with rs6983267.
  • the sites of polymorphism assessed may include rs4779584 and rs6983267.
  • the sites may include rsl6892766, rsl0795668, rs4939827 and rs4779584 and rs6983267) .
  • the allele present at the site may be determined in one or both copies of the region in the genome of the individual.
  • homozygosity of the risk allele may be associated with a higher risk of the disease than heterozygosity.
  • the present invention additionally comprises methods of assigning a regime of treatment and/or monitoring to an individual based on the number of risk-associated alleles they have at a plurality of sites of polymorphism as taught herein.
  • said plurality of sites of polymorphism may be or comprise any of the combinations set out above.
  • said polymorphisms may be or comprise rsl6892766, rsl0795668, rs4939827, rs6983267 and rs4779584 or a site in linkage disequilibrium therewith.
  • a regime of monitoring may be assigned (e.g., of regular colonoscopic examination) if an individual has seven or more risk-associated alleles at said sites, where homozygous alleles are counted as two and heterozygous alleles as one.
  • Determining the allele present at a particular site of polymorphism may in some embodiments comprise determining the nucleotide or sequence of nucleotides at that site. In other embodiments, determining the allele present at a particular site of polymorphism may comprise determining the presence or absence of the disease-associated allele.
  • the allele at the one or more sites of polymorphism may be determined by any convenient technique, including amplification of all or part of the region containing the site of polymorphism, sequencing all or part of the region containing the site of polymorphism, and/or hybridisation of a probe specific for an allele at the site of polymorphism.
  • a specific amplification reaction such as PCR using one or more pairs of primers may conveniently be employed to amplify all or part of the region of interest, for example, the portion of the sequence containing or suspected of containing the one or more sites of polymorphism.
  • the amplification may be allele specific, such that the presence or absence of amplification product is indicative of the presence of that allele (e.g., the risk- associated allele) .
  • the amplified nucleic acid may be sequenced as above, and/or tested in any other way to determine the presence or absence of an allele (e.g., the risk-associated allele) at the one or more sites of polymorphism.
  • the method may comprise in some embodiments ligation-based methods.
  • Such a method may comprise hybridising a first and second probe to a first target domain comprising a polymorphism of interest (e.g., a SNP) at an interrogation position.
  • a polymorphism of interest e.g., a SNP
  • Either the end of the first probe or the beginning of the second probe contains a nucleotide or nucleotides at a detection position, which aligns with the interrogation position on the target. Only if there is complementarity between the detection position and the interrogation position, then the two probes can be ligated (optionally, following filling of a gap between them) .
  • Each probe has a primer sequence for amplification, such that the ligated probe comprises both an upstream and downstream primer and can be amplified, e.g., by PCR.
  • the presence of the amplification product indicates a match between the interrogation and the detection position.
  • different primers comprising different nucleotides at the interrogation position can be used (optionally, in the same reaction) , each comprising a label which allows for its detection.
  • the label may be a sequence of nucleotides which allows the amplified ligation product derived from the primer to be directed to a particular site on an array.
  • Suitable amplification reactions include the polymerase chain reaction (PCR) .
  • PCR comprises repeated cycles of denaturation of template nucleic acid, annealing of primers to template, and elongation of the primers along the template.
  • PCR is well- known in the art and is described for example in "PCR protocols; A Guide to Methods and Applications", Eds. Innis et al, 1990, Academic Press, New York, Mullis et al, Cold Spring Harbor Symp. Quant. Biol., 51:263, (1987), Ehrlich (ed) , PCR technology, Stockton Press, NY, 1989, and Ehrlich et al, Science, 252:1643-1650, (1991)).
  • the number of cycles, the respective conditions of the individual steps, the composition of reagents within the reaction tube, or any other parameter of the reaction set-up may be varied or adjusted by the skilled person, depending on the circumstances. Additional steps (such as initial denaturing, hot-start, touchdown, enzyme time release PCR, replicative PCR) may also be employed.
  • PCR transcript displacement activation
  • repair chain reaction repair chain reaction
  • ligase chain reaction ligation activated transcription
  • SDA strand displacement amplification
  • TMA transcription mediated amplification
  • the binding of a probe to genomic nucleic acid in the sample, or amplification products thereof may be determined.
  • the probe may comprise a nucleotide sequence which binds specifically to a nucleic acid sequence which contains a particular allele (e.g., the risk-associated allele, or a non- risk associated allele) at one or more sites of polymorphism and does not bind specifically to the nucleic acid sequence which does not contain that allele at the one or more polymorphic sites.
  • the oligonucleotide probe may comprise a label and binding of the probe may be determined by detecting the presence of the label.
  • One or more e.g.
  • hybridisation will generally be preceded by denaturation to produce single-stranded DNA.
  • the hybridisation may be part of amplification procedure such as PCR, or may be part of a probing procedure not involving amplification.
  • An example procedure would be a combination of PCR and low stringency hybridisation.
  • Binding of a probe to target nucleic acid may be measured using any of a variety of techniques at the disposal of those skilled in the art.
  • probes may be radioactively, fluorescently or enzymatically labelled.
  • Other methods not employing labelling of probe include examination of restriction fragment length polymorphisms, amplification using PCR, RN'ase cleavage and allele specific oligonucleotide probing.
  • Probing may employ the standard Southern blotting technique. For instance, DNA may be extracted from cells and digested with different restriction enzymes. Restriction fragments may then be separated by electrophoresis on an agarose gel, before denaturation and transfer to a nitrocellulose filter. Labelled probe may be hybridised to the DNA fragments on the filter and binding determined.
  • Suitable selective hybridisation conditions for oligonucleotides of 17 to 30 bases include hybridization overnight at 42°C in 6X SSC and washing in 6X SSC at a series of increasing temperatures from 42°C to 65°C.
  • Other suitable conditions and protocols are described in Molecular Cloning: a Laboratory Manual: 3rd edition, Sambrook & Russell (2001) Cold Spring Harbor Laboratory Press NY and Current Protocols in Molecular Biology, Ausubel et al . eds . John Wiley & Sons (1992) .
  • genomic nucleic acid may be analysed using a nucleic acid array.
  • a nucleic acid array comprises a population of nucleic acid sequences immobilised on a support. Each sequence in the population has a particular defined position on the support.
  • Nucleic acid arrays are well known in the art and may be produced in a number of ways. For example, the nucleic acid sequence may be amplified using the polymerase chain reaction from a cell or library of sequences, or synthesized ex situ using an oligonucleotide synthesis device, and subsequently deposited using a microarraying apparatus. Alternatively, the nucleic acid sequence may be synthesized in situ on the microarray using a method such as piezoelectric deposition of nucleotides .
  • the number of sequences deposited on the array generally may vary upwards from at least 10, 100, 1000, or 10,000 to between 10,000 and several million depending on the technology employed.
  • the array is a specialised, small array, e.g., comprising nucleic acids capable of hybridising to no more than 1000 different sequences, optionally no more than 500, 400, 300, 200 or 100 different sequences.
  • the kit or array may comprise a nucleic acid capable of hybridising to each of the possible alleles at the site of polymorphism to be analysed (i.e., each of the alleles which may be found in the population) .
  • the kit or array may comprise one or more controls.
  • the nucleic acid array is a genomic array comprising a population of genomic sequences from an individual having a cancer, e.g., colorectal cancer.
  • a genomic tiling path array that covers the regions of interest (e.g., in some embodiments the SMAD7 gene locus or the HMPS/CRAC1 locus) may be employed.
  • every immobilised nucleic acid typically each the same size, corresponds to a specific genomic region, with different immobilised nucleic acids containing nucleotide sequences corresponding to shifts of one or more nucleotides relative to each other along the genomic region.
  • a tiling array may be designed such that each nucleic acid from a stretch of genomic sequence that is on the array differs from its adjacent nucleic acid by a shift of a single base pair, so that a series of nucleic acids will represent a moving window across the stretch of genomic sequence.
  • an array may comprise overlapping immobilised nucleic acid sequences with as little as one nucleotide shifts and as large as the entire size of the nucleic acid, as well as non-overlapping nucleic acids .
  • Genomic sequences immobilised on an array may be hybridised with a labelled oligonucleotide probe using standard techniques .
  • the nucleic acid array may comprise a population of oligonucleotide sequences which correspond to alleles at sites of polymorphism in the genome.
  • the immobilised oligonucleotide probes may then be hybridised with labelled genomic nucleic acid, for example restriction fragments or amplification products, comprising all or part of the region of interest from an individual.
  • nucleic acid sequences on the array to which a labelled probe or nucleic acid hybridises may be determined, for example by measuring and recording the label intensity at each position in the array, for example, using an automated DNA microarray reader. These sequences correspond to the sequence which is present at the site of polymorphism in the individual, and allow the presence of the allele at the site of polymorphism to be determined.
  • Nucleic acid or an amplified region thereof may be sequenced to identify or determine the presence of a particular allele at one or more sites of polymorphism in the genomic region of interest.
  • An allele may be identified by comparing the sequence obtained with a reference genomic sequence, as described above.
  • Sequencing may be performed using any one of a range of standard techniques. Sequencing of an amplified product may, for example, involve precipitation with isopropanol, resuspension and sequencing using a TaqFS+ Dye terminator sequencing kit. Extension products may be electrophoresed on an ABI 377 DNA sequencer and data analysed using Sequence Navigator software.
  • sequence information can be retained and subsequently searched without recourse to the original nucleic acid itself.
  • scanning a database of sequence information using sequence analysis software may identify a sequence alteration or mutation.
  • tumour e.g., adenoma
  • cancer in an individual
  • the identification of an individual who is at risk or likely to be at risk of tumour or cancer e.g., colorectal adenoma or cancer
  • the individual may be assigned an appropriate program of treatment (e.g., preventative treatment) and/or monitoring.
  • the assessment of risk may affect whether monitoring is offered, and/or may influence decisions on the frequency of monitoring or the age at which monitoring begins.
  • Monitoring methods include analysis of blood samples (e.g., for molecular cancer markers in blood serum), analysis of stool or urine samples (e.g., for blood or molecular cancer markers), endoscopy, imaging methods, physical examination, biopsy and other methods that will be apparent to the skilled person.
  • blood samples e.g., for molecular cancer markers in blood serum
  • stool or urine samples e.g., for blood or molecular cancer markers
  • endoscopy e.g., for imaging methods, physical examination, biopsy and other methods that will be apparent to the skilled person.
  • endoscopy such as flexible colonoscopy and/or fecal occult blood tests.
  • individuals may be advised on lifestyle factors (including diet, weight management, smoking) that may affect the risk of developing a tumour/cancer.
  • lifestyle factors including diet, weight management, smoking
  • FIG. 1 The SMAD7 locus, (a) SNP single marker-association results. This panel shows P values for association testing drawn from the GWA study covering SMAD7 and 100kb of sequence upstream and downstream of the gene. The analysis was based on the test allele. All known genes and transcripts in the area are shown (University of California Santa Cruz March 2006 assembly; National Centre for Biotechnology Information Build 36.1). (b) Recombination rate (cM/Mb) across the region derived from HapMap project data (release 21a) (c) The interval situated in SMAD7, between B and C, for targeted resequencing . Figure 2. (a) LD structure of SMAD7.
  • the 5 SNPs with the strongest evidence for an association with colorectal cancer are denoted in blue (rs8085824, Novell, rs34007497, rs4044177 and rsl2953717); (b) Pair-wise linkage disequilibrium (r2) metrics of the 25 SNPs calculated in Haploview (v4.0) software.
  • the values indicate the LD relationship between each pair of SNPs; the darker the shading, the greater extent of LD. Shown are the two haplotype blocks defined within the region.
  • FIG. 3 (a) The tested regions contain an enhancer that promotes reporter gene expression in the rectal region of Xenopus tadpoles.
  • the bright field image above shows a 5 days tadpole embryo.
  • the rectal region is indicated by an arrow.
  • the fluorescent image below shows a detail of the rectal region of a Xenopus transgenic embryo in which GFP expression is promoted by the enhancer.
  • the intensity of the rectal expression promoted by the enhancer from the Protective or the Risk haplotypes was measured relative to the signal observed in a fixed area in the muscles region (boxed, no arrow) , which was considered as 100%.
  • the DNA tested contains either the protective or risk variants of both rs8085824 and Novel 1 (1 and 3) or solely Novel 1 (2 and 4); (e) Box-whisker plot of the relative expression observed in transgenic embryos harboring the Protective or the Risk DNA promoting GFP expression.
  • the enhancer from the risk haplotype/allele shows a significantly decreased enhancer activity.
  • SNP Novel 1 (a) Mutation surveyor output of SNP Novel 1 (b) Genomic sequence surrounding SNPs rs8085824 and Novel 1 corresponding to position Chrl ⁇ : 44, 703, 059-44, 703, 778; UCSC; March 2006 assembly (NCBI build 36.1). Shown in blue is the sequence corresponding to mod052296. Primer sequences used to generate the Xenopus laevis reporter gene expression construct are embolded. Region conserved between Homo sapieans and Canis familiaris (dog) is underlined, however, the immediate sequence encompassing Novel 1 is conserved in all primates .
  • FIG. 8 P values throughout the region of study, logio (Paiieie) values from the Stage 1 genotyping are shown, together with the locations of SGNEl, GREMl and FMNl.
  • Figure 10 Linkage disequilibrium relationships between rs4779584, rslO318 and the additional SNPs typed from sites of putative functional importance in this region. Genotypes are taken from Stage 1 samples. Haploview v3.2 was used to calculate LD values. The pairwise r 2 values are shown in each block.
  • FIG. 12 The 8q23.3 locus, (a) SNP single marker association results. This panel shows P values from the joint analysis of Phases 1 and 2. All known genes (EIF3S3) and predicted transcripts (c8orf53) in the local area are shown. Positions are that of UCSC March 2006 assembly; NCBI build 36.1). The top SNP rsl6892766 (red) was followed up in the additional phases .
  • FIG. 13 The 10pl4 locus, (a) SNP single marker association results. This panel shows P values from the joint analysis of Phases 1 and 2. No genes (predicted or otherwise) reside in the local area. Positions are that of UCSC March 2006 assembly; NCBI build 36.1. The top SNP rsl0795668 (red) was followed up in the additional phases.
  • CRC Colorectal cancer
  • GWA genome-wide association
  • Inherited susceptibility underlies ⁇ 30% of all colorectal cancer (CRC) 1 .
  • CRC colorectal cancer
  • High-penetrance, germline mutations in APC 1 the mismatch repair (MMR) genes, MUTYH/MYH, SMAD4, ALK3 and STKIl/LKBl only account for ⁇ 5% of disease incidence 2 with much of the variation in genetic risk likely to be a consequence of combinations of less penetrant variants that individually, may be common and detectable through genome-wide association (GWA) .
  • MMR mismatch repair
  • the first SNP, rs6983267 mapping to 8q24.21 has previously been implicated as a risk factor for prostate cancer 4 ' 5 .
  • the second SNP rs4939827 maps to SMAD7, a gene that encodes a component of the TGF- ⁇ signalling pathway.
  • An additional 2 SNPs in SMAD7 (rsl2953717 and rs4464148) were among the most extreme P values from the unadjusted analysis.
  • the strength of the association at rs4939827 reached P 3.07xl0 "7 under the trend test (Table 1) .
  • TCT 1.12 1.04-1.21 4.2 x 10 "3 CCC 1.04 0.87-1.24 0.67 CTC 1.07 0.87-1.32 0.52 Rare ⁇ 1% 1.34 0.99-1.81 0.06 includes 320 individuals with high-risk adenomas. 2 Combined analysis based on cancer cases only.
  • 18q21.1 contains another protein-coding gene (CR621005) and a predicted gene of unknown function (KIA0427) the decay in LD away from SMAD7 intron 3 incorporating rs4939827, rsl2953717 and rs4464148, provides little support for either being the location of a causal variant.
  • We searched for mutations and additional polymorphisms in SMAD7 by re- sequencing the coding region of the gene and its associated 5' and 3' UTrs in the genomic DNA of 65 individuals [35 carrying high risk haplotypes and 30 non-carriers] .
  • SMAD7 or Mothers against decapentaplegic homolog 7 belongs to the SMAD family of proteins, which belong to the TGF ⁇ superfamily of ligands 9 .
  • SMAD7 is involved in cell signalling as a TGF ⁇ type 1 receptor antagonist blocking TGF ⁇ l and activin associating with the receptor and blocking access to SMAD2. It is an inhibitory SMAD (I-SMAD) and is enhanced by SMURF2. Perturbation of SMAD7 expression has been documented to influence the progression of CRC 10 . We therefore looked for a relationship among allelic imbalance, mRNA expression and genotype.
  • Loss of chromosome 18q is very common in individuals with CRC 11 , but we did not observe any association between SMAD7 genotype and allelic loss, or the alleles affected by such loss, in 248 individuals with CRC and 49 CRC cell lines (P > 0.45 for rs4939827, rsl2953717 and rs4464148) . Although CRC cell lines expressed a high level of SMAD7 mRNA, comparing genotype with expression phenotype is inherently problematic owing to loss of heterozygosity on 18q, usually accompanied by aneusomy and/or polysomy. We therefore focused on SMAD7 expression in 101 lym- phoblastoid cell lines.
  • locus should contribute to ⁇ 15% of all CRC and -0.8% of the familial risk (genotypic risk around 1.4) which although modest has the potential through interaction with other common alleles to substantially increase an individual's risk.
  • the contribution of the locus to CRC risk is highly significant.
  • Participants Panel A 940 cases with colorectal neoplasia (443 males, 497 females) ascertained through the Colorectal Tumour Gene Identification (CORGI) consortium. All had at least one first- degree relative affected by CRC and one or more of the following phenotypes : CRC at age 75 or less; any colorectal adenoma at age 45 or less; ⁇ 3 colorectal adenomas at age 75 or less; or a large (>lcm diameter) or aggressive (villous and/or severely dysplastic) adenoma at age 75 or less.
  • CRC Colorectal Tumour Gene Identification
  • Panel C 2,012 CRC cases (1,218 males, 794 females; mean age at diagnosis 59.0 years; SD ⁇ 8.2) and 1,717 controls (813 males, 904 females; mean age 55.3 years; SD ⁇ 12.3) ascertained through NSCCG post 2005.
  • Panel D 966 CRC cases collected through VICTOR - a Phase III randomised double-blind placebo controlled study of rofecoxib (VIOXX) in colorectal cancer patients (Dukes stage B or C disease) following potentially curative therapy.
  • Controls - 344 derived from CORGI and population volunteer blood donors. All cases and controls were White Caucasians.
  • CRC International Classification of Diseases
  • DNA was extracted from samples using conventional methodologies and quantified using PicoGreen (Invitrogen) .A genome-wide scan of 550,163 tag SNPs was conducted using the Illumina Hap550 Bead Arrays according to the manufacturer's protocols. DNA samples with GenCall scores ⁇ 0.25 at any locus were considered "no calls". A DNA sample was deemed to have failed if it generated genotypes at fewer than 95% of loci. A SNP was deemed to have failed if fewer than 95% of DNA samples generated a genotype at the locus. To ensure quality of genotyping, a series of duplicate samples were genotyped and cases and controls were genotyped in the same batches.
  • Genotyping of rs4939827, rsl2953717 and rs4464148 was conducted by competitive allele-specific PCR KASPar chemistry (KBiosciences Ltd, Hertfordshire, UK) ; primers and probes used are available on request. Genotyping quality control was tested using duplicate DNA samples within studies and SNP assays, together with direct sequencing of subsets of samples to confirm genotyping accuracy. For all SNPs, >99% concordant results were obtained.
  • Microsatellite instability in CRCs was determined using the following methodology: lOum sections were cut from formalin fixed paraffin embedded tumours, lightly stained with toluidine blue, and regions containing at least 60% tumour micro-dissected. Tumour DNA was extracted using the QIAamp DNA Mini kit (Qiagen, Crawley, UK) according to the manufacturer's instructions and genotyped for the mononucleotide microsatellite loci BAT25 and BAT26 which are highly sensitive markers of MSI 14 . Samples showing novel alleles at either BAT26 or BAT25 or both markers were assigned as MSI (corresponding to a high level of instability, MSI-H 15 .
  • Margarita program 16 we inferred ARGs for 46-SNP haplotypes spanning SMAD7 and its flanking regions (from 44,613,022 to 44,780,189 on NCBI build35). For every ARG, a putative risk mutation was placed on the marginal genealogy at each SNP position by maximizing the association between the mutation and disease status. We evaluated the significance of this observed association through 10 6 permutations on the phenotypes .
  • the risks associated with each SNP were estimated by allelic odds ratio (OR) using unconditional logistic regression, and associated 95% confidence intervals (CIs) were calculated in each case. Associations by site (colon/rectum), MSI status, family history status (at least one first-degree relative with CRC) and age at diagnosis (stratifying into two groups by the median age at diagnosis, 61) were examined by logistic regression in case-only analyses. Haplotypes were inferred using an MCMC method implemented in the program PHASE 17 .
  • Meta-analysis was conducted using standard methods for combining raw data based on the Mantel-Haenszel method 18 .
  • Cochran's Q statistic to test for heterogeneity and the I 2 statistic 19 to quantify the proportion of the total variation due to heterogeneity were calculated.
  • the sibling relative risk attributable to a given SNP was calculated using the formula 20 :
  • CRC cell lines were analysed: C106, C125, C32, C70, C84, C99, CACO2 , CCK81, CL-40, COLO205, COLO320DM, COLO5, COLO678, COLO741, CX-I, DLD1/HCT15, GB126, GP5D, HCA46, HCA7, HCT116, HIW1772, HRA19, HT1115, HT29, HT55, HUTU80, LOVO, LS1034, LS123, LS125, LS174T, LS411, LS513, DMS8, PC/JW, RKO, RS1013, SCKOl, SW1417, SW1463, SW403, SW480, SW620, SW837, SW948, T84, VACO4S and VAC05. Mutational and other analyses
  • Table 4 Description of the candidate SNPs genotyped in cases and controls
  • All SNPs display LD (r 2 >0.50) with rs4939827, rsl2953717 and rs4464148.
  • AAATACCCCA 0.03 0.026 1.674 0.1957
  • CRC International Classification of Diseases
  • Genotyping was conducted by competitive allele-specific PCR KASPar chemistry (KBiosciences Ltd, Hertfordshire, UK) or Illumina iSelect Arrrays . Genotyping quality control was tested using duplicate DNA samples within studies and SNP assays, together with direct sequencing of subsets of samples to confirm genotyping accuracy. For all SNPs, >99.9% concordant results were obtained.
  • haplotypes are estimated using an accelerated EM algorithm similar to the partition/ligation method described in Qin et al (Am J Hum Genet 71: 1242-1247) and tested for association via a likelihood ratio test. Linkage disequilibrium statistics were calculated using Haploview software (v4.0).
  • Xenopus laevis transgenic embryos were generated using the I- Scel method recently described (Ogino H, McConnell WB, Grainger RM (2006) Nat Protoc 1: 1703-1710).
  • a 0.7 kb fragment containing the conserved region from the Protective or Risk human samples were amplified with the following primers: 5'-GCTACCTTAACAAAGCTTCCTCC-S' and 5'-CGCCTGTAAAAGTTGGAGC-S'.
  • HMPS/CRAC1 high-penetrance gene for colorectal cancer (CRC)
  • CRC colorectal cancer
  • CRC colorectal cancer
  • HMPS hereditary mixed polyposis syndrome
  • CRC CRC
  • HMPS/CRAC1 Using microsatellites in five families allowed us subsequently to demonstrate that the disease locus resides between bases 29,775,416 an34,124,377 on chromosome 15. In order to fine-map the location of HMPS/CRAC1, we extended our patient set. Since our previous study, seven previously unaffected individuals from our five families had developed characteristic serrated adenomas. We also identified two further individuals with HMPS based on a phenotype of multiple (>5) adenomas including at least three reported serrated lesions, dominant family history of colorectal tumours, self-reported Ashkenazi ancestry and absence of germline mutations in APC, MYH or the mismatch repair genes.
  • HMPS/CRAC1 To further refine the location of HMPS/CRAC1, we genotyped 8 selected, affected individuals (probands and other family members previously shown to have critical recombination events) and one unaffected, non-carrier mother of a patient using the Illumina Hap550 SNP array. We refined the locations of recombinations and searched for a minimal shared haplotype in the HMPS/CRAC1 region. Using these data, we restricted the location of the gene to 30,735,098-31,369,755 bases.
  • This region contains three known genes: the 3' part of SGNE1/SCG5 (chrl5:30, 721, 252-30, 776, 590) ; GREMl /DRM/CKTSF1B1 (chrl5:30, 797, 497-30, 814, 158) ; and FMNl (chrl5 : 30, 846, 102- 31,147,525).
  • hypothetical genes C15orf45, AX747968 and DKFZp686C2281 map to the region. Despite sequencing all coding sequences, introns, promoter regions and other highly conserved sequences within the minimal region, no mutations unique to HMPS patients were identified.
  • HMPS/CRAC1 locus might harbour not only high-penetrance mutations that cause colorectal tumour in Ashkenazi Jews, but also variants that increase the risk of CRC in the general UK population.
  • An alternative hypothesis was that the same variant might cause disease in multiple ethnic groups, but that some unknown factor in the Ashkenazi genetic background greatly increased the penetrance of the variant.
  • Genotyping rs4779584 and rslO318 in Stages 1-4 The numbers of cases and controls by genotype in each Stage are shown, together with: identities of the major (A) and minor (B) alleles; ⁇ value for the allele test of association; corresponding P value; odds ratio for the susceptibility (minor) allele under the allelic test; odds ratios under dominant and recessive tests; 95% confidence intervals for each test; and Hardy-Weinberg equilibrium P values. Note a probable dosage effect of the high-risk allele.
  • Table 9 shows the alleles for rslO318 as G/A.
  • the skilled person would be aware that this substitution corresponds to a C/T substitution on the opposite strand. If the rslO318 sequence is represented by the plus strand CAAGATATTTGTGGTCTTGATCATAC[CZT]TATTAAAATAATGCCAAACACCAAA ⁇ then the skilled person would recognise that the substitution is C/T. This is also consistent with the ancestral allele at position 30,813,271 being C
  • Table 10 shows Genotyping SNPs within and close to the HMPS/CRAC1 region in our stage 1 CRC samples from the general UK population. The columns show: SNP ID; location on chromosome 15; identity of the major allele (A) ; identity of the minor allele (B) ; total number of cases with each genotype; total number of controls with each genotype; Chi 2 i statistic; corresponding P value; odds ratio for the minor allele (since this is the susceptibility allele at rs4779584 and rslO318); lower 95% confidence interval; and higher 95% confidence interval.
  • HMPS/CRAC1 haplotype in the Ashkenazi HMPS patients contained the high-risk alleles at both rs4779584 and rslO318, but there were no apparent phenotypic differences between HMPS patients who were homozygous and heterozygous for the high-risk alleles at these SNPs (details not shown) .
  • genotype at rs4779584 or rslO318 we found no association between genotype at rs4779584 or rslO318 and any of the clinico-pathological variables (see Methods) .
  • Table 11 SNPs with putative functional importance close to rs4779584 and rslO318, their LD relationships and their associations with CRC.
  • SNPs are present in dbSNP (www.ncbi.nlm.nih.gov/projects/SNP/) , except for rs8034965ins/del which is an unreported C/- polymorphism at chrl5: 30, 799, 068 close. to the SNP rs8034965, and NFN28 which is a previously unreported A/G SNP at chrl5 : 308, 813, 548 bases. All SNPs were initially tested in a set of 96 UK cases and LD relationships with rs4779584 and rslO318 were assessed.
  • rsl2594235 was not assessed further owing to very low LD with rs4779584 and rslO318.
  • the remaining SNPs were then typed in the Stage 1 samples.
  • AIC Akaike's information criterion
  • the relative likelihood (Akaike weight) of the model was compared with the model for rs4779584 by re-scaling and normalising the AIC.
  • Akaike JF weights estimated a relative likelihood of 0.80 that the model with rs4779584 was the best single-SNP explanation for the data.
  • HMPS/CRAC1 a locus within the HMPS/CRAC1 region is associated with low-penetrance predisposition to CRC.
  • the SNP rslO318 is located within the 3' UTR of GREMl, a secreted bone morphogenetic protein (BMP) antagonist.
  • GREMl a secreted bone morphogenetic protein
  • BMP bone morphogenetic protein
  • rs4779584 lies between GREMl and SGNEl. Although genetically and functionally a slightly worse candidate than GREMl, neuroendocrine signalling involving SGNEl 36 could influence cellular proliferation in the large bowel through, for example, signalling of nutrient availability or through systemic hormonal effects.
  • HMPS/CRAC1 The position of the HMPS/CRAC1 locus was initially refined by identifying and genotyping 35 microsatellites within the region in 35 Ashkenazi cases with HMPS and 100 Ashkenazi controls. Details of PCR primers and conditions are available from the authors. A sub-set of the most informative HMPS cases, based on their shared haplotype region from the microsatellite analysis, was subsequently genotyped using the Illumina Hap550 beadArrays using the manufacturer's standard protocols. SNP calls from the region around HMPS/CRAC1 were manually inspected to confirm failure to share genotypes (essentially discordant homozygotes) in the set of affected individuals and to identify the locations of critical recent or ancestral recombination events.
  • Stage 2 comprised 4,500 CRC cases (mean age at diagnosis 60 years, male: female ratio 1.17:1), plus 3,860 healthy control individuals ascertained between 1999 and 2005 through the National Study of Colorectal Cancer Genetics (NSCCG) and the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry. Controls (mean age 60 years) were the spouses or unrelated friends of patients with malignancies. No control had a personal history of malignancy at time of ascertainment. Stage 3 comprised 2,000 CRC cases ascertained through the NSCCG post-2005 (mean age at diagnosis 60 years, male:female ratio 1.55:1) and 1,650 healthy controls (mean age 56 years).
  • Stage 4 consisted of 313 additional cases from CORGI (mean age at diagnosis 63 years, male: female ratio 0.92:1) and 550 additional cases from VICTOR with age of presentation >60 years (mean age at diagnosis 70 years, male:female ratio 1.05:1).
  • CRC International Classification of Diseases
  • DNA samples with GenCall scores ⁇ 0.25 at any locus were considered "no calls".
  • a DNA sample was deemed to have failed if it generated genotypes at fewer than 95% of loci.
  • a SNP was deemed to have failed if fewer than 95% of DNA samples generated a genotype at the locus.
  • the KASPar genotyping system was used (http://www.kbioscience.co.uk/).
  • Common and allele-specific PCR primers were as follows: for rs4779584, the common primer CCAGTAGAACTTGTTGATAAGCCATTCTT was used together with allele-specific primers GAAGGTGACCAAGTTCATGCTCCTGTGTATATAGTTATGGTTTCTGTTCA and GAAGGTCGGAGTCAACGGATTCTGTGTATATAGTTATGGTTTCTGTTCG; and for rslO318, the common primer TAATGCCAAGCACAAAGTGTACATCATAAA was used, together with allele-specific primers GAAGGTGACCAAGTTCATGCTGCAAGATATTTGTGGTCTTGATCATACT and GAAGGTCGGAGTCAACGGATTCAAGATATTTGTGGTCTTGATCATACC . Genotyping quality was tested using duplicate DNA samples within each assay, together with direct sequencing of subsets of 20 samples to confirm genotyping accuracy. For all SNPs,
  • ⁇ * P(pr 2 + qrtf + q ⁇ pr ⁇ + q)
  • p is the population frequency of the minor allele
  • q l-p
  • ri and r 2 are the relative risks (estimated as OR) for heterozygotes and rare homozygotes, relative to common homozygotes .
  • proportion of the familial risk attributable to a SNP was calculated as log ( ⁇ *) /log ( ⁇ 0 ) , where ⁇ 0 is the overall familial relative risk estimated from epidemiological studies, assumed to be 2.2.
  • Aikake information criterion analysis (Table 11) was performed as described 34 .
  • Phase 1 we genotyped 555,352 tagSNPs in 940 individuals with colorectal neoplasia and 965 controls (Panel A) using the Illumina Hap550 BeadChips . To maximise power to identify associations, each case had at least one first-degree relative affected with CRC, thereby genetically enriching for susceptibility alleles 37 ' 38 . Of the 1,905 DNA samples submitted for genotyping, 1,890 samples were successfully processed, generating in excess of 1,000 million genotypes. Genotyping failed in only 15 individuals, leaving genotype data for 930 cases (620 with CRC and 310 with high-risk colorectal adenomas) and 960 controls.
  • Table 12 Summary of results for eleven SNPs selected for Phase 3 , together with selected SNPs described in examples 1 to 3
  • VIOXX® colorectal cancer trial, United Kingdom 50-75 1 10 lllumina lnfinium. KASPar patients following potentially curative therapy (VICTORr)
  • EPICOLON The EPICOLON project 8 515 515 27 - 101 24 - 93 1 45 1 29 Sequenom iPLEX Barcelona, Spam
  • POPGENSHIP The POPGEN 910 and SHIP 11 projects in Northern Germany 2,569 2,699 29 - 90 21 - 81 1 16 092 KASPar
  • rsl6892766 and rsl0795668 are within regions of fairly extensive LD.
  • rsl6892766 maps to 8q23.3 and lies in a 220Kb LD block (117.65Mb-117.87Mb) that encompasses both EIF3S3 and predicted transcript C8orf53 (figure 12).
  • the SNP rsl0795668 maps to an 82Kb LD block (8.73Mb-8.81Mb) within 10pl4 (figure 13) .
  • genotype-specific Ors were most compatible with a multiplicative model (table 15) .
  • Table 18 shows odds ratios corresponding to increasing numbers of risk alleles in rs6983267, rs4779584, rs4939827, rsl0795668 and rsl6892766.
  • Table 13 provides a summary of all cases and controls in the study.
  • participant for phase 1 are set out in example 1, panel A ' .
  • NSCCG 3,036 CRC cases (1,629 males, 1,407 females; mean age at diagnosis 59.4 years; SD+ 8.2) and 2,944 controls (1,183 males, 1,753 females; mean age 55.2 years; SD ⁇ 12.3) ascertained through NSCCG post 2005.
  • VCQ 202 additional individuals with colorectal carcinoma from the CORGI study; 910 patients from the VICTOR study, a randomised trial of VIOXX in patients with stage B and C colorectal cancer (Kerr et al, N Engl J Med. 2007); and 139 patients from the QUASAR2 clinical trial, a study that compares standard chemotherapy of capecitabine against capecitabine plus bevacizumab.
  • the controls were made up of: 250 unaffected spouses or partners from the CORGI study; 376 human random controls from ECACC; and 173 population blood donors. Overall, 53% of the cases and 58% of the controls were female. All cases and controls were of white UK origin.
  • COGS 1,012 CRC cases (518 males, 494 females; mean age at diagnosis 49.6 years; SD ⁇ 6.1) and 1,012 age- and gender- matched cancer-free population controls (518 males, 494 females; mean age 51.0 years; SD ⁇ 5.9) . Cases were enriched for genetic aetiology by early age at onset ( ⁇ 55 years) . Known dominant polyposis syndromes, HNPCC or bi-allelic MYH mutation carriers were excluded. Control subjects were population controls, matched by age (+/- 2 years), gender and area of residence within Scotland.
  • DFCCS 783 CRC cases (370 males, 413 females; mean age at diagnosis 53.4 years; SD ⁇ 13.4) and 664 controls (251 males, 413 females; mean age 51.1 years; SD ⁇ 11.3) ascertained at a clinically based genetic reference centre, Leiden, the Netherlands. This cohort consists of familial cases.
  • EPICOLON 515 CRC cases (305 males, 210 females; mean age at diagnosis 70.6 years; SD ⁇ 11.3) and 515 controls (290 males, 225 females; mean age 69.8 years; SD ⁇ 11.7) ascertained through the EPICOLON initiative, a prospective, multi-centre, nationwide study aimed at compiling prominent epidemiological and clinical data with respect to hereditary non-polyposis colorectal cancer and other familial colorectal cancer forms in Spain. This cohort consists of an incident series collected in Barcelona.
  • FCCPS 1,001 CRC cases (509 males, 492 females; mean age at diagnosis 67.4 years; SD ⁇ 11.8) and 1,034 controls (randomly selected anonymous Finnish blood donors) ascertained in southeastern Finland.
  • MCCS 515 CRC cases (270 males, 245 females; mean age at diagnosis 66.2 years; SD ⁇ 7.7) and 709 controls (352 males, 357 females; mean age 57.9 years; SD+ 7.0) ascertained in Melbourne, Australia.
  • POPGENSHIP 2,569 CRC cases (1,382 males, 1,187 females; mean age at diagnosis 62.4 years; SD ⁇ 9.9) and 2,699 controls (1,296 males, 1,395 females; mean age 53.4 years; SD ⁇ 15.8 ascertained through the POPGEN and SHIP population-based biobank projects based in Kiel and Greifswald, Germany.
  • SEARCH 2,253 CRC cases (1,287 males, 966 females; mean age at diagnosis 59.1 years; SD ⁇ 8.1) and 2,262 controls (949 males, 1,313 females; mean age 53.39 years; SD ⁇ 7.61.
  • Samples were ascertained through the SEARCH (Studies of Epidemiology and Risk Factors in Cancer Heredity) study based in Cambridge, UK. Recruitment of colorectal cancers started in 2000; initial patient contact was though the general practitioner (GP) . Control samples were collected post-2003. Eligible individuals were sex and frequency matched in five year age bands to cases. The study has been approved by the Eastern Multi-Centre Research Ethics Committee (Eastern MREC) .
  • CRC was defined according to the ninth revision of the International Classification of Diseases (ICD) by codes 153-154 47 and all cases had pathologically proven adenocarcinoma or adenomas .
  • Phase 1 genotyping was as described in example 1.
  • Phase 2 genotyping was conducted using Illumina Infinium custom arrays according to the manufacturer's protocols.
  • a DNA sample was deemed to have failed if it generated genotypes at fewer than 95% of loci.
  • a SNP was deemed to have failed if fewer than 95% of DNA samples generated a genotype at the locus.
  • To ensure quality of genotyping a series of duplicate samples were genotyped and cases and controls were genotyped in the same batches in both Phases 1 and 2.
  • Phase 3 genotyping was conducted by competitive allele-specific PCR KASPar chemistry (KBiosciences Ltd, Hertfordshire, UK) ; primers and probes used are available on request. Genotyping quality control was tested using duplicate DNA samples within studies and SNP assays, together with direct sequencing of subsets of samples to confirm genotyping accuracy. For all SNPs, >99.9% concordant results were obtained.
  • Phase 4 genotyping used the same method as Phase 3 or standard alternatives depending upon facilities available locally. For all Phase 4 series typed other than by KASPar, local genotyping quality was confirmed by undertaking KASPar genotyping in a random set of 48 samples and found >98% concordance for all series.
  • Microsatellite instability (MSI) in CRCs was determined using the following methodology: lOum sections were cut from formalin fixed paraffin embedded tumours, lightly stained with toluidine blue, and regions containing at least 60% tumour micro-dissected.
  • Tumour DNA was extracted using the QIAamp DNA Mini kit (Qiagen, Crawley, UK) according to the manufacturer's instructions and genotyped for the mononucleotide microsatellite loci BAT25 and BAT26 which are highly sensitive markers of MSI 48 . Samples showing novel alleles at either BAT26 or BAT25 or both markers were assigned as MSI (corresponding to a high level of instability, MSI-H 49 .
  • the adequacy of the case-control matching and possibility of differential genotyping of cases and controls was formally evaluated using Q-Q plots of test statistics.
  • the inflation factor ⁇ was calculated by dividing the mean of the lower 90% of the test statistics by the mean of the lower 90% of the expected values from a ⁇ 2 distribution with 1 d.f. Deviation of the genotype frequencies in the controls from those expected under Hardy-Weinberg Equilibrium (HWE) was assessed by ⁇ 2 test (1 d.f.), or Fisher's exact test where an expected cell count was ⁇ 5.
  • SNP genotype and disease status were primarily assessed using the allelic 1 d.f. test or Fisher's exact test where an expected cell count was ⁇ 5.
  • the risks associated with each SNP were estimated by allele, heterozygous and homozygous odds ratios (OR) using unconditional logistic regression, and associated 95% confidence intervals (CIs) were calculated in each case.
  • Patterns of risk for associated SNPs were investigated by logistic regression, coding the SNP genotypes according to additive, dominant and recessive models. Models were then compared by calculating the Akaike information criterion (AIC) and Akaike weights for each mode of inheritance. Associations by site (colon/rectum) , MSI status, family history status (at least one first-degree relative with CRC) and age at diagnosis (stratifying into two groups by the median age at diagnosis) were examined by logistic regression in case-only analyses. The combined effect of each pair of loci identified as associated with CRC risk was investigated by logistic regression modelling and evidence for interactive effects between SNPs assessed by likelihood ratio test. The OR and trend test for increasing numbers of deleterious alleles was estimated based on the Phase 2 data by counting two for a homozygote and one for a heterozygote .
  • the GWA studies were both conducted in samples from UK populations (co-ordinated by centres in London and Edinburgh) and both were based on designs involving two-phase strategies,
  • the London Phase 1 was based on genotyping 940 cases with familial colorectal neoplasia and 965 controls ascertained through the CORGI consortium for 555,352 SNPs using the Illumina HumanHap550 BeadChip Array.
  • Phase 1 in the Edinburgh study consisted of genotyping 1,012 early-onset (aged ⁇ 55 years) Scottish CRC cases and 1,012 controls for 555,510 SNPs using the Illumina HumanHap300 and HumanHap240S arrays.
  • London Phase 1 547,487 polymorphic SNPs in 922 familial neoplasia cases (614 with CRC and 308 with high- risk colorectal adenomas) and 927 controls; Edinburgh Phase 1: 548,586 polymorphic SNPs in 980 CRC cases and 1,002 controls. London Phase 2 was based on genotyping 2,873 CRC cases and 2,871 controls ascertained through the National Study of Colorectal Cancer Genetics (NSCCG) , while Edinburgh Phase 2 was based on genotyping 2,057 cases and 2,111 controls.
  • NSCG National Study of Colorectal Cancer Genetics
  • Phase 2 the London and Edinburgh samples were genotyped for a common set of SNPs: the 14,982 SNPs most strongly associated with colorectal neoplasia from London Phase 1; the 14,972 most strongly associated SNPs from Edinburgh Phase 1 (432 of these SNPs were common to both the London and Edinburgh lists of most strongly associated SNPs); and 13,186 SNPs showing the strongest association with CRC risk from a joint analysis of all CRC cases and controls from both Phase 1 datasets (that were not already included in any of the preceding categories) . Therefore Phase 2 was based on genotyping 42,708 SNPs in total.
  • HapMap- http //www.hapmap. org/ http: //pipeline . IbI ,gov/cgi-bin/gateway2
  • Haploview- http //www . broad. mit . edu/personal/jcbarret/haploview/
  • Haiman, CA. et al Multiple regions within 8q24 independently affect risk for prostate cancer. Wat Genet 39, 638-44 (2007) .

Abstract

The present invention relates to the identification of genetic loci associated with susceptibility to tumour and cancer development, particularly risk of colorectal cancer and tumour. The invention relates to the analysis of these loci in samples obtained from individuals, and kits for use in such analysis.

Description

Cancer Susceptibility Loci
Field of the Invention
The present invention relates to the identification of genetic loci associated with susceptibility to tumour and cancer development .
Background to the Invention
In order to enhance the likelihood of early detection of cancers, many countries operate screening programs. However, for economic reasons and to enhance uptake, it would be preferable to focus these programs on individuals who are more likely to be at risk. Moreover, knowing that they are more at risk of developing a disease can make individuals more likely to follow lifestyle advice and to reduce their exposure to environmental risk factors for that disease state.
Much research into the development of cancers focuses on high- penetrance mutations in known genes.
However, often such mutations account only for a very small percentage of inherited susceptibility to cancer. For example, germ-line mutations in known genes (APC, mismatch repair genes, MUTYH/MYH, SMAD4 , ALK3 and STK11/LKB1) account for less than 5% of colorectal cancers. However, inherited susceptibility is believed to underlie around 30% of colorectal cancers .
Thus, identifying further mutations accounting for variance in genetic risk is likely to be of value.
Summary of the Invention The present inventors have identified new susceptibility loci associated with the risk of tumour and/or cancer development.
In one aspect, the present invention provides a method of determining, in a sample obtained from an individual, the allele present at one or more sites of polymorphism, wherein said one or more sites of polymorphism are associated with tumour and/or cancer susceptibility, and wherein at least one site of polymorphism is located in a region selected from:
Chromosome 8: 117650000-117870000;
Chromosome 10: 8730000-8810000;
Chromosome 10: 76789000-76811000;
Chromosome 4: 63108000-63170000; and
Chromosome 1: 231840000-231868000.
Position numbering in the above refers to position numbering in the stated chromosome as shown in UCSC March 2006 reference assembly; NCBI build 36.1. This applies to all other position numbering given in the present application unless expressly indicated otherwise.
Additionally or alternatively, the present invention provides a method of determining, in a sample obtained from an individual, the allele present at one or more sites of polymorphism, wherein said one or more sites of polymorphism are associated with tumour and/or cancer susceptibility, and wherein at least one site of polymorphism is located in the region of the SMAD7 gene (e.g., 18q21.1) and/or in the region of the HMPS/CRAC1 locus (e.g., 15ql3.3).
In another aspect, the present invention provides a method of assessing an individual for a cancer condition or tumour, comprising : providing a sample obtained from said individual, and determining in said sample the allele present at one or more sites of polymorphism, wherein said one or more sites of polymorphism are associated with tumour and/or cancer susceptibility, and wherein at least one site of polymorphism is located in a region as set out above (i.e., a region selected from chr8 : 117650000-117870000, chrlO : 8730000-8810000, chrlO: 76789000-76811000, chr4 : 63108000-63170000 and chrl : 231840000-231868000, the region of the SMAD7 gene e.g., 18q21.1 and/or the region of the HMPS/CRAC1 locus e.g., 15ql3.3) .
The assessment of the individual may be for diagnostic or prognostic purposes. Preferably, the assessment is of the risk of that individual to tumour or cancer, e.g., colorectal tumour or cancer.
Accordingly, the invention also provides a method for identifying an individual who is at risk or likely to be at risk of tumour or cancer, the method comprising: determining in a sample obtained from an individual, the allele present at one or more sites of polymorphism, wherein said one or more sites of polymorphism are associated with tumour and/or cancer susceptibility, and wherein at least one site of polymorphism is located in a region as set out above.
The regions identified by the present inventors as being associated with risk may contain transcribed sequences, e.g., encoding RNA and/or proteins. In another aspect, the present invention provides a method of assessing an individual for a cancer condition or tumour or a method of identifying a patient who is at risk or likely to be at risk of tumour or cancer, the method comprising determining in a sample obtained from an individual the transcription level of a gene located in a region as set out above .
Other aspects of the invention relate to kits for determining in a sample obtained from an individual, the allele present at one or more sites of polymorphism, wherein said one or more sites of polymorphism are associated with tumour and/or cancer susceptibility, and wherein at least one site of polymorphism is located in a region as set out above. The kits may comprise reagents for determining in a genomic sample obtained from the individual, the allele present said sites of polymorphism.
For example, the kit may comprise amplification reagents for amplifying all or part of the regions described above, from a genomic sample obtained from an individual. Amplification reagents may include buffers, nucleotides, taq or other polymerase and/or one or more oligonucleotide primers which bind specifically within a region as described above and are suitable for amplifying a region containing one or more sites of polymorphism, for example by PCR.
The kit may comprise a labelled oligonucleotide probe which binds to an allele at a site of polymorphism in a region as described above.
Optionally, the kit may comprise multiple such probes, e.g., for detection of multiple polymorphisms, which may be supported on a solid substrate. The kit may comprise a microarray .
In another aspect, the invention relates to a kit for detecting, in a sample obtained from an individual, the transcription level of a gene located in a region as described above. The kit may comprise reagents for the detection of mRNA and/or protein, e.g., an oligonucleotide probe or an antibody. The reagents may be detectably labelled (directly or indirectly) . The reagents may be immobilised on a solid support or may comprise a tag which is suitable for immobilisation on a solid support.
Generally, a kit as described herein may comprise one or more articles and/or reagents for performance of the method, such as means for providing the test sample itself, e.g. a swab for removing cells from the buccal cavity or a syringe for removing a blood sample (such components generally being sterile) .
The kit may further comprise instructions for using the kit in accordance with a method described herein.
A kit may further comprise control nucleic acid.
The kits may be used for identifying a patient who is at risk of or likely to be at risk of cancer or tumour.
SMAD7 and the HMPS/CRAC1 locus
As mentioned above, the present inventors have shown that polymorphisms in the SMAD7 gene region are associated with cancer, and particularly colorectal cancer. Thus, in some aspects, this invention relates to the identification of SMAD7 as a susceptibility gene in cancer.
The present inventors have shown a large number of polymorphisms in the SMAD7 locus to be associated with a significant increase in CRC risk. This association is clearly of value, e.g., in assessing an individual's likely risk. Moreover, the present inventors have identified a novel polymorphism in the SMAD7 locus, and provided evidence for that this polymorphism is located in a gut enhancer region and can affect the enhancer potential of this region. This causative role is unexpected, as many SNPs are not themselves candidates for causality.
As further mentioned above, the present inventors have also shown that polymorphisms in the HMPS/CRAC1 locus are associated with cancer, and particularly colorectal cancer.
A high penetrance gene associated with colorectal cancer (HMPS/CRAC1) has been previously mapped to a region of around 2Mb on chromosome 15ql3.3 in the Ashkenazi population (individuals of Ashkenazi Jewish descent) (Jaeger et al Am J Hum Genet 72, 1261-7, 2003; Tomlinson I et al, Gastroenterology 116, 789-95, 1999) . However, no causative mutations were identified in the Ashkenazi population.
The present inventors have surprisingly found that polymorphisms in the HMPS/CRAC1 locus are associated with cancer, and particularly colorectal cancer, in the general population. In other words, the HMPS/CRAC1 locus contains variants that increase cancer susceptibility in the general population .
Thus, in one aspect, the present invention provides a method of determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the SMAD7 gene.
In another aspect, the present invention provides a method of determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the HMPS/CRAC1 locus.
In another aspect, the present invention provides a method of assessing an individual for a cancer condition or an adenoma, comprising : providing a sample obtained from said individual, and determining in said sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the SMAD7 gene, and/or in the region of the HMPS/CRAC1 locus.
The assessment of said individual may be for diagnostic or prognostic purposes. Preferable, the assessment is of the risk of that individual to adenoma or cancer, e.g., colorectal cancer or adenoma. Accordingly, in another aspect, the present invention provides a method of identifying a patient who is at risk of or likely to be at risk of colorectal adenomas or colorectal cancer, the method comprising: determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the SMAD7 gene, and/or in the region of the HMPS/CRAC1 locus; the presence of a variant allele at the one or more sites being indicative that the individual is at risk of colorectal cancer or colorectal adenoma.
Other aspects of the invention relate to kits for determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the SMAD7 gene and/or in the region of the HMPS/CRAC1 locus. The kits may comprise reagents for determining in a genomic sample obtained from the individual, the presence or absence of a variant allele at one or more sites of polymorphism in the genomic region of the SMAD7 gene, and/or in the region of the HMPS/CRAC1 locus.
For example, the kit may comprise amplification reagents for amplifying all or part of the SMAD7 gene, and/or the HMPS/CRAC1 locus, from a genomic sample obtained from an individual. Amplification reagents may include buffers, nucleotides, taq or other polymerase and/or one or more oligonucleotide primers which bind specifically to the SMAD7 gene and/or the HMPS/CRAC1 locus and are suitable for amplifying a region of the gene containing one or more sites of polymorphism, for example by PCR.
The kit may comprise a labelled oligonucleotide probe which binds to an allelic variant at a site of polymorphism in the genomic region of the SMAD7 gene, or in the region of the HMPS/CRAC1 locus. Optionally, the kit may comprise multiple such probes, e.g., for detection of multiple polymorphisms, which may be supported on a solid substrate. The kit may comprise a microarray .
The kit may comprise one or more articles and/or reagents for performance of the method, such as means for providing the test sample itself, e.g. a swab for removing cells from the buccal cavity or a syringe for removing a blood sample (such components generally being sterile) .
The kit may further comprise instructions for using the kit in accordance with a method described herein.
A kit may further comprise control nucleic acid.
The kits may be used for identifying a patient who is at risk of or likely to be at risk of colorectal adenoma or cancer, wherein the presence of a variant allele at the one or more sites is indicative that the individual is at risk of colorectal adenoma or cancer.
In the above aspects, where the presence of a variant allele in the genomic region of the HMPS/CRAC1 locus is to be determined, it may be preferred that the individual is not a member of the ethnically Ashkenazi Jewish population: that is, is not an individual of Ashkenazi Jewish descent. In some embodiments, the methods of assessing cancer may be methods of assessing the risk of a colorectal cancer other than that forming part of or arising from HMPS. In some embodiments, the individuals may be individuals not having the clinical phenotype multiple polyps in the large bowel, characteristic of HMPS. It may be preferred that the cancer to be assessed is of non-polyposis origin. Variant alleles of SMAD7 or of the HMPS/CRAC1 locus may alter (i.e., increase or decrease) the risk of cancer in an individual .
A SMAD7 variant allele may contain one or more mutations relative to a reference sequence, which may be a sequence as given below. For example, a SMAD7 variant allele may contain one or more mutations relative to the sequence of the SMAD7 genomic region as set out below, e.g., between bases 44,700,221 and 44,731,079 of the human Genome 2006 Build, and the surrounding region. The mutation may be a substitution, deletion, copy number change or insertion. Most preferably, the variation may be one or more single nucleotide polymorphisms .
One or more of the mutations may be contained within intron 3 of SMAD7.
The presence of a variant allele may be identified at one, two, three or more sites of polymorphism.
For instance, the sites of polymorphism assessed may include one, two or three of Rs4939827, Rsl2953717 and Rs4464148 (described further below) optionally in combination with one, two, three, four, five or more other sites of polymorphism known to the associated with cancer or adenoma risk, e.g., colorectal cancer risk. For instance, a method described herein may comprise determining the presence or absence of a T at SNP rs4939827 in the genomic nucleic acid sample obtained from the individual .
Optionally, polymorphism may be assessed at Rs4939827 plus one or more additional sites. Said one or more addition sites may, for example, include Rs4464148 and/or Rsl2953717.
A HMPS/CRAC1 variant allele may contain one or more polymorphisms relative to a reference sequence, which may be the sequence residing between bases 29,775,416 and 34,124,337 on chromosome 15 of the Human Genome 2006 Build (http: //genome . ucsc/edu) . The mutation may be a substitution, copy number change, deletion or insertion. Most preferably, the variant may be one or more single nucleotide polymorphisms .
The sites of polymorphism assessed may include one or both of rs4779584 or rslO318 (details of which are given below) , optionally in combination with one, two, three, four, five or more other sites of polymorphism known to be associated with cancer risk, e.g., colorectal cancer risk, which may for instance be sites in the SMAD7 gene region or other sites known to be associated with colorectal cancer risk. For instance, one such other site may be rs6983267, details of which are given below. In one exemplary embodiment, the sites of polymorphism assessed may include rs4779584 and rs6983267.
In any of the above aspects of the invention, the presence or absence of the variant allele at the site may be determined in one or both copies of the region in the genome of the individual. In some embodiments, homozygosity of the risk allele may be associated with a higher risk of the disease than heterozygosity.
The presence of a variant allele at the one or more sites of polymorphism may be determined by any convenient technique, including amplification of all or part of the genomic region of the SMAD7 gene, including the SMAD7 gene itself, or of the HMPS/CRAC1 locus; sequencing all or part of the genomic region of the SMAD7 gene, including the SMAD7 gene itself, or of the HMPS/CRAC1 locus; and/or hybridisation of a probe which is specific for a variant allele. Suitable methods are described in more detail below (see Methods of determining an allele present at a site of polymorphism) . Moreover, the present inventors have noted that certain SNPs in the SMAD7 sequence alter the raRNA expression levels . Therefore, in a further aspect, the present invention provides a method of assessing an individual for a cancer condition, comprising: determining the expression level of SMAD7 in a sample obtained from said individual.
The method may be a method of assessing the risk of cancer in said patient, e.g., the risk of colorectal cancer. A lower SMAD7 expression may be associated with an increased risk of cancer.
The expression level of a gene may be measured by measuring the level of an expression product such as mRNA or protein. Methods of quantitatively measuring mRNA and protein are well known in the art.
Detailed Description
The term allele refers to one of several alternative forms of a given DNA sequence (which may or may not be present in an exon or in a gene) .
The term "haplotype" refers to the identity of two or more polymorphic variants occurring within genomic DNA on the same strand of DNA.
A cancer or cancer condition as described herein may include any type of solid cancer and malignant lymphoma and especially leukaemia, sarcomas, skin cancer, bladder cancer, breast cancer, uterus cancer, ovary cancer, prostate cancer, lung cancer, colorectal cancer, cervical cancer, liver cancer, head and neck cancer, oesophageal cancer, pancreas cancer, renal cancer, stomach cancer and cerebral cancer. A tumour may be a tumour of any of the above tissues or organs. It may be an adenoma. Methods of the invention may be particularly useful in assessing the risk of colorectal cancer or tumour, e.g., adenoma. Colorectal cancer may in some embodiments be further divided into colon cancer or rectal cancer.
In some embodiments, assessment of the risk or susceptibility of an individual may comprise assessing the risk of an early or late onset of the cancer or tumour, e.g., an onset at earlier or later than 60 years old.
The sample obtained from the individual may be a nucleic acid sample, e.g., a sample of genomic nucleic acid. In instances where the mutations are located in a coding region, and the mutation is non-silent, the sample may be a protein sample, and the presence of the variant allele may be determined by changes in the amino acid sequence. Where the mutation is located in a region which is transcribed into RNA, e.g., mRNA, the sample may be an RNA, e.g., mRNA sample.
Suitable nucleic acid- or protein- containing samples may include a tissue or cell sample, such as a biopsy, or a biological fluid sample, such as a blood sample or a swab.
It may be preferred that the individual or patient referred to herein is an individual or patient showing no symptoms of cancer or tumour. Risk assessment may take place for individuals who are considered to be free of the disease/condition at the time the sample is taken. The individual may be a healthy individual, having no symptoms of colorectal disease.
Regions of polymorphism
As explained above, known germline mutations in known genes account for a very low percentage of inherited risk of cancer. Much of the remaining variation of genetic risk may be attributable to a large number of susceptibility loci, some of which will be common, each exerting a small influence on risk. An individual having disease-associated alleles, particularly at several positions, may be considered at a higher risk, and may be subject to an appropriate regime of monitoring and/or may take precautionary steps to reduce that risk. Thus, it is of value to identify further risk markers.
The present inventors have identified novel susceptibility loci associated with cancer/tumour susceptibility.
The inventors have identified single nucleotide polymorphisms within these loci, associated with susceptibility. Of course, these SNPs may not themselves be directly causative of the increased susceptibility. As a result of linkage disequilibrium, polymorphisms can be associated with a causative change, so as to serve as useful markers of risk without being directly causative themselves. The region of linkage disequilibrium with the SNP is likely to contain both the causative gene/mutation and other useful markers.
For instance, the inventors have identified the SNP rsl6892766 in the linkage disequilibrium block chr : 8 117.65Mb-117.87Mb. In addition to rsl6892766, two other SNPs in the region - rsll986063 and rs6983626 - were associated with CRC risk at P<10"'1 in Phases 1 and 2 (as detailed in example 3) ; both of these are correlated with rsl6892766 (r2 =0.78 and r2=0.43 respectively, where r2 denotes the correlation co-efficient between the two loci)
In some embodiments the region containing the site of polymorphism may be chromosome 8: 117690773-117712909.
Similarly, the inventors have identified the SNP rsl0795668 in the linkage disequilibrium block chr:10 8.73-8.81Mb. Three additional SNPs in this LD block - rs706771, rs7898455 and rs827405- showed evidence of association (P<10~3) with CRC risk in the Phase 1 and 2 joint analysis; two of these were strongly correlated (rs706771, rs7898455) and one weakly correlated (rs827405) with rsl0795668 (r2=0.90, r2=0.89 and r2=0.13 respectively).
Suitable markers/polymorphisms may be a substitution, copy number change, deletion or insertion. Most preferably, the marker/polymorphism may be one or more single nucleotide polymorphisms .
Thus, in some embodiments, preferred polymorphism may be one or more of:
rsl6892766 (chr8 : 117699864)
CAGACGCAAACAGTTTCAAGACTATT [A/C] GCTGTTAAAGGTTATGCCTTATGTC
rs2488704 (chrlO : 76808630)
CTGTTTTCCCTAGAGGTGCCAGGAAG [A/G] AAGAGAATAAATTGCCCCCAAAATC
rs4355419 (chr4 : 63165287)
CTTTATCTGCAGCATGAAAATGAGCTtA/C] ATACAGTTTCTTAGAGGCAAATTTT
rs2282428 (chrl: 231852793)
CTTAAAGGACTTGGGGAATATTAAAA [C/T] GTGTGCAGAGAACAGAGGGGAGCAT
rsl0795668 (chrlO : 8741225) CAGAAAGAGAAAAAGTTAGATTCTTA[AZG]ATTCCATGATTTTATATTTCCCACC
An indication of higher risk in an individual may in some embodiments be associated with detection of a C at rsl6892766.
An indication of higher risk in an individual may in some embodiments be associated with detection of an A at rs2488704.
An indication of higher risk in an individual may in some embodiments be associated with detection of a C at rs4355419. An indication of higher risk in an individual may in some embodiments be associated with detection of a C at rs2282428.
An indication of higher risk in an individual may in some embodiments be associated with detection of a G at rsl0795668.
SMAD7
Some embodiments of the invention may comprise determining the allele present at one or more sites of polymorphism in the region of the SMAD7 gene.
SMAD7 (Mothers against decapentaplegic homolog 7) acts as an intracellular antagonist of TGFβ signalling by binding stably to the receptor complex and blocking activation of downstream signalling events. The human SMAD7 protein sequence has an exemplary database entry AAL68977, gi: 18418630. Exemplary human nucleic acid sequences for the four exons are given in database entries AF026556.1, GI:18418626, GI:18418627, GI: 18418628 and GI: 1841829. The human SMAD7 gene is located at 18q21.1, and an exemplary sequence of the human SMAD7 gene is set out between bases 44,700,221 and 44,731,079 of the human Genome 2006 Build (http: //genome. ucsc.edu) for chromosome 18. In some embodiments, one or more sites of polymorphism may be located in intron 3 of the SMAD7 gene. In some embodiments it may be preferred that one or more sites of polymorphism is located in the region Chr 18: 44,700,221-44,716,898.
Exemplary polymorphisms located in the region of SMAD7 include :
rs4939827: (Chrl8 :position 44707461) CTCACAGCCTCATCCAAAAGAGGAAA [C/T] AGGACCCCAGAGCTCCCTCAGACTC
rsl2953717: (Chr 18: position 44707927) GCATTTCACACCAACCTCGCATGCAG [C/T] CTCCCGGTAAGTTCAGCTCATCCCT rs4464148: (Chrlδ: position 44713030) CGGGGGAACAGACAGAGAAGGATGAA [C/T] GTGAAAAGGAAACACCCTGGTAACT
An indication of higher risk in an individual may in some embodiments be associated with detection of a T at rs4939827, a T at rsl2953717 and/or a C at rs4464148.
For example, when the three polymorphisms rs4939827, rsl2953717 and rs4464148 are assessed, exemplary haplotypes associated with risk are: TTC, TTT and TCT.
Other possible SNPs in the SMAD7 region are shown in tables 3 and 5.
In particular, a polymorphism in the SMAD7 gene region may be a SNP newly identified by the present inventors. This SNP is a C/G polymorphism at 44703563bp of chromosome 18.
An indication of higher risk in an individual may in some embodiments be associated with detection of a G at position 44703563bp.
Other exemplary polymorphisms in the SMAD7 region, shown in table 5, are rs8085824, rs34007497, rs4044177 and rsl2953717. An indication of higher risk in an individual may in some embodiments be associated with detection of a C at rs8085824, a G at rs34007497, AAGAA at rs4044177 and a T at rsl2953717.
Methods of the invention may involve analysis of haplotypes comprising more than one SNP. An exemplary haplotype is between markers rs6507874 and rs36025258. Another exemplary haplotype is between rs9946510 and rsl2967711. A risk haplotype may include a G at position 44703563bp.
For example, the method may comprise determining the presence or absence of a haplotype shown in table 7, where the allele is shown for each marker (in order) within the specified region (see figure 2c for the order of the markers) .
In some embodiments, methods of the invention may comprise assessing the allele present at at least one site of polymorphism located in "block 1" (from rs6507874 to rs36025258) and/or at least one site of polymorphism located in "block 2" (rs9946510 to rsl2967711) . Optionally, the assessment may be of a haplotype in block 1 and/or a haplotype in block 2.
HMPS/CRAC1
Some embodiments of the invention may comprise determining the alelle present at one or more sites of polymorphism in the region of the HMPS/CRAC1 locus.
The HMPS/CRAC1 locus referred to herein is located at 15ql3.3- ql4, and an exemplary sequence is given between bases 29,775,416 and 34,124,377 on chromosome 15 of the human genome 2006 build (http : //genome . ucsc . edu) . In some embodiments, one or more sites of polymorphism may be located in the region chrl5: 30,782,050-30,841,010.
Exemplary polymorphisms located in the region of the HMPS/CRAC1 locus include:
rs4779584: TAGAACTTGTTGATAAGCCATTCTTC [C/T] GAACAGAAACCATAACTATATACAC
rsl0318: CAAGATATTTGTGGTCTTGATCATAC [C/T] TATTAAAATAATGCCAAACACCAAA
In some embodiments, a higher risk of colorectal cancer or adenoma may be associated with a T at rslO318 and/or a T at rs4779584. Other possible polymorphisms in the region of the HMPS/CRAC1 locus may include: rsl2906413; rsll853552; rsll857190; rsl2148790; rsll857997; rs8034965ins/del (a C/- polymorphism at chrl5:30,799,068) ; rs3743103; NFN28; and/or rsll29456.
rsl2906413:
TCACAACTCAAACCACATTG [T/C] GGTATTTAAGTAAAGGGGAGCATT
Rsll853552:
ATTCAATATATTATAGGAAGGAAA [G/A] GATCAAGGAAGAGCTGGTCTTAAA
RS11857190:
AAGGAAAACAGAGCAGTTGGAAATGC [G/A] AAGATAAAAGAACTGGATCCAG
Rsl2148790:
TATACATTAAATGTTGAGCTTT [T/C] CTTGCAGCAATGCACTTGGAAGGCTC
Rsll857997:
AACACATGGCATAATGGTATTATGA [T/C] TCATCTACATATGGCCTGATCCT
Rs3743103:
CTGCTGCCAACTTAAGTCATCTCC [G/A] TTAACGAAATTGCATTCTTGTGGC
NFN28:
AAGACTCTAACAACATACGGTGTGTCAG [T/C] TTCTTCTTTGCC
Rsll29456:
GCATTATGTATTATGTCTGCTT [T/A] AATCATTTAAAAACGGCAAAG
In some embodiments, a higher risk of colorectal cancer or adenoma may be associated with a T at rsl2906413, a G at Rsll853552, an A at Rsll857190, a T at Rsll857997, and/or an A at Rsll29456.
Polymorphisms in linkage disequilibrium, transcribed sequences and panels
Additional polymorphism may be polymorphisms in linkage disequilibrium with any of the above (e.g., having an r2 greater than 0.3, more preferably greater than 0.4, 0.5, 0.6, 0.7, 0.8 or 0.9, and/or a value of D' of greater than 0.5, more preferably greater than 0.6, 0.7, 0.8 or 0.9). E.g., other polymorphism may have an r2 greater than 0.3, more preferably greater than 0.4, 0.5, 0.6, 0.7, 0.8 or 0.9, and/or a value of D' of greater than 0.5, more preferably greater than 0.6, 0.7, 0.8 or 0.9, with rsl6892766, rsl0795668, rs2488704, rs4355419, rs2282428, rs4939827, rsl2953717, rs4464148, rs4779584 and/or rslO318.
It is noted that table 5 lists 22 SNPs in addition to rs4939827, rsl2953717, rs4464148: all 22 of these SNPs have an r2 of at least 0.5 with one or more of rs4939827, rsl2953717, rs4464148 and all have an association with risk of developing CRC at the 5% statistical level.
For example, a method of the invention may comprise determining the presence or absence of a risk-associated allele which is in linkage disequilibrium with a C at rsl6892766, a G at rsl0795668, an A at rs2488704, a C at rs4355419 or a C at rs2282428.
A method of the invention may comprise determining the presence or absence of a risk-associated allele which is in linkage disequilibrium with a T at rs4939827, a T at rsl2953717, a C at rs4464148, a T at rsl0318, and/or a T at rs4779584.
A method of the invention may comprise determining the presence or absence of a risk-associated allele which is in linkage disequilibrium with a G at position 44703563bp of chromosome 18.
As mentioned above, exemplary sites in linkage disequilibrium with rsl689276β are rsll986063 and rs6983626.
rsll986063:
AGAAAGAGCAGAGAAAACACTCAATG [C/T] GAATACAGCCTTAGACAAATTCTAA rs6983626: gaggtactactttgaaagataacatt [C/T] gcttggTGTTTGCTCATATATGTGT
As mentioned above, exemplary sites in linkage disequilibrium with rsl0795668 are rs706771, rs7898455 and rs827405.
rs706771:
CTTTTTGCTTGCCACAAGTCTTGAAT [A/G] ACAAGTAACAAACATCTATTTAGAA
rs7898455: tgacagcttcattgcaggcatatgaa [G/T] ttccaggcagaagacccagataagc
rs827405:
ATGGTGTGTTTGAGATGAGTAAGAGA [C/T] ACATTAATTAGTTAAGGCTTTCTGG
Moreover, the regions in linkage disequilibrium with the identified SNPs may contain transcribed sequences, e.g., coding for proteins and/or RNAs. Thus, the site of polymorphism may be located in one of these transcribed sequences .
As mentioned above, in some aspects, the present invention relates to methods comprising determining the transcription level from one of these genes. The expression level of a gene may be measured by measuring the level of an expression product such as mRNA or protein. Methods of quantitatively measuring mRNA and protein are well known in the art.
Exemplary genes located in the regions set out above are EIF3S3(NM_003756) , C8orf53 (NM_032334 ) , BC031880, LOC389936, FLJ3802842, KCNKl (NM_002245) and SMAD7 (as previously described) .
The gene EIF3S3 is known to regulate cell growth and viability, and its overexpression is a feature of breast, prostate and hepatocellular cancers. The work of the present inventors supports this gene as a causative gene in colorectal cancer/adenoma also. Hence, preferred embodiments may comprise assessing the transcription level of EIF3S3 in order to assess colorectal cancer/tumours in an individual or to identify an individual at risk of colorectal cancer/tumours. Additionally, EIF3S3 may be a therapeutic target in the treatment of colorectal cancer or tumours, and thus may be used in methods of screening for therapeutic compounds for the treatment of these conditions. Modulators of EIF3F3 may be useful as therapeutics in the treatment (including preventative treatment) of these conditions, e.g., antibodies against EIF3S3, nucleic acid inhibitors having a sequence complementary to EIF3S3 (such as antisense, RNAi molecules such as siRNA and miRNA, ribozymes and the like) and vectors encoding nucleic acid inhibitors.
In some embodiments of the present invention, it may be preferred that the alleles present at a plurality of sites of polymorphism are assessed.
In some embodiments, at least one site of polymorphism is located in a region as set out above, and one or more additional polymorphism known to be associated with the disease/condition is assessed. For instance, one such other site may be rs6983267, details of which are given below.
rs6983267:
CGTCCTTTGAGCTCAGCAGATGAAAG [G/T] CACTGAGAAAAGTACAAAGAATTTT
In some embodiments, a higher risk in an individual (e.g., of adenoma or colorectal cancer) may be associated with a G at this position.
Additionally or alternatively, a plurality of sites may be located in one or more regions as set out herein. The plurality of polymorphisms may comprise a plurality of polymorphisms within a given region as described herein, and/or may include polymorphisms in a plurality of different regions . By "plurality" herein is meant at least 2, and in some embodiments 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 50, 100 or more.
The risk of developing a condition may be increased in an individual the more risk-associated alleles they have. For instance, counting two for a homozygote, the risk of CRC increases with increasing numbers of variant alleles for the five loci rs6983267, rs4779584, rs4939827, rsl0795668 and rsl6892766 (as discussed further in example 7) .
Thus, one or more sites of polymorphism according to the present invention may be assessed as part of a panel.
For instance, the panel may comprise (but is not limited to) :
rsl6892766 and rsl0795668;
rsl6892766 and rs4939827, and optionally rsl2953717 and/or rs4464148;
rsl0795668 and rs4939827, and optionally rsl2953717 and/or rs4464148;
rsl6892766 and one or more SNPs selected from rs4779584, rslO318, rsl2906413; rsll853552; rsll857190; rsl2148790; rsll857997; rs8034965ins/del; rs3743103; NFN28; and rsll29456;
rsl0795668 and one or more SNPs selected from rs4779584, rslO318, rsl2906413; rsll853552; rsll857190; rsl2148790; rsll857997; rs8034965ins/del; rs3743103; NFN28; and rsll29456;
rsl6892766, rsl0795668, rs4939827 and rs4779584;
rs4939827 and either rsl2953717 or rs4464148; rs4939827, rsl2953717 and rs4464148;
rs4939827 and either or both of rs4779584 and rslO318;
rs4779584 and rslO318;
The newly identified polymorphism at chr 18 : 44703563bp, with one or more of rsl6892766, rsll986063, rs6983626, rsl0795668, rs4779584 and rslO318.
Any of the above combinations, and any of the individual sites of polymorphism taught herein, or combinations thereof, may also be assessed together with rs6983267. (As just one example, the sites of polymorphism assessed may include rs4779584 and rs6983267. As another example, the sites may include rsl6892766, rsl0795668, rs4939827 and rs4779584 and rs6983267) .
Moreover, in the present invention, the allele present at the site may be determined in one or both copies of the region in the genome of the individual. In some embodiments, homozygosity of the risk allele may be associated with a higher risk of the disease than heterozygosity. Optionally, in assessing the number of risk alleles that an individual has, one will be counted for a heterozygote comprising one copy of the risk-associated allele, and two for a homozygote comprising two copies of the risk-associated allele at a given site of polymorphism.
Accordingly, the present invention additionally comprises methods of assigning a regime of treatment and/or monitoring to an individual based on the number of risk-associated alleles they have at a plurality of sites of polymorphism as taught herein. For instance, said plurality of sites of polymorphism may be or comprise any of the combinations set out above. In some embodiments, said polymorphisms may be or comprise rsl6892766, rsl0795668, rs4939827, rs6983267 and rs4779584 or a site in linkage disequilibrium therewith. For instance, a regime of monitoring may be assigned (e.g., of regular colonoscopic examination) if an individual has seven or more risk-associated alleles at said sites, where homozygous alleles are counted as two and heterozygous alleles as one.
Methods of determining an allele present at a site of polymorphism
Determining the allele present at a particular site of polymorphism may in some embodiments comprise determining the nucleotide or sequence of nucleotides at that site. In other embodiments, determining the allele present at a particular site of polymorphism may comprise determining the presence or absence of the disease-associated allele.
The allele at the one or more sites of polymorphism may be determined by any convenient technique, including amplification of all or part of the region containing the site of polymorphism, sequencing all or part of the region containing the site of polymorphism, and/or hybridisation of a probe specific for an allele at the site of polymorphism.
A specific amplification reaction such as PCR using one or more pairs of primers may conveniently be employed to amplify all or part of the region of interest, for example, the portion of the sequence containing or suspected of containing the one or more sites of polymorphism.
In some embodiments, the amplification may be allele specific, such that the presence or absence of amplification product is indicative of the presence of that allele (e.g., the risk- associated allele) . In other embodiments, the amplified nucleic acid may be sequenced as above, and/or tested in any other way to determine the presence or absence of an allele (e.g., the risk-associated allele) at the one or more sites of polymorphism.
For instance, the method may comprise in some embodiments ligation-based methods. Such a method may comprise hybridising a first and second probe to a first target domain comprising a polymorphism of interest (e.g., a SNP) at an interrogation position. Either the end of the first probe or the beginning of the second probe contains a nucleotide or nucleotides at a detection position, which aligns with the interrogation position on the target. Only if there is complementarity between the detection position and the interrogation position, then the two probes can be ligated (optionally, following filling of a gap between them) . Each probe has a primer sequence for amplification, such that the ligated probe comprises both an upstream and downstream primer and can be amplified, e.g., by PCR. Thus, the presence of the amplification product indicates a match between the interrogation and the detection position. In some embodiments, different primers comprising different nucleotides at the interrogation position can be used (optionally, in the same reaction) , each comprising a label which allows for its detection. For example, the label may be a sequence of nucleotides which allows the amplified ligation product derived from the primer to be directed to a particular site on an array.
Suitable amplification reactions include the polymerase chain reaction (PCR) . PCR comprises repeated cycles of denaturation of template nucleic acid, annealing of primers to template, and elongation of the primers along the template. PCR is well- known in the art and is described for example in "PCR protocols; A Guide to Methods and Applications", Eds. Innis et al, 1990, Academic Press, New York, Mullis et al, Cold Spring Harbor Symp. Quant. Biol., 51:263, (1987), Ehrlich (ed) , PCR technology, Stockton Press, NY, 1989, and Ehrlich et al, Science, 252:1643-1650, (1991)). The number of cycles, the respective conditions of the individual steps, the composition of reagents within the reaction tube, or any other parameter of the reaction set-up may be varied or adjusted by the skilled person, depending on the circumstances. Additional steps (such as initial denaturing, hot-start, touchdown, enzyme time release PCR, replicative PCR) may also be employed.
Numerous variations and modifications of PCR are known in the art and may be employed by the skilled person in performing the present methods. Chemicals, kits, materials and reagents are commercially available to perform PCR reactions.
Other specific nucleic acid amplification techniques include strand displacement activation, the QB replicase system, the repair chain reaction, the ligase chain reaction, ligation activated transcription, SDA (strand displacement amplification) and TMA (transcription mediated amplification) . For convenience, and because it is generally preferred, the term PCR is used herein in contexts where other nucleic acid amplification techniques may be applied by those skilled in the art. Unless the context requires otherwise, reference to PCR should be taken to cover use of any suitable nucleic amplification reaction available in the art.
In some embodiments, the binding of a probe to genomic nucleic acid in the sample, or amplification products thereof, may be determined. The probe may comprise a nucleotide sequence which binds specifically to a nucleic acid sequence which contains a particular allele (e.g., the risk-associated allele, or a non- risk associated allele) at one or more sites of polymorphism and does not bind specifically to the nucleic acid sequence which does not contain that allele at the one or more polymorphic sites. The oligonucleotide probe may comprise a label and binding of the probe may be determined by detecting the presence of the label. One or more (e.g. two) oligonucleotide probes or primers may be hybridised to the region of interest in the sample nucleic acid. Hybridisation will generally be preceded by denaturation to produce single-stranded DNA. The hybridisation may be part of amplification procedure such as PCR, or may be part of a probing procedure not involving amplification. An example procedure would be a combination of PCR and low stringency hybridisation.
Binding of a probe to target nucleic acid (e.g. DNA) may be measured using any of a variety of techniques at the disposal of those skilled in the art. For instance, probes may be radioactively, fluorescently or enzymatically labelled. Other methods not employing labelling of probe include examination of restriction fragment length polymorphisms, amplification using PCR, RN'ase cleavage and allele specific oligonucleotide probing. Probing may employ the standard Southern blotting technique. For instance, DNA may be extracted from cells and digested with different restriction enzymes. Restriction fragments may then be separated by electrophoresis on an agarose gel, before denaturation and transfer to a nitrocellulose filter. Labelled probe may be hybridised to the DNA fragments on the filter and binding determined.
Those skilled in the art are well-able to employ suitable conditions of the desired stringency for selective hybridisation, taking into account factors such as oligonucleotide length and base composition, temperature and so on. Suitable selective hybridisation conditions for oligonucleotides of 17 to 30 bases include hybridization overnight at 42°C in 6X SSC and washing in 6X SSC at a series of increasing temperatures from 42°C to 65°C. Other suitable conditions and protocols are described in Molecular Cloning: a Laboratory Manual: 3rd edition, Sambrook & Russell (2001) Cold Spring Harbor Laboratory Press NY and Current Protocols in Molecular Biology, Ausubel et al . eds . John Wiley & Sons (1992) . In some embodiments, genomic nucleic acid may be analysed using a nucleic acid array.
A nucleic acid array comprises a population of nucleic acid sequences immobilised on a support. Each sequence in the population has a particular defined position on the support. Nucleic acid arrays are well known in the art and may be produced in a number of ways. For example, the nucleic acid sequence may be amplified using the polymerase chain reaction from a cell or library of sequences, or synthesized ex situ using an oligonucleotide synthesis device, and subsequently deposited using a microarraying apparatus. Alternatively, the nucleic acid sequence may be synthesized in situ on the microarray using a method such as piezoelectric deposition of nucleotides .
The number of sequences deposited on the array generally may vary upwards from at least 10, 100, 1000, or 10,000 to between 10,000 and several million depending on the technology employed.
In some embodiments, it may be preferred that the array is a specialised, small array, e.g., comprising nucleic acids capable of hybridising to no more than 1000 different sequences, optionally no more than 500, 400, 300, 200 or 100 different sequences.
In some embodiments, the kit or array may comprise a nucleic acid capable of hybridising to each of the possible alleles at the site of polymorphism to be analysed (i.e., each of the alleles which may be found in the population) . Moreover, the kit or array may comprise one or more controls.
In some embodiments, the nucleic acid array is a genomic array comprising a population of genomic sequences from an individual having a cancer, e.g., colorectal cancer. In some embodiments, a genomic tiling path array that covers the regions of interest (e.g., in some embodiments the SMAD7 gene locus or the HMPS/CRAC1 locus) may be employed. In a tiling array, every immobilised nucleic acid, typically each the same size, corresponds to a specific genomic region, with different immobilised nucleic acids containing nucleotide sequences corresponding to shifts of one or more nucleotides relative to each other along the genomic region. For example, a tiling array may be designed such that each nucleic acid from a stretch of genomic sequence that is on the array differs from its adjacent nucleic acid by a shift of a single base pair, so that a series of nucleic acids will represent a moving window across the stretch of genomic sequence. Thus, an array may comprise overlapping immobilised nucleic acid sequences with as little as one nucleotide shifts and as large as the entire size of the nucleic acid, as well as non-overlapping nucleic acids .
Genomic sequences immobilised on an array may be hybridised with a labelled oligonucleotide probe using standard techniques .
In other embodiments, the nucleic acid array may comprise a population of oligonucleotide sequences which correspond to alleles at sites of polymorphism in the genome. The immobilised oligonucleotide probes may then be hybridised with labelled genomic nucleic acid, for example restriction fragments or amplification products, comprising all or part of the region of interest from an individual.
The nucleic acid sequences on the array to which a labelled probe or nucleic acid hybridises may be determined, for example by measuring and recording the label intensity at each position in the array, for example, using an automated DNA microarray reader. These sequences correspond to the sequence which is present at the site of polymorphism in the individual, and allow the presence of the allele at the site of polymorphism to be determined.
Nucleic acid or an amplified region thereof may be sequenced to identify or determine the presence of a particular allele at one or more sites of polymorphism in the genomic region of interest. An allele may be identified by comparing the sequence obtained with a reference genomic sequence, as described above.
Sequencing may be performed using any one of a range of standard techniques. Sequencing of an amplified product may, for example, involve precipitation with isopropanol, resuspension and sequencing using a TaqFS+ Dye terminator sequencing kit. Extension products may be electrophoresed on an ABI 377 DNA sequencer and data analysed using Sequence Navigator software.
Having sequenced nucleic acid of an individual or sample, the sequence information can be retained and subsequently searched without recourse to the original nucleic acid itself. Thus, for example, scanning a database of sequence information using sequence analysis software may identify a sequence alteration or mutation.
Treatment/Monitoring Programs
As a result of the assessment of tumour (e.g., adenoma) or cancer in an individual (e.g., of the risk of tumour or cancer in the individual) , or the identification of an individual who is at risk or likely to be at risk of tumour or cancer (e.g, colorectal adenoma or cancer) , the individual may be assigned an appropriate program of treatment (e.g., preventative treatment) and/or monitoring. For example, the assessment of risk may affect whether monitoring is offered, and/or may influence decisions on the frequency of monitoring or the age at which monitoring begins. Monitoring methods include analysis of blood samples (e.g., for molecular cancer markers in blood serum), analysis of stool or urine samples (e.g., for blood or molecular cancer markers), endoscopy, imaging methods, physical examination, biopsy and other methods that will be apparent to the skilled person. For example, in the case of colorectal cancer or adenoma, individuals at higher risk may be assigned to a program of monitoring e.g., by endoscopy such as flexible colonoscopy and/or fecal occult blood tests.
Additionally or alternatively, as a result of the assessment, individuals may be advised on lifestyle factors (including diet, weight management, smoking) that may affect the risk of developing a tumour/cancer.
Aspects of the present invention will now be illustrated with reference to the following experimental exemplification, by way of example and not limitation. Further aspects and embodiments will be apparent to those of ordinary skill in the art. All documents mentioned herein are specifically incorporated by reference in their entirety.
Figures
Figure 1. The SMAD7 locus, (a) SNP single marker-association results. This panel shows P values for association testing drawn from the GWA study covering SMAD7 and 100kb of sequence upstream and downstream of the gene. The analysis was based on the test allele. All known genes and transcripts in the area are shown (University of California Santa Cruz March 2006 assembly; National Centre for Biotechnology Information Build 36.1). (b) Recombination rate (cM/Mb) across the region derived from HapMap project data (release 21a) (c) The interval situated in SMAD7, between B and C, for targeted resequencing . Figure 2. (a) LD structure of SMAD7. Shown in each box are estimated statistics of the square of the correlation coefficient (r2) , derived from Haploview software (v3.2). The values indicate the LD relationship between each pair of SNPs; the darker the shading, the greater extent of LD. The SMAD7 exons have been redrawn to show the relative SNP positions in the gene; therefore, the map is not to physical scale, (b) Single marker association statistics (as -loglO values) for each of the 25 SNPs mapping to the 17Kb region sequenced. The 5 SNPs with the strongest evidence for an association with colorectal cancer are denoted in blue (rs8085824, Novell, rs34007497, rs4044177 and rsl2953717); (b) Pair-wise linkage disequilibrium (r2) metrics of the 25 SNPs calculated in Haploview (v4.0) software. The values indicate the LD relationship between each pair of SNPs; the darker the shading, the greater extent of LD. Shown are the two haplotype blocks defined within the region.
Figure 3. (a) The tested regions contain an enhancer that promotes reporter gene expression in the rectal region of Xenopus tadpoles. The bright field image above shows a 5 days tadpole embryo. The rectal region is indicated by an arrow. The fluorescent image below shows a detail of the rectal region of a Xenopus transgenic embryo in which GFP expression is promoted by the enhancer. The intensity of the rectal expression promoted by the enhancer from the Protective or the Risk haplotypes was measured relative to the signal observed in a fixed area in the muscles region (boxed, no arrow) , which was considered as 100%. The DNA tested contains either the protective or risk variants of both rs8085824 and Novel 1 (1 and 3) or solely Novel 1 (2 and 4); (e) Box-whisker plot of the relative expression observed in transgenic embryos harboring the Protective or the Risk DNA promoting GFP expression. The enhancer from the risk haplotype/allele shows a significantly decreased enhancer activity.
Figure 4: SMAD7 expression in 36 rectal adenomas and 43 carcinomas. Vertical axis present normalized relative SMAD7 gene expression (Log2-scale) . Expression of SMAD7 was significantly lower in carcinomas than adenomas irrespective of 18q21 copy number status. (Difference of expression between tumor groups: 1 vs 2 P = 0.524; 3 vs 4 P = 0.34 ; 1 vs 3 P = 0.13; 2 vs 4 P = 4.0 x 10-4; 1,3 vs 2,4 P = 0.06; 1,2 vs 3,4 P = 5.1 x 10-6)
Figure 5. SNP Novel 1 (a) Mutation surveyor output of SNP Novel 1 (b) Genomic sequence surrounding SNPs rs8085824 and Novel 1 corresponding to position Chrlδ : 44, 703, 059-44, 703, 778; UCSC; March 2006 assembly (NCBI build 36.1). Shown in blue is the sequence corresponding to mod052296. Primer sequences used to generate the Xenopus laevis reporter gene expression construct are embolded. Region conserved between Homo sapieans and Canis familiaris (dog) is underlined, however, the immediate sequence encompassing Novel 1 is conserved in all primates .
Figure 6. Pair wise linkage disequilibrium of all SNPs identified through the re-sequencing of 90 unrelated control individuals .
Figure 7.
Figure 8. P values throughout the region of study, logio (Paiieie) values from the Stage 1 genotyping are shown, together with the locations of SGNEl, GREMl and FMNl.
Figure 9. Meta-analyses of associations between CRC risk and (i) T allele at rs4779584 and (ii) A allele at rslO318.
Plot of allelic Odds Ratios (ORs) of CRC from each Stage (see Methods) are shown. Horizontal lines represent 95% confidence intervals. Each box represents the OR point estimate and its area is proportional to the weight of the study Stage. The diamond (and broken line) represents the overall summary estimate for all four Stages, with confidence interval given by its width. The unbroken vertical line is at the null value (OR= 1 . 0 ) .
Figure 10. Linkage disequilibrium relationships between rs4779584, rslO318 and the additional SNPs typed from sites of putative functional importance in this region. Genotypes are taken from Stage 1 samples. Haploview v3.2 was used to calculate LD values. The pairwise r2 values are shown in each block.
Figure 11. Forest plots of per-allele odds ratios (Ors) for (A) rsl6892766 and (B) rsl0795668.
The x-axis corresponds to the trend OR. Each row corresponds to one sample series. Boxes denote OR point estimates, their areas being proportional to the inverse variance weight of the estimate. Horizontal lines represent 95% confidence intervals. The diamond (and broken line) represents the summary OR, with 95% confidence interval given by its width. The unbroken vertical line is at the null value (OR= 1.0).
Figure 12. The 8q23.3 locus, (a) SNP single marker association results. This panel shows P values from the joint analysis of Phases 1 and 2. All known genes (EIF3S3) and predicted transcripts (c8orf53) in the local area are shown. Positions are that of UCSC March 2006 assembly; NCBI build 36.1). The top SNP rsl6892766 (red) was followed up in the additional phases .
(b) LD structure at 8q23.3. Shown in each box are estimated statistics of the square of the correlation coefficient (r2) , derived from Phase 1 genotypes in Haploview software (v3.2) . The values indicate the LD relationship between each pair of SNPs; the darker the shading, the greater extent of LD.
Figure 13. The 10pl4 locus, (a) SNP single marker association results. This panel shows P values from the joint analysis of Phases 1 and 2. No genes (predicted or otherwise) reside in the local area. Positions are that of UCSC March 2006 assembly; NCBI build 36.1. The top SNP rsl0795668 (red) was followed up in the additional phases.
(b) LD structure at 10pl4. Shown in each box are estimated statistics of the square of the correlation coefficient (r2) , derived from Phase I genotypes in Haploview software (v3.2) . The values indicate the LD relationship between each pair of SNPs; the darker the shading, the greater extent of LD.
Figure 14. Quantile-quantile (Q-Q) plot of allele test statistics for (A) Phase 1; (B) Phase 2. The darker line represents the null hypothesis of no true association. The lighter line with gradient λ is fitted to the lower 90% of the distribution of observed test statistics.
Example 1
Colorectal cancer (CRC) displays familial aggregation, consistent with there being a substantial genetic component. To identify novel CRC risk variants we are conducting a multistage genome-wide association (GWA) study; the first stage based on genotyping 550,163 tagSNPs in 930 familial colorectal tumour cases and 960 controls. Fast tracking replication of the most significantly associated loci we have evaluated selected SNPs in three replication sample sets comprising a total of 7,473 affected individuals and 5,984 controls. Three SNPs mapping to intron 3 of SMAD7 - which is involved in TGF-β and Wnt signalling - were consistently associated with CRC risk. Across the four sample sets, the association with all three SNPs was highly statistically significant ( Ptrend for the most strongly associated SNP, rs4939827 = 1.0 X 10"12; population attributable risk = 15%). Our results provide evidence implicating variants of SMAD7 as common, low penetrance alleles for CRC.
Inherited susceptibility underlies ~30% of all colorectal cancer (CRC)1. High-penetrance, germline mutations in APC1 the mismatch repair (MMR) genes, MUTYH/MYH, SMAD4, ALK3 and STKIl/LKBl, however, only account for <5% of disease incidence2 with much of the variation in genetic risk likely to be a consequence of combinations of less penetrant variants that individually, may be common and detectable through genome-wide association (GWA) .
Using the Illumina Hap550 BeadChips (Supplementary Methods online) we have generated genotypes on 547,647 polymorphic tagging SNPs in 930 individuals with familial colorectal neoplasia and 960 controls (Panel A)3. The GWA identified several genomic locations as potentially associated with disease risk. In the second stage of our multistage design we are genotyping the 5% of SNPs with the most extreme P values in the initial GWA study. In concert with this analysis we have sought to fast track replication of the most extreme initial associations in the GWA in additional case-control datasets .
The most significant P values in the GWA study were found at two polymorphic sites. The first SNP, rs6983267 mapping to 8q24.21 has previously been implicated as a risk factor for prostate cancer4'5. We and others have recently shown that this locus is also associated with CRC risk3'6'7. The second SNP rs4939827 maps to SMAD7, a gene that encodes a component of the TGF-β signalling pathway. An additional 2 SNPs in SMAD7 (rsl2953717 and rs4464148) were among the most extreme P values from the unadjusted analysis. The strength of the association at rs4939827 reached P=3.07xl0"7 under the trend test (Table 1) . We confirmed 180 genotyping results at this SNP by direct sequencing and obtained perfect concordance, thereby excluding a technical artefact.
To interpret the association signal observed for SMAD7 we generated ancestral recombination graphs (ARGs) using data from our GWA using the Margarita program 8 inferring ARGs for 46-SNP haplotypes spanning SMAD7 and its flanking regions. The strongest evidence for association was found for the 8 Kb segment between 44,707,461 and 44,715,784, containing 9 SNPs genotyped in our GWA study in intron 3 of SMAD7. Within this region only the marginal trees for rs4939827, rsl2953717 and rs4464148 displayed association with permuted P < 10~4. For all SNPs outside this region, the permutation P value was greater than 0.01. These results are entirely consistent with there being a single risk locus in the SMAD7 region.
To replicate findings with respect to SMAD7 we evaluated rs4939827, rsl2953717 and rs4464148 in three independent CRC case-control series by allele specific PCR, confirming genotyping calls by direct sequencing 5% of samples. In each of the additional case-control analysis there was evidence of an association between genotype and risk (Table 1). Pooling genotype data for CRC cases and controls from the four panels provided unequivocal evidence for a relationship between the 3 SNPs and risk; associations remaining statistical significant after adjustment for multiple testing employing a conservative Bonferroni correction for the 547,647 tests in the original array. For the SNP in SMAD7 most strongly associated with CRC in the GWA, rs4939827, the pooled P value across all studies was 1.0OxIO"12 (Table 1) with no statistical evidence of heterogeneity between the four series. Pooled odds ratios (OR) were 0.86 (95% CI: 0.79-0.92; P=6.07xl0~5; Phet = 0.80 and I2 = 0.0%) and 0.73 (95% CI: 0.66-0.80; P=1.95xlO"n; Phet = 0.26, I2 = 25.2%) for heterozygotes and rare homozygotes, respectively. Since cases in Panels A and D were enriched for familial CRC, the estimate of the risk ratio is biased away from 1.0. Restricting analysis to data from Panels B and C provides a less biased estimate of risk associated with hetero- and homozygosity (OR 0.87; 95% CI: 0.80-0.95 and 0.75 95% CI: 0.67-0.83 respectively). Table 1 : Risk of colorectal neoplasia associated with the SMAD7 SNPs rs4939827, rs12953717 and rs4464148.
SNP MAF MAF
Panel Cases Controls ORhom (95% Cl) cases controls ORhe, (95% Cl) ORtreπd (95% Cl) Puenά
(Position) rs4939827 A1 930 960 0.406 0.489 0.52 (0.39-0.70) 0.77(0.61-0.98) 0.71 (0.63-0.81) 3.07 x io-7
(44707461 bp) B 4422 3844 0.439 0.469 0.79 (0.70-0.90) 0.87 (0.78-0.96) 0.89 (0.84-0.94) 1.42 x 10-*
C 1992 1680 0.441 0.494 0.65 (0.54-0.78) 0.88(0.75-1.03) 0.81 (0.74-0.89) 7.72 x io-6
D 963 343 0.449 0.471 0.85 (0.60-1.22) 0.82(0.61-1.09) 0.91 (0.77-1.09) 0.280
Combined2 0.73 (0.66-0.80) 0.86 (0.79-0.92) 0.85(0.81-0.89) 1.00x IO12
rs12953717 A1 929 960 0.496 0.417 1.89(1.41-2.54) 1.26(1.00-1.60) 1.38(1.21-1.57) 1.07 x IO-6
(44707927 bp) B 4424 3868 0.469 0.432 1.35(1.20-1.53) 1.10(1.00-1.22) 1.16(1.09-1.23) 2.69 x 10-"
C 1995 1704 0.460 0.428 1.29(1.07-1.56) 1.13(0.97-1.31) 1.14(1.04-1.24) 6.74 x 10"3
D 943 341 0.458 0.443 1.16(0.80-1.68) 0.94(0.70-1.25) 1.06(0.89-1.27) 0.481
Combined2 1.37(1.25-1.50) 1.11 (1.03-1.20) 1.17(1.12-1.22) 9.1Ox IO"
rs4464148 A1 930 960 0.353 0.288 1.72(1.21-2.45) 1.35(1.09-1.67) 1.35(1.18-1.56) 1.51 x10"s (44713030 bp) B 4441 3849 0.326 0.300 1.32(1.14-1.54) 1.08(0.99-1.19) 1.13(1.05-1.20) 4.52 x 10^ C 1960 1669 0.314 0.296 1.17(0.94-1.47) 1.09(0.95-1.25) 1.09(0.98-1.20) 0.103 D 929 335 0.326 0.285 2.00(1.20-3.33) 1.01 (0.78-1.31) 1.22(1.00-1.48) 4.84 x 10~2
Combined2 1.35(1.20-1.51) 1.10(1.03-1.18) 1.15(1.09-1.21) 6.66 X10"8 includes 320 individuals with high-risk adenomas. 2 Combined analysis based on cancer cases only.
rs4939827, rsl2953717 and rs4464148 all map to the block of LD within intron 3 of SMAD7 (Figure 1) . Of the additional 13 SNPs we genotyped in this intron, 4 were also nominally associated with risk (P<0.05; Figure 1). Modelling pairwise combinations of rs4939827, rsl2953717 and rs4464148 SNPs and their interactions, as well as haplotypes of the three SNPs, suggested that rs4939827, rsl2953717 and rs4464148 were similar with respect to their association with cancer risk, consistent with the very high degree of linkage disequilibrium (each pairwise D' > 0.93 and r2 > 0.60). Inclusion of rsl2953717 did not provide a significantly improved fit over a model solely incorporating rs4939827; however inclusion of rs4464148 did generate a marginally superior disease risk model (P=O.03). Computational analysis of haplotypes indicated four common haplotypes; the TTC risk haplotype was present in 29.3% of chromosomes from affected individuals and 26.3% of control chromosomes (Table 2) . On the basis of the findings from logistic regression analyses and the observation that the second most strongly associated haplotype was TTT, (P=9.9xlO~ 6) , a single disease-causing haplotype as inferred by ARG analysis could not be fully defined.
Table 2. Risk of colorectal neoplasia associated with the haplotypes formed by SMAD7 SNPs rs4939827, rs12953717 and rs4464148.
Haplotype Case Control frequency frequency OR 95% Cl P
Stage 1
CCT 0.383 0.459 1.0
TTC 0.324 0.252 1.54 1.31-1.80 5.4 x 10"8
TTT 0.161 0.150 1.29 1.06-1.56 8.9 x io-3
TCT 0.102 0.103 1.18 0.94-1 .48 0.15
CCC 0.012 0.016 0.95 0.52-1.70 0.85
CTC 0.097 0.014 0.82 0.42-1.57 0.53
Rare <1% 0.081 0.065 1.55 0.67-3.64 0.26
Stage 2
CCT 0.409 0.438 1.0
TTC 0.293 0.268 1.17 1.09-1.26 2.5 x 10'5
TTT 0.161 0.151 1.14 1.04-1.25 3.7 x 10"3
TCT 0.103 0.109 1.02 0.92-1.13 0.75
CCC 0.016 0.018 0.96 0.75-1 .22 0.71
CTC 0.013 0.011 1.19 0.89-1 .59 0.23
Rare <1% 0.061 0.048 1.35 0.88-2.1 1 0.15
Stage 3
CCT 0.409 0.459 1.0
TTC 0.281 0.258 1.22 1.09-1.37 5.O x io-4
TTT 0.167 0.156 1.21 1.06-1.38 5.5 x IO 3
TCT 0.106 0.088 1.35 1 .15-1.60 2.6 x io-4
CCC 0.020 0.020 1.14 0.81 -1.61 0.43
CTC 0.010 0.013 0.87 0.55-1.37 0.53
Rare <1% 0.060 0.064 1.05 0.56-1.96 0.88
Stage 4
CCT 0.410 0.444 1.0
TTC 0.285 0.262 1.18 0.95-1.47 0.13
TTT 0.154 0.170 0.98 0.76-1.27 0.87
TCT 0.108 0.096 1 .22 0.89-1 .68 0.21
CCC 0.022 0.016 1 .49 0.74-3.24 0.25
CTC 0.013 0.087 1.59 0.63-4.77 0.31
Rare <1% 0.072 0.029 2.67 0.61 -24.28 0.18
Pooled
CCT 1.0 TTC 1.23 1.16-1.30 5.6 x 10'
1.16 1.09-1.24 9.9 x 10"6
TCT 1.12 1.04-1.21 4.2 x 10"3 CCC 1.04 0.87-1.24 0.67 CTC 1.07 0.87-1.32 0.52 Rare<1% 1.34 0.99-1.81 0.06 includes 320 individuals with high-risk adenomas. 2 Combined analysis based on cancer cases only.
Although 18q21.1 contains another protein-coding gene (CR621005) and a predicted gene of unknown function (KIA0427) the decay in LD away from SMAD7 intron 3 incorporating rs4939827, rsl2953717 and rs4464148, provides little support for either being the location of a causal variant. We searched for mutations and additional polymorphisms in SMAD7 by re- sequencing the coding region of the gene and its associated 5' and 3' UTrs in the genomic DNA of 65 individuals [35 carrying high risk haplotypes and 30 non-carriers] . We identified 6 known SNPs (rs7236774 [5'UTR], rs3736242, rs3764482, rs34151545 and rs3809923 [intronic] , and rsl6950113 [3'UTR]) and 6 previously unreported SNPs. None were significantly associated with disease risk. A novel G39R substitution in exon 1 was identified in two individuals carrying the high- risk SMAD7 haplotype, however this change is outside the two SMAD7 functional domains MHl (MAD homology 1 region) and MH2 (MAD homology 2 region) in so called "low complexity protein sequence" .
SMAD7 or Mothers against decapentaplegic homolog 7 belongs to the SMAD family of proteins, which belong to the TGFβ superfamily of ligands9. Like many other TGFβ family members SMAD7 is involved in cell signalling as a TGFβ type 1 receptor antagonist blocking TGFβl and activin associating with the receptor and blocking access to SMAD2. It is an inhibitory SMAD (I-SMAD) and is enhanced by SMURF2. Perturbation of SMAD7 expression has been documented to influence the progression of CRC10. We therefore looked for a relationship among allelic imbalance, mRNA expression and genotype. Loss of chromosome 18q is very common in individuals with CRC11, but we did not observe any association between SMAD7 genotype and allelic loss, or the alleles affected by such loss, in 248 individuals with CRC and 49 CRC cell lines (P > 0.45 for rs4939827, rsl2953717 and rs4464148) . Although CRC cell lines expressed a high level of SMAD7 mRNA, comparing genotype with expression phenotype is inherently problematic owing to loss of heterozygosity on 18q, usually accompanied by aneusomy and/or polysomy. We therefore focused on SMAD7 expression in 101 lym- phoblastoid cell lines. Lower median mRNA expression was associated with CRC risk alleles at rsl2953717 and rs4464148 (P = 0.02 and P = 0.06, respectively.) As subtle changes in SMAD7 expression can affect β-catenin levels while having few effects on TGFβ signalling12, it is possible that the observed differences in SMAD7 levels lead to increased Wnt signalling.
Here we have focused on one of the most highly statistically significant associations from our GWA study, identifying variants in SMAD7 as reproducibly associated with CRC. On the basis of the allele frequency and unbiased estimates of genotypic risks we estimate the locus should contribute to ~15% of all CRC and -0.8% of the familial risk (genotypic risk around 1.4) which although modest has the potential through interaction with other common alleles to substantially increase an individual's risk. The contribution of the locus to CRC risk is highly significant.
There was no significant difference in effect size of rs4939827 on CRC risk after stratifying by tumour site in the large bowel (P=O.06), age at diagnosis (P=O.23) or microsatellite instability (P=O.06), although the borderline associations leave open the possibility that the impact of the disease-causing variant is stronger in microsatellite stable rectal cancer.
These results, coupled with the recent finding that loci mapping to 8q24.21 act as genetic risk factors for CRC, provide unambiguous evidence for a "common-disease common- variant" model of CRC predisposition and justification for a continuing search for low penetrance susceptibility alleles.
SUPPLEMENTARY METHODS
Participants Panel A, 940 cases with colorectal neoplasia (443 males, 497 females) ascertained through the Colorectal Tumour Gene Identification (CORGI) consortium. All had at least one first- degree relative affected by CRC and one or more of the following phenotypes : CRC at age 75 or less; any colorectal adenoma at age 45 or less; ≥ 3 colorectal adenomas at age 75 or less; or a large (>lcm diameter) or aggressive (villous and/or severely dysplastic) adenoma at age 75 or less. Controls (n=965; 439 males, 526 females) were spouses or partners unaffected by cancer and without a personal family history (to 2nd degree relative level) of colorectal neoplasia. All cases and controls were of white UK ethnic origin.
Panel B, 4,495 CRC cases (2,424 males, 2,071 females; mean age at diagnosis 59.5 years; SD ± 8.7) ascertained through two ongoing initiatives at the Institute of Cancer Research/Royal Marsden Hospital NHS Trust (RMHNHST) - The National Study of Colorectal Cancer Genetics (NSCCG) and the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry. A total of 3,923 healthy individuals were recruited as part of ongoing National Cancer Research Network genetic epidemiological studies, NSCCG (n=l,357) and the Genetic Lung Cancer Predisposition Study (GELCAPS) (1999-2004; n=l,565) the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry (1999-2004; n=l,001). These controls (1,344 males, 2,579 females; mean age 60.0 years; SD ± 10.8) were the spouses or unrelated friends of patients with malignancies. None had a personal history of malignancy at time of ascertainment. All cases and controls were British Caucasians, and there were no obvious differences in the demography of cases and controls in terms of place of residence within the UK.
Panel C: 2,012 CRC cases (1,218 males, 794 females; mean age at diagnosis 59.0 years; SD ± 8.2) and 1,717 controls (813 males, 904 females; mean age 55.3 years; SD ± 12.3) ascertained through NSCCG post 2005.
Panel D: 966 CRC cases collected through VICTOR - a Phase III randomised double-blind placebo controlled study of rofecoxib (VIOXX) in colorectal cancer patients (Dukes stage B or C disease) following potentially curative therapy. Controls - 344 derived from CORGI and population volunteer blood donors. All cases and controls were White Caucasians.
In all cases CRC was defined according to the ninth revision of the International Classification of Diseases (ICD) by codes 153-15412 and all cases had pathologically proven adenocarcinoma or adenomas. Collection of blood samples and clinico-pathologiocal information from patients and controls was undertaken with informed consent and ethical review board approval in accordance with the tenets of the Declaration of Helsinki .
Genotyping
DNA was extracted from samples using conventional methodologies and quantified using PicoGreen (Invitrogen) .A genome-wide scan of 550,163 tag SNPs was conducted using the Illumina Hap550 Bead Arrays according to the manufacturer's protocols. DNA samples with GenCall scores <0.25 at any locus were considered "no calls". A DNA sample was deemed to have failed if it generated genotypes at fewer than 95% of loci. A SNP was deemed to have failed if fewer than 95% of DNA samples generated a genotype at the locus. To ensure quality of genotyping, a series of duplicate samples were genotyped and cases and controls were genotyped in the same batches. We have previously showed that there was no evidence of differential genotyping between cases and controls or of population sub-structure in the CORGI samples as assessed by the STRUCTURE program and a Q-Q plot of the genotype test statistics13. Genotyping of rs4939827, rsl2953717 and rs4464148 was conducted by competitive allele-specific PCR KASPar chemistry (KBiosciences Ltd, Hertfordshire, UK) ; primers and probes used are available on request. Genotyping quality control was tested using duplicate DNA samples within studies and SNP assays, together with direct sequencing of subsets of samples to confirm genotyping accuracy. For all SNPs, >99% concordant results were obtained.
Microsatellite instability in CRCs was determined using the following methodology: lOum sections were cut from formalin fixed paraffin embedded tumours, lightly stained with toluidine blue, and regions containing at least 60% tumour micro-dissected. Tumour DNA was extracted using the QIAamp DNA Mini kit (Qiagen, Crawley, UK) according to the manufacturer's instructions and genotyped for the mononucleotide microsatellite loci BAT25 and BAT26 which are highly sensitive markers of MSI14. Samples showing novel alleles at either BAT26 or BAT25 or both markers were assigned as MSI (corresponding to a high level of instability, MSI-H15.
Statistical analysis
Using the Margarita program16 we inferred ARGs for 46-SNP haplotypes spanning SMAD7 and its flanking regions (from 44,613,022 to 44,780,189 on NCBI build35). For every ARG, a putative risk mutation was placed on the marginal genealogy at each SNP position by maximizing the association between the mutation and disease status. We evaluated the significance of this observed association through 106 permutations on the phenotypes .
The risks associated with each SNP were estimated by allelic odds ratio (OR) using unconditional logistic regression, and associated 95% confidence intervals (CIs) were calculated in each case. Associations by site (colon/rectum), MSI status, family history status (at least one first-degree relative with CRC) and age at diagnosis (stratifying into two groups by the median age at diagnosis, 61) were examined by logistic regression in case-only analyses. Haplotypes were inferred using an MCMC method implemented in the program PHASE17.
Meta-analysis was conducted using standard methods for combining raw data based on the Mantel-Haenszel method18. Cochran's Q statistic to test for heterogeneity and the I2 statistic19 to quantify the proportion of the total variation due to heterogeneity were calculated. The population attributable fraction was estimated by (x-l)/x, where x= (1-p)2 + 2p(l-p)0Ri + P2OR2, p is the population allele frequency, and ORi and OR2 are the Ors associated with hetero- and homozygosity respectively. The sibling relative risk attributable to a given SNP was calculated using the formula20:
χ* _
Figure imgf000048_0001
where p is the population frequency of the minor allele, g=l- p, and ri and r2 are the relative risks (estimated as OR) for heterozygotes and rare homozygotes, relative to common homozygotes. Assuming a multiplicative interaction the proportion of the familial risk attributable to a SNP was calculated as log (λ*) /log (λ0) , where λ0 is the overall familial relative risk estimated from epidemiological studies, assumed to be 2.221.
Cell lines
In addition to lymphoblastoid cell lines from 69 healthy individuals the following CRC cell lines were analysed: C106, C125, C32, C70, C84, C99, CACO2 , CCK81, CL-40, COLO205, COLO320DM, COLO5, COLO678, COLO741, CX-I, DLD1/HCT15, GB126, GP5D, HCA46, HCA7, HCT116, HIW1772, HRA19, HT1115, HT29, HT55, HUTU80, LOVO, LS1034, LS123, LS125, LS174T, LS411, LS513, DMS8, PC/JW, RKO, RS1013, SCKOl, SW1417, SW1463, SW403, SW480, SW620, SW837, SW948, T84, VACO4S and VAC05. Mutational and other analyses
The search for pathogenic sequence changes in exons, intron- exon boundaries and untranslated regions (UTrs) of each gene was performed by direct sequencing. This sample cohort consisted of 65 individuals; 35 individuals homozygous for the high-risk SMAD7 haplotype (TTC, Table 2) and 30 individuals with the protective haplotype, both defined by associated SMAD7 SNPs rs4939827, rsl2953717 and rs4464148.
Expression assays on the 69 lymphoblastoid and 49 CRC cell lines were conducted by RTQ-PCR with Taqman probes specific to each of the two SMAD7 isoforms using GAPDH as control. mRNA expression measured relative to mean of whole group. Details of these assays and allele loss analysis on 18q are available on request.
Example 2
We have defined a 17kb high-LD interval of association (Chrl8 position: 44,700,221-44,716,898; UCSC May 2006 assembly, NCBI build 36.1) encompassing the most significant 18q21 SNPs rs4939827, rsl2953717 and rs4464148 which were represented on the Illumina 550 Hapmap Bead Array. These 3 SNPs map to the interval encompassing the 3' region of intron 3, exon 4 and the 3'UTR of SMAD7 (between B and C in Figure la-c) . Through re-sequencing we excluded the possibility that the 18q21 association signal is a consequence of a coding sequence change in SMAD7.
To comprehensively interrogate the 17Kb interval for all genetic variation we re-sequenced the region in 90 healthy unrelated individuals. Only 722 bps (4%) of the 17Kb was refractory to re-sequencing owing to low-complexity. In total, we identified 55 variants (Table 3); these included 50 SNPs and 5 insertion/deletion polymorphisms. Of the 55 variants 43 were common (minor allele frequency [MAF] ≥ 5%) . Of these, 33 common variants had not been genotyped by HapMap. In addition a SNP caused by a C to G change at 44703563bp (henceforth referred to as Novel 1) with a MAF of 0.47 was unlisted by dbSNP (Build 128; http://wvvw.ncbi.nlm.nih.qov/entrez/querv.fcqi?CMD=search&DB=snp Figure 5a) .
We calculated pair-wise LD statistics between each of the 52 SNPs and rs4939827, rsl2953717 and rs4464148, 22 of these showed evidence of high LD with one or more of the original 3 SNPs (r2 > 0.50; Figure 6 and Table 4). These 22 SNPs together with rs4939827, rsl2953717 and rs4464148 were genotyped in a series of 2,532 CRC cases and 2,607 controls. All showed evidence of an association with risk of developing CRC at the 5% statistical threshold (Table 5) . The strongest evidence for an association between 18q21 variation and CRC risk was provided by Novel 1 (P = 5.98 x 10"7; Table 5.)
Table 3
minor
Chr18 Minor Allele
ID Position Variant dbSNP ID Allele Frequency pHWE
1 44702803 C/T rs6507874 C 0.429 1.00
2 44702817 C/G rs6507875 G 0.335 1.00
3 44702873 A/G rs8098041 A 0.065 1.00
4 44702875 -/CTCT rs 10640406 A 0.429 1.00
5 44703030 A/G rs17186485 A 0.098 1.00
6 44703109 C/T rs8085824 C 0.489 1.00
7 44703193 C/T T 0.08 1.00
8 44703563 C/G G 0.488 1.00
9 44703716 C/T T 0.012 1.00
10 44704642 A/G A 0.007 1.00
11 44704781 C/T C 0.006 1.00
12 44704974 A/G rs7229639 A 0.069 1.00
13 44705071 C/G rs34007497 C 0.494 1.00
14 44705144 A/G rs12956924 A 0.262 1.00
15 44705804 C/G rs4939825 C 0.458 1.00
16 44705871 A/G rs4939567 A 0.447 1.00
17 44706635 A/G rs4991143 A 0.071 1.00
18 44706700 -/AAGAA rs4044177 AAGAA 0.495 1.00
19 44707154 A/T rs 11874392 T 0.445 1.00
20 44707404 A/C rs4939826 A 0.065 1.00
21 44707461 C/T rs4939827 C 0.445 1.00
22 44707752 C/T rs35352860 T 0.222 1.00
23 44707853 C/G rs 12968048 G 0.006 1.00
24 44707927 C/T rs12953717 T 0.494 0.79
25 44708046 A/G rs7226855 G 0.449 1.00
26 44708137 A/G rs7227023 A 0.062 1.00
27 44708552 G/T G 0.005 1.00
28 44709469 -/A rs36025258 - 0.428 0.62
29 44709688 A/G G 0.006 1.00
30 44710128 C/T rs884013 T 0.056 1.00
31 44710223 G/A rs4503867 A 0.044 1.00
32 44710514 A/T A 0.006 1.00
33 44711086 -/C rs33922711 C 0.482 0.62
34 44711094 C/T rs 12456328 T 0.137 0.95
35 44711172 C/T rs4939828 C 0.269 0.60
36 44711356 A/G G 0.028 1.00
37 44711375 C/A A 0.139 0.94
38 44712111 C/T T 0.091 1.00
39 44712225 A/C rs9946510 C 0.352 0.89
40 44712335 A/C A 0.012 1.00
41 44712349 C/T T 0.029 1.00
42 44712355 -/C rs11453375 C 0.383 0.53
43 44712669 T/G rs6507876 G 0.052 1.00
44 44712947 A/G A 0.414 1.00
45 44713030 C/T rs4464148 C 0.362 1.00
46 44713321 C/T rs2337107 T 0.354 0.86
47 44713907 C/T T 0.017 1.00 Minor
Chr18 Minor Allele IQ position Variant dbSNP Allele Frequency pHWE
48 44714383 A/G rs4939829 G 0.367 1.00
49 44714654 G/A rs7351039 A 0.067 1.00
50 44714703 C/T rs12967477 T 0.354 1.00
51 44714901 C/G rs2337106 C 0.461 0.64
52 44715010 A/G rs17186877 A 0.369 1.00
53 44715485 C/T T 0.073 1.00
54 44715784 C/T rs7238442 C 0.472 0.44
55 44716181 A/G rs12967711 G 0.343 0.81
Table 3: All identified SNPs
Summary of all SNPs identified through re sequencing the candidate interval (Chr 18: 44,700,221-44,716,898; UCSC May 2006 assembly, NCBI build 36.1) in intron 3 of SMAD7 in the SNP discovery panel. This sample population consisted of 90 unrelated individuals from the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry (50 males, 40 females; mean age at diagnosis 59.3 years; SD + 11.45). None had a personal history of malignancy at time of ascertainment and all were British Caucasians. Shown are the positions, alleles, minor allele frequencies in this population and P values for test to fit HWE.
Table 4
Minor
ChM 8 Minor Allele
ID Position Variant dbSNP ID Allele Frequency pHWE
1 44702803 C/T rs6507874 C 0.43 1.00
2 44702817 C/G rs6507875 G 0.34 1.00
4 44702875 -/CTCT rs10640406 - 0.43 1.00
6 44703109 C/T rs8085824 C 0.49 1.00
8 44703563 C/G Novel 1 G 0.49 1.00
13 44705071 C/G rs34007497 C 0.49 1.00
14 44705144 A/G rs12956924 A 0.5 1.00
15 44705804 C/G rs4939825 C 0.26 1.00
16 44705871 A/G rs4939567 A 0.45 1.00
18 44706700 -/AAGAA rs4044177 AAGAA 0.50 1.00
19 44707154 A/T rs11874392 T 0.45 1.00
21 44707461 C/T rs4939827 C 0.45 1.00
24 44707927 C/T rs12953717 T 0.49 0.79
25 44708046 A/G rs7226855 G 0.45 1.00
28 44709469 -/A rs36025258 - 0.43 0.62
39 44712225 AYC rs9946510 C 0.35 0.89
42 44712355 -IC rs11453375 C 0.38 0.53
43 44712947 A/G rs6507877 A 0.41 1.00
45 44713030 C/T rs4464148 C 0.36 1.00
46 44713321 C/T rs2337107 T 0.41 0.86
48 44714383 A/G rs4939829 G 0.36 1.00
50 44714703 C/T rs12967477 T 0.35 1.00
52 44715010 A/G rs17186877 A 0.36 1.00
54 44715784 C/T rs7238442 C 0.47 0.44
55 44716181 A/G rs12967711 G 0.34 0.81
Table 4 : Description of the candidate SNPs genotyped in cases and controls
All SNPs display LD (r2 >0.50) with rs4939827, rsl2953717 and rs4464148.
Table 5: Association between 25 SNPs and risk of colorectal cancer
Difference in Ratio
Risk MAF MAF Log- log-likelihood Akaike with
SNP Position (bp) Alleles Allele Cases (%) Controls (%) ^allele ORaMeIe 05% Cl) Ptreπd likelihood with Novel 1 Weight Novel 1 rs6507874 44702803 C/T T 4391 47 16 9462E-04 0877 (0 811-0 949) 9826E-04 -3030.13 4 917 0.021 11.688 rs6507875 44702817 C/G C 37 70 40 54 3 691E-03 0 887 (0 818-0963) 3 612E-03 -3031.18 5.968 0.012 19.764 rs10640406 44702875 -/CTCT CTCT 43 99 47 16 1 523E-03 0 880 (0 813-0 953) 1 595E-03 -3030.27 5.057 0.019 12.537 rs8085824 44703109 C/T C 46 86 42 18 2 684E-06 1 209 (1 116-1 309) 2 695E-06 -3025.97 0.760 0.167 1.463
Novel 1 44703563 C/G G 47 67 4269 5976E-07 1 223 (1 129-1 324) 5466E-07 -3025.21 0.000 0.244 1.000 rs34007497 44705071 C/G G 4696 4247 6283E-06 1 199 (1 107-1 299) 6267E-06 -3026.44 1.229 0.132 1.849 re 12956924 44705144 A/G G 28 37 30 74 8 765E-03 0 892 (0 819-0 973) 9 172E-03 -3031.61 6.404 0.010 24.582 re4939825 44705804 C/G C 42 80 46 17 7 188E-04 0 872 (0 806-0 945) 7684E-04 -3029.93 4.723 0.023 10.607 rs4939567 44705871 A/G G 42 76 46 12 7 188E-04 0 873 (0 806-0 945) 7 553E-04 -3030 4.786 0.022 10.944 rs4044177 44706700 -/AAGAA AAGAA 46 80 42 58 2 390E-05 1 186 (1 095-1 285) 2 384E-05 -3026.89 1.676 0.106 2.312 rs11874392 44707154 A/T T 4287 4623 7 188E-04 0873 (0 806-0945) 6427E-04 -3030.2 4.994 0.020 12.144 ^ rs4939827 44707461 C/T T 44 33 47 26 2 883E-03 0 888 (0 821-0 961) 2 996E-03 -3031.26 6.045 0.012 20.547 rs12953717 44707927 C/T T 46 81 4261 2 031E-05 1 185 (1 095-1 282) 2 033E-05 -3026.73 1.518 0.114 2.136 rs7226855 44708046 A/G A 42 77 46 06 8224E-04 0 875 (0 809-0 947) 8665E-04 -3030.34 5.129 0.019 12.994 re36025258 44709469 -/A A 43 98 4747 4 653E-04 0 869 (0 802-0 941) 4 807E-04 -3030.16 4.953 0 021 11.897 re9946510 44712225 A/C C 3228 30 05 1 480E-02 1 110 (1 020-1 208) 1 461E-02 -3032.85 7.635 0.005 45.484 rs11453375 44712355 -/C C 36 35 33 27 1 274E-03 1 146 (1 054-1 244) 1 378E-03 -3031.91 6.699 0.009 28.486 rs6507877 44712947 A/G A 4262 4045 2609E-02 1 093 (1 001-1 184) 2719E-02 -3033.93 8 722 0.003 78.351 rs4464148 44713030 C/T C 32 28 30 00 1 285E-02 1 112 (1 022-1 210) 1 273E-02 -3032.74 7.532 0.006 43.198 rs2337107 44713321 C/T A 42 57 4033 2 170E-02 1 097 (1 013-1 188) 2263E-02 -3033.78 8.572 0.003 72.679 rs4939829 44714383 A/G G 35 56 33 01 7 290E-03 1 120 (1 030-1 217) 7 644E-03 -3032.32 7.111 0.007 35.007 rs12967477 44714703 C/T T 32 14 30 02 2 072E-02 1 100 (1 010-1 201) 2 039E-02 -3032.95 7.740 0.005 47.938 re17186877 44715010 A/G A 32 34 30 11 1 506E-02 1 109 (1 109-1 208) 1 501E-02 -3032.67 7.457 0.006 41.621 rs7238442 44715784 C/T C 46 71 4448 2406E-02 1 093 (1 011-1 182) 2 510E-02 -3033.39 8.182 0 004 59.797 re12967711 44716181 A/G G 30 04 27 30 2 515E-03 1 140 (1 047-1 248) 2 559E-03 -3031.82 6.614 0.009 27.296
Table 6: Logistic regression results
Log- Log- likelihood likelihood Log-likelihood
Model 1 Model 2 Model 1 Model 2 Difference P
Novel 1 Novel 1 + rs6507874 25.13 25.63 0.500 0.480
Novel 1 Novel 1 + rs6507875 25.13 24.47 0.660 0.417
Novel 1 Novel 1 + rs 10640406 25.13 25.03 0.100 0.752
Novel 1 Novel 1 + rs8085824 25.13 26.6 1.470 0.225
Novel 1 Novel 1 + Novel 1 25.13 25.13 0.000 1.000
Novel 1 Novel 1 + rs34007497 25.13 25.53 0.400 0.527
Novel 1 Novel 1 + rs 12956924 25.13 25.12 0.010 0.920
Novel 1 Novel 1 + rs4939825 25.13 24.82 0.310 0.578
Novel 1 Novel 1 + rs4939567 25.13 25.31 0.180 0.671
Novel 1 Novel 1 + rs4044177 25.13 23.87 1.260 0.262
Novel 1 Novel 1 + rs 11874392 25.13 24.57 0.560 0.454
Novel 1 Novel 1 + rs4939827 25.13 25.74 0.610 0.435
Novel 1 Novel 1 + rs12953717 25.13 25.27 0.140 0.708
Novel 1 Novel 1 + rs7226855 25.13 25.42 0.290 0.590
Novel 1 Novel 1 + rs36025258 25.13 23.9 1.230 0.267
Novel 1 Novel 1 + rs9946510 25.13 24.26 0.870 0.351
Novel 1 Novel 1 + rs 11453375 25.13 26.34 1.210 0.271
Novel 1 Novel 1 + rs6507877 25.13 24.85 0.280 0.597
Novel 1 Novel 1 + rs4464148 25.13 25.23 0.100 0.752
Novel 1 Novel 1 + rs2337107 25.13 24.85 0.280 0.597
Novel 1 Novel 1 + rs4939829 25.13 25.66 0.530 0.467
Novel 1 Novel 1 + rs 12967477 25.13 25.15 0.020 0.888
Novel 1 Novel 1 + rs17186877 25.13 25.28 0.150 0.699
Novel 1 Novel 1 + rs7238442 25.13 25.06 0.070 0.791
Novel 1 Novel 1 + rs12967711 25.13 24.44 0.690 0.406
Comparison of the log likelihood of Novel 1, the most associated SNP, with log likelihoods of Novel 1 and the other SNPs.
We performed logistic regression comparing the log-likelihoods of models based on the maximally associated SNP (Novel 1) and models incorporating Novel 1 and the other 24 candidate SNPs. A model based solely on Novel 1 was sufficient to capture all variation at the locus (P>0.2 for the addition of each SNP to the model; Table 6) Computational analysis of the 5 most significantly associated SNPs, rs8085824, rs34007497, rs4044177, and rsl2953717 showed they are strongly correlated with Novel 1 (r2>0.94; Figure 2b and 2c) and constitute a single risk haplotype in block 1 (Figure 2c and Table 7) . To further explore the association signal at 18q21 we generated inferred ancestral recombination graphs (ARGs) using the Margarita program (Minichiello MJ, Durbin R (2006) Am J Hum Genet 79: 910-922), inferring ARGs for the 25-SNP haplotypes that span the SMAD7 interval. The strongest evidence for association with CRC risk was provided by Novel 1 (P= 5.2 x 10" 6) . SNPs rs8085824, rs34007497, rs4044177 and rsl2953717, also showed evidence for association (P<10~4; Table 8) . For all other SNPs the permutation P values were >10"4. These observations are consistent with one of these 5 SNPs being the causal variant. We also calculated Akaike Weights, to determine the weight of evidence in favour of each variant relative to Novel 1 (Table 5) . Novel 1 was 1.5 times more likely to be causal than rs8085824, 1.8 times more likely than rs34007497, 2.1 times more likely than rsl2953717, 2.3 times more likely than rs4044177 and more than 10 times more likely to be causal than the other variants. Collectively these data strongly support Novel 1 being the variant responsible for the association between 18q21 variation and CRC risk. Case Control Chi-
Haplotype (All Markers) Frequency Frequency Square P
CTCTCCAGACTCCGCACCTCACCTA 0.246 0.272 8.029 0.0046
TCACGGGCGAATTAACAACACTACC 0.27 0.242 9.623 0.0019
TCACGGGCGAATTAAACCTCACCTA 0.151 0.133 6.223 0.0126
CTCTCCGGACTCCGCACCTCACCCA 0.05 0.054 0.86 0.3537
TCATCCGCGCATCAAACATAACCCA 0.042 0.047 1.492 0.222
CCCTCCGGACTCCGCACCTCACCTA 0.036 0.043 3.556 0.0593
TCATCCGCGCATCAAACCTCACCTA 0.029 0.038 5.613 0.0178
CTCTCCGGACTCCGCACCTCACCTA 0.024 0.026 0.381 0.5368
CTCTCCAGACTCCGCAAATACCCCA 0.026 0.023 0.78 0.3773
TCACGGGCGAATTACCAACACTACA 0.019 0.021 0.806 0.3692
TCATCCGCGCATCAAACATAACCTA 0.017 0.02 1.202 0.2729
TCACGGGCGAACTAACAACACTACC 0.013 0.009 3.836 0.0502
Case Control Chi-
Haplotype (Block 1) Frequency Frequency Square P
TCACGGGCGAATTAA 0.428 0.383 20.985 4.63E-06
CTCTCCAGACTCCGC 0.28 0.304 7.161 0.0075
TCATCCGCGCATCAA 0.093 0.107 5.603 0.0179
CTCTCCGGACTCCGC 0.082 0.087 1.012 0.3144
CCCTCCGGACTCCGC 0.042 0.049 3.197 0.0738
TCACGGGCGAATTAC 0.025 0.028 0.662 0.4157
TCACGGGCGAACTAA 0.015 0.011 2.83 0.0925
CCCTCCGGACTCCGA 0.011 0.01 0.065 0.7986
Case Control Chi-
Haplotype (Block 2) Frequency Frequency Square P
ACCTCACCTA 0.514 0.536 4.635 0.0313
CAACACTACC 0.296 0.267 10.056 0.0015
ACCTCACCCA 0.057 0.059 0.32 0.5716
ACATAACCCA 0.048 0.052 0.707 0.4004
AAATACCCCA 0.03 0.026 1.674 0.1957
CAACACTACA 0.025 0.028 0.9 0.3427
ACATAACCTA 0.018 0.021 0.944 0.3313
Table 7: Association between haplotypes and colorectal cancer
Shown are the haplotypes, their estimated frequencies and test for association for all markers, blocks 1 (rs6507874- rs36025258) and 2 (between markers rs9946510 -rsl2967711) . Risk haplotypes are in embolded. SNP Position ARG Map Score Chi-Square P rs6507874 44702803 15.71875488 0.00141111 rs6507875 44702817 16.70062443 0.003240563 rs10640406 44702875 17.04070525 0.001516681 rs8085824 44703109 21.63743625 8.24E-06
Novel l 44703563 21.76344364 5.17E-06 rs34007497 44705071 20.32246916 1.38E-05 rs1295692444705144 17.80880331 0.01124182 rs4939825 44705804 18.58402895 6.48E-04 rs4939567 44705871 18.90305002 7.43E-04 rs4044177 44706700 19.5723315 2.15E-05 ΓS1187439244707154 18.0238829 8.62E-04 rs4939827 44707461 15.97608495 0.003449092 rs1295371744707927 19.20619179 1.82E-05 rs7226855 44708046 15.83846846 0.001058751 rs3602525844709469 16.18888388 2.49E-04 rs9946510 44712225 13.10262544 0.017125409 rs1145337544712355 14.58870762 0.003082808 rs6507877 44712947 13.12594782 0.030562398 rs4464148 44713030 13.31307595 0.014386405 rs2337107 44713321 13.30270259 0.02635127 rs4939829 44714383 13.61331259 0.006037852 rs1296747744714703 13.4025643 0.020385593 rs1718687744715010 13.14393646 0.015198113 rs7238442 44715784 13.04382183 0.025759679 rs1296771144716181 13.89839806 0.00304056
Table 8 : Results of ancestral recombination graph analysis using the program Margarita
Shown are the ancestral recombination graph (ARG) map scores (a χ2 test score) and associated P values .
We have previously examined the relationship between rsl2953717, rs4939827 and rs4461448 genotype and SMAD7 expression in lymphoblastoid cell lines. Lower median mRNA SMAD7 expression was significantly and most strongly associated with CRC risk alleles at rsl2953717. (Real-time quantitative PCR was performed using the ABI SMAD7 assays on demand (Applied Biosystems, Foster City, US) . The two major isoforms of SMAD7 were assayed separately in comparison with GAPDH control and the mean of the differences between levels (DCt) were taken in 97 lymphoblastoid cell lines. The T risk allele of rsl2953717 (TT; n=19 TC: n=57 CC: n=21) was associated with a decreased expression of SMAD7 (Ptrend = 0.02) .)
Of the 3 SNPs rsl2953717 is in highest LD with Novel 1 (r2=0.94). Predicated on the assertion that the causal variant influences SMAD7 expression in the colorectum we examined the influence of Novel 1 on expression using an in vivo model system.
DNA sequence conservation in non-coding regions has been shown to be a good predictor of cis-regulatory sequences (Gomez Skarmeta JL, Lenhard B, Becker TS (2006) Dev Dyn 235: 870- 885.) Sequence comparison of the intron 3 of SMAD7 of several vertebrate revealed the presence of a number of highly conserved non-coding regions (HCNRs) in the area. Of particular interest is one HCNR present in all tetrapods including Xenopus flanked by Novel 1 and rs8085824.
We first tested the enhancer potential of a DNA fragment spanning the nucleotide position where these two SNPs map and the HCNR in Xenopus transgenic assays. For this we generated a construct in which this fragment, obtained from a control genomic human DNA sample, was located 5' of a minimal promoter driving GFP expression (Protective construct) . Transgenic Xenopus embryos for this construct showed GFP expression in the muscles and in the colorectum at tadpole stages (Figure 3a) . No expression was observed in control embryos transgenic for a similar construct that lacks the human DNA. We next examined the enhancer potential of the same human fragment containing Novel 1 and rs8085824 (Risk construct) . As shown in Figure 3b, comparison of the Protective and the Risk enhancer activities indicated that the latter drove GFP expression at significantly reduced levels in the colorectum (median difference in expression levels 10%; P=O.0025). These results demonstrate that the risk variant maps to a gut enhancer. In addition, they indicate that the risk haplotype is detrimental for the enhancer's function.
Summary
To identify the causal basis of the association between 18q21 variation and CRC we resequenced the 17Kb region of linkage disequilibrium, and evaluated all variants in 2,532 CRC cases and 2,607 controls. A novel C to G single nucleotide polymorphism (SNP) at 44703563bp was maximally associated with CRC risk (P = 5.98 x 10"7; ≥ 1.5-fold more likely to be causal than other variants) . Using transgenic assays in Xenopus laevis as a functional model, we demonstrated the disease haplotype capturing the G risk allele leads to reduced expression of SMAD7 in the colorectum (P=O.0025) . We propose that this novel SNP is the functional change leading to CRC predisposition through differential SMAD7 expression and hence aberrant TGF-beta signaling.
The data presented herein shows that possession of the C allele of Novel 1 is associated with a reduced expression of SMAD7 in the colorectum. The magnitude of the effect observed in the model system is in keeping with risk of CRC associated with 18q21 variation. Perturbation of SMAD7 expression has previously been documented to influence the development of CRC (Levy L, Hill CS (2006) Cytokine Growth Factor Rev 17: 41-58), hence SNP mediated differential expression is likely to be the biological basis for CRC predisposition. In addition to the BMP/TGFbeta pathway, FGF, WNT, and Notch signalling pathways constitute the major stem cell signalling network all of which play a role in the development of CRC (Katoh Y, Katoh M (2006). Int J MoI Med 18: 1019-1023). The SMAD7 colorectal enhancer which Novel 1 maps within harbors the predicted control module mod052296 (Figure 5b) . Intriguingly this domain contains putative transcription-factor binding sites for SRY, AML (RUNX3) and Pax4. Given SOX17 (a SRY-box containing gene) is a negative regulator of beta-catenin signalling (Paoni NF, Feldman MW, Gutierrez LS, Ploplis VA, Castellino FJ (2003) Physiol Genomics 15: 228-235) and inactivation of RUNX3 are features of CRC (Ku JL, Kang SB, Shin YK, Kang HC, Hong SH, et al. (2004) Oncogene 23: 6736-6742) there is strong biological plausibility to the mechanism of association we have identified.
Here we have shown that at least one mechanism by which such variation influences CRC risk is through differential expression. Regulatory polymorphisms affecting gene expression have long been postulated to contribute to disease predisposition (Hudson TJ (2003) Nat Genet 33: 439-440), however for most part the knowledge of most regulatory networks remains sparse and thus far only a restricted number of such polymorphisms have been identified (De Gobbi M, Viprakasit V, Hughes JR, Fisher C, Buckle VJ, et al . (2006) Science 312: 1215-1217; Steidl U, Steidl C, Ebralidze A, Chapuy B, Han HJ, et al. (2007) J Clin Invest 117: 2611-2620). On the basis of sequence data relating to the other CRC loci 8q23.3, 8q24, 10pl4, Ilq23 and 15ql3 it is probable that some of the causal variants underlying these associations will also influence CRC risk through differential gene expression.
MATERIALS AND METHODS
PARTICIPANTS Re-sequencing/SNP discovery panel
90 unrelated individuals from the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry (50 males, 40 females; mean age at diagnosis 59.3 years; SD ± 11.45) . None had a personal history of malignancy at time of ascertainment and all were British Caucasians.
For genotyping
2,532 CRC cases (1,336 males, 1,196 females; mean age at diagnosis 58.2 years; SD+ 7.97) ascertained through two ongoing initiatives at the Institute of Cancer Research/Royal Marsden Hospital NHS Trust (RMHNHST) from 1999 onwards - The National Study of Colorectal Cancer Genetics (NSCCG) (Penegar S, Wood W, Lubbe S, Chandler I, Broderick P, et al. (2007) Br J Cancer 97: 1305-1309) and the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry. A total of 2,607 healthy individuals were recruited as part of ongoing National Cancer Research Network genetic epidemiological studies, NSCCG (1,200) and the Genetic Lung Cancer Predisposition Study (GELCAPS) (1999-2004; n=771), the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry (1999-2004; n=636) . These controls (1,074 males, 1,533 females; mean age 56.7 years; SD ± 8.95) were the spouses or unrelated friends of patients with malignancies. None had a personal history of malignancy at time of ascertainment. All cases and controls were British Caucasians, and there were no obvious differences in the demography of cases and controls in terms of place of residence within the UK.
In all cases CRC was defined according to the ninth revision of the International Classification of Diseases (ICD) by codes 153-154 and all cases had pathologically proven colorectal adenocarcinoma. Collection of blood samples and clinico- pathological information from patients and controls was undertaken with informed consent and ethical review board approval in accordance with the tenets of the Declaration of Helsinki .
Re-sequencing
Sequence changes in the interval Chromosome 18: 44,700,221- 44,716,898; UCSC May 2006 assembly, NCBI build 36.1), encompassing the 3' end of SMAD7 intron 3, exon 4 and the 3'UTR were identified by direct sequencing. PCR and sequencing primers were designed by Primer3 software. Amplicons were sequenced by ABI chemistry (BigDye v3.1; Applied Biosystems, Foster City, US) and platform (ABI 3730x1 DNA analyzer; Applied Biosystems, Foster City, US) Sequence reads were analyzed using Mutation Surveyor software (v3.10; Softgenetics, State College, PA 16803, US) .
Genotyping
DNA was extracted from samples using conventional methodologies and quantified using PicoGreen (Invitrogen) . Genotyping was conducted by competitive allele-specific PCR KASPar chemistry (KBiosciences Ltd, Hertfordshire, UK) or Illumina iSelect Arrrays . Genotyping quality control was tested using duplicate DNA samples within studies and SNP assays, together with direct sequencing of subsets of samples to confirm genotyping accuracy. For all SNPs, >99.9% concordant results were obtained.
Statistical analysis
Statistical analyses were undertaken in Stata v8 (Station College, TX, US) or R v2.6.2 software.
Genotype data were used to search for duplicates and closely related individuals amongst all samples. Identity by state values were calculated for each pair of individuals, and for any pair with allele sharing >80%, the sample generating the lowest call rate was removed from further analysis (n=0) . Deviation of the genotype frequencies in the controls from those expected under Hardy-Weinberg Equilibrium (HWE) was assessed by χ2 test (1 d.f.). The risks associated with each SNP were estimated by per allele, heterozygous and homozygous odds ratios (OR) and associated 95% confidence intervals (CIs) were calculated in each case. Haplotype analysis was conducted in Haploview software (v4.0). The haplotypes are estimated using an accelerated EM algorithm similar to the partition/ligation method described in Qin et al (Am J Hum Genet 71: 1242-1247) and tested for association via a likelihood ratio test. Linkage disequilibrium statistics were calculated using Haploview software (v4.0).
Relationships between multiple SNPs showing association with CRC risk in the region were investigated using logistic regression analysis, and the impact of additional SNPs from the same region was assessed by a likelihood-ratio test.
The weight of evidence in favour of each SNP being causal was quantified by calculating Akaike weights for each SNP model:
Figure imgf000064_0001
' where A1 is the difference in log-likelihood between model i and the best fitting model.
Using the Margarita program (Minichiello MJ, Durbin R (2006) Am J Hum Genet 79: 910-922) we inferred ARGs for the 25-SNP haplotypes spanning the SMAD7 interval. For every ARG, a putative risk mutation was placed on the marginal genealogy at each SNP position by maximizing the association between the mutation and disease status. The significance of associations was evaluated through 106 permutations.
Xenσpus transgenic assays
Xenopus laevis transgenic embryos were generated using the I- Scel method recently described (Ogino H, McConnell WB, Grainger RM (2006) Nat Protoc 1: 1703-1710). A 0.7 kb fragment containing the conserved region from the Protective or Risk human samples were amplified with the following primers: 5'-GCTACCTTAACAAAGCTTCCTCC-S' and 5'-CGCCTGTAAAAGTTGGAGC-S'. (Figure 5b) The PCR products were cloned in a TOPO T/A vector (Invitogen, High Wycombe, UK), sequenced and then inserted, using the gateway technology, 5' of a Xenopus 0.6 kb Gata2 minimal promoter driving GFP in a transgenesis vector (Invitogen, High Wycombe, UK) . This vector contains two I-Scel sites flanking the expression cassette. 35 and 42 transgenic embryos were generated for the Risk and Protective constructs, respectively. In each embryo, using the histogram function of Adobe Photoshop (Adobe, San Jose, CA 95110-2704, US), the intensity of the rectal expression was measured relative to the signal in a fixed area in the muscles, which served as the internal reference. Differences in the distribution of levels of expression between genotypes were compared using the Mann- Whitney test.
Example 3
We had previously mapped a high-penetrance gene (HMPS/CRAC1) for colorectal cancer (CRC) to a ~2Mb region on chromosome 15ql3.3 in the Ashkenazi population. Affected individuals shared an ancestral haplotype. We reduced the HMPS/CRAC1 region to 0.6Mb, a segment containing SGNEl, GREMl and FMNl. No causative germline mutations were identified in the Ashkenazi patients despite intensive screening. We hypothesised that, even if the high-penetrance mutation were refractory to identification, either the HMPS/CRAC1 locus might harbour low- penetrance variants that increased the risk of CRC in the general UK population, or the Ashkenazi genetic background might convert a generally low-penetrance variant to one of high penetrance. We analysed 718 selected UK CRC cases and 960 controls at 145 tagging SNPs in the HMPS/CRAC1 minimal region, followed by validation in 7,243 cases and 5,843 controls. Two SNPs close to GREMl and SGNEl, rs4779584 and rslO318, were associated with increased CRC risk (allelic odds ratios 1.26, P=4.44xlO":u, and 1.19, P=7.93xlO"9 respectively). The rs4779584 genotype was sufficient to account for all the excess risk. The Ashkenazi susceptibility allele was on the high-risk haplotype. Individuals from the UK who are homozygous for the high-risk allele at rs4779584 are at about 1.7-fold increased risk of CRC.
Inherited factors are known to contribute to the development of many of the common cancers. Recent findings in prostate, breast and colorectal cancer (CRC) have shown that common genetic variants underlie at least some of this inherited risk. For CRC, candidate gene analysis has shown CASP8 to be a low- penetrance susceptibility gene22 and SNP-based genome screens have identified a locus close to POU5F1P1 on chromosome 8q24 that may be involved in predisposition to prostate23"26 and breast cancers27 as well as CRCs28"30. The identification of these susceptibility variants has demonstrated the principle of the 'common variant-common disease' model, but it is likely that several or many more similar variants remain to be found.
We have previously used classical linkage analysis to study families with hereditary mixed polyposis syndrome (HMPS) , a Mendelian' condition characterised by multiple colorectal polyps and CRC (MIM #601228) . The polyps are often dysplastic, but typically do not have a classical adenomatous morphology and are frequently classified as mixed hyperplastic/adenomatous lesions or serrated adenomas31'32. To date, HMPS has only convincingly been described in individuals of Ashkenazi Jewish descent. We mapped the disease locus, HMPS/CRAC1, in three multi-case Ashkenazi families to chromosome 15ql3.3-ql432'33. Using microsatellites in five families allowed us subsequently to demonstrate that the disease locus resides between bases 29,775,416 an34,124,377 on chromosome 15. In order to fine-map the location of HMPS/CRAC1, we extended our patient set. Since our previous study, seven previously unaffected individuals from our five families had developed characteristic serrated adenomas. We also identified two further individuals with HMPS based on a phenotype of multiple (>5) adenomas including at least three reported serrated lesions, dominant family history of colorectal tumours, self-reported Ashkenazi ancestry and absence of germline mutations in APC, MYH or the mismatch repair genes. To further refine the location of HMPS/CRAC1, we genotyped 8 selected, affected individuals (probands and other family members previously shown to have critical recombination events) and one unaffected, non-carrier mother of a patient using the Illumina Hap550 SNP array. We refined the locations of recombinations and searched for a minimal shared haplotype in the HMPS/CRAC1 region. Using these data, we restricted the location of the gene to 30,735,098-31,369,755 bases. This region contains three known genes: the 3' part of SGNE1/SCG5 (chrl5:30, 721, 252-30, 776, 590) ; GREMl /DRM/CKTSF1B1 (chrl5:30, 797, 497-30, 814, 158) ; and FMNl (chrl5 : 30, 846, 102- 31,147,525). In addition, hypothetical genes C15orf45, AX747968 and DKFZp686C2281 map to the region. Despite sequencing all coding sequences, introns, promoter regions and other highly conserved sequences within the minimal region, no mutations unique to HMPS patients were identified.
The pathogenic change in HMPS patients remained undetected, but we hypothesised that the HMPS/CRAC1 locus might harbour not only high-penetrance mutations that cause colorectal tumour in Ashkenazi Jews, but also variants that increase the risk of CRC in the general UK population. An alternative hypothesis was that the same variant might cause disease in multiple ethnic groups, but that some unknown factor in the Ashkenazi genetic background greatly increased the penetrance of the variant. We therefore analysed 145 SNPs within and flanking SGNEl, GREMl, and FORMINl (Table 10) from the Illumina Hap300 tagSNP panel in 718 CRC cases selected for family history and/or early onset and 960 unaffected controls. The most significant association found was at rs4779584
Figure imgf000067_0001
odds ratio [OR]=I.35, 95%CI 1.14-1.60), a SNP mapping to chrl5 : 30, 782, 048 bases. Associations between CRC and other SNPs within the HMPS/CRAC1 region are shown in Figure 8 and Table 10. The second most strongly associated SNP, rslO318, mapped 31kb distal to rs4779584 (PaUeie= 6.97xlCT3; OR=I.26, 95% CI 1.06-1.50).
Although the association between rs4779584 genotype and CRC did not attain formal significance after adjustment for multiple testing (P=3.45xlO~4 with Bonferroni correction for 145 SNPs), the data were suggestive of a true association. We therefore genotyped rs4779584 and rslO318 in a second, larger set of CRC cases and controls (Stage 2) . After Stage 2, both SNPs were significantly associated with disease (Table 9) and we confirmed the association in two further Stages, providing a total of 7,961 successfully typed CRC cases and 6,803 controls (Table 9) . A meta-analysis showed very strong association of both SNPs with CRC risk (Figure 9) ; the overall Paiieie for rs4779584 was 4.44xlO"14 and that for rslO318 was 7.93xlO'9, with corresponding Ors of 1.26 (95%CI 1.19-1.34) and 1.19 (95%CI 1.12-1.26) respectively. There was moderate-to-strong LD between alleles at each of the two SNPs (D' =0.77 and r2=0.57). Consistent with this, we found no evidence that each SNP contributed independently to disease risk: in a logistic regression model, inclusion of rslO318 genotypes did not provide a superior model to one based solely on rs4779584 (P=O.47) .
(o9> llti c coson rasn ro
OC co
CO
N
CN O O
CO CO
9
O
O O
OO CO
N CN
co
Figure imgf000069_0001
U) CN CN CO CO Tf Tf Table 9
Genotyping rs4779584 and rslO318 in Stages 1-4. The numbers of cases and controls by genotype in each Stage are shown, together with: identities of the major (A) and minor (B) alleles; \ value for the allele test of association; corresponding P value; odds ratio for the susceptibility (minor) allele under the allelic test; odds ratios under dominant and recessive tests; 95% confidence intervals for each test; and Hardy-Weinberg equilibrium P values. Note a probable dosage effect of the high-risk allele.
Table 9 shows the alleles for rslO318 as G/A. The skilled person would be aware that this substitution corresponds to a C/T substitution on the opposite strand. If the rslO318 sequence is represented by the plus strand CAAGATATTTGTGGTCTTGATCATAC[CZT]TATTAAAATAATGCCAAACACCAAA ^ then the skilled person would recognise that the substitution is C/T. This is also consistent with the ancestral allele at position 30,813,271 being C
Table 10
Figure imgf000071_0001
Figure imgf000072_0001
Table 10 shows Genotyping SNPs within and close to the HMPS/CRAC1 region in our stage 1 CRC samples from the general UK population. The columns show: SNP ID; location on chromosome 15; identity of the major allele (A) ; identity of the minor allele (B) ; total number of cases with each genotype; total number of controls with each genotype; Chi2i statistic; corresponding P value; odds ratio for the minor allele (since this is the susceptibility allele at rs4779584 and rslO318); lower 95% confidence interval; and higher 95% confidence interval.
We found that the HMPS/CRAC1 haplotype in the Ashkenazi HMPS patients contained the high-risk alleles at both rs4779584 and rslO318, but there were no apparent phenotypic differences between HMPS patients who were homozygous and heterozygous for the high-risk alleles at these SNPs (details not shown) . In the full patient data set, we found no association between genotype at rs4779584 or rslO318 and any of the clinico-pathological variables (see Methods) . Specifically, synchronous or metachronous serrated polyps were no more common in carriers of the high-risk alleles than the low-risk alleles (for rs4779584, P=O.36, Fisher's exact test) .
We determined linkage disequilibrium (LD) structure across the region containing HMPS/CRAC1 using our own genotyping data and data from the HapMap CEU samples. On the basis of these data, there was no evidence for strong or moderate LD (defined as r2>0.3) with rs4779584 or rslO318 outside the region chrl5:30, 782, 050-30, 841, 010. As an initial screen for possible disease-causing variants, we therefore focussed on sites with potential functional importance within this region. In order to detect unreported variants, we sequenced coding regions, UTrs, splice junctions, reported transcripts, reported control regions and other highly conserved sequences in 92 UK familial CRC cases ascertained through genetics clinics in which mutations in the known CRC genes had been excluded. From these searches and public databases, we identified nine polymorphisms (rsl2906413, rsll853552, rsll857190, rsl2148790, rsll857997, rs8034965ins/del, rs3743103, NFN28 and rsll29456) that were prioritised for further investigation in Stage 1 cases and controls .
However, all of these polymorphisms were much less strongly associated with CRC risk 13 than was rs4779584 (Table 11 and Figure 10) . We had available a panel of 39 lymphoblastoid cell lines from cases and controls with known genotypes at rs4779584 and rslO318. Using standard Taqman assays, we searched for a relationship between genotype and the expression levels of SGNEl and GREMl mRNA. We found no significant association between SGNEl or GREMl expression and the genotype at rs4779584 or rslO318 (P greater than or equal to 0.1 in all cases) . Both genes were, however, much less strongly expressed in lymphocytes than in colorectal tissues (data not shown) , and findings must therefore be interpreted with a degree of caution .
Table 11: SNPs with putative functional importance close to rs4779584 and rslO318, their LD relationships and their associations with CRC.
SNPs are present in dbSNP (www.ncbi.nlm.nih.gov/projects/SNP/) , except for rs8034965ins/del which is an unreported C/- polymorphism at chrl5: 30, 799, 068 close. to the SNP rs8034965, and NFN28 which is a previously unreported A/G SNP at chrl5 : 308, 813, 548 bases. All SNPs were initially tested in a set of 96 UK cases and LD relationships with rs4779584 and rslO318 were assessed. rsl2594235 was not assessed further owing to very low LD with rs4779584 and rslO318. The remaining SNPs were then typed in the Stage 1 samples. For each SNP, the maximum likelihood of the model for each SNP was calculated and then used to determine Akaike's information criterion (AIC), defined by 2 * (-maximum log-likelihood + number of parameters estimated) . For each SNP, the relative likelihood (Akaike weight) of the model (excepting rs4779584) was compared with the model for rs4779584 by re-scaling and normalising the AIC. Akaike JF weights estimated a relative likelihood of 0.80 that the model with rs4779584 was the best single-SNP explanation for the data.
Figure imgf000075_0001
Figure imgf000076_0001
We have shown that a locus within the HMPS/CRAC1 region is associated with low-penetrance predisposition to CRC. The SNP rslO318 is located within the 3' UTR of GREMl, a secreted bone morphogenetic protein (BMP) antagonist. Although we found no obviously pathogenic variant in GREMl, the TGF-beta/BMP pathway is known to play an important role in colorectal tumorigenesis; BMPRlA mutations, for example, cause juvenile polyposis, a disease typified by lesions that resemble HMPS polyps. It is therefore entirely plausible that GREMl may increase tumour proliferation, for example through its expression in the stroma35. rs4779584 lies between GREMl and SGNEl. Although genetically and functionally a slightly worse candidate than GREMl, neuroendocrine signalling involving SGNEl 36 could influence cellular proliferation in the large bowel through, for example, signalling of nutrient availability or through systemic hormonal effects.
Although not formally proven, because the causal variant for HMPS remains unidentified, it is evidently most likely that the same gene is involved in both Mendelian predisposition to CRC in HMPS patients and common allele/low-penetrance predisposition in the general UK population. In a similar vein, high-penetrance APC mutations cause familial adenomatous polyposis, but a low-penetrance variant, APC1307K, predisposes to colorectal tumours in the general Ashkenazi population. Restricting analysis to data from Stages 2 and 3, which represent CRC cases unselected for family history, provides estimates of risk associated with hetero- and homozygosity at rs4779584 of OR=I.23 (95%CI 1.13-1.33) and OR=I.70 (95%CI 1.41- 2.04) respectively. Hence we estimate that the low-penetrance variant on chromosome 15ql3.3 underlies about 15% of CRCs. Although this locus may only account for ~1.5% of the excess familial risk of CRC, it has the potential to be an important contributor to the overall risk. For example, we estimate from our series that individuals homozygous for the risk alleles at rs4779584 and at another low-penetrance CRC locus on chromosome 8q24.21 (SNP rs6983267) have a 2- to 3-fold increased CRC risk.
Methods
Genotyping to refine the HMPS/CRAC1 haplotype in Ashkenazi Jews
The position of the HMPS/CRAC1 locus was initially refined by identifying and genotyping 35 microsatellites within the region in 35 Ashkenazi cases with HMPS and 100 Ashkenazi controls. Details of PCR primers and conditions are available from the authors. A sub-set of the most informative HMPS cases, based on their shared haplotype region from the microsatellite analysis, was subsequently genotyped using the Illumina Hap550 beadArrays using the manufacturer's standard protocols. SNP calls from the region around HMPS/CRAC1 were manually inspected to confirm failure to share genotypes (essentially discordant homozygotes) in the set of affected individuals and to identify the locations of critical recent or ancestral recombination events.
Study participants from the general UK population
Our initial screen (Stage 1) comprised CRC cases enriched for a possible genetic origin. We did not enrich for serrated neoplasia, but we excluded individuals with advanced classical adenomas and no CRC, since the conventional adenoma-carcinoma sequence may not apply in HMPS. Germline mismatch repair and mutation carriers were excluded and no patient had a polyposis phenotype. The cases included individuals from the CORGI study (CRC at age 75 or less and at least one first-degree relative affected by CRC) and individuals from the VICTOR Trial with CRC at age 60 or less. CORGI was based on recruitment from Clinical Genetics Centres in the UK (see Supplementary Information) from 1999 to date. VICTOR was a Phase III randomised, double-blind, placebo-controlled study of rofecoxib (VIOXX) in colorectal cancer patients (Dukes stage B or C disease) following potentially curative therapy (http : //www . octo- oxford. org . uk/alltrials/infollowup . vie . html) . From these two sources, 730 familial and/or early-onset CRC cases were identified (mean age at diagnosis 61 years, male: female ratio 0.99:1). Controls (N=960) were of similar age to the cases, were unaffected by cancer and had no personal or family history (to 2nd degree relative level) of colorectal neoplasia. All cases and controls were of white UK ethnic origin for both Stage 1 and subsequent Stages.
Stage 2 comprised 4,500 CRC cases (mean age at diagnosis 60 years, male: female ratio 1.17:1), plus 3,860 healthy control individuals ascertained between 1999 and 2005 through the National Study of Colorectal Cancer Genetics (NSCCG) and the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry. Controls (mean age 60 years) were the spouses or unrelated friends of patients with malignancies. No control had a personal history of malignancy at time of ascertainment. Stage 3 comprised 2,000 CRC cases ascertained through the NSCCG post-2005 (mean age at diagnosis 60 years, male:female ratio 1.55:1) and 1,650 healthy controls (mean age 56 years). Stage 4 consisted of 313 additional cases from CORGI (mean age at diagnosis 63 years, male: female ratio 0.92:1) and 550 additional cases from VICTOR with age of presentation >60 years (mean age at diagnosis 70 years, male:female ratio 1.05:1). Stage 4 controls (N=340) were white UK population blood donors without cancer, but, for reasons of confidentiality, without recorded sex or age.
The following clinico-pathological data were collected from Stage 1: sex; site of tumour; Dukes stage; age of onset; and presence and type of synchronous or metachronous polyps. In all cases, CRC was defined according to the ninth revision of the International Classification of Diseases (ICD) by codes 153-154 and all cases had pathologically proven adenocarcinoma. Collection of blood samples and clinico-pathological information from patients and controls was undertaken with informed consent and full UK ethical review board approval .
SNP Genotyping
DNA was extracted from samples using conventional methods and quantified using PicoGreen (Invitrogen) . Selected Ashkenazi samples were typed using the Illumina Hap550 Bead Arrays according to the manufacturer's instructions. For the initial stage of the association study, some samples were analysed using the Illumina Hap300 Bead Arrays and others using the Hap550 Arrays, in both cases according to the manufacturer's protocols. Almost all Hap300 SNPs are present on the Hap550 and the Hap300 is designed to provide near-identical haplotype tagging efficiency to the Hap550 in the European population; we therefore based our analysis on Hap300 SNPs within the HMPS/CRAC1 region. DNA samples with GenCall scores <0.25 at any locus were considered "no calls". A DNA sample was deemed to have failed if it generated genotypes at fewer than 95% of loci. A SNP was deemed to have failed if fewer than 95% of DNA samples generated a genotype at the locus. To ensure quality of genotyping, a series of duplicate samples were genotyped and cases and controls were genotyped in the same batches. No SNP genotype frequencies within the region deviated significantly from Hardy-Weinberg equilibrium at P=O.05. We had previously showed that there was no evidence of differential genotyping between cases and controls or of population sub-structure in the sample set as assessed by the STRUCTURE program and a Q-Q plot of the genotype test statistics (details not shown) .
For genotyping in subsequent Stages, the KASPar genotyping system was used (http://www.kbioscience.co.uk/). Common and allele-specific PCR primers were as follows: for rs4779584, the common primer CCAGTAGAACTTGTTGATAAGCCATTCTT was used together with allele-specific primers GAAGGTGACCAAGTTCATGCTCCTGTGTATATAGTTATGGTTTCTGTTCA and GAAGGTCGGAGTCAACGGATTCTGTGTATATAGTTATGGTTTCTGTTCG; and for rslO318, the common primer TAATGCCAAGCACAAAGTGTACATCATAAA was used, together with allele-specific primers GAAGGTGACCAAGTTCATGCTGCAAGATATTTGTGGTCTTGATCATACT and GAAGGTCGGAGTCAACGGATTCAAGATATTTGTGGTCTTGATCATACC . Genotyping quality was tested using duplicate DNA samples within each assay, together with direct sequencing of subsets of 20 samples to confirm genotyping accuracy. For all SNPs, a minimum of 99% concordant results were obtained.
Statistical analysis
Statistical analyses were undertaken using STATA Software. Deviation of the genotype frequencies from those expected under Hardy-Weinberg Equilibrium (HWE) was assessed by the Chi2 test (1 degree of freedom (d.f.)), or Fisher's exact test where an expected cell count was <5. In the absence of known mode of inheritance of alleles, we primarily based our analyses on the difference between allelic frequencies in cases and controls using the Chi2 test or Fisher's exact test if an expected cell count was <5. The risks associated with each SNP were estimated by the allelic odds ratio (OR) using unconditional logistic regression. LD calculations and haplotype block structures were inferred using Haploview v3.2 (http: //www.broad.mit .edu/mpg/haploview/) .
Meta-analysis was conducted using standard methods for combining raw data based on the Mantel Haenszel method. The sibling relative risk attributable to a given SNP was calculated using the formula:
λ* = P(pr2 + qrtf + q{pr{ + q)
[pr2+2pqη+q 22]i2
where p is the population frequency of the minor allele, q=l-p, and ri and r2 are the relative risks (estimated as OR) for heterozygotes and rare homozygotes, relative to common homozygotes . Assuming a multiplicative interaction the proportion of the familial risk attributable to a SNP was calculated as log ( λ*) /log (λ 0) , where λ 0 is the overall familial relative risk estimated from epidemiological studies, assumed to be 2.2. The population attributable fraction with a SNP was estimated by (x-1) /x, where x = (1-p)2 + 2p(l-p)ORi + p2OR2 and ORi and OR2 are the Ors associated with hetero- and homozygosity respectively. Aikake information criterion analysis (Table 11) was performed as described 34.
Mutational and other analyses
The search for undescribed and potentially pathogenic sequence changes was performed by direct sequencing. Expression analysis assays were conducted by real-time quantitative (RTQ)-PCR.
Example 4
In Phase 1, we genotyped 555,352 tagSNPs in 940 individuals with colorectal neoplasia and 965 controls (Panel A) using the Illumina Hap550 BeadChips . To maximise power to identify associations, each case had at least one first-degree relative affected with CRC, thereby genetically enriching for susceptibility alleles 37'38. Of the 1,905 DNA samples submitted for genotyping, 1,890 samples were successfully processed, generating in excess of 1,000 million genotypes. Genotyping failed in only 15 individuals, leaving genotype data for 930 cases (620 with CRC and 310 with high-risk colorectal adenomas) and 960 controls. A total of 550,163 SNPs were satisfactorily genotyped (99.1%), with mean individual sample call rates (the percentage of samples for which a genotype was obtained for each SNP) of 99.7% and 99.8% in cases and controls, respectively. Of the SNPs satisfactorily genotyped, 2,516 were monomorphic, leaving 547,647 SNPs for which genotype data were informative. Comparison of the observed and expected distributions showed little evidence for an inflation of the test statistics (inflation factor A= 1.02, based on the 90% least significant SNPs; figure 14), thereby excluding the possibility of significant hidden population substructure, cryptic relatedness amongst subjects or differential genotype calling between cases and controls39.
We prioritised 42,708 of the SNPs typed in Phase 1 for genotyping in Phase 2. Phase 2 data were derived from 2,864 CRC cases (of 2,873 submitted for typing) and 2,855 controls (of 2,871 submitted), these samples being typed using customised Illumina iSelect Bead arrays. 38,733 SNPs were satisfactorily genotyped (90.7%), with mean individual sample call rates of 99.9% and 99.9% in cases and controls, respectively. A comparison of the observed and expected distributions showed only a small inflation of the test statistics (inflation factor A= 1.05; figure 14).
After a joint analysis of Phases 1 (restricted to cancer phenotype) and 2, we prioritised 11 SNPs (P<10~4) from novel regions of association for further analysis in our Phase 3 that comprised 4,287 CRC cases and 3,743 controls. Of these 11 SNPs, two were independently and consistently associated with CRC risk in Phase 3 at P<0.05 and were taken forward for genotyping in Phase 4 comprising 8 additional, independent CRC case-control series (10,731 CRC cases and 10,961 controls; table 13) . For the remaining SNPs, P was greater than 0.05 (Table 12). However, a P value of 0.12 (for instance) still indicates a probability of 88% that the result is real and not seen due to chance: thus, these remaining SNPs may have a real association with disease, and the likelihood of this is higher with lower P values. For rs2488704, rs4355419 and rs2282428, the probability of a real association with disease is very high and hence it is considered that these sites will be useful as risk indicators of disease, particularly taken together with the combined results shown in table 12 below. The SNPs rsl6892766 and rsl0795668 showed associations in Phase 4 that were significant in this Phase alone at P ≤ ICT4, with effects in the same direction as in Phases 1, 2 and 3
(Table 12) . Pooling genotype data for CRC cases and controls from all of the case-control series provided unequivocal evidence for a relationship between these SNPs and CRC risk
(figure 11), with combined significance levels of Paiieie test=8.7 x ICT18 and Paiieie test=l • 3 x 10~13, respectively. There was little evidence of heterogeneity between studies (I2=11.15, Phβt =0.43 and I2=14.81, Phet=0.10 respectively).
Table 12 : : Summary of results for eleven SNPs selected for Phase 3 , together with selected SNPs described in examples 1 to 3
OR2 (95% Cl) P value
Position MAF -
SNP Gene Chr 1 Heterozygot Homozygot Phases 1 (bp) Allelic Combined
Phase 3 Phase 4 e e & 2 3
rs6983267 8q24.21 128482487 0.48 1.24 1.35 1.57 7.0 x 1011
(1.17-1.33) (1.20-1.53) (1.38-1.80) rs4939827 SMAD7 18q21.1 44707461 0.47 0.85 0.84 0.73 1.7 x 10"6 - -
(0.80-0.91 ) (0.75-0.94) (0.64-0.83) rs4779584 15q13.3 30782048 0.19 1.23 1.17 1.70 4.7 x 107 - -
(1.14-1.34) (1.06-1.30) (1.35-2.14)
Systematically evaluated loci rs16892766 EIF3S3 8q23.3 117699864 0.07 1.25 1.27 1.43 7.4 x 1 O* 0.02 2.9 x 1011 3.3 x 1018
(1.19-1.32) (1.20-1.34) (1.13-1.82) rs4841306 MSRA 8q23.10 10159725 0.24 1.04 1.01 1.14 1.3 x 10"6 0.75 0.03
(1.00-1.08) (0.96-1.06) (1.04-1.26) rs4822442 SLC2A11 22q11.23 76808630 0.29 0.95 1.00 0.82 1.0 x 105 0.78 0.01
(0.91-0.99) (0.94-1.05) (0.75-0.91) rs12957142 18q12.3 36599267 0.20 1.10 1.08 1.29 1.4 x 10"5 0.54 5.6 X 10-4
(1.04-1.16) (1.01-1.16) (1.10-1.51 ) rs2488704 10q22.1 76808630 0.33 1.11 1.11 1.22 1.5 x 10"5 0.12 3.8 x 105
(1.05-1.16) (1.03-1.19) (1.10-1.36) rs4355419 4q13.1 63165287 0.41 1.10 1.08 1.22 1.6 x 10"5 0.09 2.4 x 105
(1.05-1.16) (1.01-1.16) (1.11-1.34) rs2989734 FCN1 9q34.30 136941135 0.36 1.04 1.05 1.07 2.2 x 105 0.41 0.07
(1.00-1.08) (0.99-1.11) (0.98-1.16)
rs11590577 NPH4 1 p36.31 5824610 0.22 1.07 1.07 1.14 3.8 X 10"5 0.55 - 0.02
(1.01-1.13) (1.00-1.14) (0.99-1.32) rs2164182 MAML2 11q21 95620677 0.05 0.89 0.87 1.00 5.8 X 10"5 0.37 - 0.03
(0.80-0.99) (0.78-0.98) (0.53-1.88) rs2282428 KCNK1 1q42.2 231852793 0.32 1.10 1.08 1.24 9.2 X 10"5 0.07 - 6 .3 x 105
(1.05-1.16) (1.01-1.15) (1.12-1.38) rs10795668 10p14 8741225 0.33 0.89 0.87 0.80 9.8 X 10-5 7.4 x 1.8 x 10 5 2. 5 x 10 13
(0.86-0.91) (0.83-0.91) (0.74-0.86) io-7
1MAF in controls in phase 2; 2Ors estimated using data from Phases 1-4; 3Combined analysis based on cancer cases only (i.e. 320 cases with adenoma excluded from phase 1)
Table 13
Number of subjects Age range Male/female ratio
Cohort Study name General setting Genotyping
Cases Controls Cases Controls Cases platform
Controls
Phase 1 Colorectal Tumour Gene Samples collected through CORGI
940 consortium United Kingdom 965 n/a n/a 0 89 Identification Consortium (CORGI) 0 83 lllumina Hap550 Phase 2 National Study of Colorectal Cancer Population based study, United 2,873 1 ,235 17 - 69 22 - 73 0 71 Genetics (NSCCG)1 Kingdom 067 lllumina lnfinium Genetic Lung Cancer Predisposition Population based study, United
Kingdom 917 21 - 69 Study (GELCAPS) 061 lllumina lnfinium Royal Marsden Hospital Trust/Institute of Cancer Research Population based study, United
719 Family History and DNA Registry Kingdom 26 - 69 081 lllumina lnfinium (FH)
Phase 3
NSCCG Population based study, United
National Study of Colorectal Cancer Kingdom Samples collected 3,036 2,944 Genetics (NSCCG)1 20 - 69 16 - 87 1 16 068 KASPar through NSCCG post-2005
VCQ Post-treatment stage of a Phase III, randomised, double blind, placebo- controlled study of rofecoxib Samples from a closed clinical
910 (VIOXX®) in colorectal cancer trial, United Kingdom 50-75 1 10 lllumina lnfinium. KASPar patients following potentially curative therapy (VICTORr
Colorectal Tumour Gene Samples collected through CORGI
202 250 Identification Consortium (CORGI) consortium. United Kingdom 21-75 25-70 1 12 1 05 KASPar
A multicentre international study of capecftabine I bevacizumab as Samples from an ongoing clinical
139 adjuvant treatment of colorectal trial United Kingdom 50-74 1 07 KASPar cancer (QUASAR2)3
European Collection of Cell Cultures Population-based controls 376
(ECACC)4 20-63 095 KASPar
UK blood donors Population-based controls 173 21 -37 087 KASPar
Penegar et al (2007) Br J Cancer, 97.1305-9 !Kerr et al. (2007) N Engl J Med, 357:360-9. * http://www octo-oxford.org.uk/alltrials/trials/Q2.htfnl 1 http://www.ecacc.org. uk/default.asp?Reload=detail2.asρ?ιtemιd=92962
Number of subjects Age range Male/female ratio
Genotyping
Cohort Study name General setting platform
Cases Controls Cases Controls Cases Controls
Phase 4
Population-based incident case
COGS Colorectal Cancer Genetics Study 1 ,012 1 ,012 18-56 21 -60 1 05 series Scotland, United Kingdom 1 06 lllumina
Pathology based genetic reference
DFCCS Dutch Familial Colorectal Cancer Study centre cohort of familial cases, 783 664 14 - 90 18 - 93 090 067 KASPar Leiden, The Netherlands
Hospital based case-control study
EPICOLON The EPICOLON project8 515 515 27 - 101 24 - 93 1 45 1 29 Sequenom iPLEX Barcelona, Spam
Finnish Colorectal Cancer Predisposition Population based case-control FCCPS 1 ,001 1 ,034 24 -94 unknown 1 03 unknown KASPar Study7 study South-eastern Finland
Case-cohort design from a High Resolution
MCCS Melbourne Collaborative Cohort Study8 prospective cohort study. 515 709 52 - 83 53 - 87 1 10 099 Melt (HRM) Curve Melbourne, Australia analysis Ό5
Population-based biobank projects
POPGENSHIP The POPGEN910 and SHIP11 projects in Northern Germany 2,569 2,699 29 - 90 21 - 81 1 16 092 KASPar
Studies of Epidemiology and Risk Factors in Population based case-control
SEARCH 19 - 73 26 - 72 1 33 0722 Taqman Cancer Heredity study, Cambridge, England 2,253 2,262
Population-based incident case
SOCCS Scottish Colorectal Cancer Study 1 53 series Scotland, United Kingdom 2,057 2,111 24-85 21 -83 1 43 lllumina
5 De Jong et al. (2004) Clin Cancer Res, 10, 972-80.
6 Pifiol et al. (2005) JAMA1 293:1986-94.
7 Salovaara et al. (2000) Clin Oncol, 18.2193-200.
8 Giles et al. (2002) IARC Sci Publ, 156:69-70.
9 Krawczak et al. (2006) Community Genet, 9:55-61
10 Schafmayer et al. (2007) lnt J Cancer, 121 :555-8
11 Volzke et al. (2005) J CIm Endocrinol Metab, 90:4587-92.
Table 14 - odds ratios and 95% confidence intervals by study and pooled, for the eleven SNPs genotyped in
Phase 3
SNP OR (95% Cl) in NSCCG OR (95% Cl) in VCQ OR (95% Cl) in Phase 3 Phase 3 P-value rs 16892766 1.16(1.01-1.32) 1.13(0.89-1.43) 1.15(1.03-1.29) 0.02 rs4841306 1.02(0.94-1.11) 0.89(0.77-1.02) 0.99(0.92-1.06) 0.75 rs4822442 0.98(0.90-1.06) 1.03(0.89-1.19) 0.99(0.92-1.06) 0.78 rs12957142 1.02(0.93-1.11) 1.05(0.90-1.23) 1.02(0.95-1.11) 0.54 rs2488704 1.06(0.98-1.14) 1.05(0.91-1.20) 1.05(0.99-1.13) 0.12 rs4355419 1.07(0.99-1.15) 1.03(0.91-1.17) 1.06(0.99-1.13) 0.09 rs2989734 0.96(0.89-1.03) 1.02(0.89-1.16) 0.97(0.91-1.04) 0.41 rs11590577 1.02(0.94-1.11) 0.85(0.73-1.00) 0.98(0.91-1.05) 0.55 rs2164182 0.97(0.82-1.15) 1.54(1.11-2.15) 1.07(0.92-1.24) 0.37 rs2282428 1.05(0.97-1.13) 1.11 (0.97-1.27) 1.06(0.99-1.14) 0.07 rs10795668 0.84 (0.77-0.90) 0.86 (0.75-0.99) 0.84 (0.79-0.90) 7.4x107
Example 5
Both rsl6892766 and rsl0795668 are within regions of fairly extensive LD. rsl6892766 maps to 8q23.3 and lies in a 220Kb LD block (117.65Mb-117.87Mb) that encompasses both EIF3S3 and predicted transcript C8orf53 (figure 12). In addition to rsl6892766, two other SNPs in the region - rsll986063 and rs6983626 - were associated with CRC risk at P<10"4 in Phases 1 and 2; both of these are correlated with rsl6892766 (r2=0.78 and r2=0.43 respectively) and in a logistic regression analysis, the inclusion of rsll986063 and rs6983626 did not significantly improve the fit of the model compared with rsl6892766 alone (P=O.43 and P=O.57, respectively), providing no evidence for more than one disease locus in this region.
We assessed in more detail the pattern of the risks associated with rsl6892766. The control MAF was similar in all populations (P=O.66), except for the Finnish population (FCCPS) where it was significantly higher than that in all other cohorts (P=2.6 x 10"7) . For rsl6892766, the minor allele was associated with an increased risk of CRC in a dose- dependent manner, with a higher risk in homozygous than heterozygous carriers (ORhet =1.27, 95% CI: 1.20-1.34 and ORhom=1.44, 95% CI:1.13-1.82) . There was little difference in the fit provided by multiplicative and dominant models (most likely due to the very low frequency of rare homozygotes) , although a recessive model could be excluded.
Since the cases in Phase 1 and one cohort (LFCCS) within Phase 4 were enriched for familial CRC, the estimate of the risk ratio may be biased away from 1.0. We therefore also computed Ors restricting analysis to data from those series unselected for family history. Odds ratios were marginally closer to unity; for rsl6892766, ORhet=l-26 (95% CI: 1.19-1.34) and ORhom=1.40, (95% CI: 1.10-1.79). We assessed associations between clinico-pathological variables and genotypes at rsl6892766. The effect of rsl6892766 was significantly stronger in younger cases (<60 years; P=O.01) . No other significant associations between molecular and clinico-pathological data were found for rsl6892766. Specifically, stratifying CRC cases by microsatellite instability or family history provided no evidence that risks were different according to MMR or family history status (Table 16) .
Example 6
The SNP rsl0795668 maps to an 82Kb LD block (8.73Mb-8.81Mb) within 10pl4 (figure 13) . Three additional SNPs in this LD block - rs706771, rs7898455 and rs827405 - showed evidence of association (P<10~3) with CRC risk in the Phase 1 and 2 joint analysis; two of these were strongly correlated (rs706771, rs7898455) and one weakly correlated (rs827405) with rsl0795668 (r2=0.90, r2=0.89 and r2=0.13 respectively). Nevertheless in a logistic regression analysis the inclusion of each of these additional SNPs did not significantly improve the fit of the model compared with rsl0795668 alone (P=O.96, P=O.92 and P=O.06 respectively), again providing no evidence for more than one disease locus in the region. There are no proven protein-coding transcripts in the vicinity of the marker SNPs that we tested, and no predicted genes within 0.4Mb of rsl0795668. The nearest predicted genes are BC031880, located 0.4Mb proximal to rsl0795668, and LOC389936, located 0.7Mb distally.
We assessed in more detail the pattern of the risks associated with rsl0795668. For rsl0795668, there was no evidence for differences in control allele frequencies between the populations studied (P=O.40) .
For rsl0795668, the minor allele was associated with a decreased risk of CRC in a dose-dependent manner, with a lower risk in homozygous than in heterozygous carriers (ORhet =0 • 87, 95% CI: 0.83-0.91 and ORhora=0.80, 95% CI: 0.74-0.86).
For rsl0795668, genotype-specific Ors were most compatible with a multiplicative model (table 15) .
Since the cases in Phase 1 and one cohort (LFCCS) within Phase 4 were enriched for familial CRC, the estimate of the risk ratio may be biased away from 1.0. We therefore also computed Ors restricting analysis to data from those series unselected for family history, as discussed above. For rsl0795668, ORhet=0.87 (95% CI: 0.83-0.91) and ORhom=0.81 (95% CI: 0.75- 0.87) .
We also assessed associations between clinico-pathological variables and genotypes at rsl0795668. There was some evidence of an association with site of CRC (P = 0.04), with the susceptibility allele more common in rectal than colonic tumours (Table 16) .
Table 15 - Risk models for the two associated SNPs rsl0795668 and rsl6892766 based on Phase 2-4 data
SNP Model -2*log-likelihood χ2 compared with base Akaike weight model rs16892766 Base 126.55 0 —
Multiplicative 196.07 69.52 0.51
Dominant 195.97 69.42 0.49
Recessive 133.04 6.49 0 rs10795668 Base 62.78 0 --
Multiplicative 109.97 47.19 0.77
Dominant 107.57 44.79 0.23
Recessive 78.14 15.36 0
$
Table 16- Clinico-pathological association testing for rsl6892766 and rsl0795668
rs10795668 rs16892766 ot χ2 adjusted
Covariate Group C OCO A ACO A AAA T t otaall X fo a r d pJh u a s s te e d AA AG GG Total P for phase
Site1 Colon 622 2684 3138 6444 61 1229 5523 6813
4.27 0.04 0.01 0.92 Rectum 359 1726 2115 4200 46 787 3567 4400
Age2 <=60 609 2740 3283 6632 74 1337 5876 7287
1.02 0.31 6.57 0.01 >60 872 3972 4508 9352 84 1732 8258 10074
MSI status3 MSI positive 23 98 107 228 1 39 191 231
0.48 0.49 0.42 0.52 MSI negative 142 681 785 1608 16 285 1312 1613
FH status" FH positive 89 517 612 1218 21 352 1631 2004
0.39 0.53 0.21 0.65 \J> FH negative 537 2476 3047 6060 57 1109 4938 6104
Gender5 Male 799 3617 4172 8588 84 1633 7551 9268
0.07 0.79 0.01 0.92 Female 693 3150 3678 7521 76 1462 6679 8217
1 based on cases from Phase 2, NSCCG, VCQ, COGS, FCCPS, MCCS and SOCCS (66.0% and 64.1% complete data from Phases 2 - 4 for rs 10795668 and rs 16892766 respectively)
2 based on cases from all case-control series in Phases 2-4 (99.2% and 99.2% complete data)
3 based on cases from Phase 2 and NSCCG (11.4% and 10.5% complete data)
4 based on cases from Phase 2, NSCCG, FCCPS and EPICOLON (45.2% and 46.3% complete data)
5 based on cases from all case-control series in Phases 2-4 (100% and 99.9% complete data)
Example 7
Modelling pairwise combinations of rsl6892766 and rsl0795668 and the previously identified CRC variants rs6983267, rs4779584, and rs4939827 showed no evidence of interactive effects between any of the CRC disease loci identified thus far (P>0.2 for all pairwise interactions; (table 16), suggesting an independent role for each locus in CRC development. Counting two for a homozygote, the risk of CRC increased with increasing numbers of variant alleles for the five loci (Ptrend = 4.8x 10~4; table 17).
Table 18 shows odds ratios corresponding to increasing numbers of risk alleles in rs6983267, rs4779584, rs4939827, rsl0795668 and rsl6892766.
Table 17 - Pairwise analysis of rs6983267, rs4939827, rs4779584, rsl6892766 and rsl0795668. For each row-column combination, numbers show the Chi-square, P value and number of samples (N) the result is based on, for inclusion of an interaction term between the two SNPs. Interactions involving rs6983267, rs4939827 and rs4779584 are based on Phase 1 and Phase 2 data. Interactions between rsl0795668 and rsl6892766 is based on data from all phases.
SNP rs6983267 rs4939827 rs4779584 rs16892766
-- ΛQOQQ97 0.08 rs^aoae^ (P=0.78, N=7,523) ~ Si r rss4<u77/9»5o8w4 (P=0.76 °, 0 N9=7,521) (P=0.42 °,"6 N5=7,523) - ^ rC1fiRQ97fifi ° 31 ° 01 0 68 rs iDoa^/DD (P=0.58, N=7,523) (P=0.92, N=7,525) (P=0.41 , N=7,523)
«107qςββfl 1 -1 1 0 09 °-72 0 84 rs iu/aoDoo (P=0.29, N=7,516) (P=0.76, N=7,518) (P=0.40, N=7,518) (P=0.36, N=33,665)
Table 18- Odds ratios corresponding to increasing numbers of risk alleles in rs6983267, rs4779584, rs4939827, rsl0795668 and rsl6892766 based on phase 2 data.
Number of risk alleles Cases (%) Controls (%) OR (95% Cl)1
0 8 (0.28) 12 (0.43) 1.00 (ref)
1 71 (2.49) 80 (2.84) 1.33 (0.51-3.44)
2 295 (10.34) 326 (11.57) 1.36 (0.55-3.37)
3 672 (23.55) 683 (24.24) 1.48 (0.60-3.63)
4 801 (28.08) 801 (28.42) 1.50 (0.61-3.69)
5 604 (21.17) 596 (21.15) 1.52 (0.62-3.75)
6 287 (10.06) 256 (9.08) 1.68 (0.68-4.18)
7 100 (3.51) 57 (2.02) 2.63 (1.02-6.82)
8+ 15 (0.53) 7 (025) 3.21 (0.91-11.41)
1.07 (1.03-1.11)
Total 2853 (100) 2818 (100) per allele
Ptrend=7.0 x 10"4
Summary
Although loss of heterozygosity involving chromosome 10pl4 is observed in CRC41, the underlying basis of the association identified at rsl0795668 is presently unclear and there is no evidence to implicate the predicted gene FLJ3802842. For rsl6892766, amplification and over-expression of EIF3S3, which regulates cell growth and viability, are features of breast, prostate and hepatocellular cancers ^2-43. Most Mendelian cancer predisposition genes influence the risk of more than one tumour type and pleiotropic effects are also a feature of 8q24 variants such as rs6983267, which affects the risk of both CRC 44and prostate carcinoma45'46. It is therefore entirely plausible that the variants we have identified will influence the risk of not only CRC, but also other cancers.
On the basis of the allele frequencies and genotypic risks, we estimate the five loci set out above account for ~3% of the excess familial CRC risk. However, irrespective of the nature of the causal variants, a high proportion of the population are carriers of at-risk genotypes. Moreover our data are compatible with a polygenic model, in which individual alleles each of which has a small effect combine either additively or multiplicatively to produce much larger risks in carriers of multiple risk alleles. Based on our data, the SNPs identified thus far have potential to be clinically useful, given that the -3% of individuals carrying seven or more deleterious alleles have a risk of CRC sufficient to warrant regular colonoscopic surveillance under current guidelines. As further susceptibility loci are identified, panels of low penetrance alleles are likely to be increasingly important clinically.
In addition to identifying novel disease loci for CRC, our GWAS analysis provides insight into the nature of low penetrance susceptibility to CRC in general. Given our staged design, we estimate that the power of our Phases 1 and 2 to identify the five loci mapping to 8q24, 18q21, 15ql3, 8q23.3, and 10pl4 with a stipulated statistical threshold of P<10"7 was 99%, 87%, 81%, 11% and 12% respectively. Thus, the power of our study to detect the major common loci confirming risks of 1.2 or greater was high (8q24, 18q21, 15ql3 variants). In contrast, we had low power to detect alleles with smaller effects and/or MAFs <10%, as evidenced by the 8q23.3 and 10pl4 variants. Such variants may represent a much larger class of susceptibility locus for CRC. The 550K tagging SNPs we have employed for the GWAS capture on average -80% of common SNPs in the European population (defined by r2 >0.8), but only -12% of SNPs with MAFs of 5-10% are tagged at this level, limiting power to detect this class of susceptibility allele. METHODS (examples 4-7)
Participants
Table 13 provides a summary of all cases and controls in the study.
Phase 1
Participants for phase 1 are set out in example 1, panel A'.
Phase 2
2,873 CRC cases (1,199 males, 1,674 females; mean age at diagnosis 59.3 years; SD ± 8.7) ascertained through two ongoing initiatives at the Institute of Cancer Research/Royal Marsden Hospital NHS Trust (RMHNHST) from 1999 onwards - The National Study of Colorectal Cancer Genetics (NSCCG) and the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry. A total of 2,871 healthy individuals were recruited as part of ongoing National Cancer Research Network genetic epidemiological studies, NSCCG (1,235), the Genetic Lung Cancer Predisposition Study (GELCAPS) (1999-2004; n=917), and the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry (1999-2004; n=719) . These controls (1,164 males, 1,707 females; mean age 59.8 years; SD ± 10.8) were the spouses or unrelated friends of patients with malignancies. None had a personal history of malignancy at time of ascertainment. All cases and controls were British Caucasians, and there were no obvious differences in the demography of cases and controls in terms of place of residence within the UK.
Phase 3
NSCCG: 3,036 CRC cases (1,629 males, 1,407 females; mean age at diagnosis 59.4 years; SD+ 8.2) and 2,944 controls (1,183 males, 1,753 females; mean age 55.2 years; SD ± 12.3) ascertained through NSCCG post 2005.
VCQ: 202 additional individuals with colorectal carcinoma from the CORGI study; 910 patients from the VICTOR study, a randomised trial of VIOXX in patients with stage B and C colorectal cancer (Kerr et al, N Engl J Med. 2007); and 139 patients from the QUASAR2 clinical trial, a study that compares standard chemotherapy of capecitabine against capecitabine plus bevacizumab. The controls were made up of: 250 unaffected spouses or partners from the CORGI study; 376 human random controls from ECACC; and 173 population blood donors. Overall, 53% of the cases and 58% of the controls were female. All cases and controls were of white UK origin.
Phase 4
COGS: 1,012 CRC cases (518 males, 494 females; mean age at diagnosis 49.6 years; SD ± 6.1) and 1,012 age- and gender- matched cancer-free population controls (518 males, 494 females; mean age 51.0 years; SD ± 5.9) . Cases were enriched for genetic aetiology by early age at onset (<55 years) . Known dominant polyposis syndromes, HNPCC or bi-allelic MYH mutation carriers were excluded. Control subjects were population controls, matched by age (+/- 2 years), gender and area of residence within Scotland.
DFCCS: 783 CRC cases (370 males, 413 females; mean age at diagnosis 53.4 years; SD ± 13.4) and 664 controls (251 males, 413 females; mean age 51.1 years; SD ± 11.3) ascertained at a clinically based genetic reference centre, Leiden, the Netherlands. This cohort consists of familial cases.
EPICOLON: 515 CRC cases (305 males, 210 females; mean age at diagnosis 70.6 years; SD ± 11.3) and 515 controls (290 males, 225 females; mean age 69.8 years; SD ± 11.7) ascertained through the EPICOLON initiative, a prospective, multi-centre, nationwide study aimed at compiling prominent epidemiological and clinical data with respect to hereditary non-polyposis colorectal cancer and other familial colorectal cancer forms in Spain. This cohort consists of an incident series collected in Barcelona. FCCPS: 1,001 CRC cases (509 males, 492 females; mean age at diagnosis 67.4 years; SD ± 11.8) and 1,034 controls (randomly selected anonymous Finnish blood donors) ascertained in southeastern Finland.
MCCS: 515 CRC cases (270 males, 245 females; mean age at diagnosis 66.2 years; SD ± 7.7) and 709 controls (352 males, 357 females; mean age 57.9 years; SD+ 7.0) ascertained in Melbourne, Australia. A random sample selected from the MCCS (Melbourne Collaborative Cohort study) cohort.
POPGENSHIP: 2,569 CRC cases (1,382 males, 1,187 females; mean age at diagnosis 62.4 years; SD ± 9.9) and 2,699 controls (1,296 males, 1,395 females; mean age 53.4 years; SD ± 15.8 ascertained through the POPGEN and SHIP population-based biobank projects based in Kiel and Greifswald, Germany.
SEARCH: 2,253 CRC cases (1,287 males, 966 females; mean age at diagnosis 59.1 years; SD ± 8.1) and 2,262 controls (949 males, 1,313 females; mean age 53.39 years; SD ± 7.61. Samples were ascertained through the SEARCH (Studies of Epidemiology and Risk Factors in Cancer Heredity) study based in Cambridge, UK. Recruitment of colorectal cancers started in 2000; initial patient contact was though the general practitioner (GP) . Control samples were collected post-2003. Eligible individuals were sex and frequency matched in five year age bands to cases. The study has been approved by the Eastern Multi-Centre Research Ethics Committee (Eastern MREC) .
SOCCS: 2,057 CRC cases (1,249 males, 808 females; mean age at diagnosis 65.8 years; SD ± 8.4) and 2,111 population controls (1,257 males, 854 females; mean age 67.9 years; SD ± 9.0) ascertained in Scotland. Cases were taken from an independent, prospective, incident colorectal cancer case series and aged <80 years at diagnosis.
In all cases CRC was defined according to the ninth revision of the International Classification of Diseases (ICD) by codes 153-154 47 and all cases had pathologically proven adenocarcinoma or adenomas .
Collection of blood samples and clinico-pathologiocal information from patients and controls was undertaken with informed consent and ethical review board approval in accordance with the tenets of the Declaration of Helsinki.
Genotyping
DNA was extracted from samples using conventional methodologies and quantified using PicoGreen (Invitrogen) . Phase 1 genotyping was as described in example 1. In Phase 2 genotyping was conducted using Illumina Infinium custom arrays according to the manufacturer's protocols. A DNA sample was deemed to have failed if it generated genotypes at fewer than 95% of loci. For both Phases 1 and 2, a SNP was deemed to have failed if fewer than 95% of DNA samples generated a genotype at the locus. To ensure quality of genotyping, a series of duplicate samples were genotyped and cases and controls were genotyped in the same batches in both Phases 1 and 2.
Phase 3 genotyping was conducted by competitive allele- specific PCR KASPar chemistry (KBiosciences Ltd, Hertfordshire, UK) ; primers and probes used are available on request. Genotyping quality control was tested using duplicate DNA samples within studies and SNP assays, together with direct sequencing of subsets of samples to confirm genotyping accuracy. For all SNPs, >99.9% concordant results were obtained.
Phase 4 genotyping used the same method as Phase 3 or standard alternatives depending upon facilities available locally. For all Phase 4 series typed other than by KASPar, local genotyping quality was confirmed by undertaking KASPar genotyping in a random set of 48 samples and found >98% concordance for all series. Microsatellite instability (MSI) in CRCs was determined using the following methodology: lOum sections were cut from formalin fixed paraffin embedded tumours, lightly stained with toluidine blue, and regions containing at least 60% tumour micro-dissected. Tumour DNA was extracted using the QIAamp DNA Mini kit (Qiagen, Crawley, UK) according to the manufacturer's instructions and genotyped for the mononucleotide microsatellite loci BAT25 and BAT26 which are highly sensitive markers of MSI48. Samples showing novel alleles at either BAT26 or BAT25 or both markers were assigned as MSI (corresponding to a high level of instability, MSI-H49.
Statistical analysis
The adequacy of the case-control matching and possibility of differential genotyping of cases and controls was formally evaluated using Q-Q plots of test statistics. The inflation factor λ was calculated by dividing the mean of the lower 90% of the test statistics by the mean of the lower 90% of the expected values from a χ2 distribution with 1 d.f. Deviation of the genotype frequencies in the controls from those expected under Hardy-Weinberg Equilibrium (HWE) was assessed by χ2 test (1 d.f.), or Fisher's exact test where an expected cell count was <5.
Association between SNP genotype and disease status was primarily assessed using the allelic 1 d.f. test or Fisher's exact test where an expected cell count was <5. The risks associated with each SNP were estimated by allele, heterozygous and homozygous odds ratios (OR) using unconditional logistic regression, and associated 95% confidence intervals (CIs) were calculated in each case.
Joint analysis of data generated from multiple phases was conducted using standard methods for combining raw data based on the Mantel-Haenszel method50. Cochran's Q statistic to test for heterogeneity and the I2 statistic51 to quantify the proportion of the total variation due to heterogeneity were calculated. Differences between allele frequencies in controls from the different populations in phase 4 were assessed using a χ2 test.
We used Haploview software (v3.2) to infer the LD structure of the genome in the regions containing loci associated with disease risk. Relationships between multiple SNPs showing association with CRC risk in the same region were investigated using logistic regression analysis, and the impact of additional SNPs from the same region was assessed by a likelihood-ratio test.
Patterns of risk for associated SNPs were investigated by logistic regression, coding the SNP genotypes according to additive, dominant and recessive models. Models were then compared by calculating the Akaike information criterion (AIC) and Akaike weights for each mode of inheritance. Associations by site (colon/rectum) , MSI status, family history status (at least one first-degree relative with CRC) and age at diagnosis (stratifying into two groups by the median age at diagnosis) were examined by logistic regression in case-only analyses. The combined effect of each pair of loci identified as associated with CRC risk was investigated by logistic regression modelling and evidence for interactive effects between SNPs assessed by likelihood ratio test. The OR and trend test for increasing numbers of deleterious alleles was estimated based on the Phase 2 data by counting two for a homozygote and one for a heterozygote .
The population attributable fraction was estimated by (x-l)/x, where x= (1-p)2 + 2p (1-P)OR1 + P2OR2, p is the population allele frequency, and OR1 and OR2 are the Ors associated with hetero- and homozygosity respectively. The sibling relative risk attributable to a given SNP was calculated using the formula52: p(pr2 + ςr?'i )2 + q{prx + q)2 λ* =
IpV2 + 2pqrλ + q2}2 where p is the population frequency of the minor allele, q=l- p, and ri and r2 are the relative risks (estimated as OR) for heterozygotes and rare homozygotes, relative to common homozygotes. Assuming a multiplicative interaction the proportion of the familial risk attributable to a SNP was calculated as log (λ*) /log (A0) , where λ0 is the overall familial relative risk estimated from epidemiological studies, assumed to be 2.253.
Example 8
In addition to the above, SNPs in the SMAD7 region (18q21), the 8q23 region and the HPMS/CRAC1 region (15ql3) have been shown to be associated with CRC risk with a P value of less than 10~5, a very high significance level, in a meta-analysis of two different studies.
The GWA studies were both conducted in samples from UK populations (co-ordinated by centres in London and Edinburgh) and both were based on designs involving two-phase strategies, The London Phase 1 was based on genotyping 940 cases with familial colorectal neoplasia and 965 controls ascertained through the CORGI consortium for 555,352 SNPs using the Illumina HumanHap550 BeadChip Array. Phase 1 in the Edinburgh study consisted of genotyping 1,012 early-onset (aged ≤ 55 years) Scottish CRC cases and 1,012 controls for 555,510 SNPs using the Illumina HumanHap300 and HumanHap240S arrays.
After applying quality control filters, the following data were available: London Phase 1: 547,487 polymorphic SNPs in 922 familial neoplasia cases (614 with CRC and 308 with high- risk colorectal adenomas) and 927 controls; Edinburgh Phase 1: 548,586 polymorphic SNPs in 980 CRC cases and 1,002 controls. London Phase 2 was based on genotyping 2,873 CRC cases and 2,871 controls ascertained through the National Study of Colorectal Cancer Genetics (NSCCG) , while Edinburgh Phase 2 was based on genotyping 2,057 cases and 2,111 controls. For Phase 2, the London and Edinburgh samples were genotyped for a common set of SNPs: the 14,982 SNPs most strongly associated with colorectal neoplasia from London Phase 1; the 14,972 most strongly associated SNPs from Edinburgh Phase 1 (432 of these SNPs were common to both the London and Edinburgh lists of most strongly associated SNPs); and 13,186 SNPs showing the strongest association with CRC risk from a joint analysis of all CRC cases and controls from both Phase 1 datasets (that were not already included in any of the preceding categories) . Therefore Phase 2 was based on genotyping 42,708 SNPs in total. After applying quality control filters the following data were available: London Phase 2 38,715 polymorphic SNPs in 2,854 cases and 2,822 controls; and Edinburgh Phase 2 38,710 polymorphic SNPs in 2,024 cases and 2,092 controls. Overall, there were 38,710 polymorphic SNPs common to all four data sets (Phases 1 and 2 in London and Edinburgh) .
Prior to undertaking the Phase 1+2 meta-analysis, we searched for potential errors and biases in the 4 case-control series. Comparison of the observed and expected distributions showed little evidence for an inflation of the test statistics in any of the datasets (inflation factor A= 1.02 and 1.05 for London Phases 1 and 2 and 1.02 and 1.08 for Edinburgh Phases 1 and 2, based on the 90% least significant SNPs), thereby excluding the possibility of significant hidden population substructure, cryptic relatedness amongst subjects or differential genotype calling. Using data on all CRC cases and controls from the 4 series we derived joint odds ratios (ORs) and confidence intervals under a fixed effects model for each SNP, and associated P values under the allelic test from the standard normal distribution. The distribution of the association P values was significantly skewed from the null distribution with 76 of the SNPs having a P value < 10"4, greater than the 39 expected under the null hypothesis (P = 1.2 x 107, binomial test) .
Results for SNPs in the 18q21, 8q23 and 15ql3 regions are shown below. (For rsll986063 (C/T) , the risk allele is T; for rs6983626 (C/T), the risk allele is T; other risk alleles are indicated above) .
London Phase 1 Edinburgh Phase 1 London Phase 2 Edinburgh Phase 2 Pooled analysis
Position OR OR
SNPName Chr P OR P P P OR OR
P (bp) (95% Cl) (95% Cl) (95% Cl) (95% Cl) (95% Cl) Ph.t I2
1.36 1.18 1.17 0.2
4470792 3.5E- 8.1E- 1.15 4.7E- 1.19 24.2 rs12953717 18 (1.17,1. (1.04,1. 3.7E-05 (1.09,1. 1.1E-03 (1.13,1.2 7
7 05 03 (1.06,1.26) 12 % 57) 34) 26) 5)
0.73 0.83 0.88 0.85 0.1
4470746 3.3E- 4.1E- 0.85 43.9 rs4939827 18 (0.64,0. (0.73,0. 1.1E-03 (0.82,0. 2.2E-
2.2E-04 (0.81 ,0.8 5 1 05 03 (0.78,0.93) 11 % 85) 94) 95) 9)
1.35 1.22 1.38 1.32 0.7
1176998 1.7E- 6.7E- 1.28 0%
1.1E- rs16892766 8 (1.06,1. (0.99,1. 1.3E-06 (1.21 ,1. 1.1E-03 64 02 02 (1.10,1.49) (1.21 ,1.4 7 10 73) 51) 58) 4)
1.30 1.25 1.35 1.29 0.8
1177094 2.6E- 3.4E- 1.23 0%
5.7E- rs 11986063 8 06 (1.19,1. 4.9E-03 (1.19,1.4 1 96 02 (1.03,1. 02 (1.02,1. 2.3E- 1.06,1.42) 10 64) 53) 54) 0)
1.35 1.31 1.20 1.19 0.0 60.0
3078204 8.8E- 7.0E- 1.06 1.7E- rs4779584 15 ,1. 7.4E-05 (1.10,1. 0.29 (1.12,1.2 8 04 (1.13,1. 04 (1.12 (0.95,1.19) 6 08 % 61) 53) 32) 6)
1.26 1.34 15
30813 1.6E- 1. 27 9.8E- 1.11 1.18 0.2
1.2E- 35.1 rs10318 15 (1.06,1. (1.15,1. 3.7E-03 (1.05,1. 5.7E-02 1 03 04 (1.00,1.24) (1.11,1.2 0 07 % 51) 57) 26) 5)
1.36 1.12 1.23 1.21 0.5 0%
1178713 9.3E- 1.16 4.4E- rs6983626 8 (1.08,1. 0.27 (0.92,1. 8.3E-04 (1.09,1. 4.5E-02 29 03 (1.00,1.34) (1.11 ,1.3 9 06 72) 37) 40) 1)
References
URLs
R: http : //www. r-project.org/
Detailed information on the tag SNP panel can be found at http : //www . illumina . com/ dbSNP- http : //www. ncbi .nlm.nih.gov/entrez/query. fcgi?CMD=search&DB=sn
E
HapMap- http : //www.hapmap. org/ http: //pipeline . IbI ,gov/cgi-bin/gateway2
Margarita- http : //www . Sanger . ac . uk/Software/analysis/margarita/
KBioscience- http : //kbioscience .co.uk/
Haploview- http : //www . broad. mit . edu/personal/jcbarret/haploview/
1. Lichtenstein, P. et al . N Engl J Med 343, 78-85 (2000).
2. Aaltonen, L., et al . Clin Cancer Res 13, 356-61 (2007).
3. Tomlinson, I. et al . Wat Genet 39, 984-8 (2007).
4. Gudmundsson, J. et al. Nat Genet 39, 631-7 (2007).
5. Haiman, CA. et al . Wat Genet 39, 638-44 (2007).
6. Zanke, B.W. et al . Nat Genet 39, 989-94 (2007).
7. Haiman, CA. et al . Wat Genet 39, 954-6 (2007).
8. Minichiello, M.J. & Durbin, R. Am J Hum Genet 79, 910-22
(2006) .
9. ten Dijke, P. & Hill, CS. Trends Biochem Sci 29, 265-73 (2004) .
10. Levy, L. & Hill, CS. Cytokine Growth Factor Rev 17, 41- 58 (2006) .
11. Gaasenbeek, M. et al . Cancer Res 66, 3471-9 (2006).
12. Diseases, I. S. Co. 1975 Revision, (WHO Geneva, 1977).
13. Tomlinson, I. et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Wat Genet (2007). 14. Zhou, X. P. et al . Determination of the replication error phenotype in human tumors without the requirement for matching normal DNA by analysis of mononucleotide repeat microsatellites . Genes Chromosomes Cancer 21, 101-7
(1998) .
15. Boland, CR. et al. A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer. Cancer Res 58, 5248-57 (1998) .
16. Minichiello, M.J. & Durbin, R. Mapping trait loci by use of inferred ancestral recombination graphs. Am J Hum Genet 79, 910-22 (2006) .
17. Stephens, M., Smith, N.J. & Donnelly, P. A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68, 978-89 (2001) .
18. Petitti, D. Meta-analysis Decision Analysis and Cost- Effectiveness Analysis, (Oxford, New York, Oxford, 1994).
19. Higgins, J. P. & Thompson, S. G. Quantifying heterogeneity in a meta-analysis. Stat Med 21, 1539-58 (2002) .
20. Cox, A. et al . A common coding variant in CASP8 is associated with breast cancer risk. Nat Genet 39, 352-8 (2007) .
21. Johns, L. E. & Houlston, R. S. A systematic review and meta-analysis of familial colorectal cancer risk. Am J Gastroenterol 96, 2992-3003 (2001) .
22. Sun, T. et al. A six-nucleotide insertion-deletion polymorphism in the CASP8 promoter is associated with susceptibility to multiple cancers. Nat Genet 39, 605-13 (2007) .
23. Amundadottir, L. T. et al . A common variant associated with prostate cancer in European and African populations. Nat Genet 38, 652-8 (2006) .
24. Gudmundsson, J. et al . Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nat Genet 39, 631-7 (2007). 25. Haiman, CA. et al . Multiple regions within 8q24 independently affect risk for prostate cancer. Nat Genet 39, 638-44 (2007) .
26. Yeager, M. et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet 39, 645-9 (2007) .
27. Easton, D. F. et al . Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, 1087-93 (2007) .
28. Haiman, C. A. et al . A common genetic risk factor for colorectal and prostate cancer. Nat Genet (2007) .
29. Tomlinson, I. et al . A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet (2007).
30. Zanke, B. W. et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet (2007) .
31. Whitelaw, S. C. et al . Clinical and molecular features of the hereditary mixed polyposis syndrome. Gastroenterology 112, 327-34 (1997) .
32. Jaeger, E. E. et al . An ancestral Ashkenazi haplotype at the HMPS/CRAC1 locus on 15ql3-ql4 is associated with hereditary mixed polyposis syndrome. Am J Hum Genet 72, 1261-7 (2003) .
33. Tomlinson, I. et al. Inherited susceptibility to colorectal adenomas and carcinomas : evidence for a new predisposition gene on 15ql4-q22.
Gastroenterology 116, 789-95(1999).
34. Burnham, K. P. & Anderson, D. R. Model Selection and Multimodel Inference. A Practical Information Theoretic Approach, (Springer, New York, 2002) .
35. Sneddon, J. B. et al . Bone morphogenetic protein antagonist gremlin 1 is widely expressed by cancer-associated stromal cells and can promote tumor cell proliferation. Proc Natl Acad Sci U S A 103, 14842-7 (2006) .
36. Seidah, N. G. & Chretien, M. Proprotein and prohormone convertases : a family of subtilases generating diverse bioactive polypeptides. Brain Res 848, 45-62 (1999).
37. Antoniou, A. C. & Easton, D. F. Polygenic inheritance of breast cancer: Implications for design of association studies. Genet Epidemiol 25, 190-202 (2003).
38. Houlston, R. S. & Peto, J. The future of association studies of common cancers. Hum Genet 112, 434-5 (2003).
39. Clayton, D. G. et al . Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet 37, 1243-6 (2005) .
40. Jaeger, E. E. et al . An ancestral Ashkenazi haplotype at the HMPS/CRAC1 locus on 15ql3-ql4 is associated with hereditary mixed polyposis syndrome. Am J Hum Genet 72, 1261-7 (2003) .
41. Shima, H. et al . Loss of heterozygosity on chromosome 10pl4-pl5 in colorectal carcinoma. Pathobiology 72, 220-4 (2005) .
42. Savinainen, K.J. et al. Expression and copy number analysis of TRPSl, EIF3S3 and MYC genes in breast and prostate cancer. Br J Cancer 90, 1041-6 (2004) .
43. Okamoto, H., Yasui, K., Zhao, C, Arii, S. & Inazawa, J. PTK2 and EIF3S3 genes may be amplification targets at 8q23-q24 and are associated with large hepatocellular carcinomas. Hepatology 38, 1242-9 (2003).
44. Zanke, B. W. et al . Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet (2007).
45. Haiman, CA. et al . Multiple regions within 8q24 independently affect risk for prostate cancer. Wat Genet 39, 638-44 (2007) .
46. Haiman, CA. et al. A common genetic risk factor for colorectal and prostate cancer. Nat Genet (2007).
47. Diseases, I. S. Co. 1975 Revision, (WHO Geneva, 1977).
48. Zhou, X. P. et al . Determination of the replication error phenotype in human tumors without the requirement for matching normal DNA by analysis of mononucleotide repeat microsatellites . Genes Chromosomes Cancer 21, 101-7
(1998) . 49. Boland, CR. et al . A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer. Cancer Res 58, 5248-57 (1998) .
50. Petitti, D. Meta-analysis Decision Analysis and Cost- Effectiveness Analysis, (Oxford, New York, Oxford, 1994).
51. Higgins, J. P. & Thompson, S. G. Quantifying heterogeneity in a meta-analysis. Stat Med 21, 1539-58 (2002) .
52. Cox, A. et al . A common coding variant in CASP8 is associated with breast cancer risk. Wat Genet 39, 352-8 (2007) .
53. Johns, L. E. & Houlston, R. S. A systematic review and meta-analysis of familial colorectal cancer risk. Am J Gastroenterol 96, 2992-3003 (2001) .

Claims

Claims
1. A method of determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the SMAD7 gene.
2. The method of claim 1, which is a method of assessing an individual for a cancer condition or tumour.
3. The method of claim 2, wherein the assessment is of the risk of that individual to tumour or cancer.
4. A method of identifying an individual who is at increased risk of colorectal adenoma or colorectal cancer, the method comprising : determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the SMAD7 gene; the presence of a variant allele at the one or more sites being indicative that the individual is at increased risk of colorectal cancer or colorectal adenoma.
5. The method of any one of the preceding claims, wherein said site of polymorphism is located between bases 44,700,221 and 44,731,079 of human chromosome 18.
6. The method of claim 5, wherein said site of polymorphism is located between bases 44,700,221-44,716,898 of human chromosome 18.
7. The method of any one of the preceding claims, wherein said site of polymorphism is a single nucleotide polymorphism (SNP) .
8. The method according to claim 7, wherein said SNP is selected from the group consisting of rs4939827, rsl2953717, rs4464148, or is a site in linkage disequilibrium therewith.
9. The method according to claim 8, wherein the site in linkage disequilibrium has an r2 value of at least 0.5 with rs4939827, rsl2953717 and/or rs4464148.
10. The method according to claim 9, wherein the site in linkage disequilibrium is located at position 44703563 of chromosome 18.
11. The method according to claim 7, wherein said SNP is located at position 44703563 of chromosome 18, or is a site in linkage disequilibrium therewith.
12. A method of assessing an individual for a cancer condition, comprising: determining the expression level of SMAD7 in a sample obtained from said individual.
13. A method of determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the HMPS/CRAC1 locus .
14. The method of claim 13, which is a method of assessing an individual for a cancer condition or tumour.
15. The method of claim 13, wherein the assessment is of the risk of that individual to tumour or cancer.
16. A method of identifying an individual who is at increased risk of colorectal adenoma or colorectal cancer, the method comprising: determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the HMPS/CRAC1 locus; the presence of a variant allele at the one or more sites being indicative that the individual is at increased risk of colorectal cancer or colorectal adenoma.
17. The method of any one of claims 13 to 16 wherein said site of polymorphism is located between bases 29,775,416 and 34,124,377 of human chromosome 15.
18. The method of any one of claims 13 to 17, wherein said site of polymorphism is a SNP.
19. The method according to claim 18, wherein said SNP is selected from the group consisting of rs4779584 and rslO318, or is a site in linkage disequilibrium therewith.
20. The method according to claim 19, wherein the site in linkage disequilibrium has an r2 value of at least 0.5 with rs4779584 and rslO318.
21. A method of determining, in a sample obtained from an individual, the allele present at one or more sites of polymorphism, wherein said one or more sites of polymorphism are associated with tumour and/or cancer susceptibility, and wherein at least one site of polymorphism is located in a region selected from:
Chromosome 8: 117650000-117870000; Chromosome 10: 8730000-8810000; Chromosome 10: 76789000-76811000; Chromosome 4: 63108000-63170000; and Chromosome 1: 231840000-231868000.
22. The method of claim 21, which is a method of assessing an individual for a cancer condition or tumour.
23. The method of claim 22, wherein the assessment is of the risk of that individual to tumour or cancer.
24. A method for identifying an individual who is at increased risk of tumour or cancer, the method comprising determining, in a sample obtained from an individual, the allele present at one or more sites of polymorphism, wherein said one or more sites of polymorphism are associated with tumour and/or cancer susceptibility, and wherein at least one site of polymorphism is located in a region selected from:
Chromosome 8: 117650000-117870000;
Chromosome 10: 8730000-8810000;
Chromosome 10: 76789000-76811000;
Chromosome 4: 63108000-63170000; and
Chromosome 1: 231840000-231868000.
25. The method of any one of claims 21 to 24, wherein the polymorphism is one or more single nucleotide polymorphisms.
26. The method according to claim 25, wherein at least one of said polymorphisms is selected from rsl6892766, rs2488704, rs4355419, rs2282428 and/or rsl0795668, or a site in linkage disequilibrium therewith.
27. The method of claim 26, wherein said site in linkage disequilibrium is in linkage disequilibrium with a value of r2 greater than 0.5.
28. The method of claim 26, wherein said site in linkage disequilibrium is rsll986063 or rs6983626.
29. The method of claim 26 wherein said site in linkage disequilibrium is rs706771, rs7898455 or rs827405.
30. The method of any one of claims 21 to 29, comprising determining the allele present at a site of polymorphism in a plurality of said regions.
31. The method of any one of claims 21 to 30, wherein said method comprises additionally determining the allele present at one or more sites of polymorphism in the region of the SMAD7 gene and/or; in the region of the HMPS/CRAC1 locus.
32. The method of claim 31, wherein said method comprises additionally determining the allele present at rs4939827, rsl2953717 and/or rs4464148, or a site in linkage disequilibrium therewith.
33. The method of claim 32, wherein said site in linkage disequilibrium therewith is position 44703563 of chromosome 18.
34. The method of any one of claims 31 to 33, wherein said method comprises additionally determining the allele present at rs4779584 and/or rslO318, or a site in linkage disequilibrium therewith.
35. The method of any one of claims 21 to 34, wherein said method comprises additionally determining the allele present at rs6983267.
36. The method of any one of claims 21 to 35, wherein said method comprises determining the allele present at rsl6892766, rsl0795668, rs4939827, rsl2953717 and rs4464148.
37. The method of any one of the preceding claims, wherein the cancer or tumour is colorectal tumour or cancer.
38. The method of claim 37, wherein said tumour is an adenoma .
39. The method of any one of the preceding claims, wherein determining the allele present at said one or more sites of polymorphism comprises amplifying a region containing the site of polymorphism.
40. The method of any one of the preceding claims, wherein determining the allele present at said one or more sites of polymorphism comprises sequencing the site of polymorphism.
41. The method of any one of claims 1 to 38, wherein determining the allele present at one or more sites of polymorphism comprises hybridisation of an allele-specific probe .
42. The method of any one of the preceding claims, wherein the method comprises determining the allele present in both copies of the DNA sequence of said individual .
43. A kit for determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the SMAD7 gene.
44. A kit for determining in a sample obtained from an individual, the presence or absence of a variant allele at one or more sites of polymorphism in the region of the HMPS/CRAC1 locus .
45. The kit of claim 43 or 44, which comprises a labelled oligonucleotide probe which binds to an allelic variant at a site of polymorphism.
46. The kit of any one of claims 43 to 45, which comprises a microarray .
47. The kit of claim 43, which comprises one or more oligonucleotide primers which bind specifically to the SMAD7 gene and are suitable for amplifying a region of the gene containing one or more sites of polymorphism.
48. The kit of claim 44, which comprises one or more oligonucleotide primers which bind specifically to the HMPS/CRΛC1 locus and are suitable for amplifying a region of the locus containing one or more sites of polymorphism.
49. A kit for determining in a sample obtained from an individual, the allele present at one or more sites of polymorphism, wherein said one or more sites of polymorphism are associated with tumour and/or cancer susceptibility, and wherein at least one site of polymorphism is located in a region selected from:
Chromosome 8: 117650000-117870000; Chromosome 10: 8730000-8810000; Chromosome 10: 76789000-76811000; Chromosome 4: 63108000-63170000; and Chromosome 1: 231840000-231868000.
50. The kit of claim 49, comprising one or more nucleic acid molecules capable of hybridising within said region.
51. The kit of claim 50, wherein said nucleic acid molecule hybridises to a region comprising the site of polymorphism.
52. The kit of claim 51, wherein said nucleic acid molecule is immobilised on a solid support.
53. The kit of claim 51, wherein the kit comprises nucleic acids capable of hybridising to each of the possible alleles at the site of polymorphism.
54. The kit of claim 49, wherein said kit comprises a pair of nucleic acid primers hybridising to sequences flanking or comprising said site of polymorphism and suitable for amplification thereof.
PCT/GB2008/003454 2007-10-12 2008-10-13 Cancer susceptibility loci WO2009047532A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US97976607P 2007-10-12 2007-10-12
US60/979,766 2007-10-12
US3929408P 2008-03-25 2008-03-25
US61/039,294 2008-03-25

Publications (2)

Publication Number Publication Date
WO2009047532A2 true WO2009047532A2 (en) 2009-04-16
WO2009047532A3 WO2009047532A3 (en) 2009-07-02

Family

ID=40347966

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2008/003454 WO2009047532A2 (en) 2007-10-12 2008-10-13 Cancer susceptibility loci

Country Status (1)

Country Link
WO (1) WO2009047532A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108531601A (en) * 2018-05-29 2018-09-14 成都中创清科医学检验所有限公司 A kind of primer and detection method for detecting the relevant SNP site of liver cancer susceptibility
WO2023154770A1 (en) * 2022-02-09 2023-08-17 9 Meters Biopharma, Inc. Compositions and methods for inhibiting rho kinase
CN116949179A (en) * 2023-07-04 2023-10-27 中国医学科学院北京协和医院 Colorectal tumor polygene genetic risk scoring system, storage medium and electronic device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1314787A2 (en) * 2001-11-26 2003-05-28 Kudoh, Norio The Midkine gene is associated with cancer
WO2003070082A2 (en) * 2002-02-21 2003-08-28 Idgene Pharmaceuticals Ltd. Association of snps in the comt locus and neighboring loci with schizophrenia, bipolar disorder, breast cancer and colorectal cancer
US20030215528A1 (en) * 2000-04-26 2003-11-20 Queens University At Kingston Formulations and methods of using nitric oxide mimetics against a malignant cell phenotype
WO2005014782A2 (en) * 2003-06-13 2005-02-17 Alnylam Europe Ag., Double-stranded ribonucleic acid with increased effectiveness in an organism
WO2006012361A2 (en) * 2004-07-01 2006-02-02 University Of Southern California Genetic markers for predicting disease and treatment outcome
WO2008046964A2 (en) * 2006-10-16 2008-04-24 Panu Jaakkola Novel useful inhibitors
WO2008106785A1 (en) * 2007-03-05 2008-09-12 Cancer Care Ontario Assessment of risk for colorectal cancer

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030215528A1 (en) * 2000-04-26 2003-11-20 Queens University At Kingston Formulations and methods of using nitric oxide mimetics against a malignant cell phenotype
EP1314787A2 (en) * 2001-11-26 2003-05-28 Kudoh, Norio The Midkine gene is associated with cancer
WO2003070082A2 (en) * 2002-02-21 2003-08-28 Idgene Pharmaceuticals Ltd. Association of snps in the comt locus and neighboring loci with schizophrenia, bipolar disorder, breast cancer and colorectal cancer
WO2005014782A2 (en) * 2003-06-13 2005-02-17 Alnylam Europe Ag., Double-stranded ribonucleic acid with increased effectiveness in an organism
WO2006012361A2 (en) * 2004-07-01 2006-02-02 University Of Southern California Genetic markers for predicting disease and treatment outcome
WO2008046964A2 (en) * 2006-10-16 2008-04-24 Panu Jaakkola Novel useful inhibitors
WO2008106785A1 (en) * 2007-03-05 2008-09-12 Cancer Care Ontario Assessment of risk for colorectal cancer

Non-Patent Citations (17)

* Cited by examiner, † Cited by third party
Title
BOULAY JEAN-LOUIS ET AL: "Combined copy status of 18q21 genes in colorectal cancer shows frequent retention of SMAD7" GENES CHROMOSOMES AND CANCER, vol. 31, no. 3, July 2001 (2001-07), pages 240-247, XP002516294 ISSN: 1045-2257 *
BRODERICK ET AL.: "Lack of a relationship between the common 18q24 variant rs12953717 and risk of chronic lymphocytic leukemia." 19000101, February 2008 (2008-02), XP008102551 *
BRODERICK PETER ET AL: "A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk" NATURE GENETICS, vol. 39, no. 11, November 2007 (2007-11), pages 1315-1317, XP002516278 ISSN: 1061-4036 *
CARVAJAL-CORMONA: "A genome-wide association study of tag SNPs identify five novel colorectal cancer susceptibility loci." EUROPEAN JOURNAL OF CANCER. SUPPLEMENT, PERGAMON, OXFORD, GB, vol. 6, no. 9, 1 July 2008 (2008-07-01), pages 10-11, XP022833561 ISSN: 1359-6349 [retrieved on 2008-07-01] *
GUDMUNDSSON JULIUS ET AL: "Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24" NATURE GENETICS, NATURE PUBLISHING GROUP, NEW YORK, US, vol. 39, no. 5, 1 May 2007 (2007-05-01), pages 631-637, XP002469972 ISSN: 1061-4036 cited in the application *
HAIMAN CHRISTOPHER A ET AL: "A common genetic risk factor for colorectal and prostate cancer" NATURE GENETICS, vol. 39, no. 8, August 2007 (2007-08), pages 954-956, XP002516282 ISSN: 1061-4036 cited in the application *
HALDER S K ET AL: "Smad7 induces tumorigenicity by blocking TGF-beta-induced growth inhibition and apoptosis" EXPERIMENTAL CELL RESEARCH, ACADEMIC PRESS, US, vol. 307, no. 1, 1 July 2005 (2005-07-01), pages 231-246, XP004908935 ISSN: 0014-4827 *
HE WEI ET AL: "Smads mediate signaling of the TGFbeta superfamily in normal keratinocytes but are lost during skin chemical carcinogenesis" ONCOGENE, vol. 20, no. 4, 25 January 2001 (2001-01-25), pages 471-483, XP002516279 ISSN: 0950-9232 *
KLEEFF J ET AL: "The TGF-beta signaling inhibitor Smad7 enhances tumorigenicity in pancreatic cancer" ONCOGENE, vol. 18, no. 39, 23 September 1999 (1999-09-23), pages 5363-5372, XP002516280 ISSN: 0950-9232 *
KORCHYNSKYI OLEXANDER ET AL: "Expression of Smad proteins in human colorectal cancer" INTERNATIONAL JOURNAL OF CANCER, vol. 82, no. 2, 19 July 1999 (1999-07-19), pages 197-202, XP002516295 ISSN: 0020-7136 *
LANDSTROM M ET AL: "Smad7 mediates apoptosis induced by transforming growth factor beta in prostatic carcinoma cells" CURRENT BIOLOGY, CURRENT SCIENCE, GB, vol. 10, no. 9, 4 May 2000 (2000-05-04), pages 535-538, XP002252297 ISSN: 0960-9822 *
LITTLE J ET AL: "Family history, metabolic gene polymorphism, diet and risk of colorectal cancer." EUROPEAN JOURNAL OF CANCER PREVENTION : THE OFFICIAL JOURNAL OF THE EUROPEAN CANCER PREVENTION ORGANISATION (ECP) DEC 1999, vol. 8 Suppl 1, December 1999 (1999-12), pages S61-S72, XP008102586 ISSN: 0959-8278 *
MONTELEONE G ET AL: "BLOCKING SMAD7 RESTORES TGF-BETA1 SIGNALING IN CHRONIC INFLAMMATORY BOWEL DISEASE" JOURNAL OF CLINICAL INVESTIGATION, AMERICAN SOCIETY FOR CLINICAL INVESTIGATION, US, vol. 108, no. 4, 1 August 2001 (2001-08-01), pages 601-609, XP001152527 ISSN: 0021-9738 *
TOMLINSON IAN ET AL: "A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21" NATURE GENETICS, vol. 39, no. 8, August 2007 (2007-08), pages 984-988, XP002516281 ISSN: 1061-4036 cited in the application *
WATANABE YUKIO ET AL: "A catalog of 106 single-nucleotide polymorphisms (SNPs) and 11 other types of variations in genes for transforming growth factor-beta1 (TGF-beta1) and its signaling pathway" JOURNAL OF HUMAN GENETICS, vol. 47, no. 9, 2002, pages 478-483, XP002516284 ISSN: 1434-5161 *
XIE ET AL.: "Loss of Smad Signaling in Human Colorectal Cancer Is associated with Advanced Disease and Poor Prognosis" 19000101, vol. 9, no. 4, 1 July 2003 (2003-07-01), pages 302-312, XP008102550 *
ZANKE BRENT W ET AL: "Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24" NATURE GENETICS, NATURE PUBLISHING GROUP, NEW YORK, US, vol. 39, no. 8, 1 August 2007 (2007-08-01), pages 989-994, XP002509358 ISSN: 1061-4036 [retrieved on 2007-07-08] cited in the application *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108531601A (en) * 2018-05-29 2018-09-14 成都中创清科医学检验所有限公司 A kind of primer and detection method for detecting the relevant SNP site of liver cancer susceptibility
WO2023154770A1 (en) * 2022-02-09 2023-08-17 9 Meters Biopharma, Inc. Compositions and methods for inhibiting rho kinase
CN116949179A (en) * 2023-07-04 2023-10-27 中国医学科学院北京协和医院 Colorectal tumor polygene genetic risk scoring system, storage medium and electronic device

Also Published As

Publication number Publication date
WO2009047532A3 (en) 2009-07-02

Similar Documents

Publication Publication Date Title
Kiemeney et al. A sequence variant at 4p16. 3 confers susceptibility to urinary bladder cancer
US8735076B2 (en) Targets for use in diagnosis, prognosis and therapy of cancer
Tomlinson et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24. 21
COGENT Study Group 1 Houlston Richard S richard. houlston@ icr. ac. uk 2 b Webb Emily 2 Broderick Peter 2 Pittman Alan M 2 Di Bernardo Maria Chiara 2 Lubbe Steven 2 Chandler Ian 2 Vijayakrishnan Jayaram 2 Sullivan Kate 2 Penegar Steven 2 Colorectal Cancer Association Study Consortium et al. Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer
Pomerantz et al. Evaluation of the 8q24 prostate cancer risk locus and MYC expression
Pittman et al. The colorectal cancer risk at 18q21 is caused by a novel variant altering SMAD7 expression
Cunnington et al. Chromosome 9p21 SNPs associated with multiple disease phenotypes correlate with ANRIL expression
EP2663656B1 (en) Genetic variants as markers for use in urinary bladder cancer risk assessment
US20120122698A1 (en) Genetic Variants Predictive of Cancer Risk in Humans
US8865400B2 (en) Genetic variants contributing to risk of prostate cancer
Saenko et al. Genetic polymorphism predisposing to differentiated thyroid cancer: a review of major findings of the genome-wide association studies
US20100129799A1 (en) Cancer susceptibility variants on chr8q24.21
US20110269143A1 (en) Genetic Variants as Markers for Use in Urinary Bladder Cancer Risk Assessment, Diagnosis, Prognosis and Treatment
DiCioccio et al. STK15 polymorphisms and association with risk of invasive ovarian cancer
WO2010018601A2 (en) Genetic variants predictive of cancer risk
EP2164984A2 (en) Genetic variants on chr 5pl2 and 10q26 as markers for use in breast cancer risk assessment, diagnosis, prognosis and treatment
AU2009321163A1 (en) Genetic variants useful for risk assessment of thyroid cancer
AU2009269542A1 (en) Genetic variants for breast cancer risk assessment
IL201217A (en) Method for determining and identifying a marker for assessing susceptibility to breast cancer
US20110020320A1 (en) Genetic Variants Contributing to Risk of Prostate Cancer
US20140065615A1 (en) The KRAS Variant and Tumor Biology
Pajic et al. ABCC1 G2012T single nucleotide polymorphism is associated with patient outcome in primary neuroblastoma and altered stability of the ABCC1 gene transcript
NZ563913A (en) Markers for prostate cancer in LD Block A on chromosome 8q24.21
Dobrijević et al. Association between genetic variants in DICER1 and cancer risk: an updated meta-analysis
JP2013212052A (en) Kras variant and tumor biology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08806586

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08806586

Country of ref document: EP

Kind code of ref document: A2