GB2549763A - Biomarkers for early diagnosis of ovarian cancer - Google Patents

Biomarkers for early diagnosis of ovarian cancer Download PDF

Info

Publication number
GB2549763A
GB2549763A GB1607393.4A GB201607393A GB2549763A GB 2549763 A GB2549763 A GB 2549763A GB 201607393 A GB201607393 A GB 201607393A GB 2549763 A GB2549763 A GB 2549763A
Authority
GB
United Kingdom
Prior art keywords
ovarian cancer
mutation
seq
subject
expression level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1607393.4A
Other versions
GB201607393D0 (en
Inventor
Ashour Ahmed Ahmed
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oxford University Innovation Ltd
Original Assignee
Oxford University Innovation Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oxford University Innovation Ltd filed Critical Oxford University Innovation Ltd
Priority to GB1607393.4A priority Critical patent/GB2549763A/en
Publication of GB201607393D0 publication Critical patent/GB201607393D0/en
Publication of GB2549763A publication Critical patent/GB2549763A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57449Specifically defined cancers of ovaries
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/50Determining the risk of developing a disease
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/52Predicting or monitoring the response to treatment, e.g. for selection of therapy based on assay results in personalised medicine; Prognosis

Abstract

One or more biomarkers (as listed in Table A at pages 4-22) are useful for predicting or identifying an increased risk of ovarian cancer. For instance, there is an expansion of SOX2 expressing cells in the normal fallopian tubes of subjects with ovarian cancer. The expression of other genes listed in Table A is found to be responsive to SOX2 ectopic expression. SOX2 and the other genes in Table A can act as biomarkers for the prediction and early diagnosis of ovarian cancer. A method for predicting ovarian cancer or identifying an increased risk of developing ovarian cancer in a subject comprises determining the expression level of one or more biomarkers from Table A. The determined expression level is used to predict ovarian cancer or identify an increased risk of developing ovarian cancer.

Description

BIOMARKERS FOR EARLY DIAGNOSIS OF OVARIAN CANCER FIELD OF THE INVENTION
The present invention relates to ovarian cancer. Provided are methods for predicting and diagnosing ovarian cancer which rely upon biomarkers including SOX2. Systems, devices or kits useful in the methods are also envisaged, together with (prophylactic) methods of treatment.
BACKGROUND OF THE INVENTION
Ovarian cancer is one of the most fatal malignancies because of late presentation and chemotherapy resistance. While emerging evidence suggests extensive tumour heterogeneity as an underlying mechanism of resistance, the clinical significance of subclonal mutations in determining tumour behaviour has been difficult to assess.
With the exception of ubiquitous mutations of TP53 (Ahmed, A. A. et al. J Pathol 221,49-56, doi:10.1002/path.2696 (2010)), driver coding mutations of individual genes in high grade serous ovarian cancers (HGSOCs) are relatively uncommon. Moreover, whether non-coding genomic variations or mutations contribute to the genesis of HGSOCs has remained unkown. Such non-coding alterations have been shown to drive tumour initiation and progression. For example, recurring mutations in the promoter of telomerase reverse transcriptase (TERT) has been previously described in many solid tumours (Huang, F. W. et al. Science 339, 957-959, doi:science. 1229259 [pii]10.1126/science.1229259 (2013); Horn, S. et al. Science 339, 959-961, doi:science. 1230062 [pii] 10.1126/science. 1230062 (2013); Vinagre, J. et al. Nat Commun 4, 2185, doi:ncomms3185 [pii]10.1038/ncomms3185 (2013)). A plausible approach to identify such driver non-coding mutations is to perform multidimensional analyisis of large cohorts of samples from individual tumour types (Nature 474, 609-615, doi:nature10166 [pii] 10.1038/nature10166 (2011)). Alternatively, recent studies have suggested that focussed analyses of ultra-deep sequence data (Nik-Zainal, S. et al. Cell 149, 994-1007, doi:S0092-8674(12)00527-2 [pii] 10.1016/j.cell.2012.04.023 (2012)), multiregion sampling (Gerlinger, M. et al. Nat Genet 46, 225-233, doi:ng.2891 [pii]10.1038/ng.2891 (2014)) or temporal sampling (Schuh, A. et al. Blood 120, 4191-4196, doi:blood-2012-05-433540 [pii] 10.1182/blood-2012-05-433540 (2012)) may reveal insights into the evolution of mutations and identify core mutations that are shared spatially and temporally. However, because of the challenges in obtaining clinical data that describes biological behaviour (e.g response to chemotherapy) at individual sites within the same tumour, it has remained difficult to evaluate the relationship between genomic alteration and tumour biology in patients.
DESCRIPTION OF THE INVENTION
The present invention is based upon utilization of a novel strategy for evaluating the genetic basis of ovarian cancer chemotherapy response and tumour initiation. Using digital analysis of extensive whole genome sequencing (WGS) the present inventors compared extreme chemotherapy-resistant and chemotherapy-sensitive sites from a single tumour. By tracking mutations to a pre-neoplastic lesion in this tumour, they identified a SOX2 distal repressor 40 Kb region that is mutated in 23% of high-grade serous ovarian cancers (HGSOCs). Under physiological conditions EZH2 and MYC occupy elements in this region and are required for suppression of SOX2 expression in the fallopian tube epithelium (FTE), a common site of origin of HGS pelvic cancers. Mutating a repressor element in that region interfered with EZH2 and MYC occupancy and induces SOX2 expression. The present inventors have shown that consequent expansion of SOX2-expressing FTE cells is a ubiquitous feature of HGSOCs. In addition, the present inventors have identified genes that are differentially expressed following SOX2 ectopic expression. These results have important implications for early detection and for circumventing chemotherapy resistance in ovarian cancer.
Thus, in a first aspect the invention provides a method for predicting ovarian cancer or identifying an increased risk of developing ovarian cancer in a subject comprising determining the expression level of one or more biomarkers from Table A wherein the determined expression level is used to predict ovarian cancer or identify an increased risk of developing ovarian cancer.
By predicting is meant making a determination that a subject who at the time of testing does not have ovarian cancer is at an increased risk of developing ovarian cancer. The increased risk may be a risk higher than the average risk for the population. The increased risk may be a risk above a pre-calculated threshold level. The threshold level may be the point above which the benefits of increased monitoring and/or prophylactic treatment outweigh the negatives of potentially unnecessary intervention. The increased risk may be a percentage lifetime risk of greater than 1.5%, greater than 2%, greater than 5%, greater than 10%, greater than 50% or greater than 75%. The subject with an increased risk of developing ovarian cancer may have pre-cancerous (pre-neoplastic) lesions..
In a further aspect the invention provides a method for diagnosing ovarian cancer in a subject comprising: determining the expression level of one or more biomarkers from Table A wherein the determined expression level is used to diagnose ovarian cancer in the subject.
By diagnosing is meant determining that a subject has ovarian cancer at the time of testing.
In certain embodiments the ovarian cancer is at a (very) early stage. The ovarian cancer may be stage 1 (stage 1a, 1b and/or 1c), stage 2 (2a, 2b and/or 2c), stage 3 (3a, 3b and/or 3c) or stage 4 by the FIGO system. The cancer may be Grade 1,2 or 3. By very early is meant stage 1 and by early is meant stage 1 or 2.
In yet a further aspect, the present invention relates to a method for predicting ovarian cancer or identifying an increased risk of developing ovarian cancer or diagnosing ovarian cancer in a subject comprising: a. applying a specific binding agent that can specifically bind to a biomarker from Table A to a sample obtained from the subject b. applying a detection agent that detects the specific binding agent-biomarker complex c. using the detection agent to determine the number of cells that express the biomarker wherein the determined number of cells is used to predict ovarian cancer or identify an increased risk of developing ovarian cancer ordiagnose ovarian cancer in the subject.
According to all aspects of the invention the specific binding agent may be an antibody or an aptamer or a protein or a peptide. Preferably, the specific binding agent is an antibody or an aptamer.
Table A provides a list of differentially expressed genes following SOX2 ectopic expression.
Any one or more of these genes may be used in the present invention because they reflect expression of SOX2 which has been shown experimentally to be a useful predictive and/or diagnostic marker.
The biomarkers from Table A may be grouped into functional categories as shown in Table B:
Table B
In certain embodiments the one or more biomarkers from Table A are one or more of PECAM1 ,FBN2, KLKB1, EMILIN1, MMP7, JAM2, ITGA7, FBLN5, LTBP1, COL4A2, LAM A3, PLEC, PDGFB, ACTN1, ITGB4, SDC4, FBLN2, SPARC, COL4A1, FGF2, COL5A1, CDH1, LAMB3, ITGB3, COL1A1, ITGA3, COL7A1, FN1, TGFB2, SERPINE1 and TNC.
In certain embodiments the one or more biomarkers from Table A are one or more of COL4A2, LAM A3, PDGFB, ACTN1, SDC4, COL4A1, FGF2, COL5A1, ITGB3, COL1A1, FN1 and TNC.
In certain embodiments the one or more biomarkers from Table A are one or more of ACTN1, SDC4, FGF2, COL5A1, ITGB3, COL1A1, FN1 and TNC.
In certain embodiments the one or more biomarkers from Table A are one or more of ITGA7, COL4A2, LAM A3, SPARC, COL4A1, COL5A1, ITGB3, COL1A1, FN1, TGFB2, SERPINE1 and TNC.
In certain embodiments the one or more biomarkers from Table A are one or more of IGF2, CD36, PECAM1, SCG3, PROS1, PDGFB, ACTN1, SPARC, FLNA, ITGB3, FN1, TGFB2 and SERPINE1.
In certain embodiments the one or more biomarkers from Table A are one or more of ITGA7, COL4A2, LAM A3, ITGB4, COL4A1, LAMB3, ITGA3 and COL7A1.
In certain embodiments the one or more biomarkers from Table A are one or more of COL4A2, LAM A3, COL4A1, LAMB3, COL1A1 and COL7A1.
In certain embodiments the one or more biomarkers from Table A are one or more of IGF2, CD36, PECAM1, SCG3, PROS1, PDGFB, ACTN1, SPARC, FLNA, ITGB3, FN1, TGFB2 and SERPINE1.
In certain embodiments the one or more biomarkers from Table A are one or more of IGF2, CD36, PECAM1, SCG3, VAV3, PROS1, PDGFB, ACTN1, SPARC, F2R, FLNA, MGLL, RAPGEF3, F2RL2, ITGB3, COL1A1, FN1, TGFB2, APBB1 IP and SERPINE1.
In certain embodiments the one or more biomarkers from Table A are one or more of COL4A2, LAM A3, PLEC, ITGB4, COL4A1, COL5A1, LAMB3, COL1A1 and COL7A1.
In certain embodiments the one or more biomarkers from Table A are one or more of FBN2, EMILIN1, FBLN5, LTBP1, FBLN2, ITGB3, FN1 and TGFB2.
In certain embodiments the one or more biomarkers from Table A are one or more of IGF2, CD36, SELE, GATA3, PECAM1, SCG3, PDE3A, MYB, KLKB1, JAM2, VAV3, PROS1, PDGFB, ACTN1, SPARC, F2R, THBD, FLNA, SERPINE2, KCNMA1, MGLL, EHD2, TFPI, RAPGEF3, F3, F2RL2, ITGB3, COL1A1, ITGA3, FN1, TGFB2, APBB1 IP and SERPINE1.
In certain embodiments the one or more biomarkers from Table A are one or more of PECAM1, JAM2, COL4A2, COL4A1, COL5A1, CDH1, ITGB3, COL1A1, ITGA3, COL7A1, FN1 and TNC.
In certain embodiments the one or more biomarkers from Table A are one or more of FBN2, EMILIN1, FBLN5, LTBP1, FBLN2, ITGB3, FN1 and TGFB2.
In certain embodiments the one or more biomarkers from Table A are one or more of KLKB1, PROS1, F2R, THBD, SERPINE2, TFPI and F3.
In certain embodiments the one or more biomarkers from Table A are one or more of LAM A3, PLEC, ITGB4 and LAMB3.
In certain embodiments the one or more biomarkers from Table A are one or more of FBN2, KLKB1, MMP7, COL4A2, LAM A3, COL4A1, COL5A1, CDH1, LAMB3, COL1A1, COL7A1 and FN1.
In certain embodiments the one or more biomarkers from Table A are one or more of ACVR1C, FST, FSTL3 and INHBA.
In certain embodiments the one or more biomarkers from Table A are one or more of COL4A2, LAM A3, PLEC, ITGB4, COL4A1, COL5A1, LAMB3, COL1A1 and COL7A1.
In certain embodiments the one or more biomarkers from Table A are one or more of CDH10, CLDN20, LAM A3, PLEC, ACTN1, ITGB4, FLNA, CDH1 and LAMB3.
In certain embodiments the one or more biomarkers from Table A are located at the cell membrane and/or cell junction and/or are secreted as shown in Table C:
Table C
Ovarian cancer may arise from the fallopian tube. Accordingly, the expression level may be determined in fallopian tube tissue and/or fallopian tube cells and/or material derived therefrom. In certain embodiments the fallopian tube tissue and/or fallopian tube cells and/or material derived therefrom comprise normal (i.e. non-cancerous) and/or pre-cancerous and/or cancerous cells. By pre-cancerous is meant a (morphologically) atypical state (such as disordered tissue organization or disrupted cell morphology) that is associated with an increased risk of cancer. The expression level may be determined in the fallopian tube epithelium. In specific embodiments the fallopian tube cells comprise or are ciliated fallopian tube cells. According to all aspects of the invention ciliated fallopian tube cells may be identified using a marker that allows ciliated (fallopian tube) cells to be discriminated from other cell types. Such a marker may be any appropriate specific binding agent, such as an antibody that binds specifically to ciliated fallopian tube cells or an antibody that allows the cilia to be visualized. In some embodiments, the specific binding agent targets beta tubulin or CRMP2 in order to identify ciliated cells. In further embodiments the fallopian tube tissue and/or fallopian tube cells are derived from a specific segment of the fallopian tube. This may include or be the isthmus (the narrower part of the tube that links to the uterus). In certain embodiments the expression level is determined in ovarian tissue and/or ovarian cells and/or material derived therefrom.
According to all aspects of the invention the expression level may be determined in a sample obtained from the subject and thus performed entirely in vitro. The method may further comprise obtaining a sample from the subject. The sample may thus comprise, consist essentially of or consist of ovarian and/or fallopian tube tissue and/or ovarian and/or fallopian tube cells and/or material derived therefrom.
According to all aspects of the invention the sample may be obtained by any suitable technique. Examples include a biopsy procedure, optionally a fine needle aspirate biopsy procedure. In certain embodiments (where the sample comprises fallopian tube cells) the sample is obtained using a tool for acquiring fallopian tube cells in vivo by exfoliative cytology. The tool may be a cytology brush (Haeusler et al., Fertility and Sterility Vol. 67 No. 3 580 (1997); WO2012/170380). In further embodiments the sample is obtained by washing out the lumen of the fallopian tube(s) (Matsushima et al., J Nippon Med Sch 69 5 445 (2002)). Physiological saline may be injected into a fallopian tube using a catheter passed through a hysteroscope. The saline solution containing released fallopian tube cells may then be retrieved, for example using a pouch.
Alternatively, according to all aspects of the invention, the expression level may be determined in situ. By in situ is meant that the expression level is determined in the subject without extracting a sample. The expression level may be determined in ovarian and/or fallopian tube tissue.
The methods may comprise comparing the expression level to a reference value or to a control. In certain embodiments the control represents the level of one or more biomarkers from Table A in a comparable (sample from a) subject with no ovarian cancer or no increased risk of developing ovarian cancer. The comparable subject with no increased risk of developing ovarian cancer may be negative for risk factors for ovarian cancer, for example they may not have mutations in either of the gene BRCA1 or BRCA2. One or more positive controls (subjects or samples from subjects with ovarian cancer or an increased risk of developing ovarian cancer) may be used and/or one or more negative controls (subjects or samples from subjects with no ovarian cancer or no increased risk of developing ovarian cancer). Methods for identifying a comparable subject are known in the art. Matching may be on the basis of known parameters such as sex, age etc.
The reference value may be a threshold level of expression of one or more biomarkers from Table A set by determining the level or levels in a range of (samples from) subjects with and without ovarian cancer or an increased risk of developing ovarian cancer. The range of (samples from) subjects may be representative of the population as a whole. Suitable methods for setting a threshold are well known to those skilled in the art. The threshold may be mathematically derived from a training set of subject data. The score threshold thus separates the test samples according to presence or absence of the particular condition.
The interpretation of this quantity, i.e. the cut-off threshold may be derived in a development or training phase from a set of subjects with known outcome. The threshold may therefore be fixed prior to performance of the claimed methods from training data by methods known to those skilled in the art.
In certain embodiments the reference value represents a threshold level of expression set by determining the level of expression at a first time point. The determined levels of expression at later time points for the same subject are then compared to the threshold level. Thus, the methods of the invention may be used in order to monitor progression in a subject, namely to provide an ongoing assessment of risk of developing ovarian cancer in the subject. For example, the methods may be used to identify a level of risk that requires that (preventative) treatment be considered. This may be used to guide treatment decisions as discussed in further detail herein. If the determined levels of expression at later time points show a trend that is indicative of an increasing level of risk but has not yet reached the level when treatment should be considered, the subject may have expression levels measured at increasingly frequent time points.
In certain embodiments an increased expression level of one or more of genes 1 to 338 from Table A (relative to subjects without ovarian cancer or to a reference value) indicates that the subject has ovarian cancer. In further embodiments an increased expression level of one or more of genes 1 to 338 from Table A (relative to subjects with no prediction of ovarian cancer or no increased risk of developing ovarian cancer or to a reference value) predicts ovarian cancer or indicates that the subject has an increased risk of developing ovarian cancer. The increased expression level may be an increased number of cells that (strongly) express one or more of genes 1 to 338 from Table A and/or an increased level of expression within individual cells. Accordingly, in certain embodiments an increased number of cells that express one or more of genes 1 to 338 from Table A predicts ovarian cancer or indicates that the subject has an increased risk of developing ovarian cancer. The cells may be fallopian tube cells, optionally ciliated fallopian tube cells.
In certain embodiments a decreased expression level of one or more of genes 339 to 628 from Table A (relative to subjects without ovarian cancer or to a reference value) indicates that the subject has ovarian cancer. In further embodiments a decreased expression level of one or more of genes 339 to 628 from Table A (relative to subjects with no prediction of ovarian cancer or no increased risk of developing ovarian cancer or to a reference value) predicts ovarian cancer or indicates that the subject has an increased risk of developing ovarian cancer. The decreased expression level may be a decreased number of cells that (strongly) express one or more of genes 339 to 628 from Table A and/or a decreased level of expression within individual cells. Accordingly, in certain embodiments a decreased number of cells that express one or more of genes 339 to 628 from Table A predicts ovarian cancer or indicates that the subject has an increased risk of developing ovarian cancer. The cells may be fallopian tube cells, optionally ciliated fallopian tube cells. Since MYC/EZH2 act as repressors of SOX2, both markers (i.e. one or both of MYC and EZH2 together with SOX2) may be measured according to the methods. An imbalance in the ratio or proportions of MYC/EZH2 to SOX2 may provide an indication, prediction or increased risk of developing ovarian cancer. SOX2 is a transcription factor and thus there are a number of biomarkers whose expression may be used as a surrogate of SOX2 expression. Examples are listed in Table A. The biomarker indicative of the expression level of SOX2 may be a surface biomarker to facilitate detection. For example, the biomarker may comprise or be CD133. CD133 is a downstream marker of SOX2 expression (Cox JL, Wilder PJ, Desler M, Rizzino A. PLoS One. 2012;7(8):e44087. doi: 10.1371 /journal.pone.0044087. Epub 2012 Aug 28. PubMed PMID: 22937156; PubMed Central PMCID: PMC3429438; lida H, Suzuki M, Goitsuka R, Ueno H.
Int J Oncol. 2012 Jan;40(1):71-9. doi: 10.3892/ijo.2011.1207. Epub 2011 Sep 22. PubMed PMID:21947321). The biomarker indicative of the expression level of SOX2 may also be a repressor of SOX2 expression. In certain embodiments the biomarker may comprise or be EZH2 or KDM5B.
According to all aspects of the invention the expression level may be measured by any suitable method. In certain embodiments the expression level is determined at the level of protein or RNA.
The expression level may be determined by immunocytochemistry or immunohistochemistry. These techniques involve the detection of proteins in cells or tissue by using a binding reagent such as an antibody, peptide or aptamer that binds specifically to the protein.
In certain embodiments the expression level is determined using an antibody, peptide or aptamer conjugated to a label. By label is meant a component that permits detection, directly or indirectly. For example, the label may be an enzyme, optionally a peroxidase, or a fluorophore. A label is an example of a detection agent. By detection agent is meant an agent that may be used to assist in the detection of the antibody, or aptamer-protein complex. Where the antibody, or aptamer is conjugated to an enzyme the detection agent may be or further comprise a chemical composition such that the enzyme catalyses a chemical reaction to produce a detectable product. The products of reactions catalyzed by appropriate enzymes can be, without limitation, fluorescent, luminescent, or radioactive or they may absorb visible or ultraviolet light. Examples of detectors suitable for detecting such detectable labels include, without limitation, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers. In certain embodiments the detection agent may comprise a secondary antibody. The expression level is then determined using an unlabeled primary antibody that binds to the target protein and a secondary antibody conjugated to a label, wherein the secondary antibody binds to the primary antibody.
Additional techniques for determining expression level at the level of protein include, for example, Western blot, immunoprecipitation, mass spectrometry, ELISA, flow cytometry and others (see ImmunoAssay: A Practical Guide, edited by Brian Law, published by Taylor & Francis, Ltd., 2005 edition). To improve specificity and sensitivity of an assay method based on immunoreactivity, monoclonal antibodies are often used because of their specific epitope recognition. Polyclonal antibodies have also been successfully used in various immunoassays because of their increased affinity for the target as compared to monoclonal antibodies.
Measuring mRNA in a biological sample may be used as a surrogate for detection of the level of the corresponding protein in the biological sample. Thus, the expression level can also be determined at the level of RNA.
Accordingly, in specific embodiments the expression level is determined by microarray, northern blotting, sequencing (such as next generation sequencing, including RNAseq) or nucleic acid amplification. Nucleic acid amplification includes PCR and all variants thereof such as real-time and end point methods and quantitative (qPCR). Other nucleic acid amplification techniques are well known in the art, and include methods such as NASBA, 3SR and Transcription Mediated Amplification (TMA). Other suitable amplification methods include the ligase chain reaction (LCR), selective amplification of target polynucleotide sequences (US Patent No. 6,410,276), consensus sequence primed polymerase chain reaction (US Patent No 4,437,975), arbitrarily primed polymerase chain reaction (WO 90/06995), invader technology, strand displacement technology, and nick displacement amplification (WO 2004/067726). This list is not intended to be exhaustive; any nucleic acid amplification technique may be used provided the appropriate nucleic acid product is specifically amplified. Design of suitable primers and/or probes is within the capability of one skilled in the art. Various primer design tools are freely available to assist in this process such as the NCBI Primer-BLAST tool. Primers and/or probes may be at least 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 or 25 (or more) nucleotides in length. mRNA expression levels may be measured by reverse transcription quantitative polymerase chain reaction (RT-PCR followed with qPCR). RT-PCR is used to create a cDNA from the mRNA. The cDNA may be used in a qPCR assay to produce fluorescence as the DNA amplification process progresses. By comparison to a standard curve, qPCR can produce an absolute measurement such as number of copies of mRNA per cell. Northern blots, microarrays, Invader assays, and RT-PCR combined with capillary electrophoresis have all been used to measure expression levels of mRNA in a sample. See Gene Expression Profiling: Methods and Protocols, Richard A. Shimkets, editor, Humana Press, 2004.
The methods described herein may further comprise extracting total nucleic acid or RNA from the sample. Suitable methods are known in the art and include use of commercially available kits such as Rneasy and GeneJET RNA purification kit.
The expression level may be determined using an (in vivo) imaging technique. In certain embodiments the imaging technique comprises an implantable biosensor or photonic device. The implantable biosensor or photonic device may be operated remotely (extra-corporal operation). It may remain long term in a patient. At regular intervals, a labelled (for example fluorescently-tagged) antibody, peptide or aptamer may be administered (for example, intravenously) and the implantable biosensor or photonic device operated (remotely) before and at specific times after injection. The images may then be transferred outside the body to be analysed on a computer. In certain embodiments, the labelled-antibody, peptide or aptamer may be specific for a surface biomarker reflective of SOX2 expression, preferably CD133. Implantable devices are discussed, for example, in Implantable Fluorescence Sensor for Continuous Molecular Monitoring in Live Animals, A Dissertation Submitted to the Department of Electrical Engineering and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy, Thomas Daniel O’Sullivan and O’Sullivan et al., Optics Express Vol. 18 No. 12 12513 (2010); Ohta et al., Sensors and Materials, Vol. 23, No. 7 369-379 (2011))
In further embodiments the imaging technique comprises Positron Emission Tomography (PET). PET involves detection of pairs of gamma rays emitted indirectly by a positron-emitting radionuclide (tracer), which is introduced into the body on a biologically active molecule (van Kruchten et al., The Journal of Nuclear Medicine Vol. 53 No. 2 182 (2012); van Kruchten et al., Lancet Oncol 14 e465-75 (2013)). Images of tracer concentration within the body are then constructed by computer analysis. Thus, analysis of PET data is carried out by computer away from the body. The biologically active molecule may be an antibody, peptide or aptamer. In certain embodiments, the antibody, peptide or aptamer may be specific for a surface biomarker reflective of SOX2 expression, preferably CD133. Any suitable tracer may be utilized; for example, the tracer may comprise or be Technetium.
In further embodiments the imaging technique comprises falloposcopy. By falloposcopy is meant the inspection of the fallopian tubes through a micro- endoscope (Kerin et al., Journal of Laparoendoscopic Surgery Vol. 1 No. 1 47(1990); Bauer et al. Human Reproduction vol. 7 suppl. 1 p 7-11 (1992)). This may be combined with use of a labelled (for example fluorescently-tagged) antibody, peptide or aptamer. In certain embodiments, the labelled-antibody, peptide or aptamer may be specific for a surface biomarker reflective of SOX2 expression, preferably CD133.
The expression level may be determined by testing the immune response of the subject to the one or more biomarkers. The immune response may be tested by obtaining a blood sample from the subject and incubating lymphocytes from the blood sample with synthetic peptides covering the length of the one or more biomarkers. Lymphocytes from a subject with increased expression of the one or more biomarkers may show reactivity (increased proliferation or activation) to one or more of the peptides when lymphocytes from a control subject without an increased expression of the one or more biomarkers do not. This is particularly relevant where the one or more biomarkers is not normally expressed during adult life and an immune reaction to the increased expression is developed. If an immune response is detected further tests (such as the imaging techniques described above or analysis of samples from the fallopian tube) may then be used to determine the location of the increased expression of the one or more biomarkers. BRCA1 and BRCA2 are human tumor suppressor genes. Mutations in BRCA1 and BRCA2 may result in damaged DNA not being repaired properly, and this increases the risk of breast, ovarian, fallopian tube, and prostate cancers. Women with such mutated BRCA1 or BRCA2 genes have an increased risk of developing ovarian cancer of about 55% for women with BRCA1 mutations and about 25% for women with BRCA2 mutations (“Genetics". Breastcancer.org. 2012-09-17). Usually the mutated genes make a protein product that does not function properly. For women with mutated BRCA1 and BRCA2 genes a method of predicting which will develop ovarian cancer or have a particularly heightened risk of developing ovarian cancer is, therefore, particularly important. Likewise, if such women do develop ovarian cancer and it is diagnosed at an early stage then their chances of long-term survival are much improved. Accordingly, in certain embodiments the subject has a mutation in either of the genes BRCA1 or BRCA2 that gives them an increased risk of developing ovarian cancer relative to a subject without such a mutation in BRCA1 or BRCA2. The methods of the invention may also be used to identify subjects that have an increased risk of ovarian cancer or have or will develop ovarian cancer, wherein the cancer is unconnected to mutation in BRCA1 or BRCA2. Such subjects may have no reason to suspect that they have an increased risk and may, therefore, be less aware of symptoms. Accordingly, in further embodiments the subject does not have a mutation in either of the genes BRCA1 or BRCA2 that gives them an increased risk of developing ovarian cancer.
According to all aspects of the invention the ovarian cancer may be high grade serous ovarian cancer.
In a further aspect, the present invention relates to an antibody, aptamer or peptide that binds specifically to a biomarker from Table A for use in a method of predicting ovarian cancer, identifying an increased risk of developing ovarian cancer or diagnosing ovarian cancer in a subject. In certain embodiments the antibody, aptamer or peptide is conjugated to a label.
According to all aspects of the invention the antibody may be of monoclonal or polyclonal origin. Typically, the antibody is an IgG immunoglobulin isotype. Antigen-binding fragments and antibody derivatives may also be utilised, to include without limitation Fab fragments, ScFv, single domain antibodies, nanoantibodies, heavy chain antibodies, chimeric antibody fusions etc. which retain antigen-specific binding function and these are included in the definition of “antibody”. Methods for generating specific antibodies are known to those skilled in the art. Antibodies may be of human or non-human origin (e.g. rodent, such as rat or mouse) and be humanized etc. according to known techniques (Jones etal., Nature (1986) May 29-Jun. 4;321 (6069):522-5; Roguska etal., Protein Engineering, 1996, 9(10):895-904; and Studnicka etal., Humanizing Mouse Antibody Frameworks While Preserving 3-D Structure. Protein Engineering, 1994, Vol.7, pg 805).
In all aspects of the invention the aptamer may be a nucleic acid molecule, to include natural nucleotides and derivatives/analogues and non-natural nucleotides, or a peptide molecule, to include natural amino acids and derivatives/analogues and non-natural amino acids bonded with standard or non-standard peptide bonds, or one or more operably connected nucleic acid molecules and/or peptide molecules. In specific embodiments the aptamer is a DNA, RNA or XNA aptamer. Nucleic acid aptamer selection can be made using methods known to those skilled in the art, for example using in vitro selection or SELEX. In certain embodiments the aptamer is a peptide aptamer comprising a variable peptide domain, attached at both ends to a protein scaffold. The variable peptide domain may be up to 30 amino acids long, preferably 5 to 25 amino acids long, more preferably 10 to 20 amino acids long. The scaffold may be any protein which has appropriate solubility and compacity properties, for example the bacterial protein Thioredoxin-A, Peptide aptamer selection can be made using methods known to those skilled in the art, for example using the yeast two-hybrid system.
By “specifically bind” it is meant that an antibody binds to an epitope via its antigen binding domain, and that the binding entails some complementarity between the antigen binding domain and the epitope. The term is well known in the art. Accordingly, an antibody specifically binds to an epitope when it binds to that epitope, via its antigen binding domain more readily than it would bind to a random, unrelated epitope. Likewise, a specific binding agent such as an aptamer, protein or peptide specifically binds to a target molecule when if binds to that molecule more readily than it would to a random, unrelated molecule. The dissociation constant or Kd may be iess than 5χ 10 2 Μ, 10'2 Μ, 5χ1 O'3 M, 10'3 Μ, 5χ10‘4 M, 1Ο”4 Μ, 5χ1Ο”5 M, 10”s Μ, 5χ10~6 M, 1Ο”8 Μ, 5χ 1Q”7 M, 1Ο”7 Μ, 5χ 10 '8 M, 10 '8 Μ, 5χ 1Q”9 M, 10”9M, 5x10”10M, 10'1CM, 5x10”" M, 10 11 Μ, 5χ1”12Μ, 10'12M, 5χ10”13Μ, 10 '13M, 5χ10”14Μ, 10”14M, 5χ10”15 M, or 1G”15M.
The methods of the invention may be used to inform treatment decisions. Thus, according to a further aspect of the invention there is provided a method of treating a subject with a prediction of ovarian cancer or an increased risk of developing ovarian cancer or with a diagnosis of ovarian cancer comprising one or more of: a. removing one or more (affected) fallopian tubes from the subject b. treatment with a therapeutic agent (such as a biologic, optionally an antibody and/or a vaccine, and/or a small molecule inhibitor) wherein the subject is selected for treatment on the basis of a method as described herein.
According to a further aspect of the invention there is provided a therapeutic agent (such as a biologic, optionally an antibody and/or vaccine, and/or a small molecule inhibitor) for use in a method of treating a subject with a prediction of ovarian cancer or an increased risk of developing ovarian cancer or with a diagnosis of ovarian cancer wherein the subject is selected for treatment on the basis of a method as claimed in any previous claim.
In a further aspect the invention provides a method of treating a subject with a prediction of ovarian cancer or an increased risk of developing ovarian cancer or with a diagnosis of ovarian cancer wherein the subject has an increased expression level of one or more of genes 1 to 338 from Table A and/or a decreased expression level of one or more of genes 339 to 628 from Table A comprising one or more of: a. removing one or more (affected) fallopian tubes from the subject b. treatment with a therapeutic agent (such as a biologic, optionally an antibody and/or a vaccine, and/or a small molecule inhibitor)
As an alternative subjects may be treated with an epigenetic modifier that re-induces SOX2 repression. The epigenetic modifier may be USP22 (Sussman et al., J Biol Chem 288(33) 24234-46 (2013)). Subjects may also be treated with MYC and/or EZH2. Subjects may also be treated with antagonists of SOX2, preferably an antibody that binds specifically to SOX2 and interferes with its function or antisense RNA, siRNA or RNAi targeting SOX2. In certain embodiments the therapeutic agent, epigenetic modifier or antagonist is targeted to fallopian tube cells, optionally the fallopian tube epithelium or ciliated fallopian tube cells. Targeting may be achieved by fusion to a specific binding agent specific for these cells.
In a further aspect the invention provides a therapeutic agent (such as a biologic, optionally an antibody and/or vaccine, and/or a small molecule inhibitor) for use in a method of treating a subject with a prediction of ovarian cancer or an increased risk of developing ovarian cancer or with a diagnosis of ovarian cancer wherein the subject has an increased expression level of one or more of genes 1 to 338 from Table A and/or a decreased expression level of one or more of genes 339 to 628 from Table A.
According to a further aspect of the invention there is provided a method for selecting a treatment for a subject comprising a. determining the expression level of one or more biomarkers from Table A in a sample from the subject wherein the determined expression level is used to predict ovarian cancer or identify an increased risk of developing ovarian cancer or diagnose ovarian cancer b. selecting a treatment appropriate to the prediction, increased risk or diagnosis of ovarian cancer and c. treating the subject with the selected treatment.
If the subject has a prediction of ovarian cancer, an increased risk of developing ovarian cancer or a diagnosis of ovarian cancer the treatment selected may be one or more of: a. removing one or more (affected) fallopian tubes from the subject b. a therapeutic agent (such as a biologic, optionally an antibody and/or a vaccine, and/or a small molecule inhibitor).
In certain embodiments if the expression level of IGF2 is increased the treatment selected is a small molecule inhibitor of the interaction between IGF2 and the IGF1 receptor.
In certain embodiments the treatment selected is an antagonist or inhibitor of one or more of the (protein products of the) biomarkers in Table A. The treatment selected may be an antagonist or inhibitor of one or more of the (protein products of the) biomarkers 1 to 338 from Table A. In certain embodiments the treatment selected is a (small molecule) inhibitor of the kinase SIK2. This treatment may be administered if the expression level of SIK2 is increased.
In certain embodiments the antibody can specifically bind to one of the biomarkers from Table A.
By biologic is meant a pharmaceutical drug product manufactured in, extracted from, or semi-synthesized from a biological source. Examples include vaccines, blood, or blood components, therapeutic cells, nucleic acids, sugars, tissues, and recombinant proteins. In certain embodiments the vaccine may be a (recombinant) protein product or fragment of a protein product of one of the genes in Table A. Therapeutic cells may include lymphocytes engineered to kill cells in the subject that are (over-)expressing the one or more biomarkers.
The expression level of SOX2 may be affected by mutations within the regulatory region of SOX2. According to a further aspect of the invention there is provided a method for predicting ovarian cancer or identifying an increased risk of developing ovarian cancer or diagnosing ovarian cancer comprising in a sample obtained from a subject identifying a mutation within the regulatory region of the SOX2 gene wherein identification of the mutation predicts ovarian cancer or indicates that the subject has an increased risk of developing ovarian cancer or is used to diagnose ovarian cancer.
The mutation may affect MYC, KDM5B or EZH2 occupancy of the regulatory region (resulting in a loss of repression of SOX2 by MYC or EZH2).
In certain embodiments the sample comprises, consists essentially of or consists of fallopian tube tissue and/or fallopian tube cells and/or material derived therefrom.
The fallopian tube cells may comprise ciliated fallopian tube cells.
The regulatory region is defined to include any portion of the genome operably linked to SOX2 expression. Typically, the regulatory region includes one or more binding sites for proteins that control SOX2 expression. Thus, the regulatory region may incorporate one or more MYC, KDM5B and/or EZH2 binding sites. The regulatory region may comprise repressor sites. In specific embodiments, the regulatory region may comprise, consist essentially of or consist of SEQ ID NO. 1. SEQ ID NO. 1 is the sequence of a 2 MB region flanking SOX2. In certain embodiments the mutation comprises, consists essentially of or consists of any one or more of a mutation from C to any other nucleotide at position 1788388 of SEQ ID NO. 1, a mutation from G to any other nucleotide at position 1791114 of SEQ ID NO. 1, a mutation from G to any other nucleotide at position 1779360 of SEQ ID NO. 1 ,a mutation from T to any other nucleotide at position 1768318 of SEQ ID NO. 1, a mutation from G to any other nucleotide at position 1769826 of SEQ ID NO. 1, a mutation from A to any other nucleotide at position 1793202 of SEQ ID NO. 1, a mutation from C to any other nucleotide at position 1771041 of SEQ ID NO, 1, a mutation from A to any other nucleotide at position 1764947 of SEQ ID NO. 1, a mutation from C to any other nucleotide at position 1776714 of SEQ ID NO. 1, a mutation from C to any other nucleotide at position 1793607 of SEQ ID NO. 1, a mutation from G to any other nucleotide at position 1765207 of SEQ ID NO, 1 and a mutation from T to any other nucleotide at position 377489 of SEQ ID NO. 1, a mutation from C to any other nucleotide at position 1270829 of SEQ ID NO. 1, a mutation from G to any other nucleotide at position 1492887 of SEQ ID NO. 1, a mutation from A to any other nucleotide at position 1583019 of SEQ ID NO, 1, a mutation from C to any other nucleotide at position 1788388 of SEQ ID NO. 1, a mutation from C to any other nucleotide at position 1940505 of SEQ ID NO. 1. Corresponding genomic locations are described below against the relevant human genome build as indicated. The mutation may comprise, consist essentially of or consist of any one or more of M1 -M11 and BB1-BB6 (as defined herein).
The position of each of these mutations in the genome and in relation to the start of the 2 MB region (SEQ ID NO.1) is shown below. In addition, the nature of each mutation in terms of the nucleotide found in the reference allele and the mutant allele is also shown.
Description of mutations:
In relation to the genome (HG10 build). Chromosome:position_ref allele/mut allele
M1 - Chr3:182218102_C/G M2 - Chr3:182220828_G/A M3 - Chr3:182209074_G/C M4 - Chr3:182198032_T/G M5 - Chr3 :182199540_G/A M6 - Chr3: 182222916 A/G M7 - Chr3:182200755_C/A M8 - Chr3:182194661 _A/T M9 - Chr3:182206428 C/A M10 -Chr3:182223321_C/T M11 -Chr3:182194921_G/C
In relation to the start of the 2Mb at chr3:180429714 (i.e. within SEQ ID NO: 1):
M1 - 1788388_C/G M2 - 1791114_G/A M3 - 1779360G/C M4 - 1768318_T/G M5 - 1769826_ G/A M6 - 1793202_ A/G M7- 1771041 _ C/A M8 - 1764947 A/T M9 - 1776714_ C/A M10 -1793607 C/T M11 -1765207 G/C BB mutations genome (build HG19) coordinates:
BB1: chr3:180807203_T/G BB2: chr3:181700543 C/G BB3: chr3:181922601 G/T BB4: chr3:182012733_A/T BB5: chr3:182218102 C/G BB6: chr3:182370219_C/G
In relation to the start of the 2Mb SEQ ID NO: 1
BB1:377489_T/G BB2:1270829C/G BB3: 1492887 G/T BB4: 1583019_A/T BB5: 1788388 C/G BB6: 1940505 C/G
The regulatory region may comprise, consist essentially of or consist of SEQ ID NO. 2. SEQ ID NO. 2 is a 40Kb region flanking the BB5 nucleotide. M1 to M11 are mutations in the SEQ ID NO. 2 region. In certain embodiments the mutation comprises, consists essentially of or consists of any one or more of M1 (a mutation from C to G at position 1788388 of SEQ ID NO. 1), M2 (a mutation from G to A at position 1791114 of SEQ ID NO. 1), M3 (a mutation from G to C at position 1779360 of SEQ ID NO. 1), M4 (a mutation from T to G at position 1768318 of SEQ ID NO. 1), M5 (a mutation from G to A at position 1769826 of SEQ ID NO. 1), M6 (a mutation from A to G at position 1793202 of SEQ ID NO. 1), M7 (a mutation from C to A at position 1771041 of SEQ ID NO, 1), M8 (a mutation from A to T at position 1764947 of SEQ ID NO. 1), M9 (a mutation from C to A at position 1776714 of SEQ ID NO. 1), M10 (a mutation from C to T at position 1793607 of SEQ ID NO. 1), and M11 (a mutation from G to C at position 1765207 of SEQ ID NO, 1). In further embodiments the mutation comprises, consists essentially of or consists of BB5 (a mutation from C to G at position 1788388 of SEQ ID NO. 1).
In certain embodiments the mutation is identified by a method selected from nucleic acid sequencing, probe hybridization, nucleic acid amplification and mass spectrometric detection. The mutation may be identified using a high-thoughput sequencing technique. In certain embodiments the high-throughput sequencing technique is selected from pyrosequencing, SOLiD sequencing, Ion Torrent semiconductor sequencing, lllumina dye sequencing, singlemolecule real-time sequencing or DNA nanoball sequencing. The mutation may be identified using one, two or more of the primers listed in the Supplementary Tables, preferably Supplementary Tables 8 and 9.
The invention also relates to a system or device for performing a method as described herein. In a further aspect, the present invention relates to a system or test kit for predicting ovarian cancer or identifying an increased risk of developing ovarian cancer or diagnosing ovarian cancer in a subject comprising: a. one or more testing devices for determining the expression level of one or more biomarkers from Table A in a sample from the subject b. a processor; and c. a storage medium comprising a computer application that, when executed by the processor, is configured to: i. access and/or calculate the determined expression level of one or more biomarkers from Table A in the sample on the one or more testing devices ii. calculate whether there is an increased or decreased level of the one or more biomarkers from Table A in the sample; and iii. output from the processor the prediction or risk of developing ovarian cancer or diagnosis of ovarian cancer.
By testing device is meant a combination of components that allows the expression level of a gene to be determined. The components may include any of those described above with respect to the methods for determining expression level at the level of protein or RNA. For example the components may be antibodies, primers, detection agents and so on. Components may also include one or more of the following: microscopes, microscope slides, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers.
In certain embodiments the system or test kit further comprises a display for the output from the processor. The biomarker may be IGF2.
The invention also relates to a computer application or storage medium comprising a computer application as defined above.
In certain example embodiments, provided is a computer-implemented method, system, and a computer program product for predicting ovarian cancer or identifying an increased risk of developing ovarian cancer (or diagnosing ovarian cancer) in a subject, in accordance with the methods described herein. For example, the computer program product may comprise a non-transitory computer-readable storage device having computer-readable program instructions embodied thereon that, when executed by a computer, cause the computer to predict ovarian cancer or identify an increased risk of developing ovarian cancer (or diagnose ovarian cancer) in a subject as described herein. For example, the computer executable instructions may cause the computer to: (i) access and/or calculate the determined expression level of one or more biomarkers from Table A in a sample on one or more testing devices; (ii) calculate whether there is an increased or decreased level of one or more biomarkers from Table A in the sample; and, (iii) provide an output regarding the prediction of ovarian cancer or identification of an increased risk of developing ovarian cancer (or a diagnosis of ovarian cancer).
In certain example embodiments, the computer-implemented method, system, and computer program product may be embodied in a computer application, for example, that operates and executes on a computing machine and a module. When executed, the application may predict ovarian cancer or identify an increased risk of developing ovarian cancer in a subject, in accordance with the example embodiments described herein.
As used herein, the computing machine may correspond to any computers, servers, embedded systems, or computing systems. The module may comprise one or more hardware or software elements configured to facilitate the computing machine in performing the various methods and processing functions presented herein. The computing machine may include various internal or attached components such as a processor, system bus, system memory, storage media, input/output interface, and a network interface for communicating with a network, for example. The computing machine may be implemented as a conventional computer system, an embedded controller, a laptop, a server, a customized machine, any other hardware platform, such as a laboratory computer or device, for example, or any combination thereof. The computing machine may be a distributed system configured to function using multiple computing machines interconnected via a data network or bus system, for example.
The processor may be configured to execute code or instructions to perform the operations and functionality described herein, manage request flow and address mappings, and to perform calculations and generate commands. The processor may be configured to monitor and control the operation of the components in the computing machine. The processor may be a general purpose processor, a processor core, a multiprocessor, a reconfigurable processor, a microcontroller, a digital signal processor (“DSP”), an application specific integrated circuit (“ASIC”), a graphics processing unit (“GPU”), a field programmable gate array (“FPGA”), a programmable logic device (“PLD”), a controller, a state machine, gated logic, discrete hardware components, any other processing unit, or any combination or multiplicity thereof. The processor may be a single processing unit, multiple processing units, a single processing core, multiple processing cores, special purpose processing cores, co-processors, or any combination thereof. According to certain example embodiments, the processor, along with other components of the computing machine, may be a virtualized computing machine executing within one or more other computing machines.
The system memory may include non-volatile memories such as read-only memory (“ROM”), programmable read-only memory (“PROM”), erasable programmable read-only memory (“EPROM”), flash memory, or any other device capable of storing program instructions or data with or without applied power. The system memory may also include volatile memories such as random access memory (“RAM”), static random access memory (“SRAM”), dynamic random access memory (“DRAM”), and synchronous dynamic random access memory (“SDRAM”). Other types of RAM also may be used to implement the system memory. The system memory may be implemented using a single memory module or multiple memory modules. While the system memory may be part of the computing machine, one skilled in the art will recognize that the system memory may be separate from the computing machine without departing from the scope of the subject technology. It should also be appreciated that the system memory may include, or operate in conjunction with, a non-volatile storage device such as the storage media. The storage media may include a hard disk, a floppy disk, a compact disc read only memory (“CD-ROM”), a digital versatile disc (“DVD”), a Blu-ray disc, a magnetic tape, a flash memory, other non-volatile memory device, a solid state drive (“SSD”), any magnetic storage device, any optical storage device, any electrical storage device, any semiconductor storage device, any physical-based storage device, any other data storage device, or any combination or multiplicity thereof. The storage media may store one or more operating systems, application programs and program modules such as module, data, or any other information. The storage media may be part of, or connected to, the computing machine. The storage media may also be part of one or more other computing machines that are in communication with the computing machine, such as servers, database servers, cloud storage, network attached storage, and so forth.
The module may comprise one or more hardware or software elements configured to facilitate the computing machine with performing the various methods and processing functions presented herein. The module may include one or more sequences of instructions stored as software or firmware in association with the system memory, the storage media, or both. The storage media may therefore represent examples of machine or computer readable media on which instructions or code may be stored for execution by the processor. Machine or computer readable media may generally refer to any medium or media used to provide instructions to the processor. Such machine or computer readable media associated with the module may comprise a computer software product. It should be appreciated that a computer software product comprising the module may also be associated with one or more processes or methods for delivering the module to the computing machine via a network, any signal-bearing medium, or any other communication or delivery technology. The module may also comprise hardware circuits or information for configuring hardware circuits such as microcode or configuration information for an FPGA or other PLD.
The input/output (“I/O”) interface may be configured to couple to one or more external devices, to receive data from the one or more external devices, and to send data to the one or more external devices. Such external devices along with the various internal devices may also be known as peripheral devices. The I/O interface may include both electrical and physical connections for operably coupling the various peripheral devices to the computing machine or the processor. The I/O interface may be configured to communicate data, addresses, and control signals between the peripheral devices, the computing machine, or the processor. The I/O interface may be configured to implement any standard interface, such as small computer system interface (“SCSI”), serial-attached SCSI (“SAS”), fiber channel, peripheral component interconnect (“PCI”), PCI express (PCIe), serial bus, parallel bus, advanced technology attached (“ATA”), serial ATA (“SATA”), universal serial bus (“USB”), Thunderbolt, FireWire, various video buses, and the like. The I/O interface may be configured to implement only one interface or bus technology.
Alternatively, the I/O interface may be configured to implement multiple interfaces or bus technologies. The I/O interface may be configured as part of, all of, or to operate in conjunction with, the system bus. The I/O interface may include one or more buffers for buffering transmissions between one or more external devices, internal devices, the computing machine, or the processor.
The I/O interface may couple the computing machine to various input devices including mice, touch-screens, scanners, electronic digitizers, sensors, receivers, touchpads, trackballs, cameras, microphones, keyboards, any other pointing devices, or any combinations thereof. The I/O interface may couple the computing machine to various output devices including video displays, speakers, printers, projectors, tactile feedback devices, automation control, robotic components, actuators, motors, fans, solenoids, valves, pumps, transmitters, signal emitters, lights, and so forth.
The computing machine may operate in a networked environment using logical connections through the network interface to one or more other systems or computing machines across the network. The network may include wide area networks (WAN), local area networks (LAN), intranets, the Internet, wireless access networks, wired networks, mobile networks, telephone networks, optical networks, or combinations thereof. The network may be packet switched, circuit switched, of any topology, and may use any communication protocol. Communication links within the network may involve various digital or an analog communication media such as fiber optic cables, free-space optics, waveguides, electrical conductors, wireless links, antennas, radio-frequency communications, and so forth.
The processor may be connected to the other elements of the computing machine or the various peripherals discussed herein through the system bus. It should be appreciated that the system bus may be within the processor, outside the processor, or both. According to some embodiments, any of the processor, the other elements of the computing machine, or the various peripherals discussed herein may be integrated into a single device such as a system on chip (“SOC”), system on package (“SOP”), or ASIC device.
Embodiments may comprise a computer program that embodies the functions described and illustrated herein, wherein the computer program is implemented in a computer system that comprises instructions stored in a machine-readable medium and a processor that executes the instructions. However, it should be apparent that there could be many different ways of implementing embodiments in computer programming, and the embodiments should not be construed as limited to any one set of computer program instructions. Further, a skilled programmer would be able to write such a computer program to implement one or more of the disclosed embodiments described herein. Therefore, disclosure of a particular set of program code instructions is not considered necessary for an adequate understanding of how to make and use embodiments. Further, those skilled in the art will appreciate that one or more aspects of embodiments described herein may be performed by hardware, software, or a combination thereof, as may be embodied in one or more computing systems.
Moreover, any reference to an act being performed by a computer should not be construed as being performed by a single computer as more than one computer may perform the act.
The example embodiments described herein can be used with computer hardware and software that perform the methods and processing functions described previously. The systems, methods, and procedures described herein can be embodied in a programmable computer, computer-executable software, or digital circuitry. The software can be stored on computer-readable media. For example, computer-readable media can include a floppy disk, RAM, ROM, hard disk, removable media, flash memory, memory stick, optical media, magneto-optical media, CD-ROM, etc. Digital circuitry can include integrated circuits, gate arrays, building block logic, field programmable gate arrays (FPGA), etc.
Reagents, tools, and/or instructions for performing the methods described herein can be provided in a kit. Such a kit can include reagents for collecting a tissue or cell sample from a patient, such as by biopsy, and reagents for processing the tissue or cells. The kit can also include one or more reagents for performing a expression level analysis, such as reagents for performing nucleic acid amplification, including RT-PCR and qPCR, NGS, northern blot, proteomic analysis, or immunohistochemistry to determine expression levels in a sample of a patient. For example, primers for performing RT-PCR, probes for performing northern blot analyses, and/or antibodies or aptamers, as discussed herein, for performing proteomic analysis such as Western blot, immunohistochemistry and ELISA analyses can be included in such kits. Appropriate buffers for the assays can also be included. Detection reagents required for any of these assays can also be included. The kits may be array or PCR based kits for example and may include additional reagents, such as a polymerase and/or dNTPs for example. The kits featured herein can also include an instruction sheet describing how to perform the assays for measuring expression levels. The kit may include one or more primer pairs complementary to one or more biomarkers from Table A. The kit may also include one or more primers pairs complementary to a reference gene.
Informational material included in the kits can be descriptive, instructional, marketing or other material that relates to the methods described herein and/or the use of the reagents for the methods described herein. For example, the informational material of the kit can contain contact information, e.g., a physical address, email address, website, or telephone number, where a user of the kit can obtain substantive information about performing a gene expression analysis and interpreting the results.
The kit may further comprise a computer application or storage medium as described above.
The example systems, methods, and acts described in the embodiments presented previously are illustrative, and, in alternative embodiments, certain acts can be performed in a different order, in parallel with one another, omitted entirely, and/or combined between different example embodiments, and/or certain additional acts can be performed, without departing from the scope and spirit of various embodiments. Accordingly, such alternative embodiments are included in the examples described herein.
Although specific embodiments have been described above in detail, the description is merely for purposes of illustration. It should be appreciated, therefore, that many aspects described above are not intended as required or essential elements unless explicitly stated otherwise.
Modifications of, and equivalent components or acts corresponding to, the disclosed aspects of the example embodiments, in addition to those described above, can be made by a person of ordinary skill in the art, having the benefit of the present disclosure, without departing from the spirit and scope of embodiments defined in the following claims, the scope of which is to be accorded the broadest interpretation so as to encompass such modifications and equivalent structures. DESCRIPTION OF THE FIGURES Figure 1:
Mutations are not required for extreme primary chemotherapy resistance a. Diagram showing the sites of laparoscopic-guided biopsies obtained before and after chemotherapy and the corresponding laparoscopic images of the biopsy sites. Number of tumour islets sequenced is shown above each image. Note the complete macroscopic resolution of the tumor following chemotherapy. p53 immunohistochemical staining of MRCD is also shown, b. A diagram showing the analysis strategy, c. A venn diagram showing the number of mutations per location, d. Allele counts (top), allele fractions (middle) and copy number alterations (bottom) identified from supergenome A/B. Black lines indicate total copy number, blue lines indicate minor allele copy number and red line indicate areas of loss of heterozygosity (LOH).
Figure 2:
Core mutations cluster at potential enhancers of stem cell differentiation genes. a. LOH events in the indicated samples are presented in orange. Note the ubiquitous pattern of LOH across 16 tumour islets, b. Log number of mutations (y-axis) shared between genomes (x-axis). c. The distribution of the core mutations (n=750) across chromosomes is shown, d. The 750 variants shared by more than 90% of tumor islets, and hence called ancestor variants, were mapped to the potential gene regulatory regions. Initially, to test for enrichment, somatic variants that were called in any 2 tumor islets (n = 330,811) were used as a background comparator, a. The fold enrichment and the log number of variants that supported any significantly enriched ontologies are shown. Also indicated are the top 5 ontologies that had more than 2 fold enrichment over the background.
Figure 3: A 40Kb region downstream of SOX2 is frequently mutated in HGSOCs a. Deep targeted sequencing was performed for 33 HGSOCs and 861 rare variants (not previously reported in the 1K genomes) were identified. Shown in the upper panel is the ratio of observed number of variants in 40Kb “moving” windows in the cancer set to the expected number of variants in the equivalent windows based on 1000 permutations of simulated 1KG data. The lower panel represents statistical significance (log~\0 1/p-value) of enrichment of rare variants that fall in known biochemically active sites (orange bars). The locations of the BB nucleotides are shown and the BB5 region is indicated by a horizontal magenta bar. b-e. Immunohistochemical staining of p53 and SOX2 at low (b and d, measure bar = 300 μιτι) and high (c and e, measure bar = 50 μιτι) magnification in normal FTE and the p53 signature.
Note the strong focal p53 staining at the multilayered epithelium (p53 signature), f. Fraction of mutant reads compared to total number of reads of the BB5 and TP53 mutations in germline DNA, FTE, the p53 signature (p53 sig) and tumor (n = 3). g. Sequencing trace showing the BB5 mutation.
Figure 4:
Clonal expansion of SOX2-expressing cells in the normal FTE is ubiquitous in HGSOC. a. Representative SOX2 immunohistochemistry (IHC) images for the FTE of the indicated samples are presented. Scale bars = 100 μιτι. b. Percentages (y-axis), median percentage (horizontal black bars) and intensity (x-axis) of SOX2 staining in the normal FTE of women with benign conditions, endometrial cancer or HGSOCs and in the paired HGSOC tumours. Solid circles represent the FTE from cancers harbouring rare variants and mutations in the BB5 40 kb region. The black arrow indicates a case of high-grade serous endometrial cancer, c. Power calculations were used to determine the required case number for the validation set based on the data from the discovery set (see Suppl. Methods). Data from an independent set of 88 cases as well as an additional 48 BRCA1 or BRCA2 mutation carriers who underwent prophylactic excision of the FTs are presented, d. A Receiver Operating Characteristics curve is presented for the combined data presented in panels b and c.
Figure 5:
The FTE expression of SOX2 and MYC/EZH2 is mutually exclusive.
a. CPG di-nucleotides were identified in the reference genome and the number of dinucleotides in moving (every 100 bp) 40Kb windows flanking SOX2 is shown in the top panel. In the lower panel, targeted ChIP-sequencing was performed on chromatin extracted from FTE cells and the number of significant peaks within moving 40Kb windows is shown. The BB5 region is indicted using a magenta bar. In b. the number of cpg di-nucleotides per 40Kb windows is plotted against the number of significant H3K27Ac peaks for the same windows. The Pearson correlation coefficient is also shown, c. and d. show representative IHC images of FTE double staining using the indicated antibodies. Measure bars = 10 μηι. Arrows indicate cells that are shown in higher magnifications. EZH2 single staining of FTE and a corresponding tumour is also shown in d. e. Representation of two hypothetical models where in one (left), a single SOX2-expressing cell or at least one in a chain of SOX2 expressing cells (not shown) is immediately adjacent to a non-ciliated cell while in the other (right), this condition is not necessary. A barplot representation of the mean counts of FTs from 4 subjects +/- sd. f. representative images of HGSOCs each stained for SOX2 and MYC. Also shown is the summary table for intensity of staining and the mean (+/- sem) intensity values of MYC expression is shown for SOX2 high expressing versus SOX2 low expressing tumours.
Figure 6: EZH2 and MYC are required for suppression of S0X2 in the FTE cells. a. A diagram showing the process of extracting non-ciliated fallopian tube epithelial cells. Also shown are immunofluorescence images stained using the indicated antibodies, b. and c. Freshly obtained primary cultured FTE cells were transfected with either non-targeting siRNA or MYC siRNA as indicated for 48h to 72h then cells were harvested and either used for western blots using the indicated antibodies (b) or quantitative real-time PCR to measure the expression of the indicated genes, d. Capture ChIP-seq was performed on FTE cells using the indicated antibodies to detect areas of occupancy and histone modifications. Shown is the number of reads per nucleotide normalized to the maximum number of reads in that region. Also shown is the pattern of sequence conservation across vertebrate species. The locations of the 11 identified mutations is also shown.
Figure 7:
The BB5 repressor element is required for MYC, EZH2 and KDM5B occupancy and suppression of SOX2 expression a. Logo for sequence motif UW.0169. The letter height corresponds to the probability of occurrence of a nucleotide, b-d. Activity of the BB5 wild type and mutant enhancer region in chicken embryos (n = 20). e. Transfection of wild type and mutant plasmids on either side of the same embryo to show differences in activity. A typical image of 10 repeats is presented, f. Co-localization of endogenous nuclear Sox2 expression with wild type BB5 region enhancer activity at the neural tube but not at the neural crest is shown using white arrows, g. CRISPR-mediated deletion of 6 nucleotides (magenta) downstream of BB5 nucleotide (see Supplementary Table 1). Also shown is the protospacer adjacent motif (red), h. ChlP-qPCR results showing significant reduction of occupancy of the BB5 element by the indicated proteins and significant reduction of H3K27Ac following CRISPR-mediated deletion. Also shown is ChIP-qPCR of FTE for comparison. Shown is the mean and sd of triplicate values, i. DNA electrophoresis and western blots for wild type HEK293 cells and mutant clone SH26A. Note the loss of overexpression that coincided with overgrowth of the wild-type subclone as indicated by the DNA electrophoresis, j. ChIP-qPCR results showing significant reduction of MYC and KDM5B occupancy of the BB5 element following siRNA-mediated depletion of MYC in FTE cells. Shown is the mean and sd of triplicate values.
Figure 8:
Strategy for laser capture microdissection. A total of 84 tumor islets and 41 stroma islets were microdissected using individual caps for each islet. For each microdissected islet, at least 5 images were obtained for documentation. Islets were subjected to LFR WGS. Shown are images of each of the 16 islets mentioned in main text at low magnification (mag), high magnification, after microdissection and on the collection cap. Also shown is the genome call rate (percentage of positions called in relation to all bases in a genome). Measure bars = 150 μιη.
Figure 9:
Digital analysis of LFR whole genome sequence data faithfully represents allele fractions. a. Dot plot representation of allele specific read and well data as indicates. Note the tendency for over-amplification when using read counts following extensive MDA-based amplification. b. A Bland-Altman plot comparing read fractions as measured by well counts from LFR-WGS data from all genomes in positions A and B and the read counts measured by lllumina targeted sequencing of 750 core mutations from the same positions. Allele fractions from targeted sequencing were adjusted to account for normal cell contamination. The X-axis represents the mean of the two measurements for each mutation while the Y-axis, represents the difference. Note that the mean differences (shown by the golden line) is not significantly distant from 0 indicating that the measurements obtained by the two methods were highly similar. Also shown as blue horizontal lines is the SD of the differences.
Figure 10:
The total read counts and read fractions of germline SNPs obtained from standard whole genome sequencing at 80x depth are presented in the upper two panels. Note that calling of chromosomal numerical variations was not possible because of the noise introduced by contamination from normal cells. For comparison, the mean well counts and well fractions for a supergenome that comprised of 10 genomes from LCM tumour islets is presented in the middle two panels. The high resolution and precision of this data allowed accurate calling of copy number alterations as shown in the lower panel (Black lines indicate total copy number, blue lines indicate minor allele copy number and red line indicate areas of loss of heterozygosity (LOH).
Figure 11:
Analysis strategy and validation of the 750 variants. A diagram showing the validation strategy of the identified ancestor variants. Further validation using: 1) WGS of unamplified lymphocyte DNA, 2) LFR sequencing of a further stroma sample (85% genome coverage) and 3) targeted sequencing (50X read depth) of all variants in germline lymphocyte DNA excluded the possibility of these being germline variants. Repeat deep targeted sequencing (450X) of all variants on lymphocyte DNA identified two that were compatible with mosaic state (chr21:9,833,828_C/A; 204 out of 2542 reads and chr7:61,874,043_C/A; 273 out of 1115 reads). Other variants (n = 7) were either present at very low frequency (<1:100) or were not confirmed by subsequent standard sequencing. Furthermore, extensive validation using: 1) LFR WGS of 3 additional tumor islets, 2) Standard WGS of bulk omental tumor and 3) deep targeted sequencing of all variants on DNA from 2 bulk tumor samples (positions B and C) confirmed the presence of all 750 mutations.
Figure 12:
Analysis strategy for identifying enriched gene ontologies.
The 750 variants shared by more than 90% of tumor islets, and hence called ancestor variants, were mapped to the potential gene regulatory regions. Initially, to test for enrichment, somatic variants that were called in any 2 tumor islets (n = 330,811) were used as a background comparator. The validation strategy of the stem cell differentiation ontology enrichment (see methods for analysis details).
Figure 13:
Validation of enhancer activity in chicken embryos. a. The epiblasts of chicken embryos were electroporated at the 4-somite stage with a citrine fluorescent protein plasmid downstream of a basal promoter as shown in the diagram with or without an upstream putative 1 Kb enhancer region containing the nucleotide that we previously identified as being mutated in tumor samples of patient 11152. b. Shown are the patterns of enhancer activity for the indicated nucleotide positions.
Figure 14:
The BB5 genomic region was a site of complex genomic aberrations. a. Shown are the total well counts, well fractions (allele fractions) and called copy number alterations in the region surrounding the BB5 location.
Figure 15:
Mutations at the BB5 enhancer region in HGSOCs are associated with high expression of SOX2 in the FTE.
Shown are the identified somatic mutations, (a, b, d, e, f) or LOH (c) in the 40 Kb region containing BB5 in 6 HGSOCs (left panel). Also shown is SOX2 expression in the corresponding apparently normal FTE and for c and e the lack of SOX2 nuclear expression in the adjacent ovarian tumors (right panel). Note that for patient 11249, macrodissection of FTE from the fallopian tube was possible. Seqencing of DNA from FTE revealed that a heterozygous mutation that was found in the tumour was also present in the FTE and appeared to be homozygous.
Figure 16: SOX2 expression in the FTE and tumor tissue samples. a. Bar plot of the percentage of nuclei showing strong (green), medium (magenta) or low SOX2 expression (blue) of the apparent normal FTE, tumor tissue prior to chemotherapy (Pre) in two omentum (OM1 and OM2) and 1 diaphragmatic peritoneum (DR) samples, and following chemotherapy (Post) in the left ovarian tumor (LOT), b. Expression of SOX2, p53 and WT1 (an epithelial cell marker) in the FTE at the p53 signature, a pre-chemotherapy tumor islet (OM1) and a post-chemotherapy tumor islet (LOT). Positive p53 and WT1 staining confirms that the SOX2-negative cells are cancer cells. Measure bars = 100 pm.
Figure 17:
Ectopic SOX2 expression induces significant extracellular matrix remodelling. a. Immunofluorescence images of primary cultured FTE cells that were transduced with either SOX2-expressing lentiviruses or empty vector controls and were then fixed and stained using the indicated antibodies. Cytokeratin (CK) staining was used to confirm the presence of epithelial cells. Arrow heads indicate the loss of MYC or PAX8 expression in SOX2-expressing cells. A typical example of three independent experiments is presented, b. Bar plot representation of 20 significantly enriched pathways following the ectopic expression of SOX2 in the SKOv3 ovarian cancer cell line. ECM-related pathways are highlighted using magenta bars. See Supplementary Table for the pathway names and the genes assigned to each pathway, c. The validation of RNA-seq results using real-time quantitative PCR in SKOv3 and FTE cells following the ectopic expression of SOX2 in the corresponding cell lines using lentiviral transduction.
EXAMPLES
The present invention will be further understood by reference to the following experimental examples.
Founder mutations in repressor elements induce S0X2 expression in the fallopian tube epithelium of high-grade serous ovarian cancers.
Ovarian cancer is one of the most fatal malignancies because of late presentation and chemotherapy resistance. While emerging evidence suggests extensive tumour heterogeneity as an underlying mechanism of resistance, the clinical significance of subclonal mutations in determining tumour behaviour has been difficult to assess. In this work we utilised a novel strategy for evaluating the genetic basis of ovarian cancer chemotherapy response and tumour initiation. Using digital analysis of extensive whole genome sequencing (WGS) we compared extreme chemotherapy-resistant and chemotherapy-sensitive sites from a single tumour and found that mutations were not required for extreme primary chemotherapy resistance. By tracking mutations to a pre-neoplastic lesion in this tumour, we discovered a novel SOX2 distal repressor 40 Kb region that is mutated in 23% of high-grade serous ovarian cancers (HGSOCs). We show that under physiological conditions EZH2 and MYC occupy elements in this region and are required for suppression of SOX2 expression in the fallopian tube epithelium (FTE), a common site of origin of HGS pelvic cancers. Mutating a repressor element in that region interfered with EZH2 and MYC occupancy and induces SOX2 expression. We show that consequent expansion of SOX2-expressing FTE cells is a ubiquitous feature of HGSOCs. Our work has important implications for early detection and for circumventing chemotherapy resistance in this fatal malignancy.
Introduction:
With the exception of ubiquitous mutations of TP53\ driver coding mutations of individual genes in high grade serous ovarian cancers (HGSOCs) are relatively uncommon. Moreover, whether non-coding genomic variations or mutations contribute to the genesis of HGSOCs has remained unkown. Such non-coding alterations have been shown to drive tumour initiation and progression. For example, recurring mutations in the promoter of telomerase reverse transcriptase (TERTj has been previously described in many solid tumours2'4. A plausible approach to identify such driver non-coding mutations is to perform multidimensional analyisis of large cohorts of samples from individual tumour types5. Alternatively, recent studies have suggested that focussed analyses of ultra-deep sequence data6, multiregion sampling7 or temporal sampling8 may reveal insights into the evolution of mutations and identify core mutations that are shared spatially and temporally. However, because of the challenges in obtaining clinical data that describes biological behaviour (e.g response to chemotherapy) at individual sites within the same tumour, it has remained difficult to evaluate the relationship between genomic alteration and tumour biology in patients.
Comprehensive genomic monitoring an ovarian cancer
We utilised intraoperative video recording and tumor tracking (IOVT) combined with whole genome sequencing (WGS) of tumor islets (containing tens of cells) and bulk tissue samples of a single high-grade serous ovarian cancer (HGSOC). In total, we generated thirty-three whole genome sequencing data sets from three spatially distinct locations, before and after chemotherapy providing one of the most comprehensive genomic characterizations of any single tumor to date and this allowed us to investigate tumor heterogeneity in four dimensions: Space, Subspace (tumour islets within an individual biopsy site), time and biological response to chemotherapy (Fig. 1a-b).
The tumor locations were sampled at initial laparoscopy prior to chemotherapy (labeled A, B and C in Fig. 1a) and 10 weeks later following three cycles of combination chemotherapy. Complete macroscopic clearance at all sites as well as microscopic tumor clearance at sites A and B (absolute sensitive) were documented. Microscopic residual chemotherapy-resistant disease (MRCD) was detected at position C. This study design allowed the sampling of cancer cells that represented the extremes of the chemotherapy response spectrum within the same tumour: a) pre-chemotherapy cells that subsequently disappeared (complete pathological response) after chemotherapy (absolute sensitive) and b) MRCD that once existed in bulk tumor (absolute resistant). We performed laser capture micro-dissection of tumor islets from all positions (Figure 8 and Supplementary Table 1) followed by long fragment read (LFR) whole genome sequencing9 of 29 tumor islets (3 position A, prechemotherapy, 7 position B pre, 10 position C pre and 9 position C post) and single peripheral blood sample. LFR technology enabled the sequencing of very low quantities of DNA obtained from our micro-biopsies that typically consisted of tens of cancer cells per islet. A novel digital analysis approach of long-fragment read WGS data. LFR technology uses stochastic separation of long parental tumor DNA fragments into 384 distinct wells resulting in approximately 10-20% of a haploid genome in each well. The genomic DNA is then amplified using multiple displacement amplification (MDA), fragmented and ligated to unique barcode adapters before DNA from all 384 wells are combined, purified and sequenced (Fig. 1b). We exploited the fact that the physical separation of the DNA into distinct pools results in a low probability of corresponding fragments from both parental chromosomes co-existing in the same well. Using the unique barcodes adapters, we identified the well of origin for each allele sequenced and then, by counting the number of wells supporting the reference or mutant allele, obtain digital allele counts that are free of the amplification-induced biases contained in the sequence reads. The allele fractions for individual variants were then computed by using the number of wells supporting a variant instead of the classic approach of using number of reads. Utilizing well-level data enabled avoided false inflation of allele numbers due to amplification bias (Fig. 9a) and thus enabled more accurate estimation of copy numbers. The use of well counts was validated by comparing the well-level allele fractions with read-level allele fractions obtained by deep targeted sequencing of a subset of 750 mutations (Fig. 9b).
Genetic mutations are not required for primary extreme chemotherapy resistance
Whether primary chemotherapy resistance is an inherent characteristic of tumours that is determined by core mutations or a feature of a subpopulation of cancer cells with unique mutations has remained unkown. To address this question, we constructed “supergenomes” that had accumulated data from individual islets from each of the positions mentioned above (Fig. 1b). This high resolution set of supergenomes enabled the detailed comparison between chemotherapy-sensitive (positions A and B) and those that were extremely chemo-resistance (position C post chemotherapy). Using stringent criteria (only including high quality somatic mutation calls that were not called in blood sample, that were supported by reads from at least three wells and that were called in at least 3 genomes and not previously called in dbSNP database) we found that most of the single nucleotide mutations {n = 7588, of which 750 were vaidated using targeted sequencing) were in fact shared by the majority of tumor sites (Fig. 1c). This suggests that the mutations occurred early and became fixed within tumors irrespective of space, time or chemotherapy response. However, a relatively smaller set was specific to individual groups. Of note, there were 535 mutations that were specific to prechemotherapy tumour islets of which 13 were private to the extreme sensitive sites (sites A and B). Of the 535 mutations, only four were nonsynonymous mutations and the rest occurred in non-coding regions. We speculate that synthetic lethal interactions could have occurred in this tumor that rendered a subpopulation of cancer cells particularly sensitive to chemotherapy. In addition, there appeared to be relatively few private mutations for the post-treatment chemotherapy-resistant tumors (n =6) none of which occurred in exons. Altogether, our data strongly suggest that in this tumor, the chemotherapy resistance signature may have pre-existed.
We next characterized the copy number alterations of the tumor. The high-resolution supergenomes permitted whole genome copy number analysis of the tumor that was not possible from bulk sequencing (Fig. 1d and Fig. 10). This revealed a near-triploid state and a complex myriad of copy number alterations and loss-of-heterozygosity events that is typical of HGSOCs5. Importantly, there were no significant differences between different supergenomes (A/B prechemotherapy, C prechemotherapy or C postchemotherapy). This added further confirmation that extreme primary chemotherapy resistance is indepednent from genetic mutations or chrmosomal numerical aberrations.
Core ancestor mutations predominantly lie in non-coding regions.
We next sought to determine the core genetic mutations and structural aberrations in this tumour that may have contributed to its genesis. To do this we utilized data from individual tumour islets (as opposed to supergenomes). We hypothesized that core mutations would be present in all cancer cells irrespective of their gross spatial location, refined sublocation, time of presentation or biological behavior (e.g. chemotherapy response). Analysis of numerical aberrations using a novel statistical algorithm to account for the bias introduced by allele drop out (see Supplementary Methods) showed that LOH events occurred in a highly conserved pattern across 16 tumor islets with greater than 70% genome coverage.
We next identified a set of core mutations that were present in practically all tumor islet samples (>90% of tumour islets) irrespective of space, tumour site or biological behaviour. Analysis of 16 tumor genomes derived from individual tumour islets that had sequence coverage of more than 70% and two microdissected adjacent stroma samples identified a large number of novel core mutations (n = 750, thereafter designated as ancestor). These were shared by more than 90% of tumor islets (>14/16) examined and were neither found in blood nor adjacent stroma and were not previously described in dbSNP (Fig. 2b-c). Further extensive validation using different sequencing chemistry excluded the possibility of these being germline variants and confirmed the presence of all 750 mutations (Fig. 11 and Supplementary Table 1). The variants included 2 synonymous mutations, 8 non-synonymous mutations in protein-coding regions and 740 mutations in regions that were not protein coding. Importantly, the list of gene-coding non-synonymous mutations included a mutation in exon 8 of TP53 that was shared between all tumor islets but not detected in stroma samples. This result was consistent with the previous finding by Ahmed et al.,1 and others5 of ubiquitous TP53 mutations in HGSOCs.
Core mutations cluster at potential c/s regulatory elements of genetic drivers of stem cell differentiation
We next hypothesized that a fraction of the identified non-coding mutations may have occurred at distal c/s-regulatory elements that could alter the expression of genes involved in tumor evolution. To test whether mutations targeted the regulatory regions of genes involved in specific biological processes, we first mapped the identified mutations to gene regulatory regions as previously reported10. Next, we compared the list of 750 ancestor variants to “progeny” variants (background comparator) that were present in at least two tumor islets (n = 330,811 variants). This analysis revealed five significantly enriched ontologies of biological processes (Fig. 2d) and they were all related to embryonic and stem cell differentiation (Fig. 2d). For example, the stem cell differentiation (Supplementary Table 2) ontology was supported by 23 mutations that mapped to 15 genes (e.g. SOX2, PAX7, WNT7A). This significant enrichment was robust to changing the background comparator to either the whole genome (p < 0.001, Fig. 12) or to germline variants confirmed by standard WGS of lymphocyte DNA (n = 4.28 million, p < 0.001), and to restricting the analysis to only experimentally verified genes linked to the stem cell differentiation ontology (p < 0.001). In addition, using a fluorescence reporter assay in chicken embryos (Fig. 13a), we found that the majority of non-coding regions tested (9/10), corresponding to variants that mapped to 6 genes including SOX2, PAX7, WNT7A, HMGA2, NOG and TFAP2C (Fig. 13b-g and data not shown), harbored enhancer activity and had a specific tissue distribution. These results strongly suggested that ancestor cells for this tumor were significantly enriched in somatic variants within cis-regulatory elements of genes involved in stem cell differentiation.
Of particular note was the finding that SOX2, a key driver of stem cell differentiation11 that was recently implicated in tumour initiation of skin cancer12, was supported by a total of 6 mutations that occurred at nucleotides which we termed BB1 to BB6. These mutations were located within a region that had complex chromosomal copy number variations including amplification to 4 and 6 copies and an area of possible LOH immediately downstream of this BB region (Fig. 14). Since the probability of acquiring a mutation in the same position on two alleles is very small, the finding that the allele fractions of the identified mutations centred on 0.5 in an area where 4 alleles existed, confirmed that the mutations occurred prior to duplication. Thus, all the discovered mutations appeared to have occurred early in this tumour’s evolution.
The BB5 mutation occurred at the preneoplastic region of the tumour and marked a region that was frequently mutated in HGSOCs
We next asked whether any of the 6 mutations marked regions that were sites of frequent occurrence of private variants or mutations in HGSOCs. We performed deep-targeted sequencing of the 2-Mb region flanking S0X2 on 33 HGSOCs including our index case (Supplementary Table 3). A total of 861 single nucleotide substitutions (Supplementary Table 4) were identified that were not previously reported in the 1000 Genomes Project (median = 21, range = 11-97). Since functionally important genomic regions tend to be significantly less susceptible to genomic variation within a population, we asked whether the identified rare variants accumulated in specific regions within the 2 Mb that were less susceptible to genomic alterations on a population scale. To test this we constructed overlapping moving windows of 40Kb size and compared the observed frequency of rare mutations (not previously described in the 1K genome) in our group of patients with the expected frequency of SNPs in the same windows based on 1000 sets of simulated cohorts of 33 individual from the previously reported 1KG data. This analysis showed that there was a peak observed/expected ratio (enrichment statistic) in a 40Kb region flanking the BB5 nucleotide (Fig. 3a, thereafter referred to as the BB5 region). Next we tested the assumption that the region was enriched in variants that occupied functionally important elements by mapping the variants to biochemically active sites as reported from the ENCODE. We found that the BB5 region was also significantly enriched (p < 0.01) in variants that overlapped with genomic sites that were previously reported to harbor biochemical activity (DNasel hypersensitivity or transcription factor binding activity) (Fig. 3a and Supplementary Table 4). Finally, re-analysis of sequence conservation across vertebrate species confirmed the presence of a peak of highly conserved sequences across species in the BB5 region (not shown).
We next sought to determine the number of true somatic mutations in the BB5 region by sequencing DNA from tumour and normal paraffin-embedded material. We confirmed that the BB5 region (chr3:182,189,714-182,229,714) included 21 single nucleotide substitutions in tumors from 16 patients (48.5% of all patients) of which 9 variants from 7 patients (21.2%) were somatic mutations (Supplementary Table 5). In 6 patients, de novo mutations were observed. Two of these patients had biopsies from more than one site and mutations were found in those additional sites suggesting that they were core mutations (Supplementary Table 5). In addition, two loss-of-heterozygosity events occurred in one patient (Fig. 15) and these mutations were also present in 3 other tumor sites suggesting that they were core mutations. Independent validation by deep sequencing of a focused 1.6 Mb region flanking SOX2 in 16 tumour-normal pairs of HGSOCs identified two further mutations in the BB5 region in two patients (total number of mutations 11 in 9 patients). Notably, in the entire cohort of 49 cases, out of the 9 cases with evidence of tubal carcinoma 5 (55.6%) had mutations in the BB5 region in contrast to 4 out of 40 (10%) cases with no evidence of tubal carcinoma. Importantly, based on the reported average mutation rate of 1,74e-6 per base, the BB5 region is significantly enriched for somatic mutations (p= 0.00045).
Because of the strong evidence of the early occurrence of the BB5 mutation, we tested whether it was also present in the precursor cells of the tumour. We examined the fallopian tube epithelium (FTE) as a potential tissue of origin of HGSOCs13'16 and identified the pathognomonic p53 signature14 (p53 nuclear overexpression in benign-appearing FTE) in the apparently normal FTE in this patient (Fig. 3b-e). A subpopulation of the p53-overexpressing cells also expressed high levels of nuclear SOX2 (Fig. 3b-e). Using a combination of deep-targeted sequencing, droplet-digital PCR and Sanger sequencing we confirmed that both the BB5 nucleotide and TP53 were mutated in the p53 signature and the adjacent FTE (Fig. 3f-g). Importantly, in two additional patients where DNA extraction from the paraffin embedded FTE was possible, we were able to show that the mutations were also present in the FTE (Fig. 15).
Expansion of FTE cells strongly expressing SOX2 is a novel ubiquitous feature of HGSOCs
We next sought to determine the relevance of such observation in other HGSOCs. Surprisingly, we found that the profound increase of the percentage of SOX2-expressing cells in the FTE of our index patient was in sharp contrast to the rare SOX2 expression in the FTE of patients with benign conditions. Paradoxically, SOX2 expression in tumor cells was almost absent (Fig. 16). We compared the expression of SOX2 in the nuclei of normal FTE of patients with benign conditions and patients with endometrial cancer and the normal FTE and corresponding ovarian tumors from patients with HGSOC (Supplementary Table 6). Automated image analysis of these samples showed that the median expression of SOX2 (intensity score of 3+) in the HGSOC FTE was 17 fold higher compared to FTE from patients with benign conditions (p < 2e-16, One-way ANOVA followed by Tukey test, Fig. 4a-b). The median expression of SOX2 significantly dropped in the corresponding ovarian tumors (p < 2e-16, One-way ANOVA followed by Tukey test, Fig. 4b). To validate this observation, we performed IHC on an independent set of fallopian tubes from 88 women with either HGSOCs (n = 42) or benign gynaecological conditions (n = 46, Fig. 4c). This analysis confirmed that there was a significant expansion of cells that strongly expressed SOX2 in the FT of women with HGSOCs (p < 2e-16, One-way ANOVA followed by Tukey test). In addition, fitting a binomial general linear model on the entire data set revealed that strong SOX2 expression accurately predicted whether the fallopian tube was from a HGSOC patient or from a patient with benign pathology (p = 9.95e-8, logistic regression model). Fitting a Receiver Operating Characteristic (ROC) on the data from the logistic regression analysis confirmed the high predictive power of the strong SOX2 expression (Fig. 4d, Area Under the Curve (AUC) = 0.87). Importantly, analysis of the FTs that were excised in 4 women with high risk of ovarian caner (strong family history or BRCA1/2 mutations) to reduce the risk of acquiring ovarian cancer revealed a significant expansion of cells that strongly expressed SOX2 in two patients (Fig. 4a). This indicated that such lesions might occur prior to ovarian cancer development. To our knowledge, this ubiquitous feature of the FTE of HGSOCs has not been previously described.
The FTE expression of SOX2 and MYC/EZH2 is mutually exclusive
We next hypothesized that non-coding mutations may contribute to the observed SOX2 phenotype. The possible genetic link was further supported by the observation that there was an enrichment of the number of significant H3K27Ac ChIP-seq peaks in the BB5 region in fallopian tube epithelial cells (Fig. 5a). This observation is akin to the recently described observation of enriched H3K27 acetylation at super-enhancers17. At a global level there was a striking correlation (Pearson Correlation Coefficient, a2 = 0.8) between the density of H3K27Ac peaks and that of cpg dinucleotides in the region flanking SOX2 gene suggesting that this region is heavily susceptible to regulation by methylation (Fig. 5a). We, therefore, hypothesized that such regions are regulated by repressive complexes that cooperate to modulate histone tails and mediate cpg methylation locally and at the SOX2 promoter. We hypothesized that such repression may be alleviated by tumour mutations.
To test the hypothesis, we started by conducting further characterization of the phenotypic observation. As previously described, the FTE contained basal, secretory (PAX8-expressing) and ciliated cells. In addition, we observed a surprising mutually exclusive pattern of expression of SOX2 and MYC in the FTE. In the FTE of benign cases, MYC, a stem cell TF that binds to the transcriptional repressor KDM5B18, appeared to be expressed exclusively and ubiquitously in non-ciliated cells (Fig. 5c). In contrast, SOX2 was expressed at low levels in a minor population of ciliated cells (Fig. 5c). The FTE non-ciliated cells also contained a small population of EZH2, the catalytic component of the Polycomb Repressive Complex 2 and a component of the DNA methyl transferase complex that controls DNA methylation at cpg islands19, (Fig. 5d). Irrespective of the cell type, the majority of the FTE cells expressed KDM5B (data not shown). In the FTE from HGSOC cases, MYC and EZH2 maintained a similar expression pattern in non-ciliated cells while SOX2 was strongly expressed in the majority of the ciliated cells (Fig. 5c). A plausible explanation for the relationship between MYC and SOX2 expression in benign and cancer conditions would be a model whereby under physiological condition, ciliated cells were derived from MYC-expressing cells and that SOX2 would be transiently expressed to promote lineage differentiation into ciliated epithelium. Such model would suggest that MYC contributes to repression of SOX2 in the secretory cells and that this repression is transiently lost in the maturation to ciliated SOX2-expressing cells. This suggests that at the onset of cancer initiation, repression of SOX2 by MYC and EZH2 would be lost. Such hypothesis would be rationalized if the great majority of S0X2-expressing cells were spatially located adjacent to a MYC-expressing cell in the fallopian tube and not exclusively surrounded by SOX2-negative ciliated cells. Indeed, more than 97% of the SOX2-expressing cells had that pattern of expression (p < 0.0001, two-way t test, n= 407 cells, Fig. 5e) making the hypothesis plausible. In the tumour, both EZH2 and MYC over-expression was accompanied by loss of expression of SOX2 (Fig. 5d,f). Analysis of SOX2 and MYC expression in 209 HGSOCs revealed that MYC was significantly more expressed, 80% of tumours with 26% showing strong expression, compared to SOX2, 23% of tumours and only 10% strong expression (p<0.001, Fisher Exact test). Importantly, in tumours that strongly expressed either protein (n= 72), MYC expression was significantly lower when SOX2 was expressed (p=4e-07, two-sided t test, Fig. 5e). The above data strongly suggest that MYC and EZH2 may function as repressors of SOX2 in preneoplastic lesions and eventually in tumours.
To test whether MYC and EZH2 were required for suppression of SOX2 in the FTE, we utilised a method for culturing non-ciliated cells that were obtained from freshly excised fallopian tubes20 (Fig. 6a). Indeed, depletion of MYC alone, EZH2 alone or in combination in non-ciliated fallopian tube epithelial cells resulted in up to a significant increase in SOX2 protein (Fig.6b) and up to 4.5-fold increase in mRNA expression of SOX2 following 48 h to 72 h of siRNA depletion (A total of 14 repeats on FTE of 14 different individuals, Fig. 6c). In contrast, depletion of PAX8 or factors essential for fallopian tube development such as LIM1 or HOXA9 did not increase the expression of SOX2 (data not shown). Thus, MYC and EZH2 are required for repression of SOX2 expression in fallopian tube epithelial cells. MYC, EZH2 and KDM5B occupy sites in the BB5 region that are mutated in HGSOCs
We next sought to determine whether repressors of SOX2 expression occupy sites in the BB5 region in FTE that are mutated in HGSOC. ChIP sequencing and ChIP quantitative PCR (not shown) on chromatin obtained from FTE both confirmed that, MYC, EZH2 and KDM5B occupied multiple sites in the BB5 region (Fig. 6d). Mapping the identified mutations in our patient cohort, revealed that several of these mutations including the BB5 mutation occurred in regions that were occupied by at least one or more of these proteins (Fig. 6d). This provides strong evidence that mutations in this region may directly contribute to the expansion of cells that strongly express SOX2.
The BB5 mutation marked a novel SOX2 repressor element
We next sought to determine whether or not the identified mutations might interfere with occupancy of SOX2 repressors and, therefore, induce its expression. To provide proof of concept, we focused our mechanistic work on the BB5 mutation. The BB5 nucleotide was the first of a 12-nucleotide DNA sequence (Fig. 7a) that matched the previously reported motif model UW.0169 that was discovered by genomic DNasel footprinting21. At the BB5 nucelotide there is strong preference (14 fold higher frequency) for the wild type C nucleotide over the mutant G nucleotide (observed frequencies C = 0.86 and G = 0.06). We, therefore, tested whether the mutation can alter the intensity or pattern of expression of the reporter citrine fluorescent protein in vivo in chicken embryos. The wild type enhancer controlled the expression of citrine predominantly within the neural tube, the endogenous site of Sox2 expression, both in the head region, at early stages and in the trunk region and in the delaminating neural crest cells (Fig. 7b-d). In contrast, the activity was dramatically decreased in the mutant version in the neural crest but retained in the cell population within the head region at early stages (Fig. 7e). This confirmed that this nucleotide has an important functional role in the regulatory activity or the element and that whether it functioned as a repressor or an enhancer was tissue specific. Importantly, citrine reporter expression overlapped with the expression of endogenous chicken Sox2 in the neural tube cells of 11-somite stage chicken embryo, but not in the neural crest cells which do not express Sox2 (Fig. 7e-f) and this strongly suggested that the element specifically regulated Sox2.
Finally, to conclude that the BB5 regulatory element was required for MYC- and EZH2-mediated repression of SOX2 in mammalian cells we tested whether genetic deletion of the BB5-containing motif model UW.0169 altered the physical interaction of that region with MYC and EZH2 and whether this resulted in altered expression of SOX2 (Fig. 7g).
Similar to the FTE cells, the genome of the human embryonic kidney cells 293 (HEK293) has an area of open chromatin that encompassed the BB5 nucleotide as demonstrated by H3K27Ac chromatin immunoprecipitation (ChIP) and also interacted with MYC, EZH2 and KDM5B (Fig. 7h). Clustered regularly interspaced palindromic repeats (CRISPR) genome deletion of several of the essential nucleotides included in the motif model UW.0169 (Fig. 7g)22 induced a significant reduction in H3K27 acetylation and in the interaction between MYC, EZH2 and KDM5B (Fig. 7h) indicating that this motif was required for maintaining such interaction. Importantly, this deletion resulted in a significant and sustained increase in SOX2 expression in 5 single cell colonies of HEK293 cells for up to 30 days (Fig. 7i and
Supplementary table 7). Thus, the BB5-containing motif UW.0169 is required for repression of SOX2 expression. Finally, we show that in the FTE cells, the MYC is required for KDM5B occupancy as siRNA-mediated depletion of MYC significantly reduced KDM5B occupancy of the BB5 element (Fig. 7j).
We next tested the potential oncogenic role of SOX2 overexpression in FTE cells. We transduced MYC-expressing FTE secretory cells or ovarian cancer SKOv3 cells with SOX2-expression lentiviral constructs. Consistent with our proposed role of SOX2 in FTE cells, SOX2 expression significantly reduced MYC and PAX8 expression in FTE and SKOv3 cells to undetectable levels by immunostaining (Fig. 17a) and occasionally resulted in cilia formation in FTE cells (not shown). To the best of our knowledge, this is the first example demonstrating how SOX2 may repress MYC expression in a human model system. We next examined how SOX2 overexpression may facilitate malignant transformation by performing global RNA-seq analysis. Because of difficulties with the low transduction efficiency of primary-cultured FTE cells and because ectopic SOX2 expression in SKOv3 cells resulted in key expression changes (reduced MYC and PAX8 expression) that were similar to those observed in the FTE, we performed RNA-seq following SOX2 ectopic expression in SKOv3 cells and validated the expression changes of individual genes in FTE cells. As expected, SOX2 ectopic expression resulted in the substantial reduction of MYC and PAX8 expression. Global pathway analysis of differentially expressed genes (P < 0.003, n = 628 genes, Supplementary Table 10) identified a significant enrichment of genes involved in extracellular matrix (ECM) remodelling (13 of the 20 significant pathways were related to ECM remodelling Fig. 17b and Supplementary Table 10). For example, a reduction of many fibronectin (11-fold reduction, P < 0.001) was observed. In addition, a four-fold reduction of E-Cadherin (P < 0.001) was found. In contrast, a four-fold increase in MMP7 (P < 0.001), a matrix metalloproteinase that is known to degrade a broad spectrum of matrix proteins (Supplementary Table 10), was identified. In addition, the re-expression of key soluble oncogenes such as IGF2 (> 4500-fold increase, P < 0.001) that are normally not expressed in either SKOv3 or FTE cells was found (Figure 17c). Thus, high SOX2 expression in FTE cells drives expression changes that are consistent with epithelial-mesenchymal transition (EMT) and therefore may play an important role in promoting the detachment of cancer cells from the FT early in carcinogenesis.
Discussion
Our approach in investigating chemotherapy resistance challenges the established paradigm of comparing groups of patients who are labeled as either chemotherapy resistant or sensitive based on clinical and pathological observations23. Such approach is confounded by heterogeneities in tumours, subjective clinical judgment of evaluating response and vast differences in genetic backgrounds of individuals tested. Instead, by close monitoring of tumour response from single tumours over time it is possible to make direct comparisons between extreme chemotherapy resistant and sensitive locations within the same tumour. This approach significantly reduces the complexity of genomic and clinical heterogeneity. We, therefore, believe that our unique study design has enabled the conclusion that in this tumour, the mutations that result in primary chemotherapy resistance are imprinted within the stem of the tumour rather than being acquired in a subpopulation of cells with private mutations. Such discovery, if common to other tumours, may start a novel paradigm for drug targeting. By learning how individual tumour sites or cells acquired chemotherapy sensitivity, it may be possible to apply novel therapeutic strategies that mimic these processes to drive tumours to become sensitive to chemotherapy.
This work discovered a distal SOX2 40Kb regulatory region that is infrequently altered in the general population, frequently mutated in HGSOCs, highly conserved across vertebrates and harbors several cpg islands. In addition, region has extensive H3K27 acetylation, MYC and EZH2 IP peaks. All these features resemble the recently described super-enhancers24. Such super-enhancers are known to regulate genes with key function in determining cell identity and fate. Our work suggests that in the fallopian tube, SOX2 is repressed by default but only expressed transiently at the interface between non-ciliated and ciliated cells. We propose that in certain tissue types, such as the fallopian tube, super-repressors play a key role in suppression of the expression of stem cell differentiation genes such as SOX2. Similar to super-enhancers, repressor regions are disproportionately targeted for mutations. Further studies are needed to evaluate the significance of these repressor regions in other cancers.
Our work suggests a model where physiological bidirectional conversion exists between two states: SOX2-expression and no expression and that is akin to the previously reported stemcell bidirectional conversion25. This process is perturbed by mutations in repressor elements that result in conversion of MYC/EZH2-expressing cells to SOX2-expressing ciliated cells. This process is similar to oncogene-driven processes26. We propose that for a subset of ovarian cancers, this expansion is followed by gene-coding mutations and transformation followed by differentiation and tumor expansion. These results highlight the importance of repressors in regulating tumor behavior27,28 and provides proof of founder status for cancer mutations in such repressors.
Recent work strongly suggested that the FTE is a common site of HGS pelvic cancers29. Recent mouse work has shown that inducing mutations in TP53, BRCA1/BRCA2, and PTEN in secretory cells in the FTE induces HGSOCs within about 3 months. Secretory cells ubiquitously express PAX8, that is also expressed in HGSOCs and, this supports the notion that such cells are the probable cells of origin of HGSOCs. Our work shows that the clonally expanded SOX2-expressing cells are ciliated. We also show that SOX2 expressing cells take part in forming the p53 signature (Fig. 3c-e). It is possible, therefore, that a fraction of these cells can become TP53 mutated and, therefore, transformed. Acquisition of MYC and EZH2 amplification following transformation can result in repression of SOX2 in the fully developed cancer cells in spite of mutations in repressor elements leading to a seretory-like phenotype in cancer cells. Oncogenic levels of MYC that follow malignant transformation result in wider genome occupancy and repression of subsets of genes. Our data from normal and cancer models support a model of de-repression at tumour initiation followed by transformation and eventually oncogenic repression30. Such model is also supported by the recent work showing that SOX2 was required for development of skin cancer12.
We believe that this is the first report of expansion of SOX2-expressing cells in the FTE of HGSOCs. This finding has important implications as it provides potential tool for screening for HGSOCs. We show examples that such expansion occurred in the FTE of women at high risk of developing HGSOCs. Future work will test the feasibility of quantitative detection of such cells or the underlying repressor mutations as a potential screening tool.
References and notes: 1 Ahmed, A. A. et al. Driver mutations in TP53 are ubiquitous in high grade serous carcinoma of the ovary. J Pathol 221,49-56, doi:10.1002/path.2696 (2010). 2 Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957-959, doi:science. 1229259 [pii] 10.1126/science.1229259 (2013). 3 Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959-961, doi:science. 1230062 [pii] 10.1126/science.1230062 (2013). 4 Vinagre, J. et al. Frequency of TERT promoter mutations in human cancers. Nat Commun 4, 2185, doi:ncomms3185 [pii] 10.1038/ncomms3185 (2013). 5 Integrated genomic analyses of ovarian carcinoma. Nature 474, 609-615, doi:nature10166 [pii] 10.1038/nature10166 (2011). 6 Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994-1007, doi:S0092-8674(12)00527-2 [pii] 10.1016/j.cell.2012.04.023 (2012). 7 Gerlinger, M. et al. Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing. Nat Genet 46, 225-233, doi:ng.2891 [pii] 10.1038/ng.2891 (2014). 8 Schuh, A. et al. Monitoring chronic lymphocytic leukemia progression by whole genome sequencing reveals heterogeneous clonal evolution patterns. Blood 120, 4191-4196, doi:blood-2012-05-433540 [pii] 10.1182/blood-2012-05-433540 (2012). 9 Peters, B. A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487,190-195, doi:nature11236 [pii] 10.1038/nature11236 (2012). 10 McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 28,495-501, doi:nbt. 1630 [pii] 10.1038/nbt.1630 (2010). 11 Takahashi, K. &amp; Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663-676, doi:S0092-8674(06)00976-7 [pii] 10.1016/j.cell.2006.07.024 (2006). 12 Boumahdi, S. et al. SOX2 controls tumour initiation and cancer stem-cell functions in squamous-cell carcinoma. Nature 511, 246-250, doi:nature13305 [pii] 10.1038/nature13305 (2014). 13 Crum, C. P. Intercepting pelvic cancer in the distal fallopian tube: theories and realities. MolOncolZ, 165-170, doi:S1574-7891 (09)00020-9 [pii] 10.1016/j.molonc.2009.01.004 (2009). 14 Lee, Y. et al. A candidate precursor to serous carcinoma that originates in the distal fallopian tube. J Pathol211, 26-35, doi:10.1002/path.2091 (2007). 15 Kurman, R. J. &amp; Shih le, M. Molecular pathogenesis and extraovarian origin of epithelial ovarian cancer-shifting the paradigm. Hum Pathol 42, 918-931, doi:S0046-8177(11)00137-7 [pii] 10.1016/j.humpath.2011.03.003 (2011). 16 Karst, A. M., Levanon, K. &amp; Drapkin, R. Modeling high-grade serous ovarian carcinogenesis from the fallopian tube. Proc Natl Acad Sci USA 108, 7547-7552, doi:1017300108 [pii] 10.1073/pnas.1017300108 (2011). 17 Hnisz, D. et at. Super-enhancers in the control of cell identity and disease. Cell 155, 934-947, doi:S0092-8674(13)01227-0 [pii] 10.1016/j.cell.2013.09.053 (2013). 18 Wong, P. P. et al. Histone demethylase KDM5B collaborates with TFAP2C and Myc to repress the cell cycle inhibitor p21 (cip) (CDKN1A). Mol Cell Biol 32, 1633-1644, doi:MCB.06373-11 [pii] 10.1128/MCB.06373-11 (2012). 19 Vire, E. et al. The Polycomb group protein EZH2 directly controls DNA methylation. Nature 439, 871-874, doi:nature04431 [pii] 10.1038/nature04431 (2006). 20 Karst, A. M. &amp; Drapkin, R. Primary culture and immortalization of human fallopian tube secretory epithelial cells. NatProtoc7,1755-1764, doi:nprot.2012.097 [pii] 10.1038/nprot.2012.097 (2012). 21 Neph, S. etal. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83-90, doi:nature11212 [pii] 10.1038/nature11212 (2012). 22 Ran, F. A. et al. Genome engineering using the CRISPR-Cas9 system. Nat Protoc 8, 2281-2308, doi:nprot.2013.143 [pii] 10.1038/nprot.2013.143 (2013). 23 Ahmed, A. A. et al. The extracellular matrix protein TGFBI induces microtubule stabilization and sensitizes ovarian cancers to paclitaxel. Cancer Cell 12, 514-527, doi:S1535-6108(07)00338-8 [pii] 10.1016/j.ccr.2007.11.014 (2007). 24 Whyte, W. A. et al. Master transcription factors and mediator establish superenhancers at key cell identity genes. Cell 153, 307-319, doi:S0092-8674(13)00392-9 [pii] 10.1016/j.cell.2013.03.035 (2013). 25 Tata, P. R. et al. Dedifferentiation of committed epithelial cells into stem cells in vivo. Nature 503, 218-223, doi:nature12777 [pii] 10.1038/nature12777 (2013). 26 Schwitalla, S. et al. Intestinal tumorigenesis initiated by dedifferentiation and acquisition of stem-cell-like properties. Cell 152, 25-38, doi:S0092-8674(12)01499-7 [pii] 10.1016/j.cell.2012.12.012 (2013). 27 Loven, J. et al. Selective inhibition of tumor oncogenes by disruption of superenhancers. Cell 153, 320-334, doi:S0092-8674(13)00393-0 [pii] 10.1016/j.cell.2013.03.036 (2013). 28 Chapuy, B. et al. Discovery and characterization of super-enhancer-associated dependencies in diffuse large B cell lymphoma. Cancer Cell 24, 777-790, doi:S1535-6108(13)00491-1 [pii] 10.1016/j.ccr.2013.11.003 (2013). 29 Perets, R. et al. Transformation of the fallopian tube secretory epithelium leads to high-grade serous ovarian cancer in Brca;Tp53;Pten models. Cancer Cell 24, 751-765, doi:S1535-6108(13)00459-5 [pii] 10.1016/j.ccr.2013.10.013 (2013). 30 Walz, S. et al. Activation and repression by oncogenic MYC shape tumour-specific gene expression profiles. Nature 511,483-487, doi:nature13473 [pii] 10.1038/nature13473 (2014). 31 Dean, F. B. et al. Comprehensive human genome amplification using multiple displacement amplification. Proc Natl Acad Sci USA 99, 5261-5266, doi:10.1073/pnas.082089499 99/8/5261 [pii] (2002). 32 Drmanac, R. et al. Human genome sequencing using unchained base reads on selfassembling DNA nanoarrays. Science 327, 78-81, doi:1181498 [pii] 10.1126/science.1181498 (2010). 33 Carnevali, P. etal. Computational techniques for human genome resequencing using mated gapped reads. J Comput Biol 19, 279-292, doi:10.1089/cmb.2011.0201 (2012). 34 Yau, C. OncoSNP-SEQ: a statistical approach for the identification of somatic copy number alterations from next-generation sequencing of cancer genomes. Bioinformatics 29, 2482-2484, doi:btt416 [pii] 10.1093/bioinformatics/btt416 (2013). 35 Yau, C. et al. A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data. Genome S/o/11, R92, doi:gb-2010-11 -9-r92 [pii] 10.1186/gb-2010-11 -9-r92 (2010). 36 Lunter, G. &amp; Goodson, M. Stampy: a statistical algorithm for sensitive and fast mapping of lllumina sequence reads. Genome Res 21, 936-939, doi:gr.111120.110 [pii] 10.1101/gr.111120.110 (2011). 37 Abecasis, G. R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491,56-65, doi:nature11632 [pii] 10.1038/nature11632 (2012). 38 Bernstein, B. E. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74, doi:nature11247 [pii] 10.1038/nature11247 (2012). 39 Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75-82, doi:nature11232 [pii] 10.1038/nature11232 (2012). 40 Yip, K. Y. et al. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol 13, R48, doi:gb-2012-13-9-r48 [pii] 10.1186/gb-2012-13-9-r48 (2012). 41 Hamburger, V. &amp; Hamilton, H. L. A series of normal stages in the development of the chick embryo. 1951. DevDyn 195, 231-272, doi:10.1002/aja.1001950404 (1992). 42 Sauka-Spengler, T. &amp; Barembaum, M. Gain- and loss-of-function approaches in the chick embryo. Methods Cell Biol 87, 237-256, doi:10.1016/S0091-679X(08)00212-4 (2008). 43 Betancur, P., Bronner-Fraser, M. &amp; Sauka-Spengler, T. Genomic code for Sox10 activation reveals a key regulatory enhancer for cranial neural crest. Proc Natl Acad SciUSA 107, 3570-3575, doi:10.1073/pnas.0906596107 (2010). 44 Simoes-Costa, M. S., McKeown, S. J., Tan-Cabugao, J., Sauka-Spengler, T. &amp; Bronner, Μ. E. Dynamic and Differential Regulation of Stem Cell Factor FoxD3 in the Neural Crest Is Encrypted in the Genome. PLoS Genet 8, e1003142, doi:10.1371 /journal.pgen.1003142 (2012). 45 Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823, doi:science.1231143 [pii] 10.1126/science.1231143 (2013). 46 Ahmed, A. A. et al. SIK2 is a centrosome kinase required for bipolar mitotic spindle formation that provides a potential target for therapy in ovarian cancer. Cancer Cell 18, 109-121, doi:S1535-6108(10)00274-6 [pii] 10.1016/j.ccr.2010.06.018 (2010).
Supplementary Materials:
Material and Methods:
Translational studies Ethical approval
Tumor islets for whole genome sequencing were obtained from case 11152 who participated in the prospective biomarker validation study; Gynaecological Oncology Targeted Therapy Study 01 (GO Target 01) under research ethics approval number 11/SC/0014. Targeted sequencing was performed on samples from patients who participated in the same study and patients who participated in the prospective Oxford Ovarian Cancer Predict Chemotherapy Response Trial (OXO-PCR-01), under research ethics approval number 12/SC/0404. Immunohistochemistry (IHC) was performed on tissue samples obtained retrospectively from anonymous patients under the GO-Target-01 ethics approval.
Clinical history and sample collection of Patient 11152
Patient 11152 presented with radiological evidence of at least stage MIC ovarian cancer with evidence of multiple omental and peritoneal nodules, subcapsular splenic nodules and left paracardiac lymph node enlargement and elevated CA125 tumor marker. The patient had a diagnostic laparoscopy and biopsy to confirm the diagnosis of HGSOC. On the day of the procedure, tissue samples were retrieved, split in halves and either saved in cryovals and immersed in dry ice within minutes from obtaining the biopsy in theatre or formalin fixed for standard histological diagnosis. Research samples were stored in -80 °C. In addition, a blood sample was obtained in theatre and saved in sterile EDTA-collection tubes. The patient received 3 cycles of paclitaxel and carboplatin and had near complete response. Following 10 weeks from the first laparoscopy, the patient underwent a diagnostic laparoscopy to evaluate chemotherapy response, prior to proceeding, in the same session, to standard debulking surgery. Guided by the first videolaparoscopy, samples were obtained from the same sites from which the initial biopsies were taken.
Precautions for tissue handling to diminish the risk of cross contamination of DNA
Samples were collected from -80 °C to a box of dry ice and transported within the same building to the cryostat room. Individual cryovials were obtained from the dry-ice box and immediately placed within a pre-cooled cryostat stage at -25 °C. The cryostat (CryoStar 70X, Thermo Scientific) was not previously used for routine diagnostic or research work. The cryostat stage and dissection board were cleaned by ethanol prior to use for individual samples and gloves were changed before cutting each new tissue biopsy. For each sample, a single dissection blade (MB DynaSharp Microtome Blade, Thermo Scientific) and a single sterile forceps (that was cleaned, individually wrapped and autoclaved prior to use) were used. Samples were obtained from the cryovials and placed on a sterile dish and cut using disposal single-use blades to obtain a piece of tissue for processing. These tissue pieces were then placed on individual new dissection discs for snap freezing in optimal cutting temperature (OCT) compound (NEG-50, Richard-Allan Scientific) prior to microtome cutting.
Section processing for microdissection
The first tissue section was mounted onto regular glass slides (SuperFrost Plus, VWR International) for hematoxylin (Hematoxylin solution, Gill No. 3, Sigma) and eosin (Eosin Y solution, Sigma) staining (H&amp;E), according to manufacturer’s instructions, followed by 6 to 10 sequential tissue sections at 6 μιτι thickness onto polyethylene naphthalate membrane (PEN) glass slides (MembraneSlide 1.0 PEN, Zeiss) that were pre-exposed to UV light for 30 min. Slides were immediately stored at -80 °C. Nuclease-free technique was used throughout the procedure and buffers and alcohol solutions were cooled to 4 °C and used fresh each time. Each H&amp;E slide was reviewed by a Gynaecological Oncology pathologist (SD) to confirm the presence of cancer cells and delineate their location. PEN slides were dipped in 50% ethanol for fixation and rinsed in H20 to remove excess OCT compound. The slides were stained with cresyl violet (Sigma Aldrich) at a concentration of 0.1% (weight/volume) in 50% ethanol for 15 seconds, rinsed in 50% ethanol and immediately used for microdissection. For formalin-fixed, paraffin embedded (FFPE) material, 6 pm sections were cut on activated PEN slides and dried at 56 °C overnight, then dewaxed in Xylene (Sigma) and rehydrated through graded alcohols to water, then briefly dipped in 1% methyl green (Sigma), washed in H20. The slides were dried at 37 °C for 1 hour and then used for microdissection. Laser capture microdissection was performed on a PALM Laser Microdissection System (Zeiss) and the cut tissue was catapulted on 200 μΙ membrane caps (AdhesiveCap 200 opaque, Zeiss, Jena, Germany) and immediately stored on dry ice. Images of empty caps as well as target area in 5x, 10x and 20x magnification were obtained prior to microdissection for documentation. To maintain DNA integrity, slides were kept on dry ice until microdissection and microdissection was performed for no longer than 45 minutes per slide. Caps were stored at -80 °C until sequencing. DNA extraction
For DNA extraction from frozen tissues 10 to 20 scrolls of tissue were cut at 60 μιτι thickness using a cryostat as described above. For FFPE samples, 10 to 20 scrolls of 20 μιτι thick sections were dewaxed with Xylene and washed with 100% ethanol. DNA was extracted using the DNeasy blood and tissue kit (Qiagen) according to manufacturers instructions. DNA concentrations were quantified using the PicoGreen (Quant-iT™ PicoGreen® dsDNA Assay Kit, Life technologies), A260/280 ratio and absorption spectra were generated using a spectrophotometer (Nanodrop ND-1000, Thermo Scientific) and broad range Qubit system (Invitrogen) and quality was checked using a 1% agarose E-gel (Invitrogen). For macrodissection of the p53 signature from FFPE immuno-stained sections the slides were incubated overnight in Xylene at 37 °C. The coverslip was lifted and the slide soaked in
Xylene for 30 min, then washed in gradient ethanol. The tissue was macrodissected using a scalpel blade tip. DNA was extracted with the Arcturus ® PicoPure ® DNA isolation kit (Life technologies) and amplified using the REPLIg mini kit (Qiagen) according to manufacturer’s instructions. DNA was quantified as described above.
Sequencing LFR sequencing and whole genome sequencing were performed as previously described31,32. A summary of the clinical characteristics of patients for whom targeted sequencing was performed is presented in Supplementary Table 3. For sequencing the 2 Mb flanking the SOX2 gene, the online NimbleDesign tool (http://www.nimblegen.com/products/nimbledesign/index.html) was used to design capture primers (Roche). The TruSeq® DNA HT Sample Preparation Kit (lllumina) was used to allow for multiplexing and captured using the Seq Cap EZ Choice Library (Roche). Both kits were used according to manufacturer's instructions. Library quality control was carried out using the broad range Qubit system (Invitrogen) and the 2200 TapeStation (Agilent). Sequencing was carried out on a HiSeq2500 using TruSeq® Rapid SBS 100 bp paired end sequencing. To increase coverage, the libraries for the blood sample and the microdissected tubal epithelium of case 11152 were recaptured and sequenced using v2 MiSeq chemistry, 100 bp paired end.
For DNA sequencing using dye-terminator method, TP53 exon 8 was amplified and sequenced using primers TP53-forward ‘GGGTGCAGTTATGCCTCAGATT’ and TP53-reverse ‘CGGCATTTTGAGTGTTAGACTGG’ as previously described1. SOX2 BB5 was amplified and sequenced using the BBS-forward ‘CACCCATGTGAATCATCTCG’ and BBS-reverse ‘ACCAGGTGTCCGAGAGTACG’ primers. PCR was performed using the high fidelity DNA Phusion polymerase (NEB) as per manufacturer’s instructions. Sequencing was performed for the rare variants identified in patients using the primers listed in Supplementary Table 8.
Digital PCR was performed on duplicate samples. Primers 5833217_F; 5'- ACCT ACT AG ACCCCAGGCAAG-3' and 5833217_R; 5'-GGCGCAGGAGGAGACC-3' were used to amplify a 60 bp amplicon containing the BB5 nucleotide and either detected using 5833217_V; 5-CCTGGGACCCAAACC-3' VIC-labeled probe for wild type or 5833217_M; 5'-CTGGCACCCAAACC-3' FAM-labeled probe for mutant amplicons (TaqMan ® SNP Genotyping Assays, custom design, Roche Molecular Systems). TP53 mutation was quantified using primers 22410689_F; 5'-CTGTGCGCCGGTCTCT-3' and 22410689_R; 5'-TGGGACGGAACAGCTTTGAG-3' to amplify a 64 bp amplicon and detected using 22410689_V 5'-TGCGTGTTTGTGCCTG-3' VIC-tagged probe for wild type and 22410689_M; 5'-TGCGTGTTTTTGCCTG-3' FAM-tagged probe for mutant amplicons. Reactions were prepared using droplet digital PCR Super Mix (BioRad) and standard PCR performed according to manufacturer’s instructions. Amplification events were detected with a digital PCR plate reader (QX100 Droplet Reader, BioRad) and data was analyzed using the QuantaSoft Software (Version 1.3.2.0, BioRad). Average droplet count was 11,728 per sample. Samples with less than 7,000 droplets were excluded from the analysis.
Statistical analysis Whole genome sequencing
Whole genome sequencing was performed in 3 independent rounds and results from cancer genomes that had more than 70% call rate (more than 70% of the total number of genome bases called) were included in the core mutation and LOH analysis (Supplementary Table 1) whilst all genomes were used for super-genome construction and whole genome copy number analysis. Reads were mapped to the reference genome (GRCh37) and variants were called by local de novo assembly as previously described 33. A table of variants across the various tumor and normal genome assemblies was generated and filtered using the Complete Genomics cgatools program and custom scripts.
Super-genome construction from well count data
Super-genomes were constructed by pooling LFR data from all samples belonging to the same tumor location. This was accomplished by accumulating well counts corresponding to each variant at loci where somatic mutations were identified. Well counts were used instead of sequence read counts as these were less prone to amplification biases. Only loci where, at least, three tumor samples showed a variant call were considered.
Site-specific variant identification
We defined a set of strict criteria to determine if a variant was specific to a tumor site in order to minimize false positives. In order for a variant locus to be considered, there must be a high quality genotype call on both alleles in at least 10/33 tumor genomes at that genomic location. In addition, at least 3 of the 10 tumor genotypes must contain a variant and each genotype call must be supported by at least 3 well counts. A variant is said to belong to any particular tumor site if a variant is identified in at least 3 tumor genomes belonging to that site and not called at any other tumor site.
Copy number and Loss of heterozygosity (LOH) analysis
Whole genome copy number analysis of the super-genomes was performed using OncoSNP-SEQ34. Copy number analysis was restricted to germline SNP sites for robustness. For LOH analysis, we developed a statistical algorithm (ADO-LOH) for this analysis (see Supplementary Note) to account for allele dropouts that occurs during DNA amplification. Copy number and LOH analysis of The Cancer Genome Atlas data was performed using OncoSNP v1.3 as previously described 35 and analysed for effect on survival using Kaplan-Meier analysis with Log Rank Significance Test using the implementation provided by the survival package in the R Statistical Computing package.
Performing gene ontology enrichment analysis
The functional prediction of c/'s-regulatory regions was performed using the Genomic Region Enrichment of Annotation Tool (GREAT) as previously described 10. In brief, GREAT assigns regulatory domains for each gene that consists of a basal domain (5 Kb upstream and 1 Kb downstream of transcription start site [TSS]) plus extension of up to 1 Mb, but not beyond 1 Mb, in both directions to the nearest gene basal domain. The enrichment for a particular biological process gene ontology (which consists of a number of genes) was computed by obtaining the ratio of the fraction of foreground (FG) variants (i.e. the ancestor variants) that mapped to genes of a gene ontology x to the fraction of background (BG) variants (e.g. progeny variants) that mapped to the same gene ontology. To be regarded as a gene ontology hit, we required that a gene ontology had a false discovery rate “Q value” of less than 0.01 and a number of genes supporting a particular ontology of more than 10. The BG was defined as either the progeny variants (variants present in 2 or more tumor sites), the whole genome or the number of germline variants as indicated in the text. Permutation analysis was based on the assumption that random samples from the background of equal size to the FG should not give higher enrichment the one observed for the true FG. To test this, we obtained 10,000 samples of 750 variants from the BG and computed the enrichment for each and counted the number of times in which that enrichment was higher than the one observed by the true FG and presented the result (p-value) as a the fraction of the count from the total number of random samples. The number of genes that GREAT assigned to the gene ontology “stem cell differentiation” was 73 and this was the basis of the analysis described above. We repeated the above analysis using the total number of human genes (283) assigned to the same gene ontology “G0:0048863” at the Gene Ontology data base (http://amiao.aeneontoloav.ora/cai-bin/amiao/ao.cai) and by using a subset of genes (n = 81) that had an experimentally verified link to stem cell differentiation by selecting the human genes that satisfied any of the following terms: IDA; Inferred from direct assay, IEP; Inferred from expression pattern, IGI; Inferred from genetic interactions, IMP; Inferred from mutant phenotype.
Analysis of targeted sequencing data
Reads were mapped to the reference genome (GRCh37) using STAMPY software36 and variants were identified using an in-house developed software, PLATYPUS (Rimmer A, Mathieson /, Lunter G, McVean G, (2012) Platypus: An integrated Variant Caller /www.well.ox.ac.uk/platvpus/). Downstream analysis was performed using in-house developed scripts.
There were a total of 37,291 variants identified by sequencing. Further analysis focused on single nucleotide substitutions in cancer samples (n = 18,456 in 33 samples). To identify high quality rare variants, the analysis was restricted to variants that were present in less than 5 samples, were not called in the 1000 genomes project37, and had a high quality score (i.e. flagged as “pass” or “allele bias” by PLATYPUS). Only 861 variants met these criteria and these are shown in Supplementary Table 4. The 861 variants were annotated using data from the Encyclopedia of DNA Elements (ENCODE)38 to identify those that were within regulatory regions as reported by digital genomic footprint21 downloaded from (ftp://ftP.ebi.ac.uk/pub/databases/ensembl/encode/intearation data ian2011/bvDataType/foo tprints//ian2011/). by DNAsel hypersensitivity39 downloaded from (http://hadownload.cse.ucsc.edu/aoldenPath/ha19/encodeDCC/waEncodeReaDnaseCluster ed/) or by ChIP-Seq40 experiments downloaded from (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRegTfbsClustered/ )
In order to assess the degree of local enrichment of variants we first computed the global rate of variant occurrence over the 2 Mb region (113 variants/2,000,000 bp). We then considered a series of 40 kb overlapping windows spaced at 100 bp intervals spanning the 2 Mb. For each window, we counted the number of variants observed in the tumors and computed the probability of observing (at least) this many variants under the null hypothesis of uniform variant occurrence. We assumed that, under the null hypothesis, the number of observed variants follows a Poisson distribution with a rate parameter given by the global rate over the region.
Motif analysis for the BB5 variant
The motif UW.Motif.0169 overlapping the BB5 variant was discovered from digital footprints data. The logo plot in Fig. 2g was generated using the webl_ogo3 software
Fallopian tubes primary culture
Patients scheduled to undergo surgical procedures provided written consent, prior to surgery, agreeing to participate in the study. The infundibular region of the fallopian tube was isolated, dissected and opened to reveal the lumen. Fallopian tubes were incubated in 15ml_ conical tubes containing 0.5% trypsin and 0.1% DNase II in MEM for 1h at 372C, with shaking. The supernatant, containing the epithelial cells, was removed and mixed with 10% FBS in DMEM. Cells were centrifuged and plated in 10% DMEM. The purity of epithelial cells was checked by immunofluorescence. Optional purification using CD326 microbeads (Miltenyi Biotech) was if further purification was required.
Cloning, mutant generation and chicken embryo transfection
The 1 Kb regions flanking the identified SNPs were cloned from Human genomic blood DNA by polymerase chain reaction (PCR) amplification using Phusion high fidelity polymerase (NEB) according to the manufacturer’s instructions using the primer sets in Supplementary Table 9 and cloned into a ptk citrine - BsmBI vector. Restriction digestion of the PCR products and the vector was performed using BsmBI enzyme (NEB) and the digested product was inserted into the vector using a T4 ligase (NEB). Point mutants were generated using QuikChange® II Site-Directed Mutagenesis Kit (Stratagene) as per manufacturer’s instructions using the citrine vector containing each region and the primer sets SOX2_BB5mut_F; ‘ACCTGGGCCTGGCACCCAAACCCTT’ and SOX2_BB5mut_R ‘AAGGGTTTGGGTGCCAGGCCCAGGT’. DNA sequences of all cloned PCR products were verified with direct sequencing.
Fertilized chicken eggs (Henry Stewart &amp; Co. Ltd, Louth, UK) were incubated at 37-38 2C for approximately 20-24 hours prior to electroporation. The entire epiblast of stage 4 chicken embryos (staged according to Hamburger and Hamilton41) was electroporated with enhancer constructs, cultured using modified New culture42 and analyzed as previously described43,44. Immunofluorescence for detection of endogenous Sox2 expression and co-localisation with citrine expression was carried out as previously described43,44 using anti-Sox2 antibody (Abeam) or anti-GFP antibody (Millipore) and detected using alexa-488 conjugated and alexa-568 conjugated secondary antibodies (Invitrogen) and observed using an inverted confocal microscope (LSA 510 META, Zeiss).
Chromatin immunoprecipitation (ChIP) assay
ChIP was performed on HEK293T cells using a commercially available ChIP-IT express enzymatic kit (Active Motif) according to manufacturer’s instructions and a rabbit anti-H3K27ac antibody (Abeam) or rabbit IgG (Millipore) for immunoprecipitation. Quantitative (real time) PCR was performed using the SYBR Green Mix (Applied Biosystems) according to the manufacturer's protocols. An ABI Prism 7000 Sequence detection system (Applied Biosystems) was used for amplification. All experiments contained: 1) no-template controls, 2) no-antibody controls and 3) input samples diluted 1:100 for every primer set used. ChIP was performed on FTE and/or HEK293T cells using H3K27ac (C15410174 lot: A.7071-001P; Diagenode), c-Myc (#9402; lot:8; Cell Signaling), KDM5B/PLU1/Jarid1B (ab45301; lot: 275016; Abeam) and EZH2 (C15410039; lot: 003; Diagenode) antibodies. For each preparation of nuclei, 1x105 cells were cross-linked by adding formaldehyde to a final concentration of 1% and nutated for 8 min at room temperature. Glycine (final concentration of 125 mM) was added and solution was incubated by nutation for 5 min at room temperature. The cross-linked cells were washed three times and cell pellets were snap frozen in liquid nitrogen. The pellets were resuspended in isotonic buffer and nuclei isolated using Dounce homogenizer, washed, and lysed in SDS lysis buffer (1% SDS, 10mMEDTA, 50 mM Tris-HCI, pH 8.0) for 10 minutes to 1 h. The lysate was then diluted 3-fold with ChIP dilution buffer (0.01% SDS, 1.2 mM EDTA, 16.7mMTris-HCI, pH 8.0, 167mMNaCI, ImMDTT, 0.4 mM PMSF, and protease inhibitors) and the chromatin (420μΙ_) was sonicated using Misonix 4000 sonicator at the following settings: Amp 16, 10 consecutive cycles of 10-s sonication each with 30-s pause in between and subsequently Amp 8, 8 consecutive cycles of 30-s sonication each with 1 min pause in between. Triton X was added to the sonicated material to a final concentration of ~1%, chromatin was cleared by centrifugation, diluted three to four times with ChIP Dilution Buffer with 1.1% Triton X-100 and was distributed between two to three antibody/bead complexes (400-μΙ_ each) and incubated overnight at 4 °C. Fifty microliters of the chromatin preparation was conserved at -80 °C as the input fraction. Antibody/magnetic bead were prepared as per Young protocol (http://openwetware.org/wiki/ChlP). Postimmunoprecipitation washes were performed using RIPA wash buffer (50mMHepes-KOH,pH 7.6, 500 mM LiCI, 1mM EDTA, 1% Nonidet P-40, 0.7% Na-Deoxycholate). The complexes were then washed with Tris-EDTA/NaCI (10mM Tris-HCI, 1 mM EDTA pH 8.0, 50 mM NaCI) for 5 min and transferred to a new chilled tube before last separation. The chromatin was eluted in elution buffer (1% SDS, 10mM EDTA, 50 mM Tris-HCI, pH 8.0) and cross-link reversed overnight by incubation at 65°C. The samples were consecutively treated with RNase A (0.2mg/rml_) and then Proteinase K (0.2 mg/mL), extracted with phenol/chloroform/isoamyl alcohol, precipitated, and resuspended in 40 μΙ_ of Dnase/Rnase free distilled water. Sequencing on ChIP material was performed as described above. Realtime PCR reactions were performed in a 96-well plate ABI7000 qPCR machine.
Reactions were set up using SybrGreen (Applied Biosystems), according to the manufacturer's protocols and 1 pl_ of each ChIP reaction or 1:100 to 200 dilution of input fraction. The AACt method was used for quantification and calculations performed according to ChIP-qPCR Data Analysis instructions (Supper-Array, Bioscience Corporation). The primers presented in Fig. 7h corresponding to the BB5 region were: 5833217_F 5'-ACCT ACT AG ACCCCAGGCAAG-3' and 5833217_R 5'-GGCGCAGGAGGAGACC-3'. CRISPR Vector construction
Vector px330 was used as previously described45. A pair of oligonucleotides (CRISPR-BB5-F; 5'-CACCGAGGGTTTGGGTCCCAGGCCC-3' and CRISPR-BB5-R’; 5'-AAACGGGCCTGGGACCCAAACCCTC-3' encompassing the BB5 nucleotide (underlined) and extending up to a protospacer adjacent motif [AGG] (not shown) were annealed, phosphorylated, and ligated to the linearized vector.
Cell culture and transfection HEK293 cells (ATCC) were maintained in DMEM (Invitrogen) supplemented with 10% fetal bovine serum and 100 U/mL penicillin/streptomycin and incubated at 37 2C and 5% C02. HEK293 cells were transfected with the construct px330-BB5 using FUGENE HD (Promega) according to the manufacturer’s instructions. After 2 weeks, the DNA for each clone was extracted using DNeasy blood and tissue kit (Qiagen) and the region flanking the BB5 nucleotide was amplified using the following pair of primers: SH-BB5-01-F; 5'-T CC AAT AT GAG AG ATAAG AGCAC-3' and SH-BB5-01-R; 5- GCTGAAAAGACCAAACTTAAAAC-3'. Deletion in the targeted region of each HEK293 clone was tested by standard sequencing as described above (see Supplementary Table 1).
Western blot
Western blots were performed as previously described46. Primary antibody against SOX2 (D6D9 XP, Cell Signaling Technology) and GAPDH (Proteintech Europe) and fluorescent anti-rabbit and anti-mouse secondary antibodies (LI-COR) were used. Bands were quantified and normalized to GAPDH using an Odyssey Licor system (LI-COR).
Real time PCR
Total RNA was extracted using RNeasy mini kit and reversed transcribed using Taqman Reverse transcription kit (Applied Biosystems). Quantitative real-time PCR was performed as previously described46. The following primers were used to measure SOX2 expression; SH-BB5-02-F; 5'-GCCGAGTGGAAACTTTTGTCG-3' and SH-BB5-02-R; 5- GGCAGCGT GT ACTT AT CCTTCT-3'.
Immunohistochemistry A summary of patients for whom SOX2 IHC was performed is presented in Supplementary Table 6. Tissue sections of 4 pm thickness were cut from FFPE tumor or control samples. Automated staining was carried out with the Leica Bond Max autostainer (Leica Microsystems). In short, antigen retrieval at 100 QC for 20 min was followed by primary antibody incubation with the rabbit anti-SOX2 (D6D9 XP Cell Signaling Technology) or IgG control for 1 hour then detection using the BOND™ Polymer Refine Detection System (DS9800, Leica Biosystems) as per manufacturer’s instructions. Stained slides were scanned at 20x and 40x magnification on the Aperio (Aperio). The ImageScope software (v11.2.0.780, Aperio) was used for quantification of nuclear staining. For scoring of SOX2 positivity in fallopian tubes, only the tubal epithelium was marked by using the “negative pen tool” to exclude stroma. The marked epithelium were analyzed with the program algorithm “nuclear v9” which scored the staining of all nuclei within the marked area and assigned scores of 0 for not detectable signal, +1 for weak staining, +2 for moderate staining and +3 for strong staining. Nuclear positivity was confirmed on selected areas using the “deconvolution” algorithm to subtract SOX2 from underlying hematoxylin staining. Each patient group was age matched and the diagnosis and clinical details are provided in Supplementary Table 5. For scoring of SOX2 expression in HGSOCs tumor foci were marked and benign tissue excluded and the procedure completed as described above. Immunohistochemistry for other proteins was conducted in a similar manner to the method described above using anti-p53 (Dako) and anft-WT1 (Leica) antibodies, an?/-TUBB4 (Sigma), anti-MYC (abeam), anti-EZH2 (Leica).
Supplementary Tables
Supplementary Table 1:
Description of genomes analysed in this work. TP53 status refers to the status of a single nucleotide at chromosome 17, position 7577114. Wild type nucleotide at that position is C and the mutant is A.
Supplementary Table 2:
The gene symbols that correspond to the identified 5 significantly enriched ontologies are presented along with the distance in bases from the mapped mutation to the gene’s transcription start site.
Supplementary Table 3:
The clinical details of patients who donated samples used for targeted sequencing Supplementary Table 4:
The 861 rare variants identified by targeted sequencing of samples from 33 patients are presented.
Supplementary Table 5:
The coordinates of the somatic mutations identified at the 40 Kb region flanking SOX2 in HGSOCs.
Supplementary Table 6:
The clinical details of patients who donated the samples used for SOX2 immunohistochemistry are presented.
Supplementary Table 7:
List of transfections and summary results for CRISPR experiments. HEK293T cells were transfected on the indicated dates. Two days later single cells were plated in 96-well plates and harvested for DNA collection on the indicated dates. Shown is the number of positive clones carrying bi-allelic mutations (nominated as homozygous), the number of clones carrying mono-allelic mutation (nominated as heterozygous) and clones with no deletions (nominated as negative) are reported in the table. Clones that had deletions within the motif model UW.0169 (see Fig. 2g) are displayed in bold. Also shown is the location of beginning and end of the deletion on chromosome 3 and the number of nucleotides deleted in each clone. SOX2 expression was analysed by western blots and realtime PCR and the fold increase in SOX2 expression using the two methods is shown. ** Unable to analyse the exact number of nucleotides or the exact site of deletion because of complex sequencing traces.
Supplementary Table 8:
Primers used for sequencing individual variants in patients with HGSOCs.
Supplementary Table 9:
Primers used for cloning of potential enhancers into a ptk citrine - BsmBI vector. Supplementary Table 10:
Global pathway analysis of differentially expressed genes.
Supplementary Table 1
Supplementary Table 2
Supplementary Table 3
Supplementary Table 4
Supplementary Table 5
Supplementary Table 6
Supplementary Table 7
Supplementary Table 8
Supplementary Table 9
Supplementary Table 10
The present invention is not to be limited in scope by the specific embodiments described herein. Indeed, various modifications of the invention in addition to those described herein will become apparent to those skilled in the art from the foregoing description and accompanying figures. Such modifications are intended to fall within the scope of the appended claims. Moreover, all embodiments described herein are considered to be broadly applicable and combinable with any and all other consistent embodiments, as appropriate.
Various publications are cited herein, the disclosures of which are incorporated by reference in their entireties.

Claims (70)

Claims:
1. A method for predicting ovarian cancer or identifying an increased risk of developing ovarian cancer in a subject comprising determining the expression level of one or more biomarkers from Table A wherein the determined expression level is used to predict ovarian cancer or identify an increased risk of developing ovarian cancer.
2. A method for diagnosing ovarian cancer in a subject comprising: determining the expression level of one or more biomarkers from Table A wherein the determined expression level is used to diagnose ovarian cancer in the subject.
3. The method of claim 1 or 2 wherein the expression level is determined in fallopian tube tissue and/or fallopian tube cells and/or material derived therefrom.
4. The method of claim 3 wherein the fallopian tube cells comprise or are ciliated fallopian tube cells.
5. The method of any previous claim wherein the expression level is determined in a sample obtained from the subject.
6. The method of any previous claim further comprising obtaining a sample from the subject.
7. The method of any previous claim comprising comparing the expression level to a reference value or to a control.
8. The method of claim 7 wherein the control represents the level of one or more biomarkers from Table A in a comparable sample from a subject with no ovarian cancer or no increased risk of developing ovarian cancer.
9. The method of claim 7 wherein the reference value represents a threshold level of expression set by determining the level of expression at a first time point.
10. The method of any of claims 2 to 9 wherein an increased expression level of one or more of genes 1 to 338 from Table A indicates that the subject has ovarian cancer.
11. The method of any of claims 1 or 3 to 9 wherein an increased expression level of one or more of genes 1 to 338 from Table A predicts ovarian cancer or indicates that the subject has an increased risk of developing ovarian cancer.
12. The method of any of claims 2 to 9 wherein a decreased expression level of one or more of genes 339 to 628 from Table A indicates that the subject has ovarian cancer.
13. The method of any of claims 1 or 3 to 9 wherein a decreased expression level of one or more of genes 339 to 628 from Table A predicts ovarian cancer or indicates that the subject has an increased risk of developing ovarian cancer.
14. The method of any of claims 1 or 3 to 13 wherein the subject with a prediction of ovarian cancer or an increased risk of developing ovarian cancer has one or more pre-neoplastic lesions.
15. The method of any of claims 5 to 14 wherein the sample is obtained using a tool for acquiring fallopian tube cells in vivo by exfoliative cytology.
16. The method of claim 15 wherein the tool is a cytology brush.
17. The method of any of claims 5 to 14 wherein the sample is obtained by biopsy.
18. The method of any of claims 5 to 14 wherein the sample is obtained by washing out the lumen of the fallopian tube(s).
19. The method of any previous claim wherein the expression level is determined at the level of protein or RNA.
20. The method of any previous claim wherein the expression level is determined by immunohistochemistry, western blot, ELISA or flow cytometry.
21. The method of any previous claim wherein the expression level is determined by a nucleic acid amplification technique, optionally quantitative PCR.
22. The method of any of claims 1 to 4 or 7 to 14 wherein the expression level is determined using an (in vivo) imaging technique, optionally wherein the imaging technique comprises an implantable biosensor or photonic device, falloposcopy or Positron Emission Tomography (PET)..
23. The method of any previous claim wherein the expression level is determined by testing the immune response of the subject to the one or more biomarkers.
24. The method of any previous claim wherein the subject has a mutation in either of the genes BRCA1 or BRCA2 that gives them an increased risk of developing ovarian cancer relative to a subject without such a mutation in BRCA1 or BRCA2.
25. The method of any previous claim wherein the subject does not have a mutation in either of the genes BRCA1 or BRCA2 that gives them an increased risk of developing ovarian cancer.
26. An antibody, aptamer or peptide that binds specifically to a biomarker from Table A for use in a method of predicting ovarian cancer, identifying an increased risk of developing ovarian cancer or diagnosing ovarian cancer in a subject.
27. The antibody, aptamer or peptide for use of claim 26 wherein the antibody, aptamer or peptide is conjugated to a label.
28. A method for predicting ovarian cancer or identifying an increased risk of developing ovarian cancer or diagnosing ovarian cancer in a subject comprising: a. applying a specific binding agent that can specifically bind to a biomarker from Table A to a sample obtained from the subject b. applying a detection agent that detects the specific binding agent-biomarker complex c. using the detection agent to determine the number of cells that express the biomarker wherein the determined number of cells is used to predict ovarian cancer or identify an increased risk of developing ovarian cancer or diagnose ovarian cancer in the subject.
29. The method of claim 28 wherein an increased number of cells that express one or more of genes 1 to 338 from Table A predicts ovarian cancer or indicates that the subject has an increased risk of developing ovarian cancer or indicates that the subject has ovarian cancer.
30. The method of claim 28 wherein a decreased number of cells that express one or more of genes 339 to 628 from Table A predicts ovarian cancer or indicates that the subject has an increased risk of developing ovarian cancer or indicates that the subject has ovarian cancer.
31. The method of any of claims 28 to 30 wherein the cells are fallopian tube cells, optionally ciliated fallopian tube cells.
32. The method of any of claims 28 to 31 wherein the specific binding agent is an antibody or an aptamer.
33. A method of treating a subject with a prediction of ovarian cancer or an increased risk of developing ovarian cancer or with a diagnosis of ovarian cancer comprising one or more of: a. removing one or more (affected) fallopian tubes from the subject b. treatment with a therapeutic agent (such as a biologic, optionally an antibody and/or a vaccine, and/or a small molecule inhibitor) wherein the subject is selected for treatment on the basis of a method as claimed in any previous claim
34. A therapeutic agent (such as a biologic, optionally an antibody and/or vaccine, and/or a small molecule inhibitor) for use in a method of treating a subject with a prediction of ovarian cancer or an increased risk of developing ovarian cancer or with a diagnosis of ovarian cancer wherein the subject is selected for treatment on the basis of a method as claimed in any previous claim.
35. A method of treating a subject with a prediction of ovarian cancer or an increased risk of developing ovarian cancer or with a diagnosis of ovarian cancer wherein the subject has an increased expression level of one or more of genes 1 to 338 from Table Aand/or a decreased expression level of one or more of genes 339 to 628 from Table A comprising one or more of: a. removing one or more (affected) fallopian tubes from the subject b. treatment with a therapeutic agent (such as a biologic, optionally an antibody and/or a vaccine, and/or a small molecule inhibitor).
36. A therapeutic agent (such as a biologic, optionally an antibody and/or vaccine, and/or a small molecule inhibitor) for use in a method of treating a subject with a prediction of ovarian cancer or an increased risk of developing ovarian cancer or with a diagnosis of ovarian cancer wherein the subject has an increased expression level of one or more of genes 1 to 338 from Table A and/or a decreased expression level of one or more of genes 339 to 628 from Table A.
37. A method for selecting a treatment for a subject comprising a. determining the expression level of one or more biomarkers from Table A in a sample from the subject wherein the determined expression level is used to predict ovarian cancer or identify an increased risk of developing ovarian cancer or diagnose ovarian cancer b. selecting a treatment appropriate to the prediction, increased risk or diagnosis of ovarian cancer and c. treating the subject with the selected treatment.
38. The method of claim 37 wherein if the subject has a prediction of ovarian cancer, an increased risk of developing ovarian cancer or a diagnosis of ovarian cancer the treatment selected is one or more of: a. removing one or more (affected) fallopian tubes from the subject b. a therapeutic agent (such as a biologic, optionally an antibody and/or a vaccine, and/or a small molecule inhibitor).
39. The method of claim 37 or 38 wherein if the expression level of IGF2 is increased the treatment selected is a small molecule inhibitor of the interaction between IGF2 and the IGF1 receptor.
40. The method of any of claims 33, 35 or 38 or therapeutic agent for use of any of claims 34 or 36 wherein the antibody can specifically bind to one of the biomarkers from Table A.
41. A method for predicting ovarian cancer or identifying an increased risk of developing ovarian cancer or diagnosing ovarian cancer comprising in a sample obtained from a subject identifying a mutation within the regulatory region of the SOX2 gene wherein identification of the mutation predicts ovarian cancer or indicates that the subject has an increased risk of developing ovarian cancer or is used to diagnose ovarian cancer.
42. The method of claim 41 wherein the mutation affects MYC, KDM5B or EZH2 occupancy of the regulatory region (resulting in a loss of repression of SOX2 by MYC or EZH2)
43. The method of claim 41 or 42 wherein the sample comprises, consists essentially of or consists of fallopian tube tissue and/or fallopian tube cells and/or material derived therefrom.
44. The method of claim 43 wherein the fallopian tube cells comprise ciliated fallopian tube cells.
45. The method of any of claims 41 to 44 wherein the regulatory region comprises, consists essentially of or consists of SEQ ID NO. 1.
46. The method of any of claims 41 to 44 wherein the mutation comprises, consists essentially of or consists of any one or more of a mutation from C to any other nucleotide at position 1788388 of SEQ ID NO. 1, a mutation from G to any other nucleotide at position 1791114 of SEQ ID NO. 1, a mutation from G to any other nucleotide at position 1779360 of SEQ ID NO. 1 ,a mutation from T to any other nucleotide at position 1768318 of SEQ ID NO. 1, a mutation from G to any other nucleotide at position 1769826 of SEQ ID NO. 1, a mutation from A to any other nucleotide at position 1793202 of SEQ ID NO. 1, a mutation from C to any other nucleotide at position 1771041 of SEQ ID NO, 1, a mutation from A to any other nucleotide at position 1764947 of SEQ ID NO. 1, a mutation from C to any other nucleotide at position 1776714 of SEQ ID NO. 1, a mutation from C to any other nucleotide at position 1793607 of SEQ ID NO. 1, a mutation from G to any other nucleotide at position 1765207 of SEQ ID NO, 1 and a mutation from T to any other nucleotide at position 377489 of SEQ ID NO. 1, a mutation from C to any other nucleotide at position 1270829 of SEQ ID NO. 1, a mutation from G to any other nucleotide at position 1492887 of SEQ ID NO. 1, a mutation from A to any other nucleotide at position 1583019 of SEQ ID NO, 1, a mutation from C to any other nucleotide at position 1788388 of SEQ ID NO. 1, a mutation from C to any other nucleotide at position 1940505 of SEQ ID NO. 1.
47. The method of any of claims 41 to 46 wherein the mutation comprises, consists essentially of or consists of any one or more of M1 (a mutation from C to G at position 1788388 of SEQ ID NO. 1), M2 (a mutation from G to A at position 1791114 of SEQ ID NO. 1), M3 (a mutation from G to C at position 1779360 of SEQ ID NO. 1), M4 (a mutation from T to G at position 1768318 of SEQ ID NO. 1), M5 (a mutation from G to A at position 1769826 of SEQ ID NO. 1), M6 (a mutation from A to G at position 1793202 of SEQ ID NO. 1), M7 (a mutation from C to A at position 1771041 of SEQ ID NO, 1), M8 (a mutation from A to T at position 1764947 of SEQ ID NO. 1), M9 (a mutation from C to A at position 1776714 of SEQ ID NO. 1), M10 (a mutation from C to T at position 1793607 of SEQ ID NO. 1), and M11 (a mutation from G to C at position 1765207 of SEQ ID NO, 1) and BB1 (a mutation from T to G at position 377489 of SEQ ID NO. 1), BB2 (a mutation from C to G at position 1270829 of SEQ ID NO. 1), BB3 (a mutation from G to T at position 1492887 of SEQ ID NO. 1), BB4 (a mutation from A to T at position 1583019 of SEQ ID NO, 1), BB5 (a mutation from C to G at position 1788388 of SEQ ID NO. 1) and BB6 (a mutation from C to G at position 1940505 of SEQ ID NO. 1).
48. The method of any of claims 41 to 44 wherein the regulatory region comprises SEQ ID NO. 2.
49. The method of claim 48 wherein the mutation comprises, consists essentially of or consists of any one or more of M1 (a mutation from C to G at position 1788388 of SEQ ID NO. 1), M2 (a mutation from G to A at position 1791114 of SEQ ID NO. 1), M3 (a mutation from G to C at position 1779360 of SEQ ID NO. 1), M4 (a mutation from T to G at position 1768318 of SEQ ID NO. 1), M5 (a mutation from G to A at position 1769826 of SEQ ID NO. 1), M6 (a mutation from A to G at position 1793202 of SEQ ID NO. 1), M7 (a mutation from C to A at position 1771041 of SEQ ID NO, 1), M8 (a mutation from A to T at position 1764947 of SEQ ID NO. 1), M9 (a mutation from C to A at position 1776714 of SEQ ID NO. 1), M10 (a mutation from C to T at position 1793607 of SEQ ID NO. 1), and M11 (a mutation from G to C at position 1765207 of SEQ ID NO, 1).
50. The method of any of claims 41 to 49 wherein the mutation comprises, consists essentially of or consists of BB5 (a mutation from C to G at position 1788388 of SEQ ID NO. 1).
51. The method of any of claims 41 to 50 wherein the mutation is identified by a method selected from nucleic acid sequencing, probe hybridization, nucleic acid amplification and mass spectrometric detection
52. The method of claim 51 wherein the mutation is identified using a high-thoughput sequencing technique.
53. The method of claim 52 wherein the high-throughput sequencing technique is selected from pyrosequencing, SOLiD sequencing, Ion Torrent semiconductor sequencing, llluminadye sequencing, single-molecule real-time sequencing or DNA nanoball sequencing.
54. The method of any of claims 41 to 53 wherein the sample is or was obtained using a tool for acquiring fallopian tube cells in vivo by exfoliative cytology.
55. The method of claim 54 wherein the tool is a cytology brush.
56. The method of any of claims 41 to 53 wherein the sample is obtained by biopsy.
57. The method of any of claims 41 to 53 wherein the sample is obtained by washing out the lumen of the fallopian tube(s).
58. A system or device for performing the method of any previous claim.
59. A system or test kit for predicting ovarian cancer or identifying an increased risk of developing ovarian cancer or diagnosing ovarian cancer in a subject comprising: a. one or more testing devices for determining the expression level of one or more biomarkers from Table A in a sample from the subject b. a processor; and c. a storage medium comprising a computer application that, when executed by the processor, is configured to: i. access and/or calculate the determined expression level of one or more biomarkers from Table A in the sample on the one or more testing devices ii. calculate whether there is an increased or decreased level of the one or more biomarkers from Table A in the sample; and iii. output from the processor the prediction or risk of developing ovarian cancer or diagnosis of ovarian cancer.
60. The system or test kit of claim 59 further comprising a display for the output from the processor.
61. The system or test kit of claim 59 or 60 wherein the biomarker is IGF2.
62. A computer application or storage medium comprising a computer application as defined in claim 59.
63. A method for predicting ovarian cancer or identifying an increased risk of developing ovarian cancer in a subject substantially as herein described with reference to the drawings.
64. A method for diagnosing ovarian cancer substantially as herein described with reference to the drawings.
65. Use of an antibody or aptamer that binds specifically to a biomarker from Table A substantially as herein described with reference to the drawings.
66. A method of treating a subject substantially as herein described with reference to the drawings.
67. A method for selecting a treatment for a subject substantially as herein described with reference to the drawings.
68. A therapeutic agent (such as a biologic, optionally an antibody and/or vaccine, and/or a small molecule inhibitor) for use in a method of treating a subject substantially as herein described with reference to the drawings.
69. A system, device or test kit substantially as herein described with reference to the drawings.
70. A computer application or storage medium substantially as herein described with reference to the drawings.
GB1607393.4A 2016-04-28 2016-04-28 Biomarkers for early diagnosis of ovarian cancer Withdrawn GB2549763A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1607393.4A GB2549763A (en) 2016-04-28 2016-04-28 Biomarkers for early diagnosis of ovarian cancer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1607393.4A GB2549763A (en) 2016-04-28 2016-04-28 Biomarkers for early diagnosis of ovarian cancer

Publications (2)

Publication Number Publication Date
GB201607393D0 GB201607393D0 (en) 2016-06-15
GB2549763A true GB2549763A (en) 2017-11-01

Family

ID=56234031

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1607393.4A Withdrawn GB2549763A (en) 2016-04-28 2016-04-28 Biomarkers for early diagnosis of ovarian cancer

Country Status (1)

Country Link
GB (1) GB2549763A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020101432A1 (en) * 2018-11-16 2020-05-22 가톨릭대학교 산학협력단 Biomarker for predicting onset of hereditary ovarian cancer and use thereof
KR20200057652A (en) * 2018-11-16 2020-05-26 가톨릭대학교 산학협력단 Biomarker for predicting development of hereditary ovarian cancer and use thereof
CN112951325A (en) * 2021-02-18 2021-06-11 北京吉因加医学检验实验室有限公司 Design method and application of probe combination for cancer detection

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111735950B (en) * 2020-07-17 2023-07-21 北京信诺卫康科技有限公司 FGF18 and CA125 combined used as early ovarian cancer biomarker and kit
CN111735949B (en) * 2020-07-17 2023-07-21 北京信诺卫康科技有限公司 Wnt7a and CA125 combined as early ovarian cancer biomarker and kit
CN111912987B (en) * 2020-08-25 2023-07-21 北京信诺卫康科技有限公司 FGF18 and HE4 combined used as early ovarian cancer biomarker and kit
CN114015689A (en) * 2021-10-26 2022-02-08 江苏大学 shRNA sequence for specifically inhibiting GOS2 gene expression and application thereof
CN116716404B (en) * 2023-06-13 2024-01-30 中国医学科学院北京协和医院 Device for distinguishing ovarian clear cell carcinoma from high-grade serous carcinoma based on S100A2
CN116790759B (en) * 2023-08-08 2023-12-01 潍坊医学院 Application of PLEC in early diagnosis and treatment of epithelial ovarian cancer

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Cancer Research, vol. 73, no. 17, 2013, Bareiss et al. "SOX2 Expression Associates with Stem Cell State in Human Ovarian Carcinoma", pages 5544-5555 PREV201300667971 *
FENG YE; YANLI LI; YING HU; CAIYUN ZHOU; YUTING HU; HUAIZENG CHEN: "Expression of Sox2 in human ovarian epithelial carcinoma", JOURNAL OF CANCER RESEARCH AND CLINICAL ONCOLOGY, SPRINGER, BERLIN, DE, vol. 137, no. 1, 27 March 2010 (2010-03-27), Berlin, DE, pages 131 - 137, XP019870037, ISSN: 1432-1335, DOI: 10.1007/s00432-010-0867-y *
PLoS One, vol. 9, no. 6, June 2014, Wang et al. "SOX2 Enhances the Migration and Invasion of Ovarian Cancer Cells via Src Kinase", article no.: e99594 PREV201400572359 *
ZHANG JING; CHANG DOO YOUNG; MERCADO-URIBE IMELDA; LIU JINSONG: "Sex-determining region Y-box 2 expression predicts poor prognosis in human ovarian carcinoma", HUMAN PATHOLOGY., SAUNDERS, PHILADELPHIA, PA, US, vol. 43, no. 9, 1 January 1900 (1900-01-01), US, pages 1405 - 1412, XP028931934, ISSN: 0046-8177, DOI: 10.1016/j.humpath.2011.10.016 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020101432A1 (en) * 2018-11-16 2020-05-22 가톨릭대학교 산학협력단 Biomarker for predicting onset of hereditary ovarian cancer and use thereof
KR20200057652A (en) * 2018-11-16 2020-05-26 가톨릭대학교 산학협력단 Biomarker for predicting development of hereditary ovarian cancer and use thereof
CN113286898A (en) * 2018-11-16 2021-08-20 加图立大学校产学协力团 Biomarkers for predicting genetic ovarian carcinogenesis and uses thereof
KR102368717B1 (en) * 2018-11-16 2022-02-28 가톨릭대학교 산학협력단 Biomarker for predicting development of hereditary ovarian cancer and use thereof
CN112951325A (en) * 2021-02-18 2021-06-11 北京吉因加医学检验实验室有限公司 Design method and application of probe combination for cancer detection

Also Published As

Publication number Publication date
GB201607393D0 (en) 2016-06-15

Similar Documents

Publication Publication Date Title
Beltran et al. Divergent clonal evolution of castration-resistant neuroendocrine prostate cancer
Cui Zhou et al. Spatially restricted drivers and transitional cell populations cooperate with the microenvironment in untreated and chemo-resistant pancreatic cancer
GB2549763A (en) Biomarkers for early diagnosis of ovarian cancer
Le Loarer et al. SMARCA4 inactivation defines a group of undifferentiated thoracic malignancies transcriptionally related to BAF-deficient sarcomas
Fan et al. CircNR3C2 promotes HRD1-mediated tumor-suppressive effect via sponging miR-513a-3p in triple-negative breast cancer
Wagoner et al. The transcription factor REST is lost in aggressive breast cancer
Stricker et al. Robust stratification of breast cancer subtypes using differential patterns of transcript isoform expression
Singh et al. The long noncoding RNA H19 regulates tumor plasticity in neuroendocrine prostate cancer
McAuliffe et al. Ability to generate patient-derived breast cancer xenografts is enhanced in chemoresistant disease and predicts poor patient outcomes
Ji et al. Genome-wide high-resolution mapping of mitotic DNA synthesis sites and common fragile sites by direct sequencing
Hellner et al. Premalignant SOX2 overexpression in the fallopian tubes of ovarian cancer patients: Discovery and validation studies
Kutasovic et al. Breast cancer metastasis to gynaecological organs: a clinico‐pathological and molecular profiling study
Bolis et al. Dynamic prostate cancer transcriptome analysis delineates the trajectory to disease progression
Ouyang et al. CircRNA_0109291 regulates cell growth and migration in oral squamous cell carcinoma and its clinical significance
Kim et al. Differentially expressed genes in matched normal, cancer, and lymph node metastases predict clinical outcomes in patients with breast cancer
Anjanappa et al. A system for detecting high impact-low frequency mutations in primary tumors and metastases
Cimadamore et al. Long non-coding RNAs in prostate cancer with emphasis on second chromosome locus associated with prostate-1 expression
Xiao et al. Non‐invasive diagnosis and surveillance of bladder cancer with driver and passenger DNA methylation in a prospective cohort study
Stewart et al. Aberrant expression of pseudogene-derived lncRNAs as an alternative mechanism of cancer gene regulation in lung adenocarcinoma
Belair et al. DGCR 8 is essential for tumor progression following PTEN loss in the prostate
Baker et al. Proteogenomic analysis of Inhibitor of Differentiation 4 (ID4) in basal-like breast cancer
Hoogland et al. Gene-expression analysis of gleason grade 3 tumor glands embedded in low-and high-risk prostate cancer
Sircar et al. Biphasic components of sarcomatoid clear cell renal cell carcinomas are molecularly similar to each other, but distinct from, non‐sarcomatoid renal carcinomas
Lucas et al. Spatial genomic, biochemical, and cellular mechanisms drive meningioma heterogeneity and evolution
Nakashoji et al. Comprehensive analysis of the homeobox family genes in breast cancer demonstrates their similar roles in cancer and development

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)