WO2012167112A2 - Gastric cancer biomarkers - Google Patents

Gastric cancer biomarkers Download PDF

Info

Publication number
WO2012167112A2
WO2012167112A2 PCT/US2012/040501 US2012040501W WO2012167112A2 WO 2012167112 A2 WO2012167112 A2 WO 2012167112A2 US 2012040501 W US2012040501 W US 2012040501W WO 2012167112 A2 WO2012167112 A2 WO 2012167112A2
Authority
WO
WIPO (PCT)
Prior art keywords
sample
acc
source
hgnc
symbol
Prior art date
Application number
PCT/US2012/040501
Other languages
French (fr)
Other versions
WO2012167112A9 (en
Inventor
Russell GROCOCK
Richie Chuan Teck SOONG
Jennifer BECQ
Jian-Bing Fan
Stewart MACARTHUR
Keira CHEETHAM
Ville SILVENTOINEN
Dirk Evers
Mengchu WU
Khay Guan YEOH
Bok Yan Jimmy SO
Boon Ooi Patrick TAN
Original Assignee
Illumina, Inc.
National University Of Singapore
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Illumina, Inc., National University Of Singapore filed Critical Illumina, Inc.
Publication of WO2012167112A2 publication Critical patent/WO2012167112A2/en
Publication of WO2012167112A9 publication Critical patent/WO2012167112A9/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57446Specifically defined cancers of stomach or intestine
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • Gastric cancer was once the second most common cancer in the world. In most developed countries, however, rates of stomach cancer have declined over the past half century and in the United States stomach malignancy is currently the 14 th most common cancer. In the United States, it is estimated that around 21,000 cases of gastric cancer were diagnosed in 2009, with more in intervening years, and more than half of those patients diagnosed would die of the disease. Indeed, gastric cancer is the 7 th leading cause of cancer deaths in the United States.
  • gastric cancer is still the second most common cause of cancer related death in the world and it remains difficult to cure, perhaps because most patients present with advanced disease. Even patients who present in the most favorable condition and who undergo surgical resection often die of recurrent disease. As such, what are needed are ways to detect gastric cancer earlier rather than later leading to earlier treatment regimens for a more positive prognosis of longer term survival.
  • the present disclosure identifies biological markers, or biomarkers, indicative of gastric cancer.
  • the disclosed biomarkers represent hot spots in the genome that can be used to identify the presence of gastric cancer, wherein the methods described herein utilize the biomarkers and provide alternatives to currently available gastric cancer determinative, diagnostic and prognostic methodologies.
  • the biomarkers and methods of their use as disclosed herein can be applied to the determination, diagnosis, and/or prognosis of gastric cancer either alone or in conjunction with other gastric disease diagnostic and prognostic assays.
  • the present disclosure provides methods for determining, diagnosing and/or prognosing gastric cancer comprising detecting in a nucleic acid sample from a subject the presence of a plurality of mutations in two or more genes selected from the group comprising MUC4, MEC16, TP53, SACS, APJD1A, FLNA, FAT4, ASPM, AHNAK2, CEP290, PCLO, GPR98, CR2, AR, PLEC, MACF1 and MAPI A, wherein at least one of said genes is from the list consisting of SACS, FLNA, ASPM, PCLO, CR2, and MAPI A, evaluating the probability that the two or more genes are correlated with gastric cancer, and determining the presence of gastric cancer in a sample based on said evaluation.
  • the subject is a human subject and the nucleic acid samples are genomic DNA samples.
  • three or more genes are selected for determining, diagnosing and/or prognosing the presence of gastric cancer.
  • determining the presence of gastric cancer comprises comparing a tumor or test genomic DNA sample to a normal genomic DNA sample from the same individual.
  • evaluating the sample is performed by one or more of sequenceing such as sequence by synthesis methodologies, microarray analysis or polymerase chain reactions methodologies such as quantitative or real time PCR.
  • the genomic DNA sample is isolated from a sample selected from the group consisting of a tissue sample, a biopsy sample, a cell sample, a circulating tumor cell sample, a fixed tissue sample or a frozen tissue sample and wherein the normal sample is isolated from normal tissue, for example adjacent or proximal to the proposed tumor sample.
  • Figure 1 characterizes the pre -validation patient cohort from which the gastric cancer samples and matched normal pairs were derived.
  • test sample is intended to mean any biological fluid, cell, tissue, organ or portion thereof that contains genomic nucleic acids, for example genomic DNA or RNA, suitable for mutational detection via the disclosed methods.
  • a test sample can include or be suspected to include a cell, such as a cell from any location in the stomach that contains or is suspected to contain a cancerous cell such as the cardia, the fundus of the stomach, the body of the stomach, the gastric antrum, the pylorus, the lesser curvature of the stomach, the greater curvature of the stomach and/or the overlapping lesion of the stomach.
  • the term includes samples present from an individual as well as samples obtained or derived from an individual.
  • a sample can be a histologic section of a specimen obtained by biopsy, aspiration, etc. or cells that are placed in or adapted to tissue culture.
  • a sample further can be a sub-cellular fraction or extract, or a crude or isolated nucleic acid molecule.
  • a patient matched normal sample can be used to establish a mutational background for comparison to a patient test sample.
  • An exemplary patient matched sample is a tissue or cell sample from an adjacent normal tissue to a suspected cancerous tissue or a blood sample from the patient.
  • a sample may be obtained in a variety of ways known in the art. Samples may be obtained according to standard techniques from all types of biological sources that are usual sources of genomic DNA including, but not limited to cells or cellular components which contain DNA, cell lines, circulating tumor cells, biopsies, bodily fluids such as blood, tissue samples such as tissue that are formalin fixed and embedded in paraffin such as tissue from the fundus of the stomach, the body of the stomach, the gastric antrum, the pylorus, the lesser curvature of the stomach, the greater curvature of the stomach and/or the overlapping lesion of the stomach, and all possible combinations thereof. Further, tissues can be fresh, fresh frozen, etc.
  • a sample can be from an archived, stored or fresh source as suits a particular application of the methods set forth herein.
  • the methods described herein can be performed on one or more samples from gastric cancer patients such as samples obtained by tissue biopsy or needle aspiration.
  • Sample analysis can be applied, for example, to the presence or absence of gastric cancer, differentiation between early and/or late stage gastric cancer types, gastric cancer epithelial type differentiation, or to monitor cancer progression or response to treatment.
  • a suitable sample can be collected and acquired that is either known to comprise gastric cancer cells or is subsequent to the formulation of the diagnostic aim of a biomarker as disclosed herein.
  • a sample can be derived from a population of cells or from a tissue that is predicted to be afflicted with or phenotypic of gastric cancer.
  • the genomic DNA can be derived from a high-quality source such that the sample contains only the tissue type of interest, minimum contamination and minimum DNA
  • samples are contemplated to be representative of the tissue or cell type of interest that is to be handled by an assay.
  • a population or set of samples from an individual source can be analyzed to maximize confidence in the results for an individual.
  • a sample from an individual is matched and compared to a normal sample from that same individual to identify the mutational status of biomarkers for that individual.
  • the normal sample, or patient matched normal sample can be from the same or similar organ, tissue or fluid as the sample to which it is compared.
  • the normal sample will typically display a phenotype that is different from a phenotype of the sample to which it is compared.
  • isolated or purified when used in relation to a nucleic acid refers to a nucleic acid sequence that is extracted and separated from at least one component or contaminant with which it is ordinarily associated in its natural source. As such, an isolated or purified nucleic acid is present in a form or setting that is different from that in which it is found in nature.
  • the terms “marker” or “biomarker” can be DNA or RNA, proteins, polypeptides, variants, fragments or functional equivalents thereof.
  • a biomarker is generally associated with a genomic nucleic acid such as a gene or gene associated region or location unless specified otherwise.
  • Biomarkers disclosed herein that are associated with gastric cancer, a particular type of gastric cancer and/or a particular stage of gastric cancer comprise one or more single nucleotide variants and/or insertions/deletions (indels) located in a gene or gene associated region as compared to its equivalent in a normal sample.
  • a gene that contains one or more somatic mutations, such as variant mutations, identified in one or more patient samples is contemplated to be a biomarker that is useful in detecting, diagnosing or prognosing gastric cancer, a particular type of gastric cancer and/or a stage of gastric cancer.
  • Somatic mutation is an alteration in the genome that occurs after conception resulting in a genetic difference of the genome at that particular location. Somatic mutations can occur in any cell in the body except the germ cells and are passed to the cell progeny during cell division. Somatic mutations include, but are not limited to, point mutations such as single nucleotide variants (SNVs), gene amplification or duplication, genetic insertions and/or deletions (indels), chromosomal translocations, chromosomal inversions and single nucleotide polymorphisms (SNPs). Somatic mutations can result in phenotypic changes, disease formation, cancer, etc.
  • SNVs single nucleotide variants
  • indels genetic insertions and/or deletions
  • SNPs single nucleotide polymorphisms
  • somatic mutations include SNVs, SNPs and/or indels present in genomic DNA, and are considered variant mutations, or mutations that may result in phenotypic changes, disease formation, cancer, etc. Identification of somatic variants described herein was performed by identifying a
  • a gene from one or more patient samples that has one or more variant mutations is considered a biomarker and useful in detecting, diagnosing and prognosing gastric cancer, a particular type of gastric cancer and/or a stage of gastric cancer.
  • gene refers to a nucleic acid sequence, such as DNA, that comprises coding sequences associated with the production of a polypeptide, precursor, or RNA (e.g., rRNA, tRNA). Typically, a gene also includes non-coding and intergenic sequences. The term can encompass the coding region of a gene and the sequences located adjacent to the coding region on both the 5' and 3' ends such that the gene corresponds to the length of the full-length m NA. Sequences located 5' of the coding region and present on the mRNA are referred to as 5' non-translated sequences.
  • Sequences located 3' or downstream of the coding region and present on the mRNA are referred to as 3' non-translated sequences.
  • the term "gene” encompasses both cDNA and genomic forms of a gene.
  • a genomic form or clone of a gene contains the coding region interrupted with non-coding sequences such as introns, intervening regions, intervening sequences or intergenic regions.
  • Gastric carcinoma is the second leading cause of death of all malignancies worldwide and mortality of diagnosed gastric cancer remains high.
  • United States and Western Europe even though there has been a dramatic decrease in the incidence of gastric cancer and despite advances in treatment regimens, once gastric cancer is diagnosed the prognosis is not favorable.
  • Japan, Korea and other Eastern countries gastric cancer incidence has decreased slowly but is still a leading cause of cancer related deaths in these regions.
  • the problem is that oftentimes gastric cancer is diagnosed at a late stage which leads to high mortality despite new treatment alternatives.
  • methods for diagnosing and determining the presence of gastric cancer at an early stage would allow for earlier treatment and provide for a more favorable prognosis for the afflicted patient.
  • the present disclosure solves this problem by providing biomarkers which represent mutational hot spots which can be used in determining the presence of gastric cancer in a subject.
  • Gastric cancer typically begins in the stomach.
  • the stomach is divided into three parts; the upper third, or proximal stomach, abuts the esophagus and consists of the gastroesophageal junction, the cardia and the fundus, the middle third of the stomach, and the body, or the lower third, which abuts the small intestine and consists of the antrum and the pylorus (empties into the duodenum).
  • the cancerous cells can infiltrate into surrounding organs (spleen, colon, liver, pancreas, etc.) and can further metastasize into the peritoneal cavity and other secondary metastatic locations (i.e., liver, lungs, lymph nodes, etc.).
  • Gastric cancer or gastric carcinoma is adenocarcinoma of the stomach that makes up around 90% of all stomach malignancies, the remainder being mainly gastric lymphomas.
  • Gastric carcinomas can be classified in a number of ways. Historically, the World Health Organization classification guidelines segregated gastric cancer into adenocarcinoma, signet ring-cell carcinoma or undifferentiated carcinoma. However, the Lauren classification which is gaining favor and is utilized worldwide divides gastric carcinomas into two subtypes; intestinal and diffuse. Further, anatomical and pathological staging and grading is performed on tissues to determine the extent and prognosis of a patient.
  • the Lauren classification is a system used to describe gastric cancers based on intestinal or diffuse type histology; intestinal histology being associated with a more favorable prognosis.
  • gastric adenocarcinoma is classified according to its degree of differentiation and histologic types according to the WHO classification.
  • the Lauren classification has value from an epidemiologic and prognostic standpoint.
  • the Lauren classification is used in Europe and the rest of the world, but has yet to gain universal acceptance in the United States.
  • the Lauren histologic classification of gastric adenocarcinoma into intestinal and diffuse has been emphasized in epidemiologic studies.
  • Intestinal type gastric carcinoma is characterized by; 1) a mean age of detection of 55 years, 2) commonly presents as an exophytic intraluminal mass with an expansile growth pattern as it infiltrates the wall, 3) has tubular, papillary and solid microscopic patterns with Mucin being restricted to the gland lumina, 4) a 5 year survival rate of approximately 20% and 5) an almost 100% association with intestinal metaplasia and H. pylori infection.
  • Diffuse type of gastric carcinoma is characterized by; 1) tends to occur in younger patients (mean age at diagnosis is 48), 2) commonly presents as an ulcerative, infiltrative tumor with a diffusely infiltrative pattern of growth in the gastric wall, 3) characterized
  • a pre-validation patient cohort was sequenced as described herein and a candidate list of approximately 266 gene markers (Table 1) comprising three or more variant mutations within the gene marker, or biomarker, was compiled from an original list identifying approximately 5200 genes of interest for their potential use as gastric cancer related biomarkers. Chromosomal locations identified in Table 1 are as found in the Archive EnsEMBL Human database, release 59-Aug 2010
  • GRC Genome Reference Consortium
  • ABS1 sub-family A
  • ZAC1 zinc finger homeobox 4
  • HMCN1 [Source:HGNC uncharacterized protein
  • centrosomal protein olfactory receptor family
  • DEAH voltage-dependent, DEAH (Asp-Glu-Ala-His)
  • KCNMA1 subfamily M alpha RREB1
  • StAR-related lipid sodium channel voltage- transfer (START) gated, type IX, alpha
  • catenin cadherin- interferon, gamma-associated protein
  • centlein centrosomal non-SMC condensin II
  • subunit 2 (non- chrl:220321635- ionotropic, AMPA 1 chr5: 152870204-
  • GABA GABA acid
  • Serine/threonine- protein kinase 19 (EC collagen alpha-2(XI) chain
  • Table 1 exemplifies those genes wherein three or more mutations were identified from the pre-validation patient cohort test samples (e.g., as compared to the matched normal tissue samples), thereby identifying biomarkers that are correlated with gastric cancer.
  • a subset of Table 1 is found in Table 2, the subset comprising those genes wherein four or more mutations (No. of unique mutations) were identified in the pre- validation patient cohort test samples (No. of samples mutated), thereby identifying biomarkers representing mutational hot spots that are correlated with gastric cancer.
  • variant mutations present in a gene it was further determined how many of the patient samples in the cohort had a variant mutation in that particular gene. For example, whether one or more, two or more, three or more, four or more, five or more, six or more or seven or more of the patient samples had a variant mutation in a particular gene.
  • Table 2 comprises a subset list of genes wherein two or more patient samples had variant mutations in a particular gene.
  • one or more patient samples were found to have two mutations, for example both a SNV and an indel, in the particular gene.
  • the gene ABCA13 there were six mutations identified in three patient samples, each sample displaying both a SNV and an indel in ABCA13.
  • the genes as found in Table 2 are recognized herein as biomarkers for determining, diagnosing or prognosing gastric cancer. Further investigation of the patient cohort sample set identified several samples that were potentially problematic. Those samples determined to be problematic (S9, SI 6, S18 and S24) were removed from the data set and the data set from the remaining 19 pre- validation cohort samples was reanalyzed. Table 3 reports the reanalysis of the original data set minus the problematic samples. Table 3-Pre-validation data for 19 tumor/normal pairs
  • Reanalysis identified an additional subset of markers that are contemplated to be of interest as mutational hot spots for their ability to diagnose the presence of gastric cancer.
  • biomarkers could be correlated to gastric cancer at p ⁇ 0.1 with the majority correlated at p ⁇ 0.05 or p ⁇ 0.01.
  • Pre-validation cohort data reanalysis revealed that mutations in MUC16 and TP53 were highly correlated with the presence of gastric cancer in a sample, further that 37% and 47%, of the cohort samples had mutations in these genes, respectively.
  • TP53 is well established as being mutated in a majority of cancer types, many of those mutations resulting in protein malfunction (Condel score of 0.9).
  • a Condel score (CONcensus DELeteriousness) integrates the output of computational tools aimed at assessing the impact of non-synonymous SNVs on protein function by computing a weighted average of the scores (WAS) of computational tools, such as SIFT, Polyphen2, MAPP, LogR Pfam e-value (2004, Clifford et al,
  • a high Condel score for example above 0.5, represents mutations more likely than not to be deleterious whereas a low Condel score represents the opposite (2011, Gonzalez-Perez and Lopez-Bigas, Am J Hum Gen 88:440-449; incorporated herein by reference in its entirety).
  • Condel scores were averaged using the Variant Effect Predictor (VEP) version of Condel that averages over SIFT and Polyphen2 (as further described in Example 2).
  • VEP Variant Effect Predictor
  • Pre-validation WGS data analysis identified biomarkers that had not been previously associated with gastric cancer (Wellcome Trust Sanger Institute Catalogue of Somatic Mutations in Cancer) including SACS, FLNA, ASPM, PCLO, CR2 and MAPI A.
  • An additional patient cohort of 39 tumor/normal sample pairs was obtained; the sample characteristics of which are listed in Figure 2.
  • the validation cohort was sequenced by whole exome sequencing (WES) as described herein and serves as validation of the initial data analysis from the pre-validation cohort. As seen in Table 4, the validation data further identifies those biomarkers which are highly correlated with gastric cancer, as originally identified in the pre-validation data analysis.
  • WES whole exome sequencing
  • Validation data supports the correlation of the pre-validation identified biomarkers with gastric cancer.
  • TP53, MUC4 and MUC16 were correlated with gastric cancer as expected.
  • validating with a larger sample cohort provided deeper insight into the originally identified subset of biomarkers.
  • analysis of the validation cohort data revealed a stronger correlation of ARID 1 A to gastric cancer, a marker which has been previously correlated with the presence of gastric cancer.
  • the strong correlation of TP53, MUC4, MUC16 and ARID 1 A can be used as positive controls of the present methods and systems in determining biomarkers that can be correlated with gastric cancer.
  • the validation data further supports the correlation of SACS, FLNA, FAT3, ASPM, AHNAK2, CEP290, PCLO, GPR98, CR2, AR, PLEC, MACF1 and MAPI A as being additional mutational hot spot genes, and thus biomarkers that can be correlated with gastric cancer.
  • biomarkers SACS, FLNA, ASPM, PCLO, CR2 and MAPI A were not previously correlated with gastric cancer in the Cosmic database.
  • biomarkers and their methods of use are described below.
  • the biomarkers and their methods of use are not limited to these embodiments.
  • Biomarkers as described herein find utility, either alone or in combination, in methods for determining, diagnosing, or prognosing gastric cancer. Biomarkers as described herein find utility, either alone or in combination, in methods for prognosing patient outcome diagnosed with gastric cancer, a type of gastric cancer and/or a stage of gastric cancer.
  • Biomarkers as described herein find utility, either alone or in combination, in methods for screening patients for the presence or absence of gastric cancer, a type of gastric cancer and/or a stage of gastric cancer, for example for patients that might be part of a high risk population predisposed to developing gastric cancer (e.g., family history, genetic predisposition, H. pylori infection status, etc.).
  • Additional tests and methods for identifying gastric cancer include, but are not limited to protein staining methods such as IHC or histopathological staining such as H&E, genetic probe assays such as in situ hybridization (ISH), infection status (H. pylori), TNM staging, clinical staging, pathological staging, etc., for example as recognized by the American Joint Committee on Cancer (AJCC) and/or the World Health Organization (AJCC Cancer Staging
  • H. pylori Helicobacter pylori
  • MALT gastric mucosa- associated lymphoid tissue
  • pylori is a spiral bacterium that grows in the mucus layer that coats the inside of the human stomach, the bacteria being resistant to the stomach's natural acidic and antimicrobial environment by way of urease secretion which neutralizes stomach acidity. Further, the bacterium's spiral habit allows it to burrow into the stomach's mucus layer besides attaching to the cells that line the inner surface of the stomach. Immune cells that would typically recognize and attack the bacteria are unable to reach the stomach lining. That, in combination with H. pylori's ability to interfere with local immune responses, makes immune cells ineffective in eliminating gastric infection with H. pylori. Epidemiological studies indicate that individuals infected with H.
  • biomarkers used in methods and assays for determining the presence of gastric cancer as described herein are used in conjunction with assays that determine the presence of H. pylori.
  • breath tests e.g., urea breath tests
  • antibody tests e.g., blood antibody tests directed against antibodies to H. pylori
  • antigen tests e.g., stool antigen tests presence of H. pylori antigens
  • stomach biopsy e.g., urea breath tests
  • a biomarker is a gene or genetic location that was identified to comprise one or more, two or more, three or more, four or more or five or more variant mutations in patient test samples as compared to the gene or gene location in a patient matched normal sample.
  • a biomarker that was identified to have variant mutations in at least two patient samples is contemplated to represent a "hot spot", or gene that comprises variant mutations as compared to other genes in a gastric cancer test sample (e.g., tissue, cell, circulating tumor cells, etc.).
  • Tables 2, 3 and 4 are exemplary of genes that were identified in two or more patient samples to have variant mutations compared to the same gene is a patient matched normal sample, thereby identifying them as potential biomarkers for determining the presence or absence of gastric cancer.
  • biomarkers as described herein are located in a coding region of a gene. In some embodiments, biomarkers as described herein are located in non-coding regions of a gene. In other embodiments, biomarkers as described herein are located in intergenic regions. In some embodiments, biomarkers as described herein comprise single nucleotide variants (SNVs) or single nucleotide polymorphisms (SNPs). In other embodiments, biomarkers as described herein comprise insertions and/or deletions (indels) of one or more genomic sequences. In further embodiments, a biomarker may comprise both SNVs and indels.
  • biomarkers as disclosed herein are useful in some embodiments.
  • the biomarkers as disclosed herein are useful in diagnosing the presence of gastric cancer, a type of gastric cancer, and/or stage of gastric cancer in a subject. In some embodiments, the biomarkers as disclosed herein are useful in prognosing disease progression, treatment outcome and/or treatment regimen progress of a subject diagnosed with gastric cancer. In some embodiments, the biomarkers as disclosed herein are useful in screening a subject for the possibility of developing gastric cancer. In some embodiments, the biomarkers as described herein are useful in screening potential therapeutic options for treating a patient having gastric cancer.
  • MUC4, TTN NEB, MUC16, TP53, CSMD3, SYNE1, COL11A1, ABCA13, HMCN1, AHNAK2, MLL3, ATR, CEP290, PCLO, ANK2, RELN, AR, CACNA2D1, FBN3, LVRN, OBSCN, SACS, RYR2, FAT3, TLL1, C5orf42, KCNMA1, GPR98, AL592307.2, COL7A1, ATP10A, SPINK5, CELSR3, NOTCH2, RNF43, STARD8, PTPRQ, CELSR2, ARID 1 A, VWA3B, UBR5, MYH11, F8, IGSF10, PIK3CA, DHX57, PLEC, C6orfl0, MAP2, MDN1, MMRN1, AP3B1, CTNNBl, FSHR, KIAAl 109, TRPM7
  • biomarkers comprising variant mutations which can be correlated with gastric cancer patient samples comprise two or more of MUC4, TTN.
  • OBSCN OBSCN, SACS, RYR2, FAT3, TLL1, C5orf42, KCNMA1, GPR98, AL592307.2, COL7A1, ATPIOA, SPINK5, CELSR3, NOTCH2, RNF43, STARD8, PTPRQ, CELSR2, ARID 1 A, VWA3B, UBR5, MYH11, F8, IGSF10, PIK3CA, DHX57, PLEC, C6orfl0, MAP2, MDN1, MMRN1, AP3B1, CTNNB1, FSHR, KIAAl 109, TRPM7, CNTLN, KIAA0182, AC130364.1, RAB3GAP2, ASXL1, UBE3A, OTOG, FNIP1, APOB, RP1, REV3L, PAPPA2, ABCB5, LAMA5, LRCH2, PCDH10, CR2, RP 1-21018.1,
  • MAP7D3, FLNA DNAH5, TUTl, LMANl, FAT4, KIAAl 199, TRPM6, ADAM32, DNAl l, ADAM23, UPF3A, ZBTB20, DNHDl, TENCl, SCNl lA, UIMC1, IGSF9, HPS1, LRP1B, MCM10, EPB41L3, AMPD3, TESK1, DNAH7, MY09A, CHD7, BIRC6, ERBB2, SMARCA4, STK31, FBLN2, SLC16A4, RAD50, CXorf59, C6orfl67, MAP3K4, SCN7A, TRPM3, KIF1A, RGS12, PTPRJ, DMD, SEMA3F, SCNIOA, DOCK7, TBC1D23, COL12A1, AFF3, MACFl, LAMA2, ZFP106, C6orfl03, RBM33, DOCK8, ATP11A, CHD
  • biomarkers comprising variant mutations which can be correlated with gastric cancer comprise two or more of MUC4, MUC16, TP53, SACS, ARID 1 A, FLNA, FAT4, ASPM, AHNAK2, CEP290, PCLO, GRP98, CR2, AR, PLEC, MACF1 and MAPI A.
  • biomarkers comprising variant mutations which can be correlated with gastric cancer comprise one or more of SACS, FLNA, ASPM, PCLO, CR2 and MAP1A. In some embodiments, biomarkers comprising variant mutations which can be correlated with gastric cancer comprise two or more of SACS, FLNA, ASPM, PCLO, CR2 and MAPI A. In some embodiments, biomarkers comprising variant mutations which can be correlated with gastric cancer comprise at least two or more of SACS, FLNA, ASPM, PCLO, CR2 and MAPI A and at least one or more or MUC4, MUC16, TP53, ARID 1 A, FAT4, AHNAK2, CEP290, GRP98, AR, PLEC and MACF1. In some embodiments, methods of biomarkers for determining the presence of gastric cancer comprise a combination of biomarkers, or group of biomarkers as described herein.
  • a sample from a subject used in methods for diagnosing gastric cancer as described herein is a tissue sample, for example a biopsy tissue sample, for example a gastric tissue biopsy sample.
  • a biopsy tissue sample used in diagnostic methods described herein is a fresh sample, or a sample that has been frozen or modified.
  • a modified sample is, for example, a sample that has been preserved or modified for storage and/or for use in histopathology, cytopathology,
  • a sample from a subject is a liquid sample, such as a blood sample containing white blood cells.
  • a liquid sample contains circulating tumor cells and/or bacterial cells, for example H. pylori bacteria.
  • nucleic acids are extracted and isolated from a sample, or portion thereof for subsequent use in methods as described herein for determining the presence of gastric cancer .
  • normal, adjacent tissue specimens were utilized as patient matched normal samples for comparison to a cancerous tissue sample.
  • one or more normal, adjacent tissue samples were obtained from an individual and matched to that individual's tissue sample suspected of containing cancerous cells for evaluation of biomarker mutations as described herein (for example, see Examples).
  • the genomic DNA was isolated from the normal tissue cells and served as a patient baseline (normal) for comparing mutations present in the test sample.
  • tissue sample that is free from the cancerous phenotype can also be utilized as a source of comparative normal genomic DNA for an individual.
  • nucleic acids isolated from a sample, or a portion thereof are used in diagnostic and/or prognostic methods as described herein.
  • the two or more biomarkers as described herein can be used in methods for determining, diagnosing, or prognosing gastric cancer. Further, the biomarkers find utility in combination with other biomarkers and/or other diagnostic tests in providing a diagnostician additional tools to determine gastric cancer status of a subject. The methods as described herein find particular utility as diagnostic and prognostic tools. In some embodiments, methods comprising biomarkers as described herein can be useful in detecting gastric cancer at an early stage in the disease compared to later stage detection. In some embodiments, methods comprising biomarkers as described herein can be prognostic for patient survival based on early and/or late detection of the presence of gastric cancer.
  • the biomarkers comprise variant genetic sequences in a genomic DNA sample compared to a genomic DNA normal sample.
  • Variant genetic sequences can include, but are not limited to, single nucleotide variants, sequence insertions, sequence deletions, within genes that differ from a normal sample and indicate mutations that may indicate phenotypic changes, disease formation, cancer, etc.
  • the methods detect two or more altered genetic sequences in a biomarker as compared to a normal sample.
  • a comparison between gene sequences in a test sample (i.e., collected from a patient, subject, individual, etc.) to a normal or control sample (i.e., from the sample patient, subject, individual from which the test sample is collected) identifies the number of mutations associated with a particular gene, wherein the presence of a plurality of variant genes over a normal may associate that gene with gastric cancer.
  • methods disclosed herein detect the insertion of one or more genetic sequences into a gene, deletion of one or more genetic sequences from a gene, or both as compared to a normal, or control, sample.
  • the methods detect one or more of single nucleotide variant(s) and/or insertion(s) and/or deletion(s) altered genetic sequences in a sample compared to a normal, or control sample.
  • a test sample i.e., a sample to be assayed for presence of gastric cancer
  • a second sample a normal or control sample (e.g., blood sample, tissue sample known not to have a cancerous phenotype) is collected from the same individual.
  • Genomic DNA is isolated from the sample(s) by techniques known in the art (for example, as found in Molecular Cloning, a Laboratory Manual, Eds.
  • the isolated DNA from a sample is used in methods as described herein for detecting biomarkers that can be indicative of gastric cancer.
  • the isolated DNA from the test and control samples is subjected to sequencing, for example next generation sequencing methodologies.
  • Sequence data from the test and the control DNA samples are compared, for example by aligning the two sequences, variant sequences are identified in the test sequence over the control sequence and the presence of gastric cancer, a type of gastric cancer, and/or stage of gastric cancer in a sample is identified based on said comparison.
  • isolated genomic DNA from a sample is used to identify variant mutations in a genetic sequence, wherein genes comprising variant mutations relative to a normal sample are biomarkers associated with the presence of gastric cancer, a type of gastric cancer, and/or stage of gastric cancer in a sample.
  • a subset of biomarkers as found in Tables 1, 2, 3 or 4 can be used in determining, diagnosing, or prognosing gastric cancer in a sample from a subject.
  • the subset can represent two or more, three or more, four or more, five or more, or six or more biomarkers with variant mutations from the subset of which can be indicative of the presence of gastric cancer in a patient.
  • the subset of biomarkers comprising two or more of SACS, FLNA, ASPM, PCLO, CR2, and MAPI A can be useful for indicating the presence of gastric cancer in a subject.
  • An additional subset encompassing biomarkers comprising MUC4, TTN. NEB, MUC16, TP53,
  • CTNNBl, FSHR, KIAAl 109, TRPM7, CNTLN, KIAA0182, AC 130364.1, RAB3GAP2, ASXL1, UBE3A, OTOG, FNIP1, APOB, RP1, REV3L, PAPPA2, ABCB5, LAMA5, LRCH2, PCDH10, CR2, RP1-21018.1, AC073995.2, MAP7D3, MACF1, MAPI A and DNAH5 can be useful for indicating the presence of gastric cancer, a type of gastric cancer, and/or stage of gastric cancer.
  • biomarkers comprising TUTl, LMANl, FAT4, KIAAl 199, TRPM6, ADAM32, DNAl 1, ADAM23, UPF3A, ZBTB20, DNHD1, TENC1, SCN11A, UIMC1, IGSF9, HPS1, LRP1B,
  • biomarkers with variant mutations as identified in pre-validation and validation cohort samples can be correlated with the presence of gastric cancer in a subject.
  • Some of the biomarkers identified in pre-validation and validation sample cohorts had not been previously disclosed as associated with gastric cancer in the Catalogue of Somatic Mutations in Cancer (Cosmic) database and the Cancer Gene Census (CGC) database of the Cancer Gene Project; both databases maintained by the Wellcome Trust Sanger Institute.
  • biomarkers not previously associated with gastric cancer include NEB, ABCA13, AHNAK2, CEP290, COL29A1, PCLO, LVRN, RYR2, FAT3, GPR98, AL592307.2, PTPRQ, PLEC, AL 130364.1, OTOG, AC073995.2, SACS, FLNA, ASPM, CR2 and MAPI A.
  • the biomarkers SACS, FLNA, PCLO, and CR2 are correlated with gastric cancer at p ⁇ 0.05 (validation studies) with biomarkers PLEC and MAPI A having p values of 0. 058 and ASPM having a p-value of 0.059.
  • the genes SACS, FLNA, ASPM, CR2, PLEC, MAPI A and ASPM are particularly considered mutational hot spots that can be correlated with the presence of gastric cancer.
  • Additional biomarkers that were not correlated with gastric cancer in COSMIC or CGC databases include DNAH11, DNHD1, CHD7, FBLN2, SCN7A, KIF1A, C6orfl03, RBM33, SCN4A, PCDHA11, ZFHX4, AC007342.2, MUC19, KIF26A, SCN9A, AC021066.1, ADZ4, ZNF66, DNAH12, ASXL2, FTSJD1, GABRG3, DNA2, KDM5B, AL157769.3, RPl 1-766F14.2, MPRIP, HMCN2, RYR3, WDFY4, CSMDl, KIAA1875, SSPO, ASXL3, AGBL3, TTC28, DOCK10, C20orfl2, SIK3, FREM1 and SCN5A.
  • methods comprising biomarkers for determining the presence of gastric cancer comprise two or more, three or more, four or more, five or more of NEB, ABCA13, AHNAK2, CEP290, COL29A1, PCLO, LVRN, RYR2, FAT3, GPR98,
  • Prognostic methods utilizing biomarkers as described herein are contemplated to be useful for determining a proper course of treatment for a patient having gastric cancer.
  • a course of treatment refers to the therapeutic measures taken for a patient after diagnosis or after treatment for gastric cancer.
  • Different treatment regimens are available for treating patients with gastric cancer. Standard treatments include surgery (e.g., subtotal gastrectomy or total gastrectomy), endoluminal stent placement or laser therapy, chemotherapy, radiation therapy and chemoradiation therapy.
  • Other therapeutic regimens comprise those identified during clinical trials, but not yet considered as a standard treatment option.
  • Clinical trials associated with gastric cancer can be found at, for example, www.clinicaltrials.gov.
  • a determination of the likelihood for cancer recurrence, spread, or patient survival can assist in determining whether a more conservative or more radical approach to therapy should be taken, or whether treatment modalities should be combined. For example, when gastric cancer recurrence is likely, it can be advantageous to precede or follow surgical treatment with chemotherapy, radiation, immunotherapy, biological modifier therapy, gene therapy, vaccines, and the like, or adjust the span of time during which the patient is treated.
  • a diagnosis or prognosis of a gastric cancer state is contemplated to be correlated with two or more, for example a particular combination, of biomarkers described herein.
  • methods utilizing biomarkers for use in prognosis of gastric cancer comprise two or more of MUC4, TTN.
  • C5orf42 KCNMA1, GPR98, AL592307.2, COL7A1, ATP10A, SPINK5, CELSR3, NOTCH2, RNF43, STARD8, PTPRQ, CELSR2, ARID 1 A, VWA3B, UBR5, MYH11, F8, IGSF10, PIK3CA, DHX57, PLEC, C6orfl0, MAP2, MDN1, MMRN1, AP3B1,
  • nucleic acids for example DNA are isolated from the sample by established means known in the art, and the isolated nucleic acids are assayed by methods disclosed herein, for example sequencing microarray analysis or PCR.
  • a normal or control sample is typically obtained for comparison with the test sample.
  • Methods described herein are contemplated for use in, for example, characterizing the variant mutational status of one or more biomarkers, for example those as found in Tables 1, 2, 3 or 4, wherein the variant mutational status of one or more biomarkers is useful in determining gastric cancer status.
  • methods for characterization comprise sequencing technologies, for example next generation sequencing technologies.
  • microarray based technologies are utilized to characterize the mutational status of a biomarker as described herein for determining the status of gastric cancer in a sample.
  • polymerase chain reaction is utilized to characterize the mutational status of a biomarker.
  • a sample is assayed for methylation status, the data of which is used to characterize a sample for gastric cancer status.
  • isolated genomic DNA from samples is typically modified prior to characterization.
  • genomic DNA libraries are created which can be applied to downstream detection applications such as sequencing.
  • a library is produced, for example, by performing the methods as described in the NexteraTM DNA Sample Prep Kit (Epicentre® Biotechnologies, Madison WI), GL FLX Titanium Library Preparation Kit (454 Life Sciences, Branford CT), SOLiDTM Library Preparation Kits (Applied BiosystemsTM Life Technologies, Carlsbad CA), and the like.
  • the sample as described herein may be further amplified for sequencing by, for example, multiple stand displacement amplification (MDA) techniques.
  • MDA multiple stand displacement amplification
  • an amplified sample library is, for example, prepared by creating a DNA library as described in Mate Pair Library Prep kit, Genomic DNA Sample Prep kits or TruSeqTM Sample Preparation and Exome Enrichment kits (Illumina®, Inc., San Diego CA).
  • Useful cluster is, for example, prepared by creating a DNA library as described in Mate Pair Library Prep kit, Genomic DNA Sample Prep kits or TruSeqTM Sample Preparation and Exome Enrichment kits (Illumina®, Inc., San Diego CA).
  • Genomic DNA libraries derived from a sample as described herein can be characterized for gastric cancer status by sequencing for the presence of gene mutations.
  • sequencing can be performed following manufacturer's protocols on a system such as those provided by Illumina®, Inc. (HiSeq 1000, HiSeq 2000, Genome Analyzers, MiSeq, HiScan, iScan, BeadExpress systems), 454 Life Sciences (FLX Genome Sequencer, GS Junior), Applied BiosystemsTM Life Technologies (ABI).
  • Output from a sequencing instrument can be of any sort.
  • current technology typically utilizes a light generating readable output, such as fluorescence or luminescence, however the present methods for detecting mutations in a biomarker for determining gastric cancer status in a sample is not necessarily limited to the type of readable output as long as differences in output signal for a particular sequence of interest can be determined.
  • a change in ion concentration for example hydrogen ion concentration is measured to determine a sequence of interest, whereas in other embodiments and change in current is utilized to determine a sequence of interest.
  • analysis software that may be used to characterize output derived from practicing methods as described herein include, but are not limited to, Pipeline,
  • the number of mutations in a biomarker can be detected using microarray methodologies.
  • a plurality of different probe molecules can be attached to a substrate or otherwise spatially distinguished in an array.
  • Exemplary arrays that can be used to detect the number of mutations in a biomarker include, but are not limited to, slide arrays, silicon wafer arrays, liquid arrays, bead-based arrays and others known in the art or set forth in further detail below.
  • the methods can be practiced with array technology that combines a miniaturized array platform, a high level of assay
  • Exemplary methods and systems for microarray analysis includes, but is not limited to, those methods and systems commercialized by Roche NimbleGen, Inc., Illumina®, Inc., Affymetrix® and Agilent Technologies.
  • An array of beads can also be in a fluid format such as a fluid stream of a flow cytometer or similar device.
  • Commercially available fluid formats for distinguishing beads include, for example, those used in XMAPTM technologies from Luminex or MPSSTM methods from Lynx Therapeutics.
  • microarray methods and systems can be found in, for example, US patents 5,856,101, 5,981,733; 6,001,309; 6,023,540, 6,110,426, 6,200,737, 6,221,653; 6,232,072, 6,266,459, 6,327,410, 6,355,431, 6,379,895, 6,429,027, 6,458,583, 6,667,394 6,770,441, 6,489,606 and 6,859,570, 7,106,513, 7,126,755, and 7,164,533, US patent applications 2005/0227252, 2006/0023310, 2006/006327, 2006/0071075, 2006/0119913 and PCT publications WO98/40726, W099/18434, WO98/50782, WO00/63437, WO04/024328 and WO05/033681 (each of which is incorporated herein by reference in their entireties). Microarray based technologies for characterizing gastric cancer are contemplated to be useful either alone or
  • PCR polymerase chain reaction
  • PCR analysis includes, but are not limited to those methods and systems
  • genes described herein were identified by whole genome sequencing (WGS) from a pre-validation cohort of 23 fresh frozen patient tumor samples known to contain gastric cancer. Normal samples obtained from adjacent normal tissues were also collected from each patient and matched to the corresponding gastric cancer test sample. Patient samples were available through protocols and procedures followed for human tissue usage as defined by the National University of Singapore. Several of the 23 cohort samples were determined to be problematic samples and were subsequently removed from analysis. Those samples that were removed from analysis were samples designated as S9, 16, 18 and 24. Upon removal, the WGS sequencing data was reanalyzed based on the remaining 19 tumor/normal paired samples.
  • WGS whole genome sequencing
  • the validation sample cohort initially included 50 tumor/normal paired samples. However, it was determined that several samples were problematic wherein they were removed from the validation cohort. Finally, 39 tumor/normal paired samples were sequenced for the validation cohort. Sequencing was performed by whole exome sequencing (WES) on the 39 tumor/normal pairs and the data was analyzed for biomarkers correlating to gastric cancer. Small tissue aliquots (8- 10mm 3 ) were dissected from frozen tissue in liquid nitrogen for DNA extraction. Genomic DNA was extracted from fresh frozen tissue samples by phenol/chloroform extraction as known in the art.
  • WES whole exome sequencing
  • tissue was ground to a fine powder under liquid nitrogen and digested with 1 ml extraction buffer (0.5 % SDS, lOmM Tris HCL (pH 8), 100 mM EDTA ( pH 8 ), 20 ⁇ g/ ml pancreatic RNase) and 20 ⁇ Proteinase K (20mg/ml) at 55°C over night.
  • 1 ml extraction buffer 0.5 % SDS, lOmM Tris HCL (pH 8), 100 mM EDTA ( pH 8 ), 20 ⁇ g/ ml pancreatic RNase) and 20 ⁇ Proteinase K (20mg/ml) at 55°C over night.
  • the digested samples were mixed with the same volume of buffer saturated
  • Genomic DNA libraries from the pre-validation cohort samples were generated by adding 4 ⁇ g of sample DNA to methods as defined in the Paired End Sample prep kit PE- 102-1001 (Illumina®, Inc.) following manufacturer's protocol. Briefly, DNA fragments are generated by random shearing and conjugated to a pair of oligonucleotides in a forked adaptor configuration. The ligated products are amplified using two oligonucleotide primers, resulting in double-stranded blunt-ended products having a different adaptor sequence on either end.
  • Genomic DNA libraries from the validation cohort samples were generated by adding isolated sample DNA to methods as defined in the TruSeqTM Exome Enrichment kit (Illumina, Inc.) following manufacturer's protocol. Briefly, DNA fragments are generated and the fragments are adaptor ligated on both the ends of the fragment.
  • Biotinylated probes are hybridized to the targeted exomic regions of the fragments and those hybridization complexes are captured using streptavidin magnetic coated beads. The beads are captured and the hybridized targets are eluted from the beads for downstream applications.
  • clusters of DNA library fragments were formed prior to sequencing using the V3 cluster kit (Illumina®, Inc.). Briefly, products from a DNA library preparation are denatured and single strands annealed to complementary oligonucleotides on the flow-cell surface. A new strand is copied from the original strand in an extension reaction and the original strand is removed by denaturation. The adaptor sequence of the copied strand is annealed to a surface-bound complementary oligonucleotide, forming a bridge and generating a new site for synthesis of a second strand. Multiple cycles of annealing, extension and denaturation in isothermal conditions resulted in growth of clusters, each approximately 1 ⁇ in physical diameter.
  • the DNA in each cluster is linearized by cleavage within one adaptor sequence and denatured, generating single-stranded template for sequencing by synthesis (SBS) to obtain a sequence read.
  • SBS sequencing by synthesis
  • the products of read 1 are removed by denaturation, the template is used to generate a bridge, the second strand is re-synthesized and the opposite strand is then cleaved to provide the template for the second read.
  • WGS was performed using the Illumina®, Inc. V4 SBS kit with lOObp paired end reads on the Genome Analyzer IIx. Briefly, DNA templates are sequenced by repeated cycles of polymerase-directed single base extension. To ensure base-by-base nucleotide incorporation in a stepwise manner, a set of four reversible terminators, A, C, G and T each labelled with a different removable fluorophore are used.
  • A, C, G and T each labelled with a different removable fluorophore are used.
  • the use of modified nucleotides allows incorporation to be driven essentially to completion without risk of over-incorporation. It also enables addition of all four nucleotides simultaneously minimizing risk of misincorporation.
  • the Genome Analyzer IIx is designed to perform multiple cycles of sequencing chemistry and imaging to collect sequence data automatically from each cluster on the surface of each lane of an eight-lane flow cell.
  • Sequences were aligned using Elandv2e from CASAVA version 1.8 (Illumina®, Inc.) with full repeat resolution and orphan rescue (sensitive mode), to the human hgl9/GRCh37 reference sequence.
  • the aligned reads were aggregated and sorted into chromosomes based on alignment positions.
  • the sorted reads were used to call variants using Hyrax, a Bayesian SNV caller and GROUPER.
  • the callers are part of the standard CASAVA 1.8 distribution and were run with default parameters. This process was carried out for the tumor and normal genomes.
  • Somatic single nucleotide variant subtraction for calling somatic SNVs was performed by taking the list of positions in the tumor genome with snp quality values greater than 15 (Q(snp)tumor >15) and high confidence of the assigned genotype given the polymorphic prior (Q(max gt)tumor >20). For each putative SNV the normal sample was investigated. If a call was present in the normal sample at the same position as a putative SNV, and if the call had a quality value greater than 0 (Q(snp) normal >0), the position was filtered out as background.
  • the putative SNVs were recalled (using Hyrax) in the tumor sample, however for recalling additional information from candidate indel contigs constructed from the normal sample was used. This process was utilized to avoid any indels that were initially missed in the tumor due to low supporting evidence. A candidate SNV was called when there was complete agreement between the initial SNV call and the recall.
  • variants were also recalled in the normal sample, using Hyrax with all read filtering turned off. If the posterior probability of the tumor genotype was higher than a non- reference genotype, then that SNV was considered to have low confidence evidence in the normal sample and was discarded.
  • somatic indel subtraction for calling somatic insertion/deletions only those indels that were confidently called in the tumor sample and not present in the normal sample were considered. Indels in the tumor with a Q score less than 30 (Q(indel) ⁇ 30) and those positions that had less than 10 reads coverage in the normal sample (for a 3 OX build) were filtered out. To be considered as evidence, a read had a single read alignment score >10 and a paired read alignment score >90. Positions were excluded if they mapped within 1000 bases of a known centromere or telomere (as obtained from the reference genome hgl9/GRCh37) as these locations typically contain highly repetitive regions and read alignments are problematic.
  • the indel position was matched with the region and calls present in the normal sample. If a putative somatic indel position overlapped with an indel call originating from the normal sample, the indel was considered to be present in normal germline and hence the position was filtered out. Given the repetitive nature of the human genome, the putative somatic indel region was characterized by finding the shortest sequence around the indel that extended outside any repeats and that region was matched with each intersecting read in the normal sample. If there was evidence in the normal sample having the same pattern in the intersecting normal reads the candidate somatic indel was discarded.
  • one read was allowed in the normal sample to have the same indel as found in the tumor.
  • each class of variant was annotated against the Ensembl database release e59.
  • Each somatic variant was queried for overlapping annotated features. For all gene features, it was considered whether a consequence of the somatic variant was synonymous, non-synonymous, or nonsense or if the variant could disrupt a canonical splice site at an intron/exon boundary. For variants that fell in a coding exon, the consequence of the change was analyzed and reported. Regulatory regions (e.g., 3' and 5 ' untranslated regions) of the gene feature were also reported. For pre-validation samples, coding regions were sequenced at approximately 10X coverage, with the majority depth of read of 40X depth.
  • Table 5 exemplifies mutational locations from the original analysis of the 19 pre-validation sample cohort sequencing experiment (alignment to the hgl9/GRCh37 human reference genome). Table 5 serves to demonstrate the types of mutations that were identified on the pre-validation sample cohort. Table 5 -Exemplary data from the pre-validation 23 patient cohort samples

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Organic Chemistry (AREA)
  • Molecular Biology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Microbiology (AREA)
  • Urology & Nephrology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Genetics & Genomics (AREA)
  • Oncology (AREA)
  • Hospice & Palliative Care (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biochemistry (AREA)
  • Hematology (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Cell Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present disclosure provides biomarkers and methods of their use in determining the presence or absence of gastric cancer. In preferred embodiments, biomarkers identified comprise mutational hot spots which can be correlated with the presence of gastric cancer in a sample from an individual.

Description

GASTRIC CANCER BIOMARKERS
This patent application claims priority to United Stated provisional patent application serial no. 61/492,061 filed June 1, 2011 which is incorporated herein by reference in its entirety.
BACKGROUND
Gastric cancer was once the second most common cancer in the world. In most developed countries, however, rates of stomach cancer have declined over the past half century and in the United States stomach malignancy is currently the 14th most common cancer. In the United States, it is estimated that around 21,000 cases of gastric cancer were diagnosed in 2009, with more in intervening years, and more than half of those patients diagnosed would die of the disease. Indeed, gastric cancer is the 7th leading cause of cancer deaths in the United States.
Tremendous geographic variation exists in the incidence of the disease around the world. Rates of gastric cancer are highest in Asia and parts of South America, with the highest death rates recorded in Chile, Japan, South America and the former Soviet Union. In the US, gastric cancer is more prevalent in Asian and Pacific Islanders, followed by black, Hispanic, white, American Indian and Inuit populations. Most patients are older at diagnosis; the median age in the United States is 70 for males and 74 for females. The site of gastric cancer is classified on the basis of its relationship to the long axis of the stomach. Approximately 40% of cancers develop in the lower part, 40% in the middle part and 15% in the upper part. Five year survival rates of surgical resection ranges from 30-50%) for patients with stage II disease to 10-25%) for patients with stage III disease. Nevertheless, gastric cancer is still the second most common cause of cancer related death in the world and it remains difficult to cure, perhaps because most patients present with advanced disease. Even patients who present in the most favorable condition and who undergo surgical resection often die of recurrent disease. As such, what are needed are ways to detect gastric cancer earlier rather than later leading to earlier treatment regimens for a more positive prognosis of longer term survival.
SUMMARY The present disclosure identifies biological markers, or biomarkers, indicative of gastric cancer. The disclosed biomarkers represent hot spots in the genome that can be used to identify the presence of gastric cancer, wherein the methods described herein utilize the biomarkers and provide alternatives to currently available gastric cancer determinative, diagnostic and prognostic methodologies. The biomarkers and methods of their use as disclosed herein can be applied to the determination, diagnosis, and/or prognosis of gastric cancer either alone or in conjunction with other gastric disease diagnostic and prognostic assays.
In some embodiments, the present disclosure provides methods for determining, diagnosing and/or prognosing gastric cancer comprising detecting in a nucleic acid sample from a subject the presence of a plurality of mutations in two or more genes selected from the group comprising MUC4, MEC16, TP53, SACS, APJD1A, FLNA, FAT4, ASPM, AHNAK2, CEP290, PCLO, GPR98, CR2, AR, PLEC, MACF1 and MAPI A, wherein at least one of said genes is from the list consisting of SACS, FLNA, ASPM, PCLO, CR2, and MAPI A, evaluating the probability that the two or more genes are correlated with gastric cancer, and determining the presence of gastric cancer in a sample based on said evaluation. In some embodiments, the subject is a human subject and the nucleic acid samples are genomic DNA samples. In some embodiments, three or more genes are selected for determining, diagnosing and/or prognosing the presence of gastric cancer. In some embodiments, determining the presence of gastric cancer comprises comparing a tumor or test genomic DNA sample to a normal genomic DNA sample from the same individual. In some embodiments, evaluating the sample is performed by one or more of sequenceing such as sequence by synthesis methodologies, microarray analysis or polymerase chain reactions methodologies such as quantitative or real time PCR. In some embodiments, the genomic DNA sample is isolated from a sample selected from the group consisting of a tissue sample, a biopsy sample, a cell sample, a circulating tumor cell sample, a fixed tissue sample or a frozen tissue sample and wherein the normal sample is isolated from normal tissue, for example adjacent or proximal to the proposed tumor sample.
FIGURES
Figure 1 characterizes the pre -validation patient cohort from which the gastric cancer samples and matched normal pairs were derived. %TC reports the percentage of tumor cells identified in the tissue sample, histologic and clinical staging is performed as recognized in the art, for example as described in AJCC Cancer Staging Manual 7th Ed., p.422, Lauren classification was performed as known in the art, for example as described in Roukos et al, 2002, Gastric Breast Cancer 1 : 1-3, unk=unknown, ex=past smoker or drinker (alcohol), NB=no blood, NA= not available, Occ=occasionally and NR=nor recorded. Figure 2 characterizes the patient population from which the validation patient cohort of tumor/normal sample pairs was derived. NA=not available, ex=past smoker or drinker (alcohol).
DEFINITIONS As used herein, the term "sample" is intended to mean any biological fluid, cell, tissue, organ or portion thereof that contains genomic nucleic acids, for example genomic DNA or RNA, suitable for mutational detection via the disclosed methods. A test sample can include or be suspected to include a cell, such as a cell from any location in the stomach that contains or is suspected to contain a cancerous cell such as the cardia, the fundus of the stomach, the body of the stomach, the gastric antrum, the pylorus, the lesser curvature of the stomach, the greater curvature of the stomach and/or the overlapping lesion of the stomach. The term includes samples present from an individual as well as samples obtained or derived from an individual. For example, a sample can be a histologic section of a specimen obtained by biopsy, aspiration, etc. or cells that are placed in or adapted to tissue culture. A sample further can be a sub-cellular fraction or extract, or a crude or isolated nucleic acid molecule. A patient matched normal sample can be used to establish a mutational background for comparison to a patient test sample. An exemplary patient matched sample is a tissue or cell sample from an adjacent normal tissue to a suspected cancerous tissue or a blood sample from the patient.
A sample may be obtained in a variety of ways known in the art. Samples may be obtained according to standard techniques from all types of biological sources that are usual sources of genomic DNA including, but not limited to cells or cellular components which contain DNA, cell lines, circulating tumor cells, biopsies, bodily fluids such as blood, tissue samples such as tissue that are formalin fixed and embedded in paraffin such as tissue from the fundus of the stomach, the body of the stomach, the gastric antrum, the pylorus, the lesser curvature of the stomach, the greater curvature of the stomach and/or the overlapping lesion of the stomach, and all possible combinations thereof. Further, tissues can be fresh, fresh frozen, etc. Accordingly, a sample can be from an archived, stored or fresh source as suits a particular application of the methods set forth herein. In particular embodiments, the methods described herein can be performed on one or more samples from gastric cancer patients such as samples obtained by tissue biopsy or needle aspiration. Sample analysis can be applied, for example, to the presence or absence of gastric cancer, differentiation between early and/or late stage gastric cancer types, gastric cancer epithelial type differentiation, or to monitor cancer progression or response to treatment.
A suitable sample can be collected and acquired that is either known to comprise gastric cancer cells or is subsequent to the formulation of the diagnostic aim of a biomarker as disclosed herein. A sample can be derived from a population of cells or from a tissue that is predicted to be afflicted with or phenotypic of gastric cancer. The genomic DNA can be derived from a high-quality source such that the sample contains only the tissue type of interest, minimum contamination and minimum DNA
fragmentation. In particular, samples are contemplated to be representative of the tissue or cell type of interest that is to be handled by an assay. In addition, a population or set of samples from an individual source can be analyzed to maximize confidence in the results for an individual. In some embodiments, a sample from an individual is matched and compared to a normal sample from that same individual to identify the mutational status of biomarkers for that individual. The normal sample, or patient matched normal sample, can be from the same or similar organ, tissue or fluid as the sample to which it is compared. The normal sample will typically display a phenotype that is different from a phenotype of the sample to which it is compared.
As used herein, the term "isolated" or "purified" when used in relation to a nucleic acid refers to a nucleic acid sequence that is extracted and separated from at least one component or contaminant with which it is ordinarily associated in its natural source. As such, an isolated or purified nucleic acid is present in a form or setting that is different from that in which it is found in nature.
As used herein, the terms "marker" or "biomarker" can be DNA or RNA, proteins, polypeptides, variants, fragments or functional equivalents thereof. In the present disclosure, a biomarker is generally associated with a genomic nucleic acid such as a gene or gene associated region or location unless specified otherwise. Biomarkers disclosed herein that are associated with gastric cancer, a particular type of gastric cancer and/or a particular stage of gastric cancer comprise one or more single nucleotide variants and/or insertions/deletions (indels) located in a gene or gene associated region as compared to its equivalent in a normal sample. A gene that contains one or more somatic mutations, such as variant mutations, identified in one or more patient samples is contemplated to be a biomarker that is useful in detecting, diagnosing or prognosing gastric cancer, a particular type of gastric cancer and/or a stage of gastric cancer.
As used herein, the term "somatic mutation" is an alteration in the genome that occurs after conception resulting in a genetic difference of the genome at that particular location. Somatic mutations can occur in any cell in the body except the germ cells and are passed to the cell progeny during cell division. Somatic mutations include, but are not limited to, point mutations such as single nucleotide variants (SNVs), gene amplification or duplication, genetic insertions and/or deletions (indels), chromosomal translocations, chromosomal inversions and single nucleotide polymorphisms (SNPs). Somatic mutations can result in phenotypic changes, disease formation, cancer, etc. "Somatic mutations" as used herein, unless otherwise stated, include SNVs, SNPs and/or indels present in genomic DNA, and are considered variant mutations, or mutations that may result in phenotypic changes, disease formation, cancer, etc. Identification of somatic variants described herein was performed by identifying a
SNV or indel in the patient cancer sample which was not present in the patient matched normal sample. If a somatic mutation was found in the cancer sample which was not present in the normal sample, then that mutation was identified as a SNV or indel, as the case may be. If a somatic mutation was present in both the cancer sample and the normal sample, then that mutation was considered part of that patient's genetic background and was not considered a variant. A gene from one or more patient samples that has one or more variant mutations is considered a biomarker and useful in detecting, diagnosing and prognosing gastric cancer, a particular type of gastric cancer and/or a stage of gastric cancer. The term "gene" refers to a nucleic acid sequence, such as DNA, that comprises coding sequences associated with the production of a polypeptide, precursor, or RNA (e.g., rRNA, tRNA). Typically, a gene also includes non-coding and intergenic sequences. The term can encompass the coding region of a gene and the sequences located adjacent to the coding region on both the 5' and 3' ends such that the gene corresponds to the length of the full-length m NA. Sequences located 5' of the coding region and present on the mRNA are referred to as 5' non-translated sequences.
Sequences located 3' or downstream of the coding region and present on the mRNA are referred to as 3' non-translated sequences. The term "gene" encompasses both cDNA and genomic forms of a gene. A genomic form or clone of a gene contains the coding region interrupted with non-coding sequences such as introns, intervening regions, intervening sequences or intergenic regions.
DETAILED DESCRIPTION Gastric carcinoma is the second leading cause of death of all malignancies worldwide and mortality of diagnosed gastric cancer remains high. In the United States and Western Europe, even though there has been a dramatic decrease in the incidence of gastric cancer and despite advances in treatment regimens, once gastric cancer is diagnosed the prognosis is not favorable. In China, Japan, Korea and other Eastern countries gastric cancer incidence has decreased slowly but is still a leading cause of cancer related deaths in these regions. The problem is that oftentimes gastric cancer is diagnosed at a late stage which leads to high mortality despite new treatment alternatives. As such, methods for diagnosing and determining the presence of gastric cancer at an early stage would allow for earlier treatment and provide for a more favorable prognosis for the afflicted patient. The present disclosure solves this problem by providing biomarkers which represent mutational hot spots which can be used in determining the presence of gastric cancer in a subject.
Gastric cancer typically begins in the stomach. The stomach is divided into three parts; the upper third, or proximal stomach, abuts the esophagus and consists of the gastroesophageal junction, the cardia and the fundus, the middle third of the stomach, and the body, or the lower third, which abuts the small intestine and consists of the antrum and the pylorus (empties into the duodenum). There are several layers which make up the lining of the stomach; the mucosa, submucosa, muscularis propria, subserosa and serosa. If the gastric cancer is detected early, typically the cancerous tissue is confined to the mucosa and submucosa. However, if the cancer progresses into the muscularis propria and the serosa (advanced gastric cancer) the cancerous cells can infiltrate into surrounding organs (spleen, colon, liver, pancreas, etc.) and can further metastasize into the peritoneal cavity and other secondary metastatic locations (i.e., liver, lungs, lymph nodes, etc.).
Gastric cancer or gastric carcinoma is adenocarcinoma of the stomach that makes up around 90% of all stomach malignancies, the remainder being mainly gastric lymphomas. Gastric carcinomas can be classified in a number of ways. Historically, the World Health Organization classification guidelines segregated gastric cancer into adenocarcinoma, signet ring-cell carcinoma or undifferentiated carcinoma. However, the Lauren classification which is gaining favor and is utilized worldwide divides gastric carcinomas into two subtypes; intestinal and diffuse. Further, anatomical and pathological staging and grading is performed on tissues to determine the extent and prognosis of a patient. The Lauren classification is a system used to describe gastric cancers based on intestinal or diffuse type histology; intestinal histology being associated with a more favorable prognosis. In most centers, gastric adenocarcinoma is classified according to its degree of differentiation and histologic types according to the WHO classification. However, it has been reported that the Lauren classification has value from an epidemiologic and prognostic standpoint. The Lauren classification is used in Europe and the rest of the world, but has yet to gain universal acceptance in the United States. The Lauren histologic classification of gastric adenocarcinoma into intestinal and diffuse has been emphasized in epidemiologic studies. Intestinal type gastric carcinoma is characterized by; 1) a mean age of detection of 55 years, 2) commonly presents as an exophytic intraluminal mass with an expansile growth pattern as it infiltrates the wall, 3) has tubular, papillary and solid microscopic patterns with Mucin being restricted to the gland lumina, 4) a 5 year survival rate of approximately 20% and 5) an almost 100% association with intestinal metaplasia and H. pylori infection. Diffuse type of gastric carcinoma is characterized by; 1) tends to occur in younger patients (mean age at diagnosis is 48), 2) commonly presents as an ulcerative, infiltrative tumor with a diffusely infiltrative pattern of growth in the gastric wall, 3) characterized
microscopically by poorly differentiated, decohesive cells, often a signet ring cell type and often associated with intra and extracellular mucin, 4) a poor prognosis with a 5 year survival rate of <10%, and 5) lower association with intestinal metaplasia and H. pylori infection than for intestinal type of gastric carcinoma. A more expanded description of the Lauren classification system can be found at, for example, Chandrasoma P,
Gastrointestinal Pathology, 1999, Appleton and Lange; Roukos et al, 2002, Gastric Breast Cancer 1 : 1-3 and Lauren, 1965, Acta Pathol Microbiol Scand 64:31-49
(incorporated herein by reference in their entireties). Methods and determinative procedures for WHO staging, grading and
classification of gastric and other cancers are well known in the art, as such an exhaustive description of each is not dealt with here. Additional information in the different methods of staging and classification can be found at, for example, AJCC Cancer Staging Manual 7th Ed., p.422 which is incorporated herein by reference in its entirety.
However, the staging and grading of cancers can be subjective and relies on a diagnostician to interpret morphology, histology, anatomy and other related indices. As such, there is a critical need for tools, methods and strategies that can be used for detecting, diagnosing, and prognosing gastric cancer in a patient. Experiments were conducted as disclosed herein to identify biomarkers useful for detecting, diagnosing, and prognosing gastric cancer in a patient. Exemplary biomarkers are described in the Figures and Tables herein.
A pre-validation patient cohort was sequenced as described herein and a candidate list of approximately 266 gene markers (Table 1) comprising three or more variant mutations within the gene marker, or biomarker, was compiled from an original list identifying approximately 5200 genes of interest for their potential use as gastric cancer related biomarkers. Chromosomal locations identified in Table 1 are as found in the Archive EnsEMBL Human database, release 59-Aug 2010
(http://aug2010.archive.ensembl.org/Homo_sapiens/Info/Index) which provides human genomic data as assembled from the Genome Reference Consortium (GRC). The GRC consists of the Wellcome Trust Sanger Institute, the Genome Center at Washington University, the European Bioinformatics Institute and the National Center for
Biotechnology Information.
Table 1 -Exemplary list of candidate genes
Figure imgf000011_0001
solute carrier organic mucin 4, cell surface
anion transporter family,
associated chr3:195473636-
MUC4 SLC04C1 member 4C1
[Source:HGNC 195539148
[Source:HGNC
Symbol;Acc:7514] chr5: 101569690- Symbol;Acc:23612] 101632253 sodium channel, voltage- titin gated, type V, alpha
chr2:179390716-
TTN [Source:HGNC SCN5A subunit
179695529
Symbol; Acc: 12403] [Source:HGNC chr3:38589548- Symbol; Acc: 10593]
38691164
ATPase, class VI, type
nebulin
chr2:152341850- 11A chrl3:113344643-
NEB [Source:HGNC ATP11A
152591001 [Source:HGNC 113541482
Symbol;Acc:7720]
Symbol;Acc:13552]
mucin 16, cell surface chromodomain helicase
associated chrl9:9038078- DNA binding protein 3 chrl7:7788124-
MUC16 CHD3
[Source:HGNC 9091814 [Source:HGNC 7816078 Symbol;Acc:15582] Symbol;Acc:1918] chromodomain helicase
tumor protein p53
chrl7:7565257- DNA binding protein 6 chr20:40030741-
TP53 [Source:HGNC CHD6
7590856 [Source:HGNC 40247133
Symbol;Acc:11998]
Symbol; Acc: 19057]
sodium channel, voltage-
CUB and Sushi
gated, type IV, alpha
multiple domains 3 chr8:113235157- chrl7:62015914-
CSMD3 SCN4A subunit
[Source:HGNC 114449328 62050278
[Source:HGNC
Symbol;Acc:19291]
Symbol; Acc: 10591]
spectrin repeat
containing, nuclear protocadherin alpha 11
chr6:152442822- chr5: 140247722-
SYNE1 envelope 1 PCDHA11 [Source:HGNC
152958534 140391929 [Source:HGNC Symbol; Acc:8665]
Symbol; Acc: 17089]
collagen, type XI, HECT, UBA and WWE
alpha 1 chrl:103342023- domain containing 1 chrX:53559057-
C0L11A1 HUWE1
[Source:HGNC 103574052 [Source:HGNC 53713673 Symbol;Acc:2186] Symbol;Acc:30892]
ATP-binding cassette,
sub-family A (ABC1), zinc finger homeobox 4
chr7:48211055- chr8:77593523-
ABCA13 member 13 ZFHX4 [Source:HGNC
48687092 77779521 [Source:HGNC Symbol;Acc:30939]
Symbol; Acc: 14638]
hemicentin 1
chrl:185703683- AC007342. chrl6:53399801-
HMCN1 [Source:HGNC uncharacterized protein
186160085 2 53406969
Symbol;Acc:19194]
AHNAK nucleoprotein
absent in melanoma 1
2 chrl4:105403591- chr6: 106959730-
AHNAK2 AIM1 [Source:HGNC
[Source:HGNC 105444694 107018335
Symbol;Acc:356]
Symbol;Acc:20125]
B melanoma antigen sterile alpha motif
family, member 3 chr7:151832007- domain containing 9-like chr7:92759368-
MLL3 SAMD9L
[Source:HGNC 152133628 [Source:HGNC 92777682 Symbol;Acc:15728] Symbol; Acc: 1349] ataxia telangiectasia kinesin family member and Rad3 related chr3:142168077- 21A chrl2:39687031-
ATR KIF21A
[Source:HGNC 142297668 [Source:HGNC 39837192 Symbol; Acc:882] Symbol;Acc:19349]
centrosomal protein olfactory receptor, family
290kDa chrl2:88442791- 7, subfamily C, member 1 chrl9: 14909986-
CEP290 OR7C1
[Source:HGNC 88535084 [Source:HGNC 14910948 Symbol;Acc:29021] Symbol;Acc:8373]
collagen, type VI, adenylate cyclase 8
alpha 5 chr3:130064359- (brain) chr8: 131792554-
COL29A1 ADCY8
[Source:HGNC 130203688 [Source:HGNC 132052835 Symbol;Acc:26674] Symbol;Acc:239]
piccolo (presynaptic WAS protein family,
cytomatrix protein) chr7:82383321- member 3 chrl3:27131840-
PCLO WASF3
[Source:HGNC 82792200 [Source:HGNC 27263085 Symbol; Acc: 13406] Symbol; Acc: 12734]
polymerase (DNA
ankyrin 2, neuronal
chr4:113739265- directed), theta chr3: 121150274-
ANK2 [Source:HGNC POLQ
114304896 [Source:HGNC 121265488 Symbol; Acc:493]
Symbol;Acc:9186]
cell division cycle 45
reelin
chr7:103112231- homolog (S. cerevisiae) chr22:19466982-
RELN [Source:HGNC CDC45L
103629963 [Source:HGNC 19508135 Symbol;Acc:9957]
Symbol; Acc: 1739]
aldo-keto reductase
SEC31 homolog A (S.
family 1, member Bl
chrX:134127127- cerevisiae) chr4:83739814-
AR (aldose reductase) SEC31A
134144036 [Source:HGNC 83822319 [Source:HGNC
Symbol; Acc: 17052]
Symbol;Acc:381]
calcium channel,
voltage-dependent, DEAH (Asp-Glu-Ala-His)
alpha 2/delta subunit chr7:81575760- box polypeptide 9 chrl: 182808504-
CACNA2D1 DHX9
1 82073114 [Source:HGNC 182856886
[Source:HGNC Symbol;Acc:2750]
Symbol; Acc: 1399]
fibrillin 3 WD repeat domain 47
chrl9:8130287- chrl: 109512836-
FBN3 [Source:HGNC WDR47 [Source:HGNC
8212650 109584850 Symbol; Acc: 18794] Symbol;Acc:29141]
Aminopeptidase Q
(AP-Q)(EC 3.4.11.- olfactory receptor, family
)(Laeverin)(CHL2
chr5:115298151- 6, subfamily F, member 1 chrl:247875131-
LVRN antigen) OR6F1
115363316 [Source:HGNC 247876057
Symbol; Acc: 15027]
[Source:UniProtKB/Swi
ss-Prot;Acc:Q6Q4G3]
obscurin, cytoskeletal
calmodulin and titin- HNF1 homeobox A
chrl:228395831- chrl2:121416346-
OBSCN interacting RhoGEF HNF1A [Source:HGNC
228566575 121440899 [Source:HGNC Symbol;Acc:11621]
Symbol;Acc:15719]
spastic ataxia of
Charlevoix-Saguenay mucin 19, oligomeric
chrl3:23902965- chrl2:40787197-
SACS (sacsin) MUC19 [Source:HGNC
24007841 40964634
[Source:HGNC Symbol; Acc: 14362]
Symbol;Acc:10519] ryanodine receptor 2 kinesin family member (cardiac) chrl:237205505- 26A chrl4: 104605060-
RYR2 KIF26A
[Source:HGNC 237997288 [Source:HGNC 104647235 Symbol; Acc: 10484] Symbol;Acc:20226]
FAT tumor suppressor
nucleosome assembly
homolog 3
chrll:92085262- protein 1-like 3 chrX:92925934-
FAT3 (Drosophila) NAP1L3
92629636 [Source:HGNC 92928608 [Source:HGNC
Symbol;Acc:7639]
Symbol;Acc:23112]
FERM domain containing
tolloid-like 1
chr4:166794410- 8 chrll:65154041-
TLL1 [Source:HGNC FRMD8
167025047 [Source:HGNC 65180990 Symbol;Acc:11843]
Symbol;Acc:25462]
Uncharacterized Down syndrome cell
protein C5orf42 chr5:37106330- adhesion molecule chr21:41382926-
C5orf42 DSCAM
[Source:UniProtKB/Swi 37247894 [Source:HGNC 42219065 ss-Prot;Acc:Q9H799] Symbol;Acc:3039]
potassium large
conductance calcium- ras responsive element
activated channel,
chrl0:78629360- binding protein 1 chr6:7107830-
KCNMA1 subfamily M, alpha RREB1
79398353 [Source:HGNC 7252213 member 1
Symbol; Acc: 10449]
[Source:HGNC
Symbol;Acc:6284]
G protein-coupled deleted in azoospermia- receptor 98 chr5:89825161- like chr3: 16628299-
GPR98 DAZL
[Source:HGNC 90460038 [Source:HGNC 16711813 Symbol;Acc:17416] Symbol;Acc:2685]
Putative
uncharacterized
protein fibrinogen alpha chain
chrl:145209145- chr4: 155504278-
AL592307.2 ENSP00000345684 FGA [Source:HGNC
145370303 155511918
Symbol;Acc:3661]
[Source:UniProtKB/TrE
MBL;Acc:A6NDV3]
collagen, type VII, leucine rich repeat
alpha 1 chr3:48601506- containing 7 chrl:70034081-
C0L7A1 LRRC7
[Source:HGNC 48632700 [Source:HGNC 70589167 Symbol;Acc:2214] Symbol;Acc:18531]
ATPase, class V, type Werner syndrome, RecQ
10A chrl5:25923859- helicase-like chr8: 30890778-
ATP10A WRN
[Source:HGNC 26110319 [Source:HGNC 31031276 Symbol;Acc:13542] Symbol;Acc:12791]
serine peptidase
zinc finger protein 800
inhibitor, Kazal type 5 chr5:147443535- chr7: 126986844-
SPINK5 ZNF800 [Source:HGNC
[Source:HGNC 147516925 127071978
Symbol;Acc:27267]
Symbol; Acc: 15464]
cadherin, EGF LAG
seven-pass G-type karyopherin (importin)
receptor 3 (flamingo chr3:48673902- beta 1 chrl7:45727275-
CELSR3 KPNB1
homolog, Drosophila) 48700348 [Source:HGNC 45760998 [Source:HGNC Symbol;Acc:6400]
Symbol;Acc:3230]
Notch homolog 2 tripartite motif- (Drosophila) chrl:120454176- containing 59 chr3:160153291-
NOTCH 2 TRIM59
[Source:HGNC 120612274 [Source:HGNC 160203561 Symbol;Acc:7882] Symbol;Acc:30834] ring finger protein 43 contactin 6
chrl7:56429861- chr3: 1134260-
RNF43 [Source:HGNC CNTN6 [Source:HGNC
56494480 1445901 Symbol; Acc: 18505] Symbol;Acc:2176]
StAR-related lipid sodium channel, voltage- transfer (START) gated, type IX, alpha
chrX:67867508- chr2: 167051695-
STARD8 domain containing 8 SCN9A subunit
67945684 167232503 [Source:HGNC [Source:HGNC
Symbol;Acc:19161] Symbol; Acc: 10597]
protein tyrosine
Keratin-81-like protein
phosphatase, receptor
chrl2:80837534- AC021066. chrl2:52644250-
PTPRQ type, Q
81072802 1 [Source: UniProtKB/Swiss- 52652337 [Source:HGNC
Prot;Acc:A6NCN2]
Symbol;Acc:9679]
cadherin, EGF LAG
seven-pass G-type Mdm4 p53 binding
receptor 2 (flamingo chrl:109792641- protein homolog (mouse) chrl:204485511-
CELSR2 MDM4
homolog, Drosophila) 109818377 [Source:HGNC 204596130 [Source:HGNC Symbol;Acc:6974]
Symbol;Acc:3231]
AT rich interactive odz, odd Oz/ten-m
domain 1A (SWI-like) chrl:27022522- homolog 4 (Drosophila) chrll:78364329-
ARID1A ODZ4
[Source:HGNC 27108601 [Source:HGNC 79151695 Symbol;Acc:11110] Symbol;Acc:29945]
von Willebrand factor
coiled-coil domain
A domain containing
chr2:98703579- containing 88A chr2:55514978-
VWA3B 3B CCDC88A
98929762 [Source:HGNC 55647057
[Source:HGNC
Symbol;Acc:25523]
Symbol;Acc:28385]
ubiquitin protein
ligase E3 component rhotekin 2
chr8:103265572- chrl0:63942794-
UBR5 n-recognin 5 RTKN2 [Source:HGNC
103424495 64028466 [Source:HGNC Symbol; Acc: 19364]
Symbol; Acc: 16806]
myosin, heavy chain cubilin (intrinsic factor- 11, smooth muscle chrl6:15796992- cobalamin receptor) chrl0:16865963-
MYH11 CUBN
[Source:HGNC 15950890 [Source:HGNC 17171830 Symbol;Acc:7569] Symbol;Acc:2548]
coagulation factor VIII,
procoagulant zinc finger protein 66
chrX: 154064063- chrl9:20973438-
F8 component ZNF66 [Source:HGNC
154255215 20991922
[Source:HGNC Symbol;Acc:13135]
Symbol;Acc:3546]
immunoglobulin nuclear autoantigenic
superfamily, member sperm protein (histone- chr3:151143172- chrl:46049518-
IGSF10 10 NASP binding)
151176497 46084566
[Source:HGNC [Source:HGNC
Symbol;Acc:26384] Symbol;Acc:7644]
phosphoinositide-3- cat eye syndrome
kinase, catalytic, alpha chromosome region,
chr3:178865902- chr22:17840837-
PIK3CA polypeptide CECR2 candidate 2
178957881 18033845 [Source:HGNC [Source:HGNC
Symbol;Acc:8975] Symbol; Acc: 1840] 5-hydroxytryptamine
DEAH (Asp-Glu-Ala- (serotonin) receptor 7
Asp/His) box
chr2:39024871- (adenylate cyclase- chrl0:92500580-
DHX57 polypeptide 57 HTR7
39103075 coupled) 92617455 [Source:HGNC
[Source:HGNC
Symbol;Acc:20086]
Symbol;Acc:5302]
plectin KIAA2018
chr8:144989321- chr3:113367232-
PLEC [Source:HGNC KIAA2018 [Source:HGNC
145049543 113415493 Symbol; Acc:9069] Symbol;Acc:30494]
Putative
uncharacterized dynein, axonemal, heavy
chrHSCHR6_MHC
protein chain 12 chr3:57310578-
C6orfl0 _DBB:32236325- DNAH12
ENSP00000411702 [Source:HGNC 57530071
32315209
[Source:UniProtKB/TrE Symbol;Acc:2943]
MBL;Acc:C9J558]
methionyl zinc finger, ZZ-type with
aminopeptidase 2 chr2:95867822- EF-hand domain 1 chrl7:3907739-
MAP2 ZZEF1
[Source:HGNC 95909615 [Source:HGNC 4046314 Symbol;Acc:16672] Symbol;Acc:29027]
phospholipase Dl,
MDN1, midasin
phosphatidylcholine- homolog (yeast) chr6:90352218- chr3:171318195-
MDN1 PLD1 specific
[Source:HGNC 90529442 171528740
[Source:HGNC
Symbol; Acc: 18302]
Symbol; Acc:9067]
phospholipase A2
multimerin 1
chr4:90800683- receptor 1, 180kDa chr2: 160788519-
MMRN1 [Source:HGNC PLA2R1
90875780 [Source:HGNC 160919121
Symbol;Acc:7178]
Symbol; Acc:9042]
adaptor-related
interleukin 1 receptorprotein complex 3,
chr5:77296349- like 2 chr2: 102803433-
AP3B1 beta 1 subunit IL1RL2
77590528 [Source:HGNC 102856462 [Source:HGNC
Symbol;Acc:5999]
Symbol;Acc:566]
catenin (cadherin- interferon, gamma- associated protein),
chr3:41236328- inducible protein 16 chrl: 158969758-
CTNNB1 beta 1, 88kDa IFI16
41301587 [Source:HGNC 159024945 [Source:HGNC
Symbol;Acc:5395]
Symbol;Acc:2514]
follicle stimulating additional sex combs like
hormone receptor chr2:49189296- 2 (Drosophila) chr2:25960557-
FSHR ASXL2
[Source:HGNC 49381676 [Source:HGNC 26101385 Symbol;Acc:3969] Symbol;Acc:23805]
cyclin-dependent kinase
KIAA1109
chr4:123073488- 13 chr7:39989636-
KIAA1109 [Source:HGNC CDC2L5
123283907 [Source:HGNC 40136733
Symbol;Acc:26953]
Symbol;Acc:1733]
transient receptor
potential cation nuclear receptor binding
channel, subfamily M, chrl5:50852493- SET domain protein 1 chr5: 176560026-
TRPM7 NSD1
member 7 50978995 [Source:HGNC 176727216 [Source:HGNC Symbol; Acc: 14234]
Symbol; Acc: 17994]
centlein, centrosomal non-SMC condensin II
protein chr9:17134980- complex, subunit D3 chrll:134022337-
CNTLN NCAPD3
[Source:HGNC 17503921 [Source:HGNC 134094426 Symbol;Acc:23432] Symbol;Acc:28952] BR serine/threonine
KIAA0182
chrl6:85645023- kinase 1 chrl9:55795327-
KIAA0182 [Source:HGNC BRSK1
85709812 [Source:HGNC 55823901
Symbol;Acc:28979]
Symbol; Acc: 18994]
Putative
uncharacterized FtsJ methyltransferase
protein chrll:49854689- domain containing 1 chrl6:71316203-
AC130364.1 FTSJD1
ENSP00000385623 49860872 [Source:HGNC 71323512
[Source:UniProtKB/TrE Symbol;Acc:25635]
MBL;Acc:B5MC43]
RAB3 GTPase
activating protein glutamate receptor,
subunit 2 (non- chrl:220321635- ionotropic, AMPA 1 chr5: 152870204-
RAB3GAP2 GRIA1
catalytic) 220445796 [Source:HGNC 153193240 [Source:HGNC Symbol;Acc:4571]
Symbol;Acc:17168]
additional sex combs microtubule-associated
like 1 (Drosophila) chr20:30946153- protein 1A chrl5:43803156-
ASXL1 MAP1A
[Source:HGNC 31027122 [Source:HGNC 43823818 Symbol;Acc:18318] Symbol;Acc:6835]
gamma-aminobutyric
ubiquitin protein
acid (GABA) A receptor,
ligase E3A chrl5:25582396- chrl5:27216683-
UBE3A GABRG3 gamma 3
[Source:HGNC 25684128 27778373
[Source:HGNC
Symbol; Acc: 12496]
Symbol; Acc:4088]
DNA replication helicase
otogelin
chrll:17568920- 2 homolog (yeast) chrl0:70173821-
OTOG [Source:HGNC DNA2
17668697 [Source:HGNC 70231879
Symbol;Acc:8516]
Symbol;Acc:2939]
folliculin interacting arginine-glutamic acid
protein 1 chr5:130761584- dipeptide (RE) repeats chrl:8412457-
FNIP1 RERE
[Source:HGNC 131132756 [Source:HGNC 8877702 Symbol;Acc:29418] Symbol; Acc:9965]
apolipoprotein B
lysine (K)-specific
(including Ag(x)
chr2:21224301- demethylase 5B chrl:202696526-
APOB antigen) KDM5B
21266945 [Source:HGNC 202778598 [Source:HGNC
Symbol; Acc: 18039]
Symbol;Acc:603]
Serine/threonine- protein kinase 19 (EC collagen alpha-2(XI) chain
chrHSCHR6_MHC_Q 2.7.11.1)(Protein chr8:31938868- isoform 3 preproprotein
RP1 COL11A2 BL: 33059284- RPl)(Protein Gil) 31949228 [Source:RefSeq
33089101
[Source:UniProtKB/Swi peptide; Acc: N P_542410]
ss-Prot;Acc:P49842]
REV3-like, catalytic
subunit of DNA transcription factor A,
polymerase zeta chr6:111620234- mitochondrial chrl0:60144782-
REV3L TFAM
(yeast) 111804918 [Source:HGNC 60158981
[Source:HGNC Symbol;Acc:11741]
Symbol; Acc:9968]
SAM and SH3 domain
pappalysin 2
chrl:176432307- containing 1 chr6: 148593440-
PAPPA2 [Source:HGNC SASH1
176814735 [Source:HGNC 148873186 Symbol;Acc:14615]
Symbol;Acc:19182] ATP-binding cassette,
sub-family B folliculin
chr7:20654830- chrl7:17115526-
ABCB5 (MDR/TAP), member 5 FLCN [Source:HGNC
20816658 17140502 [Source:HGNC Symbol;Acc:27310]
Symbol; Acc:46]
transient receptor
laminin, alpha 5 potential cation channel,
chr20:60883011- chrX:111017543-
LAMA5 [Source:HGNC TRPC5 subfamily C, member 5
60942368 111326004
Symbol;Acc:6485] [Source:HGNC
Symbol;Acc:12337]
asp (abnormal spindle)
homolog,
ubiquitin protein ligase
microcephaly
chrl:197053258- E3B chrl2:109915439-
ASPM associated UBE3B
197115824 [Source:HGNC 109974504
(Drosophila)
Symbol; Acc: 13478]
[Source:HGNC
Symbol; Acc: 19048]
leucine-rich repeats
and calponin myosin light chain kinase,
homology (CH) chrX: 114345185- smooth muscle isoform 8 chr3:123332952-
LRCH2 MYLK
domain containing 2 114468635 [Source:RefSeq 123339141 [Source:HGNC peptide;Acc:NP_444260]
Symbol;Acc:29292]
protocadherin 10 helicase (DNA) B
chr4:134070470- chrl2:66696360-
PCDH10 [Source:HGNC HELB [Source:HGNC
134129356 66732003 Symbol; Acc: 13404] Symbol;Acc:17196]
complement
Uncharacterized protein
component
FU40176
(3d/Epstein Barr virus) chrl:207627575- chrl3:103381801-
CR2 AL157769.3
receptor 2 207663240 103389159
[Source: UniProtKB/Swiss- [Source:HGNC
Prot;Acc:Q8NDH2]
Symbol;Acc:2336]
amyotrophic lateral
sclerosis 2 (juvenile)
Kazrin
RP1- chrl:14925200- chromosome region, chr2:203776937-
[Source:UniProtKB/Swi ALS2CR8
21018.1 15444539 candidate 8 203851060 ss-Prot;Acc:Q674X7]
[Source:HGNC
Symbol; Acc: 14435]
UPF0634 protein C
(Protein immuno- SET domain containing
reactive with anti-PTH chr2:96504520- IB chrl2:122242638-
AC073995.2 SETD1B
polyclonal antibodies) 96657607 [Source:HGNC 122270562 [Source:UniProtKB/Swi Symbol;Acc:29187]
ss-Prot;Acc:Q5JPF3]
MAP7 domain eukaryotic translation
containing 3 chrX: 135295381- initiation factor 2C, 2 chr8: 141541265-
MAP7D3 EIF2C2
[Source:HGNC 135333738 [Source:HGNC 141645645 Symbol;Acc:25742] Symbol;Acc:3263]
glutamate receptor,
filamin A, alpha
chrX: 153576892- metabotropic 7 chr3:6811688-
FLNA [Source:HGNC GRM7
153603006 [Source:HGNC 7783215
Symbol;Acc:3754]
Symbol; Acc:4599]
dynein, axonemal,
neuron navigator 3
heavy chain 5 chr5:13690440- chrl2:78225069-
DNAH5 NAV3 [Source:HGNC
[Source:HGNC 13944652 78606790
Symbol; Acc: 15998]
Symbol;Acc:2950] terminal uridylyl
transferase 1, U6 neurexin 3
chrll:62327075- chrl4:79081275-
TUTl snRNA-specific NRXN3 [Source:HGNC
62359649 80330758 [Source:HGNC Symbol; Acc:8010]
Symbol;Acc:26184]
lectin, mannose- binding, 1 chrl8:56995055- RP11- chr4: 100557686-
LMAN1 uncharacterized protein
[Source:HGNC 57026503 766F14.2 100575805 Symbol;Acc:6631]
FAT tumor suppressor
myosin phosphatase Rho
homolog 4
chr4:126237554- interacting protein chrl7:16946074-
FAT4 (Drosophila) MPRIP
126414087 [Source:HGNC 17088874 [Source:HGNC
Symbol;Acc:30321]
Symbol;Acc:23109]
KIAA1199 hemicentin 2
chrl5:81071684- chr9: 133223932-
KIAA1199 [Source:HGNC HMCN2 [Source:HGNC
81244117 133309510
Symbol;Acc:29213] Symbol;Acc:21293]
transient receptor
potential cation adenomatous polyposis
channel, subfamily M, chr9:77337411- coli chr5: 112043195-
TRPM6 APC
member 6 77503010 [Source:HGNC 112181936 [Source:HGNC Symbol;Acc:583]
Symbol;Acc:17995]
ADAM
metallopeptidase SATB homeobox 1
chr8:38965050- chr3: 18386864-
ADAM32 domain 32 SATB1 [Source:HGNC
39142435 18487080 [Source:HGNC Symbol; Acc: 10541]
Symbol; Acc: 15479]
dynein, axonemal,
COBL-like 1
heavy chain 11 chr7:21582833- chr2: 165510134-
DNAH11 COBLL1 [Source:HGNC
[Source:HGNC 21941457 165700189
Symbol;Acc:23571]
Symbol;Acc:2942]
ADAM
metallopeptidase ryanodine receptor 3
chr2:207308263- chrl5:33603177-
ADAM23 domain 23 RYR3 [Source:HGNC
207485851 34158299 [Source:HGNC Symbol; Acc: 10485]
Symbol;Acc:202]
UPF3 regulator of
nonsense transcripts artemin
chrl3:115047059- chrl:44398992-
UPF3A homolog A (yeast) NBN [Source:HGNC
115071283 44402912 [Source:HGNC Symbol;Acc:727]
Symbol;Acc:20332]
zinc finger and BTB
dachsous 2 (Drosophila)
domain containing 20 chr3:114056941- chr4:155155527-
ZBTB20 DCHS2 [Source:HGNC
[Source:HGNC 114866118 155412930
Symbol;Acc:23111]
Symbol;Acc:13503]
dynein heavy chain mannosidase, alpha,
domain 1 ch rl 1:6518490- class 2A, member 2 chrl5:91447420-
DNHD1 MAN2A2
[Source:HGNC 6593252 [Source:HGNC 91465814 Symbol;Acc:26532] Symbol;Acc:6825]
tensin like CI domain
nephronophthisis 1
containing
chrl2:53440810- (juvenile) chr2: 110879888-
TENC1 phosphatase (tensin 2) NPHP1
53458163 [Source:HGNC 110962643 [Source:HGNC
Symbol;Acc:7905]
Symbol;Acc:19737] sodium channel,
voltage-gated, type XI, fibrillin 2
chr3:38887260- chr5: 127593601-
SCN11A alpha subunit FBN2 [Source:HGNC
38992052 127994878 [Source:HGNC Symbol;Acc:3604]
Symbol; Acc: 10583]
ubiquitin interaction unc-51-like kinase 1 (C.
motif containing 1 chr5:176332006- elegans) chrl2:132379279-
UIMC1 ULK1
[Source:HGNC 176449634 [Source:HGNC 132407696 Symbol;Acc:30298] Symbol; Acc: 12558]
immunoglobulin
superfamily, member ATPase type 13A3
chrl:159896829- chr3: 194123401-
IGSF9 9 ATP13A3 [Source:HGNC
159915394 194219093
[Source:HGNC Symbol;Acc:24113]
Symbol;Acc:18132]
Uncharacterized protein
Hermansky-Pudlak
C10orf92
syndrome 1 chrl0:100175955- chrl0:134621896-
HPS1 C10orf92
[Source:HGNC 100206709 134756327
[Source: UniProtKB/Swiss- Symbol;Acc:5163]
Prot;Acc:Q8IYW2]
low density
lipoprotein receptor- coilin
chr2:140988992- chrl7:55015563-
LRP1B related protein IB COIL [Source:HGNC
142889270 55038411 [Source:HGNC Symbol;Acc:2184]
Symbol;Acc:6693]
minichromosome
maintenance complex phospholipase C, eta 2
chrl0:13203554- chrl:2398898-
MCM10 component 10 PLCH2 [Source:HGNC
13253104 2436969
[Source:HGNC Symbol;Acc:29037]
Symbol; Acc: 18043]
erythrocyte
ubiquitin specific
membrane protein
chrl8:5392383- peptidase 9, X-linked chrX:40944888-
EPB41L3 band 4.1-like 3 USP9X
5544241 [Source:HGNC 41092185 [Source:HGNC
Symbol;Acc:12632]
Symbol;Acc:3380]
adenosine
monophosphate zinc finger protein 236
chrll:10472224- chrl8:74536116-
AMPD3 deaminase 3 ZNF236 [Source:HGNC
10529126 74682680 [Source:HGNC Symbol; Acc: 13028]
Symbol; Acc:470]
spen homolog,
testis-specific kinase 1 transcriptional regulator
chr9:35605367- chrl: 16174359-
TESK1 [Source:HGNC SPEN (Drosophila)
35610038 16266955 Symbol;Acc:11731] [Source:HGNC
Symbol;Acc:17575]
dynein, axonemal, nucleosome assembly
heavy chain 7 chr2:196602427- protein 1-like 4 chrll:2965661-
DNAH7 NAP1L4
[Source:HGNC 196935730 [Source:HGNC 3013607 Symbol; Acc: 18661] Symbol;Acc:7640]
myosin IXA WDFY family member 4
chrl5:72114632- chrl0:49892921-
MY09A [Source:HGNC WDFY4 [Source:HGNC
72410422 50191001
Symbol;Acc:7608] Symbol;Acc:29323]
chromodomain protein tyrosine
helicase DNA binding phosphatase, receptor
chr8:61591339- chr20:40701392-
CHD7 protein 7 PTPRT type, T
61779463 41818610 [Source:HGNC [Source:HGNC
Symbol;Acc:20626] Symbol; Acc:9682] baculoviral IAP repeat- CUB and Sushi multiple containing 6 chr2:32582096- domains 1 chr8:2792875-
BIRC6 CSMD1
[Source:HGNC 32843966 [Source:HGNC 3611566 Symbol;Acc:13516] Symbol; Acc: 14026]
v-erb-b2
erythroblastic
leukemia viral
oncogene homolog 2, KIAA1875
chrl7:37844393- chr8: 145162629-
ERBB2 neuro/glioblastoma KIAA1875 [Source:HGNC
37884915 145173218 derived oncogene Symbol;Acc:26959]
homolog (avian)
[Source:HGNC
Symbol;Acc:3430]
SWI/SNF related,
matrix associated,
vacuolar protein sorting
actin dependent
13 homolog A (S.
regulator of chrl9:11071598- chr9:79792361-
SMARCA4 VPS13A cerevisiae)
chromatin, subfamily 11172958 80036457
[Source:HGNC
a, member 4
Symbol; Acc: 1908]
[Source:HGNC
Symbol;Acc:11100]
vacuolar protein sorting
serine/threonine
13 homolog D (S.
kinase 31 chr7:23749786- chrl: 12290124-
STK31 VPS13D cerevisiae)
[Source:HGNC 23872132 12572099
[Source:HGNC
Symbol;Acc:11407]
Symbol;Acc:23595]
fibulin 2 zinc finger protein 521
chr3:13590631- chrl8:22641888-
FBLN2 [Source:HGNC ZNF521 [Source:HGNC
13679922 22932214
Symbol;Acc:3601] Symbol;Acc:24605]
solute carrier family
16, member 4 SCO-spondin homolog
(monocarboxylic acid chrl:110905482- (Bos taurus) chr7: 149473131-
SLC16A4 SSPO
transporter 5) 110933670 [Source:HGNC 149531068 [Source:HGNC Symbol;Acc:21998]
Symbol;Acc:10925]
RAD50 homolog (S. additional sex combs like
cerevisiae) chr5:131891711- 3 (Drosophila) chrl8:31158541-
RAD50 ASXL3
[Source:HGNC 131979752 [Source:HGNC 31327377 Symbol;Acc:9816] Symbol;Acc:29357]
chromosome X open ATP/GTP binding proteinreading frame 59 chrX:36053913- like 3 chr7: 134671259-
CXorf59 AGBL3
[Source:HGNC 36163187 [Source:HGNC 134832715 Symbol;Acc:26708] Symbol;Acc:27981]
Uncharacterized transducin-like enhancer
protein C6orfl67 of split 1 (E(spl)
chr6:97590037- chr9:84198598-
C6orfl67 TLE1 homolog, Drosophila)
97731093 84304220
[Source:UniProtKB/Swi [Source:HGNC
ss-Prot;Acc:Q6ZRQ5] Symbol;Acc:11837]
mitogen-activated protein tyrosine
protein kinase kinase phosphatase, receptor
chr6:161412822- chr9:8314246-
MAP3K4 kinase 4 PTPRD type, D
161551917 10612723 [Source:HGNC [Source:HGNC
Symbol;Acc:6856] Symbol; Acc:9668] sodium channel, ADAM metallopeptidase voltage-gated, type with thrombospondin
chr2:167260083- chr21:28208606-
SCN7A VII, alpha ADAMTS1 type 1 motif, 1
167350757 28217728 [Source:HGNC [Source:HGNC
Symbol; Acc: 10594] Symbol;Acc:217]
transient receptor
potential cation tetratricopeptide repeat
channel, subfamily M, chr9:73149949- domain 28 chr22:28374004-
TRPM3 TTC28
member 3 74061820 [Source:HGNC 29075853 [Source:HGNC Symbol;Acc:29179]
Symbol;Acc:17992]
kinesin family member
protocadherin 10
1A chr2:241653181- chr4: 134070470-
KIF1A PCDH19 [Source:HGNC
[Source:HGNC 241759637 134129356
Symbol; Acc: 13404]
Symbol; Acc:888]
regulator of G-protein
phospholipase C-like 1
signaling 12 chr4:3294755- chr2: 198669426-
RGS12 PLCL1 [Source:HGNC
[Source:HGNC 3441640 199437305
Symbol; Acc:9063]
Symbol; Acc:9994]
protein tyrosine
dedicator of cytokinesis
phosphatase, receptor
chrll:48002110- 10 chr2:225629807-
PTPRJ type, J DOCK10
48192393 [Source:HGNC 225907330
[Source:HGNC
Symbol;Acc:23479]
Symbol;Acc:9673]
collagen, type XIV, alpha
dystrophin
chrX:31132808- 1 chr8: 121072019-
DMD [Source:HGNC COL14A1
33357558 [Source:HGNC 121384275
Symbol;Acc:2928]
Symbol;Acc:2191]
sema domain,
immunoglobulin
domain (Ig), short ubiquitin specific
basic domain, chr3:50192478- peptidase 34 chr2:61414591-
SEMA3F USP34
secreted, 50226508 [Source:HGNC 61697904 (semaphorin) 3F Symbol;Acc:20066]
[Source:HGNC
Symbol; Acc: 10728]
sodium channel,
voltage-gated, type X, TNNI3 interacting kinase
chr3:38738293- chrl:74663926-
SCN10A alpha subunit TNNI3K [Source:HGNC
38835501 75010112 [Source:HGNC Symbol; Acc: 19661]
Symbol; Acc: 10582]
transformation/transcrip
dedicator of
tion domain-associated
cytokinesis 7 chrl:62920399- chr7:98475556-
DOCK7 TRRAP protein
[Source:HGNC 63153969 98610866
[Source:HGNC
Symbol;Acc:19190]
Symbol; Acc: 12347]
TBC1 domain family,
dystonin
member 23 chr3:99979844- chr6:56322785-
TBC1D23 DST [Source:HGNC
[Source:HGNC 100044095 56819413
Symbol; Acc: 1090]
Symbol;Acc:25622]
Ankyrin repeat- collagen, type XII,
containing protein
alpha 1 chr6:75794042- chr20:18364011-
COL12A1 C20orfl2 C20orfl2
[Source:HGNC 75915767 18447829
[Source: UniProtKB/Swiss- Symbol;Acc:2188]
Prot;Acc:Q9NVP4] AF4/FMR2 family,
protocadherin alpha 3
member 3 chr2:100163718- chr5: 140180613-
AFF3 PCDHA3 [Source:HGNC
[Source:HGNC 100759201 140183257
Symbol; Acc:8669]
Symbol;Acc:6473]
microtubule-actin AT rich interactive
crosslinking factor 1 chrl:39546988- domain 4A (RBPl-like) chrl4:58765103-
MACF1 ARID4A
[Source:HGNC 39952789 [Source:HGNC 58840451 Symbol; Acc: 13664] Symbol; Acc:9885]
roundabout, axon
laminin, alpha 2 guidance receptor,
chr6:129204286- chr3:75955826-
LAMA2 [Source:HGNC ROB02 homolog 2 (Drosophila)
129837714 77699115
Symbol;Acc:6482] [Source:HGNC
Symbol; Acc: 10250]
zinc finger protein 106
SIK family kinase 3
homolog (mouse) chrl5:42705022- chrll:116714118-
ZFP106 SIK3 [Source:HGNC
[Source:HGNC 42749730 116969137
Symbol;Acc:29165]
Symbol;Acc:23240]
FRAS1 related
Calpain-7-like protein
chr6:146920136- extracellular matrix 1 chr9: 14737150-
C6orfl03 [Source:UniProtKB/Swi FREM1
147136598 [Source:HGNC 14910993 ss-Prot;Acc:Q8N7X0]
Symbol;Acc:23399]
eukaryotic translation
RNA binding motif
initiation factor 4
protein 33 chr7:155437145- chrll:10818593-
RBM33 EIF4G2 gamma, 2
[Source:HGNC 155574179 10830582
[Source:HGNC
Symbol;Acc:27223]
Symbol;Acc:3297]
dedicator of heparan sulfate
cytokinesis 8 chr9:214854- proteoglycan 2 chrl:22148738-
DOCK8 HSPG2
[Source:HGNC 465255 [Source:HGNC 22263790 Symbol;Acc:19191] Symbol;Acc:5273]
Table 1 exemplifies those genes wherein three or more mutations were identified from the pre-validation patient cohort test samples (e.g., as compared to the matched normal tissue samples), thereby identifying biomarkers that are correlated with gastric cancer. A subset of Table 1 is found in Table 2, the subset comprising those genes wherein four or more mutations (No. of unique mutations) were identified in the pre- validation patient cohort test samples (No. of samples mutated), thereby identifying biomarkers representing mutational hot spots that are correlated with gastric cancer.
Table 2-List of biomarker subset
Figure imgf000023_0001
MUC16 7 7 MYH11 4 3 CELSR3 4 3
TP53 7 6 F8 4 4 NOTCH2 4 4
CSMD3 6 5 IGSF10 4 4 RNF43 4 3
SYNE1 6 4 PIK3CA 4 4 STARD8 4 4
COL11A1 6 6 DHX57 4 4 PTPRQ 4 4
ABCA13 6 3 PLEC 4 3 CELSR2 4 3
HMCN1 6 4 C6orfl0 4 4 DNAH5 4 3
AHNAK2 6 6 MAP2 4 3 LAMA5 4 4
MLL3 6 6 MDN1 4 3 ASPM 4 3
ATR 5 5 MMRN1 4 3 LRCH2 4 3
CEP290 5 5 AP3B1 4 4 PCDH10 4 4
COL29A1 5 5 CTNNB1 4 4 CR2 4 3
PCLO 5 4 FSHR 4 4 RP1-21018.1 4 4
ANK2 5 4 KIAA1109 4 3 AC073995.2 4 3
RELN 5 5 TRPM7 4 2 MAP7D3 4 4
AR 5 4 CNTLN 4 3 FLNA 4 3
CACNA2D1 5 3 KIAA0182 4 4 C5orf42 4 3
FBN3 5 4 AC130364.1 4 4 KCNMA1 4 3
LVRN 5 5 RAB3GAP2 4 3 GPR98 4 4
OBSCN 5 4 ASXL1 4 3 AL592307.2 4 4
SACS 5 4 UBE3A 4 4 REV3L 4 4
RYR2 5 5 OTOG 4 4 PAPPA2 4 3
FAT3 5 4 FNIP1 4 3 ABCB5 4 3
TLL1 5 5 APOB 4 4 RP1 4 3
Whole genome sequence (WGS) experiments were performed on a sample patient cohort to identify what genes, if any, had variant mutations. The Table 2 list of biomarkers was collated based on data from a retrospective study comprising gastric adenocarcinoma samples and matching normal adjacent tissue samples from an original cohort of 23 patients (Figure 1, and Example 1). To supply a patient matched normal DNA sample for each patient genomic DNA tumor tissue sample, a sample of normal adjacent tissue was obtained from each patient. Differences in sequence between the patient matched normal sample and tumor sample from a patient was determined under stringent criteria as set forth in Pleasance et al. , 2010, Nature 463 : 191 - 196 (incorporated herein by reference in its entirety), wherein a baseline set of somatic sequence alterations obtained from each normal adjacent tissue sample genome was subtracted from its corresponding tumor genome sequence, thereby allowing identification of the sequence variants in each tissue sample above background mutations. Preliminary analyses identified approximately 6,803 somatic mutations in approximately 5,263 genes among the patient tumor samples.
Of variant mutations present in a gene, it was further determined how many of the patient samples in the cohort had a variant mutation in that particular gene. For example, whether one or more, two or more, three or more, four or more, five or more, six or more or seven or more of the patient samples had a variant mutation in a particular gene. Table 2 comprises a subset list of genes wherein two or more patient samples had variant mutations in a particular gene. In some cases, one or more patient samples were found to have two mutations, for example both a SNV and an indel, in the particular gene. For example, for the gene ABCA13, there were six mutations identified in three patient samples, each sample displaying both a SNV and an indel in ABCA13. The genes as found in Table 2 are recognized herein as biomarkers for determining, diagnosing or prognosing gastric cancer. Further investigation of the patient cohort sample set identified several samples that were potentially problematic. Those samples determined to be problematic (S9, SI 6, S18 and S24) were removed from the data set and the data set from the remaining 19 pre- validation cohort samples was reanalyzed. Table 3 reports the reanalysis of the original data set minus the problematic samples. Table 3-Pre-validation data for 19 tumor/normal pairs
Figure imgf000025_0001
ASPM 0.0499 5.26% 1 1 0.00000
AH NAK2 0.0818 5.26% 1 1 0.34700
CEP290 0.0003 10.53% 3 2 0.48363
PCLO 0.0585 21.05% 4 4 0.00000
GPR98 0.0090 26.32% 5 5 0.46520
CR2 0.0973 5.26% 1 1 0.02300
AR 0.0712 10.53% 2 2 0.25350
PLEC 0.0169 5.26% 1 1 0.00000
MACF1 0.0810 5.26% 2 1 0.73307
MAP1A 0.0335 5.26% 1 1 0.85900
Reanalysis identified an additional subset of markers that are contemplated to be of interest as mutational hot spots for their ability to diagnose the presence of gastric cancer. As identified in Table 2, biomarkers could be correlated to gastric cancer at p<0.1 with the majority correlated at p<0.05 or p<0.01. Pre-validation cohort data reanalysis revealed that mutations in MUC16 and TP53 were highly correlated with the presence of gastric cancer in a sample, further that 37% and 47%, of the cohort samples had mutations in these genes, respectively. TP53 is well established as being mutated in a majority of cancer types, many of those mutations resulting in protein malfunction (Condel score of 0.9).
A Condel score (CONcensus DELeteriousness) integrates the output of computational tools aimed at assessing the impact of non-synonymous SNVs on protein function by computing a weighted average of the scores (WAS) of computational tools, such as SIFT, Polyphen2, MAPP, LogR Pfam e-value (2004, Clifford et al,
Bioinformatics 20 : 1006- 1014; incorporated herein by reference in its entirety) and MutationAssessor. Briefly, the scores of different methods are weighted using the complementary cumulative distributions produced by the five methods on a dataset of approximately 20000 missense mutations, both deleterious and neutral. The probability that a predicted deleterious mutation is not a false positive of the method and the probability that a predicted neutral mutation is not a false negative are employed as weights. A high Condel score, for example above 0.5, represents mutations more likely than not to be deleterious whereas a low Condel score represents the opposite (2011, Gonzalez-Perez and Lopez-Bigas, Am J Hum Gen 88:440-449; incorporated herein by reference in its entirety). In the present application, Condel scores were averaged using the Variant Effect Predictor (VEP) version of Condel that averages over SIFT and Polyphen2 (as further described in Example 2). Mutations in mucin gene family members, of which MUC4 and MUC16 are members, have also been associated in the literature with different cancer types, including gastric cancer. Pre-validation WGS data analysis identified biomarkers that had not been previously associated with gastric cancer (Wellcome Trust Sanger Institute Catalogue of Somatic Mutations in Cancer) including SACS, FLNA, ASPM, PCLO, CR2 and MAPI A. An additional patient cohort of 39 tumor/normal sample pairs was obtained; the sample characteristics of which are listed in Figure 2. The validation cohort was sequenced by whole exome sequencing (WES) as described herein and serves as validation of the initial data analysis from the pre-validation cohort. As seen in Table 4, the validation data further identifies those biomarkers which are highly correlated with gastric cancer, as originally identified in the pre-validation data analysis.
Table 4- Validation data for 39 tumor/normal pairs
Total number of Total number
Samples Condel
P-value unique mutations of samples
Gene mutated score
validation in gene mutated
validation validation
validation validation
M UC4 0.0090 64.10% 18 25 0.00000
M UC16 0.0242 23.08% 14 9 0.00000
TP53 7.28E-12 56.41% 17 22 0.86864
SACS 0.0055 10.26% 6 4 0.72696
ARID1A 5.23E-06 15.38% 9 6 0.67217
FLNA 0.0181 5.13% 3 2 0.41596
FAT4 0.0039 12.82% 8 5 0.72102
ASPM 0.0594 10.26% 5 4 0.84037
AH NAK2 7.41E-08 48.72% 27 19 0.29256
CEP290 0.0690 5.13% 2 2 0.32850
PCLO 0.0407 20.51% 10 8 0.12500
GPR98 0.0736 17.95% 8 7 0.59789
CR2 0.0048 10.26% 4 4 0.49675 AR 0.0447 10.26% 4 4 0.60650
PLEC 0.0582 7.69% 2 3 0.00000
MACF1 0.0129 15.38% 6 6 0.26900
MAP1A 0.0582 5.13% 3 2 0.90050
Validation data supports the correlation of the pre-validation identified biomarkers with gastric cancer. TP53, MUC4 and MUC16 were correlated with gastric cancer as expected. As seen when comparing Tables 2, 3 and 4, validating with a larger sample cohort provided deeper insight into the originally identified subset of biomarkers. For example, analysis of the validation cohort data revealed a stronger correlation of ARID 1 A to gastric cancer, a marker which has been previously correlated with the presence of gastric cancer. As such, the strong correlation of TP53, MUC4, MUC16 and ARID 1 A can be used as positive controls of the present methods and systems in determining biomarkers that can be correlated with gastric cancer. The validation data further supports the correlation of SACS, FLNA, FAT3, ASPM, AHNAK2, CEP290, PCLO, GPR98, CR2, AR, PLEC, MACF1 and MAPI A as being additional mutational hot spot genes, and thus biomarkers that can be correlated with gastric cancer. Of those identified in the validation data, the biomarkers SACS, FLNA, ASPM, PCLO, CR2 and MAPI A were not previously correlated with gastric cancer in the Cosmic database.
Certain illustrative embodiments of biomarkers and their methods of use are described below. The biomarkers and their methods of use are not limited to these embodiments.
As disclosed herein, the identified biomarkers and methods of their use have numerous diagnostic and prognostic applications. Biomarkers as described herein find utility, either alone or in combination, in methods for determining, diagnosing, or prognosing gastric cancer. Biomarkers as described herein find utility, either alone or in combination, in methods for prognosing patient outcome diagnosed with gastric cancer, a type of gastric cancer and/or a stage of gastric cancer. Biomarkers as described herein find utility, either alone or in combination, in methods for screening patients for the presence or absence of gastric cancer, a type of gastric cancer and/or a stage of gastric cancer, for example for patients that might be part of a high risk population predisposed to developing gastric cancer (e.g., family history, genetic predisposition, H. pylori infection status, etc.). The biomarkers as described herein, either alone or in
combination, find utility as diagnostic, prognostic, or screening tools in conjunction with additional tests and methods for identifying gastric cancer. Additional tests and methods for identifying gastric cancer include, but are not limited to protein staining methods such as IHC or histopathological staining such as H&E, genetic probe assays such as in situ hybridization (ISH), infection status (H. pylori), TNM staging, clinical staging, pathological staging, etc., for example as recognized by the American Joint Committee on Cancer (AJCC) and/or the World Health Organization (AJCC Cancer Staging
Manual). Additional tests and methods for identifying gastric cancer may include experimental or discovery related tests and methods that are not yet recognized as mainstream, for example as found in clinical trials, however find utility in providing support for a diagnosis or prognosis of gastric cancer nonetheless. Infection of the stomach with bacterium Helicobacter pylori (H. pylori) is a major risk factor for peptic ulcer disease and is responsible for the majority of ulcers of the stomach and upper small intestine. H. pylori infection is highly correlated with gastric cancer incidence and is further associated with increased risk of gastric mucosa- associated lymphoid tissue (MALT) lymphoma. H. pylori is a spiral bacterium that grows in the mucus layer that coats the inside of the human stomach, the bacteria being resistant to the stomach's natural acidic and antimicrobial environment by way of urease secretion which neutralizes stomach acidity. Further, the bacterium's spiral habit allows it to burrow into the stomach's mucus layer besides attaching to the cells that line the inner surface of the stomach. Immune cells that would typically recognize and attack the bacteria are unable to reach the stomach lining. That, in combination with H. pylori's ability to interfere with local immune responses, makes immune cells ineffective in eliminating gastric infection with H. pylori. Epidemiological studies indicate that individuals infected with H. pylori have an increased risk of gastric adenocarcinoma. Indeed, in 1994 H. pylori was classified as a carcinogen and colonization of the stomach by the bacterium has been accepted as an important risk factor for gastric cancer. In some embodiments, it is contemplated that biomarkers used in methods and assays for determining the presence of gastric cancer as described herein are used in conjunction with assays that determine the presence of H. pylori. Such methods include, but are not limited to, breath tests (e.g., urea breath tests) antibody tests (e.g., blood antibody tests directed against antibodies to H. pylori), antigen tests (e.g., stool antigen tests presence of H. pylori antigens) and stomach biopsy.
In one embodiment, the present disclosure provides genes or gene associated locations useful as biomarkers for gastric cancer. In some embodiments, a biomarker is a gene or genetic location that was identified to comprise one or more, two or more, three or more, four or more or five or more variant mutations in patient test samples as compared to the gene or gene location in a patient matched normal sample. In some embodiments, a biomarker that was identified to have variant mutations in at least two patient samples is contemplated to represent a "hot spot", or gene that comprises variant mutations as compared to other genes in a gastric cancer test sample (e.g., tissue, cell, circulating tumor cells, etc.). For example, Tables 2, 3 and 4 are exemplary of genes that were identified in two or more patient samples to have variant mutations compared to the same gene is a patient matched normal sample, thereby identifying them as potential biomarkers for determining the presence or absence of gastric cancer.
In embodiments of the present disclosure, variant mutations in biomarkers as described herein are located in a coding region of a gene. In some embodiments, biomarkers as described herein are located in non-coding regions of a gene. In other embodiments, biomarkers as described herein are located in intergenic regions. In some embodiments, biomarkers as described herein comprise single nucleotide variants (SNVs) or single nucleotide polymorphisms (SNPs). In other embodiments, biomarkers as described herein comprise insertions and/or deletions (indels) of one or more genomic sequences. In further embodiments, a biomarker may comprise both SNVs and indels.
In some embodiments, the biomarkers as disclosed herein are useful in
characterization, classification, differentiation, grading, staging, diagnosis, or prognosis of gastric cancer, gastric cancer type and/or gastric cancer stage. In some embodiments, the biomarkers as disclosed herein are useful in diagnosing the presence of gastric cancer, a type of gastric cancer, and/or stage of gastric cancer in a subject. In some embodiments, the biomarkers as disclosed herein are useful in prognosing disease progression, treatment outcome and/or treatment regimen progress of a subject diagnosed with gastric cancer. In some embodiments, the biomarkers as disclosed herein are useful in screening a subject for the possibility of developing gastric cancer. In some embodiments, the biomarkers as described herein are useful in screening potential therapeutic options for treating a patient having gastric cancer.
The following genes are disclosed herein as biomarkers which may be associated with the presence of gastric cancer: MUC4, TTN. NEB, MUC16, TP53, CSMD3, SYNE1, COL11A1, ABCA13, HMCN1, AHNAK2, MLL3, ATR, CEP290, PCLO, ANK2, RELN, AR, CACNA2D1, FBN3, LVRN, OBSCN, SACS, RYR2, FAT3, TLL1, C5orf42, KCNMA1, GPR98, AL592307.2, COL7A1, ATP10A, SPINK5, CELSR3, NOTCH2, RNF43, STARD8, PTPRQ, CELSR2, ARID 1 A, VWA3B, UBR5, MYH11, F8, IGSF10, PIK3CA, DHX57, PLEC, C6orfl0, MAP2, MDN1, MMRN1, AP3B1, CTNNBl, FSHR, KIAAl 109, TRPM7, CNTLN, KIAA0182, AC 130364.1, RAB3GAP2, ASXL1, UBE3A, OTOG, FNIP1, APOB, RP1, REV3L, PAPPA2, ABCB5, LAMA5,
LRCH2, PCDHIO, CR2, RPl-21018.1, AC073995.2, MAP7D3, FLNA, DNAH5, TUTl, LMANl, FAT4, KIAAl 199, TRPM6, ADAM32, DNAl 1, ADAM23, UPF3A, ZBTB20, DNHD1, TENC1, SCN11A, UIMC1, IGSF9, HPS1, LRP1B, MCM10, EPB41L3, AMPD3, TESK1, DNAH7, MY09A, CHD7, BIRC6, ERBB2, SMARCA4, STK31, FBLN2, SLC16A4, RAD50, CXorf59, C6orfl67, MAP3K4, SCN7A, TRPM3, KIF1A, RGS12, PTPRJ, DMD, SEMA3F, SCN10A, DOCK7, TBC1D23, COL12A1, AFF3, MACF1, LAMA2, ZFP106, C6orfl03, RBM33, DOCK8, ATP11A, CHD3, CHD6, SCN4A, PCDHA11, HUWE1, ZFHX4, AC007342.2, AIM1, SAMD9L, KIF21A, OR7C1, ADCY8, WASF3, POLQ, CDC45L, SEC31A, DHX9, WDR47, OR6F1, HNFIA, MUC19, KIF26A, NAP173, FRMD8, DSCAM, RREBl, DAZL, FGA, LRRC7, WRN, ZNF800, KPNB1, TRIM59, CNTN6, SCN9A, AC021066.1, MDM4, ODZ4, CCDC88A, RTKN2, CUBN, ZNF66, NASP, CECR2, HTR7, KIAA2018, DNAH12, ZZEF1, PLD1, PLA2R1, IL1RL2, IFI16, ASXL2, CDC2L5, NSD1, NCAPD3, BRSK1, FTSJD1, GRIA1, MAPI A, GABRG3, DNA2, RERE, KDM5B, COL11A2, TFAM, SASH1, FLCN, TRPC5, UBE3B, MYLK, HELB, AL157769.3, AL2SCR8, SETD1B, EIF2C2, GRM7, NAV4, NRXN3, RP11-766F14.2, MPRIP, HMCN2, APC, SATB1, COBLL1, RYR3, NBN, DCHS2, MAN2A2, NPHP1, FBN2, ULK1, ATP 13 A3,
C10orf92, COIL, PLCH2, USP9X, ZNF236, SPEN, NAP1L4, WDFY4, PTPRT, CSMD1, KIAA1875, VPS 13 A, VPS13D, ZNF521, SSPO, ASXL3, AGBL3, TLE1, PTPRD, ADAMTS1, TTC28, PCDH19, PLCL1, DOCK10, COL14A1, USP34,
TNNI3K, TRRAP, DST, C20orfl2, PCDHA3, ARID4A, ROB02, SIK3, FREMl, EIF4G2, HSPG2, SLC04C1 and SCN5AX.
In some embodiments, biomarkers comprising variant mutations which can be correlated with gastric cancer patient samples comprise two or more of MUC4, TTN. NEB, MUC16, TP53, CSMD3, SYNE1, COL11A1, ABCA13, HMCN1, AHNAK2, MLL3, ATR, CEP290, PCLO, ANK2, RELN, AR, CACNA2D1, FBN3, LVRN,
OBSCN, SACS, RYR2, FAT3, TLL1, C5orf42, KCNMA1, GPR98, AL592307.2, COL7A1, ATPIOA, SPINK5, CELSR3, NOTCH2, RNF43, STARD8, PTPRQ, CELSR2, ARID 1 A, VWA3B, UBR5, MYH11, F8, IGSF10, PIK3CA, DHX57, PLEC, C6orfl0, MAP2, MDN1, MMRN1, AP3B1, CTNNB1, FSHR, KIAAl 109, TRPM7, CNTLN, KIAA0182, AC130364.1, RAB3GAP2, ASXL1, UBE3A, OTOG, FNIP1, APOB, RP1, REV3L, PAPPA2, ABCB5, LAMA5, LRCH2, PCDH10, CR2, RP 1-21018.1,
AC073995.2, MAP7D3, FLNA, DNAH5, TUTl, LMANl, FAT4, KIAAl 199, TRPM6, ADAM32, DNAl l, ADAM23, UPF3A, ZBTB20, DNHDl, TENCl, SCNl lA, UIMC1, IGSF9, HPS1, LRP1B, MCM10, EPB41L3, AMPD3, TESK1, DNAH7, MY09A, CHD7, BIRC6, ERBB2, SMARCA4, STK31, FBLN2, SLC16A4, RAD50, CXorf59, C6orfl67, MAP3K4, SCN7A, TRPM3, KIF1A, RGS12, PTPRJ, DMD, SEMA3F, SCNIOA, DOCK7, TBC1D23, COL12A1, AFF3, MACFl, LAMA2, ZFP106, C6orfl03, RBM33, DOCK8, ATP11A, CHD3, CHD6, SCN4A, PCDHA11, HUWE1, ZFHX4, AC007342.2, AIM1, SAMD9L, KIF21A, OR7C1, ADCY8, WASF3, POLQ, CDC45L, SEC31A, DHX9, WDR47, OR6F1, HNFIA, MUC19, KIF26A, NAP173, FRMD8,
DSCAM, RREB1, DAZL, FGA, LRRC7, WRN, ZNF800, KPNB1, TRIM59, CNTN6, SCN9A, AC021066.1, MDM4, ODZ4, CCDC88A, RTKN2, CUBN, ZNF66, NASP, CECR2, HTR7, KIAA2018, DNAH12, ZZEF1, PLD1, PLA2R1, IL1RL2, IFI16, ASXL2, CDC2L5, NSD1, NCAPD3, BRSK1, FTSJD1, GRIA1, MAPI A, GABRG3, DNA2, RERE, KDM5B, COL11A2, TFAM, SASH1, FLCN, TRPC5, UBE3B, MYLK, HELB, AL157769.3, AL2SCR8, SETD1B, EIF2C2, GRM7, NAV4, NRXN3, RP11- 766F14.2, MPRIP, HMCN2, APC, SATBl, COBLLl, RYR3, NBN, DCHS2, MAN2A2, NPHP1, FBN2, ULK1, ATP 13 A3, C10orf92, COIL, PLCH2, USP9X, ZNF236, SPEN, NAP1L4, WDFY4, PTPRT, CSMD1, KIAA1875, VPS 13 A, VPS13D, ZNF521, SSPO, ASXL3, AGBL3, TLE1, PTPRD, ADAMTS1, TTC28, PCDH19, PLCL1, DOCK10, COL14A1, USP34, TNNI3K, TRRAP, DST, C20orfl2, PCDHA3, ARID4A, ROB02, SIK3, FREM1, EIF4G2, HSPG2, SLC04C1 and SCN5AX.
In preferred embodiments, biomarkers comprising variant mutations which can be correlated with gastric cancer comprise two or more of MUC4, MUC16, TP53, SACS, ARID 1 A, FLNA, FAT4, ASPM, AHNAK2, CEP290, PCLO, GRP98, CR2, AR, PLEC, MACF1 and MAPI A.
In some embodiments, biomarkers comprising variant mutations which can be correlated with gastric cancer comprise one or more of SACS, FLNA, ASPM, PCLO, CR2 and MAP1A. In some embodiments, biomarkers comprising variant mutations which can be correlated with gastric cancer comprise two or more of SACS, FLNA, ASPM, PCLO, CR2 and MAPI A. In some embodiments, biomarkers comprising variant mutations which can be correlated with gastric cancer comprise at least two or more of SACS, FLNA, ASPM, PCLO, CR2 and MAPI A and at least one or more or MUC4, MUC16, TP53, ARID 1 A, FAT4, AHNAK2, CEP290, GRP98, AR, PLEC and MACF1. In some embodiments, methods of biomarkers for determining the presence of gastric cancer comprise a combination of biomarkers, or group of biomarkers as described herein.
In one embodiment, a sample from a subject used in methods for diagnosing gastric cancer as described herein is a tissue sample, for example a biopsy tissue sample, for example a gastric tissue biopsy sample. In some embodiments, a biopsy tissue sample used in diagnostic methods described herein is a fresh sample, or a sample that has been frozen or modified. A modified sample is, for example, a sample that has been preserved or modified for storage and/or for use in histopathology, cytopathology,
immunocytochemistry, immunohistochemistry, in situ hybridization, for example formalin fixed, paraffin embedded tissues or other methods used for tissue preservation. In one embodiment, a sample from a subject is a liquid sample, such as a blood sample containing white blood cells. In one embodiment, a liquid sample contains circulating tumor cells and/or bacterial cells, for example H. pylori bacteria.
In embodiments of the present disclosure, nucleic acids are extracted and isolated from a sample, or portion thereof for subsequent use in methods as described herein for determining the presence of gastric cancer . In embodiments of the present disclosure, normal, adjacent tissue specimens were utilized as patient matched normal samples for comparison to a cancerous tissue sample. For example, one or more normal, adjacent tissue samples were obtained from an individual and matched to that individual's tissue sample suspected of containing cancerous cells for evaluation of biomarker mutations as described herein (for example, see Examples). The genomic DNA was isolated from the normal tissue cells and served as a patient baseline (normal) for comparing mutations present in the test sample. However, it is contemplated that any tissue sample that is free from the cancerous phenotype can also be utilized as a source of comparative normal genomic DNA for an individual. In some embodiments, nucleic acids isolated from a sample, or a portion thereof, are used in diagnostic and/or prognostic methods as described herein.
It is contemplated that the two or more biomarkers as described herein can be used in methods for determining, diagnosing, or prognosing gastric cancer. Further, the biomarkers find utility in combination with other biomarkers and/or other diagnostic tests in providing a diagnostician additional tools to determine gastric cancer status of a subject. The methods as described herein find particular utility as diagnostic and prognostic tools. In some embodiments, methods comprising biomarkers as described herein can be useful in detecting gastric cancer at an early stage in the disease compared to later stage detection. In some embodiments, methods comprising biomarkers as described herein can be prognostic for patient survival based on early and/or late detection of the presence of gastric cancer.
In the methods provided herein, the biomarkers comprise variant genetic sequences in a genomic DNA sample compared to a genomic DNA normal sample. Variant genetic sequences can include, but are not limited to, single nucleotide variants, sequence insertions, sequence deletions, within genes that differ from a normal sample and indicate mutations that may indicate phenotypic changes, disease formation, cancer, etc. In some embodiments, the methods detect two or more altered genetic sequences in a biomarker as compared to a normal sample. In some embodiments, a comparison between gene sequences in a test sample (i.e., collected from a patient, subject, individual, etc.) to a normal or control sample (i.e., from the sample patient, subject, individual from which the test sample is collected) identifies the number of mutations associated with a particular gene, wherein the presence of a plurality of variant genes over a normal may associate that gene with gastric cancer. In some embodiments, methods disclosed herein detect the insertion of one or more genetic sequences into a gene, deletion of one or more genetic sequences from a gene, or both as compared to a normal, or control, sample. In some embodiments, the methods detect one or more of single nucleotide variant(s) and/or insertion(s) and/or deletion(s) altered genetic sequences in a sample compared to a normal, or control sample.
In some embodiments of methods for determining gastric cancer, a test sample (i.e., a sample to be assayed for presence of gastric cancer) is collected from an individual. In some embodiments, a second sample, a normal or control sample (e.g., blood sample, tissue sample known not to have a cancerous phenotype) is collected from the same individual. Genomic DNA is isolated from the sample(s) by techniques known in the art (for example, as found in Molecular Cloning, a Laboratory Manual, Eds.
Sambrook, et al, Cold Spring Harbor Press.). The isolated DNA from a sample is used in methods as described herein for detecting biomarkers that can be indicative of gastric cancer.
In some embodiments, the isolated DNA from the test and control samples is subjected to sequencing, for example next generation sequencing methodologies.
Sequence data from the test and the control DNA samples are compared, for example by aligning the two sequences, variant sequences are identified in the test sequence over the control sequence and the presence of gastric cancer, a type of gastric cancer, and/or stage of gastric cancer in a sample is identified based on said comparison. As such, isolated genomic DNA from a sample is used to identify variant mutations in a genetic sequence, wherein genes comprising variant mutations relative to a normal sample are biomarkers associated with the presence of gastric cancer, a type of gastric cancer, and/or stage of gastric cancer in a sample.
In some embodiments, a subset of biomarkers as found in Tables 1, 2, 3 or 4 can be used in determining, diagnosing, or prognosing gastric cancer in a sample from a subject. The subset can represent two or more, three or more, four or more, five or more, or six or more biomarkers with variant mutations from the subset of which can be indicative of the presence of gastric cancer in a patient. For example, the subset of biomarkers comprising two or more of SACS, FLNA, ASPM, PCLO, CR2, and MAPI A can be useful for indicating the presence of gastric cancer in a subject. An additional subset encompassing biomarkers comprising MUC4, TTN. NEB, MUC16, TP53,
CSMD3, SYNE1, COL11A1, ABCA13, HMCN1, AHNAK2, MLL3, ATR, CEP290, ANK2, RELN, AR, CACNA2D1, FBN3, LVRN, OBSCN, RYR2, FAT3, FAT4, TLL1, C5orf42, KCNMA1, GPR98, AL592307.2, COL7A1, ATP10A, SPINK5, CELSR3, NOTCH2, RNF43, STARD8, PTPRQ, CELSR2, ARID 1 A, VWA3B, UBR5, MYH11, F8, IGSF10, PIK3CA, DHX57, PLEC, C6orfl0, MAP2, MDN1, MMRNl, AP3B1,
CTNNBl, FSHR, KIAAl 109, TRPM7, CNTLN, KIAA0182, AC 130364.1, RAB3GAP2, ASXL1, UBE3A, OTOG, FNIP1, APOB, RP1, REV3L, PAPPA2, ABCB5, LAMA5, LRCH2, PCDH10, CR2, RP1-21018.1, AC073995.2, MAP7D3, MACF1, MAPI A and DNAH5 can be useful for indicating the presence of gastric cancer, a type of gastric cancer, and/or stage of gastric cancer. An additional subset encompassing biomarkers comprising TUTl, LMANl, FAT4, KIAAl 199, TRPM6, ADAM32, DNAl 1, ADAM23, UPF3A, ZBTB20, DNHD1, TENC1, SCN11A, UIMC1, IGSF9, HPS1, LRP1B,
MCM10, EPB41L3, AMPD3, TESK1, DNAH7, MY09A, CHD7, BIRC6, ERBB2, SMARCA4, STK31, FBLN2, SLC16A4, RAD50, CXorf59, C6orfl67, MAP3K4, SCN7A, TRPM3, KIF1A, RGS12, PTPRJ, DMD, SEMA3F, SCN10A, DOCK7,
TBC1D23, COL12A1, AFF3, MACF1, LAMA2, ZFP106, C6orfl03, RBM33, DOCK8, ATP11A, CHD3, CHD6, SCN4A, PCDHA11, HUWE1, ZFHX4, AC007342.2, AIM1, SAMD9L, KIF21A, OR7C1, ADCY8, WASF3, POLQ, CDC45L, SEC31A, DHX9, WDR47, OR6F1, HNFIA, MUC19, KIF26A, NAP173, FRMD8, DSCAM, RREB1, DAZL, FGA, LRRC7, WRN, ZNF800, KPNB1, TRIM59, CNTN6, SCN9A,
AC021066.1, MDM4, ODZ4, CCDC88A, RTKN2, CUBN, ZNF66, NASP, CECR2, HTR7, KIAA2018, DNAH12, ZZEF1, PLD1, PLA2R1, IL1RL2, IFI16, ASXL2, CDC2L5, NSD1, NCAPD3, BRSK1, FTSJD1, GRIA1, MAPI A, GABRG3, DNA2, RERE, KDM5B, COL11A2, TFAM, SASH1, FLCN, TRPC5, UBE3B, MYLK, HELB, AL157769.3, AL2SCR8, SETD1B, EIF2C2, GRM7, NAV4, NRXN3, RP11-766F14.2, MPRIP, HMCN2, APC, SATB1, COBLL1, RYR3, NBN, DCHS2, MAN2A2, NPHP1, FBN2, ULK1, ATP 13 A3, C10orf92, COIL, PLCH2, USP9X, ZNF236, SPEN, NAP1L4, WDFY4, PTPRT, CSMD1, KIAA1875, VPS 13 A, VPS13D, ZNF521, SSPO, ASXL3, AGBL3, TLE1, PTPRD, ADAMTS1 , TTC28, PCDH19, PLCL1, DOCK10, COL14A1, USP34, TNNI3K, TRRAP, DST, C20orfl2, PCDHA3, ARID4A, ROB02, SIK3, FREMl, EIF4G2, HSPG2, SLC04C1 and SCN5AXcan be useful for indicating the presence of gastric cancer, a type of gastric cancer, and/or stage of gastric cancer.
Therefore, as described herein with reference to gastric cancer, biomarkers with variant mutations as identified in pre-validation and validation cohort samples can be correlated with the presence of gastric cancer in a subject. Some of the biomarkers identified in pre-validation and validation sample cohorts had not been previously disclosed as associated with gastric cancer in the Catalogue of Somatic Mutations in Cancer (Cosmic) database and the Cancer Gene Census (CGC) database of the Cancer Gene Project; both databases maintained by the Wellcome Trust Sanger Institute. Those biomarkers not previously associated with gastric cancer include NEB, ABCA13, AHNAK2, CEP290, COL29A1, PCLO, LVRN, RYR2, FAT3, GPR98, AL592307.2, PTPRQ, PLEC, AL 130364.1, OTOG, AC073995.2, SACS, FLNA, ASPM, CR2 and MAPI A. Of this subset of biomarkers, the biomarkers SACS, FLNA, PCLO, and CR2 are correlated with gastric cancer at p<0.05 (validation studies) with biomarkers PLEC and MAPI A having p values of 0. 058 and ASPM having a p-value of 0.059. As such, the genes SACS, FLNA, ASPM, CR2, PLEC, MAPI A and ASPM are particularly considered mutational hot spots that can be correlated with the presence of gastric cancer.
Additional biomarkers that were not correlated with gastric cancer in COSMIC or CGC databases include DNAH11, DNHD1, CHD7, FBLN2, SCN7A, KIF1A, C6orfl03, RBM33, SCN4A, PCDHA11, ZFHX4, AC007342.2, MUC19, KIF26A, SCN9A, AC021066.1, ADZ4, ZNF66, DNAH12, ASXL2, FTSJD1, GABRG3, DNA2, KDM5B, AL157769.3, RPl 1-766F14.2, MPRIP, HMCN2, RYR3, WDFY4, CSMDl, KIAA1875, SSPO, ASXL3, AGBL3, TTC28, DOCK10, C20orfl2, SIK3, FREM1 and SCN5A. In some embodiments, methods comprising biomarkers for determining the presence of gastric cancer comprise two or more, three or more, four or more, five or more of NEB, ABCA13, AHNAK2, CEP290, COL29A1, PCLO, LVRN, RYR2, FAT3, GPR98,
AL592307.2, PTPRQ, PLEC, AC130364.1, OTOG, AC073995.2, DNAH11, DNHD1, CHD7, FBLN2, SCN7A, KIF1A, C6orfl03, RBM33, SCN4A, PCDHA11, ZFHX4, AC007342.2, MUC19, KIF26A, SCN9A, AC021066.1, ADZ4, ZNF66, DNAH12, ASXL2, FTSJD1, GABRG3, DNA2, KDM5B, AL157769.3, RPl 1-766F14.2, MPRIP, HMCN2, RYR3, WDFY4, CSMDl, KIAA1875, SSPO, ASXL3, AGBL3, TTC28,
DOCK10, C20orfl2, SIK3, FREMl, SACS, ASPM FLNA, CR2, MAPI A and SCN5A.
Prognostic methods utilizing biomarkers as described herein are contemplated to be useful for determining a proper course of treatment for a patient having gastric cancer. A course of treatment refers to the therapeutic measures taken for a patient after diagnosis or after treatment for gastric cancer. Different treatment regimens are available for treating patients with gastric cancer. Standard treatments include surgery (e.g., subtotal gastrectomy or total gastrectomy), endoluminal stent placement or laser therapy, chemotherapy, radiation therapy and chemoradiation therapy. Other therapeutic regimens comprise those identified during clinical trials, but not yet considered as a standard treatment option. Clinical trials associated with gastric cancer can be found at, for example, www.clinicaltrials.gov. A determination of the likelihood for cancer recurrence, spread, or patient survival, can assist in determining whether a more conservative or more radical approach to therapy should be taken, or whether treatment modalities should be combined. For example, when gastric cancer recurrence is likely, it can be advantageous to precede or follow surgical treatment with chemotherapy, radiation, immunotherapy, biological modifier therapy, gene therapy, vaccines, and the like, or adjust the span of time during which the patient is treated. A diagnosis or prognosis of a gastric cancer state is contemplated to be correlated with two or more, for example a particular combination, of biomarkers described herein. In some
embodiments, methods utilizing biomarkers for use in prognosis of gastric cancer comprise two or more of MUC4, TTN. NEB, MUC16, TP53, CSMD3, SYNE1, COL11A1, ABCA13, HMCN1, AHNAK2, MLL3, ATR, CEP290, PCLO, ANK2, RELN, AR, CACNA2D1, FBN3, LVRN, OBSCN, SACS, RYR2, FAT3, TLL1,
C5orf42, KCNMA1, GPR98, AL592307.2, COL7A1, ATP10A, SPINK5, CELSR3, NOTCH2, RNF43, STARD8, PTPRQ, CELSR2, ARID 1 A, VWA3B, UBR5, MYH11, F8, IGSF10, PIK3CA, DHX57, PLEC, C6orfl0, MAP2, MDN1, MMRN1, AP3B1,
CTNNBl, FSHR, KIAA1109, TRPM7, CNTLN, KIAA0182, AC 130364.1, RAB3GAP2, ASXL1, UBE3A, OTOG, FNIP1, APOB, RP1, REV3L, PAPPA2, ABCB5, LAMA5, LRCH2, PCDH10, CR2, RP1-21018.1, AC073995.2, MAP7D3, FLNA, MACF1, MAPI A, FAT4, DNAH5, TUT1, LMAN1, FAT4, KIAA1199, TRPM6, ADAM32, DNA11, ADAM23, UPF3A, ZBTB20, DNHD1, TENC1, SCN11A, UIMC1, IGSF9, HPS1, LRP1B, MCM10, EPB41L3, AMPD3, TESK1, DNAH7, MY09A, CHD7, BIRC6, ERBB2, SMARCA4, STK31, FBLN2, SLC16A4, RAD50, CXorf59, C6orfl67, MAP3K4, SCN7A, TRPM3, KIF1A, RGS12, PTPRJ, DMD, SEMA3F, SCN10A, DOCK7, TBC1D23, COL12A1, AFF3, MACF1, LAMA2, ZFP106, C6orfl03, RBM33, DOCK8, ATP11A, CHD3, CHD6, SCN4A, PCDHA11, HUWEl, ZFHX4, AC007342.2, AIM1, SAMD9L, KIF21A, OR7C1, ADCY8, WASF3, POLQ, CDC45L, SEC31A, DHX9, WDR47, OR6F1, HNF1A, MUC19, KIF26A, NAP173, FRMD8, DSCAM, RREB1, DAZL, FGA, LRRC7, WRN, ZNF800, KPNB1, TRIM59, CNTN6, SCN9A, AC021066.1, MDM4, ODZ4, CCDC88A, RTKN2, CUBN, ZNF66, NASP, CECR2, HTR7, KIAA2018, DNAH12, ZZEF1, PLD1, PLA2R1, IL1RL2, IFI16, ASXL2, CDC2L5, NSD1, NCAPD3, BRSK1, FTSJD1, GRIA1, MAPI A, GABRG3, DNA2, RERE, KDM5B, COL11A2, TFAM, SASH1, FLCN, TRPC5, UBE3B, MYLK, HELB, AL157769.3, AL2SCR8, SETD1B, EIF2C2, GRM7, NAV4, NRXN3, RP11-766F14.2, MPRIP, HMCN2, APC, SATB1, COBLL1, RYR3, NBN, DCHS2, MAN2A2, NPHP1, FBN2, ULKl, ATP 13 A3, C10orf92, COIL, PLCH2, USP9X, ZNF236, SPEN, NAP1L4, WDFY4, PTPRT, CSMD1, KIAA1875, VPS 13 A, VPS13D, ZNF521, SSPO, ASXL3, AGBL3, TLE1, PTPRD, ADAMTS1 , TTC28, PCDH19, PLCL1, DOCK10, COL14A1, USP34, TNNI3K, TRRAP, DST, C20orfl2, PCDHA3, ARID4A, ROB02, SIK3, FREMl, EIF4G2, HSPG2, SLC04C1 and SCN5AX. For determination of gastric cancer, a sample is obtained, nucleic acids for example DNA are isolated from the sample by established means known in the art, and the isolated nucleic acids are assayed by methods disclosed herein, for example sequencing microarray analysis or PCR. A normal or control sample is typically obtained for comparison with the test sample. Methods described herein are contemplated for use in, for example, characterizing the variant mutational status of one or more biomarkers, for example those as found in Tables 1, 2, 3 or 4, wherein the variant mutational status of one or more biomarkers is useful in determining gastric cancer status. In some embodiments, methods for characterization comprise sequencing technologies, for example next generation sequencing technologies. In some embodiments, microarray based technologies are utilized to characterize the mutational status of a biomarker as described herein for determining the status of gastric cancer in a sample. In some embodiments, polymerase chain reaction is utilized to characterize the mutational status of a biomarker. In some embodiments, a sample is assayed for methylation status, the data of which is used to characterize a sample for gastric cancer status.
In some embodiments, isolated genomic DNA from samples is typically modified prior to characterization. For example, genomic DNA libraries are created which can be applied to downstream detection applications such as sequencing. A library is produced, for example, by performing the methods as described in the Nextera™ DNA Sample Prep Kit (Epicentre® Biotechnologies, Madison WI), GL FLX Titanium Library Preparation Kit (454 Life Sciences, Branford CT), SOLiD™ Library Preparation Kits (Applied Biosystems™ Life Technologies, Carlsbad CA), and the like. The sample as described herein may be further amplified for sequencing by, for example, multiple stand displacement amplification (MDA) techniques. For sequencing after MDA, an amplified sample library is, for example, prepared by creating a DNA library as described in Mate Pair Library Prep kit, Genomic DNA Sample Prep kits or TruSeq™ Sample Preparation and Exome Enrichment kits (Illumina®, Inc., San Diego CA). Useful cluster
amplification methods are described, for example, in U.S. Patent No. 5,641,658; U.S. Patent Publ. No. 2002/0055100; U.S. Patent No. 7,115,400; U.S. Patent Publ. No.
2004/0096853; U.S. Patent Publ. No. 2004/0002090; U.S. Patent Publ. No.
2007/0128624; and U.S. Patent Publ. No. 2008/0009420, each of which is incorporated herein by reference in its entirety. Another useful method for amplifying nucleic acids on a surface is rolling circle amplification (RCA), for example, as described in Lizardi et al., Nat. Genet. 19:225-232 (1998) and US 2007/0099208 Al, each of which is incorporated herein by reference in its entirety. Emulsion PCR methods are also useful, exemplary methods for which are described, for example, in Dressman et al., Proc. Natl. Acad. Sci. USA 100:8817-8822 (2003), WO 05/010145, or U.S. Patent Publ. Nos. 2005/0130173 or 2005/0064460, each of which is incorporated herein by reference in its entirety. Sample preparation methods of the present disclosure are not necessarily limited by any particular library preparation or amplification method as a sample as described herein is
contemplated to be amenable to any of a variety of methods known in the art and/or commercially available for such purposes. Genomic DNA libraries derived from a sample as described herein can be characterized for gastric cancer status by sequencing for the presence of gene mutations. In one embodiment, sequencing can be performed following manufacturer's protocols on a system such as those provided by Illumina®, Inc. (HiSeq 1000, HiSeq 2000, Genome Analyzers, MiSeq, HiScan, iScan, BeadExpress systems), 454 Life Sciences (FLX Genome Sequencer, GS Junior), Applied Biosystems™ Life Technologies (ABI
PRISM® Sequence detection systems, SOLiD™ System), Ion Torrent® Life
Technologies (Personal Genome Machine sequencer) further as those described in, for example, in United States patents and patent applications 5,888,737, 6,175,002,
5,695,934, 6,140,489, 5,863,722, 2007/007991, 2009/0247414, 2010/0111768 and PCT application WO2007/123744, each of which is incorporated herein by reference in its entirety. Methods disclosed herein are not necessarily limited by any particular sequencing system as the particular sample preparation required for a particular instrument is contemplated to be amenable for use with any sample as described herein. Sequencing methodologies for characterizing gastric cancer are contemplated to be useful either alone or in combination with other assays for gastric cancer determination.
Output from a sequencing instrument can be of any sort. For example, current technology typically utilizes a light generating readable output, such as fluorescence or luminescence, however the present methods for detecting mutations in a biomarker for determining gastric cancer status in a sample is not necessarily limited to the type of readable output as long as differences in output signal for a particular sequence of interest can be determined. In some embodiment, a change in ion concentration for example hydrogen ion concentration is measured to determine a sequence of interest, whereas in other embodiments and change in current is utilized to determine a sequence of interest. Examples of analysis software that may be used to characterize output derived from practicing methods as described herein include, but are not limited to, Pipeline,
CASAVA and GenomeStudio data analysis software (Illumina®, Inc.), SignalMap and NimbleScan data analysis software (Roche NimbleGen), GS Analyzer analysis software (454 Life Sciences), SOLiD™, DNASTAR® SeqMan® NGen® and Partek® Genomics Suite™ data analysis software (Life Technologies), Feature Extraction and Agilent Genomics Workbench data analysis software (Agilent Technologies), Genotyping Console™, Chromosome Analysis Suite data analysis software (Affymetrix®).
A skilled artisan will know of additional numerous commercially and
academically available software alternatives for data analysis for sequencing generated output. Embodiments described herein are not limited to any data analysis method.
In one embodiment, once the sample is appropriately processed the number of mutations in a biomarker can be detected using microarray methodologies. For example, a plurality of different probe molecules can be attached to a substrate or otherwise spatially distinguished in an array. Exemplary arrays that can be used to detect the number of mutations in a biomarker include, but are not limited to, slide arrays, silicon wafer arrays, liquid arrays, bead-based arrays and others known in the art or set forth in further detail below. In some embodiments, the methods can be practiced with array technology that combines a miniaturized array platform, a high level of assay
multiplexing, and scalable automation for sample handling and data processing.
Exemplary methods and systems for microarray analysis includes, but is not limited to, those methods and systems commercialized by Roche NimbleGen, Inc., Illumina®, Inc., Affymetrix® and Agilent Technologies. An array of beads can also be in a fluid format such as a fluid stream of a flow cytometer or similar device. Commercially available fluid formats for distinguishing beads include, for example, those used in XMAP™ technologies from Luminex or MPSS™ methods from Lynx Therapeutics. Exemplary microarray methods and systems can be found in, for example, US patents 5,856,101, 5,981,733; 6,001,309; 6,023,540, 6,110,426, 6,200,737, 6,221,653; 6,232,072, 6,266,459, 6,327,410, 6,355,431, 6,379,895, 6,429,027, 6,458,583, 6,667,394 6,770,441, 6,489,606 and 6,859,570, 7,106,513, 7,126,755, and 7,164,533, US patent applications 2005/0227252, 2006/0023310, 2006/006327, 2006/0071075, 2006/0119913 and PCT publications WO98/40726, W099/18434, WO98/50782, WO00/63437, WO04/024328 and WO05/033681 (each of which is incorporated herein by reference in their entireties). Microarray based technologies for characterizing gastric cancer are contemplated to be useful either alone or in combination with other diagnostic and/or prognostic assays.
In one embodiment, once the sample is appropriately processed the number of mutations in a biomarker can be detected using polymerase chain reaction (PCR) methodologies (including, but not limited to, US Patents 4,683,195, 4,683,202,
4,965,188, 5,075,216, 5,210,015, 5,333,675, 5,475,610, 5,487,972, 5,538,848, 5,602,756, 5,656,493, 5,804,375, 5,994,056, 6,030,787, 6,703,236, 6,814,934; incorporated herein by reference in their entireties), for example qualitative PCR, real-time PCR, quantitative PCR, and the like. For example, a plurality of probes may be utilized to amplify genomic regions contemplated to comprise one or more sequence variants wherein amplification data are used to determine the presence or absence of variants thereby associating a sample with gastric cancer, or not, as the case may be. Exemplary methods and systems for PCR analysis include, but are not limited to those methods and systems
commercialized by Illumina®, Inc., Roche Applied Science and Applied Biosystems™. Polymerase chain reaction based technologies for characterizing gastric cancer are contemplated to be useful either alone or in combination with other diagnostic and/or prognostic assays.
The following examples are provided in order to demonstrate and further illustrate certain preferred embodiments and aspects of the present disclosure and are not to be construed as limiting the scope thereof.
EXAMPLES EXAMPLE 1-Study population, sample collection and processing
Initially, genes described herein were identified by whole genome sequencing (WGS) from a pre-validation cohort of 23 fresh frozen patient tumor samples known to contain gastric cancer. Normal samples obtained from adjacent normal tissues were also collected from each patient and matched to the corresponding gastric cancer test sample. Patient samples were available through protocols and procedures followed for human tissue usage as defined by the National University of Singapore. Several of the 23 cohort samples were determined to be problematic samples and were subsequently removed from analysis. Those samples that were removed from analysis were samples designated as S9, 16, 18 and 24. Upon removal, the WGS sequencing data was reanalyzed based on the remaining 19 tumor/normal paired samples.
An additional and separate sample cohort, described herein as a validation sample cohort, of fresh frozen tumor/normal pairs was obtained from patients. The validation sample cohort initially included 50 tumor/normal paired samples. However, it was determined that several samples were problematic wherein they were removed from the validation cohort. Finally, 39 tumor/normal paired samples were sequenced for the validation cohort. Sequencing was performed by whole exome sequencing (WES) on the 39 tumor/normal pairs and the data was analyzed for biomarkers correlating to gastric cancer. Small tissue aliquots (8- 10mm3) were dissected from frozen tissue in liquid nitrogen for DNA extraction. Genomic DNA was extracted from fresh frozen tissue samples by phenol/chloroform extraction as known in the art. Briefly, 20-50 mg tissue was ground to a fine powder under liquid nitrogen and digested with 1 ml extraction buffer (0.5 % SDS, lOmM Tris HCL (pH 8), 100 mM EDTA ( pH 8 ), 20 μg/ ml pancreatic RNase) and 20 μΐ Proteinase K (20mg/ml) at 55°C over night. The digested samples were mixed with the same volume of buffer saturated
phenol/chloroform/isoamyl alcohol (Sigma cat# P 2069) for 10 minutes before centrifuging at 5000 rpm for 30 minutes at room temperature. The aqueous phase was transferred to a new tube. Five hundred of 7.5 M ammonium acetate and 2.5 ml of ice cold 100% Ethanol were added to precipitate DNA. After incubating at -20°C for 15 minutes, samples were centrifuged at 5000 rpm for 15 minutes. The supernatant was decanted and the precipitated DNA was washed once with ice cold 70% Ethanol. After 10 minutes of centrifugation at 5000 rpm, the ethanol was removed and the pellet was air dried at room temperature for 30 minutes. A final volume of 200 μΐ^ TE was added to dissolve the pellet at room temperature overnight. The DNA concentration was measured by the NanoDrop method (NanoDrop Technologies, Inc. Wilmington, DE).
Genomic DNA libraries from the pre-validation cohort samples were generated by adding 4μg of sample DNA to methods as defined in the Paired End Sample prep kit PE- 102-1001 (Illumina®, Inc.) following manufacturer's protocol. Briefly, DNA fragments are generated by random shearing and conjugated to a pair of oligonucleotides in a forked adaptor configuration. The ligated products are amplified using two oligonucleotide primers, resulting in double-stranded blunt-ended products having a different adaptor sequence on either end.
Genomic DNA libraries from the validation cohort samples were generated by adding isolated sample DNA to methods as defined in the TruSeq™ Exome Enrichment kit (Illumina, Inc.) following manufacturer's protocol. Briefly, DNA fragments are generated and the fragments are adaptor ligated on both the ends of the fragment.
Biotinylated probes are hybridized to the targeted exomic regions of the fragments and those hybridization complexes are captured using streptavidin magnetic coated beads. The beads are captured and the hybridized targets are eluted from the beads for downstream applications.
For both pre-validation and validation samples, clusters of DNA library fragments were formed prior to sequencing using the V3 cluster kit (Illumina®, Inc.). Briefly, products from a DNA library preparation are denatured and single strands annealed to complementary oligonucleotides on the flow-cell surface. A new strand is copied from the original strand in an extension reaction and the original strand is removed by denaturation. The adaptor sequence of the copied strand is annealed to a surface-bound complementary oligonucleotide, forming a bridge and generating a new site for synthesis of a second strand. Multiple cycles of annealing, extension and denaturation in isothermal conditions resulted in growth of clusters, each approximately 1 μιη in physical diameter. The DNA in each cluster is linearized by cleavage within one adaptor sequence and denatured, generating single-stranded template for sequencing by synthesis (SBS) to obtain a sequence read. To perform paired-read sequencing, the products of read 1 are removed by denaturation, the template is used to generate a bridge, the second strand is re-synthesized and the opposite strand is then cleaved to provide the template for the second read.
EXAMPLE 2- Sequencing and data analysis
For the pre-validation samples, WGS was performed using the Illumina®, Inc. V4 SBS kit with lOObp paired end reads on the Genome Analyzer IIx. Briefly, DNA templates are sequenced by repeated cycles of polymerase-directed single base extension. To ensure base-by-base nucleotide incorporation in a stepwise manner, a set of four reversible terminators, A, C, G and T each labelled with a different removable fluorophore are used. The use of modified nucleotides allows incorporation to be driven essentially to completion without risk of over-incorporation. It also enables addition of all four nucleotides simultaneously minimizing risk of misincorporation. After each cycle of incorporation, the identity of the inserted base is determined by laser-induced excitation of the fluorophores and fluorescence imaging is recorded. The fluorescent dye and linker is removed to regenerate an available group ready for the next cycle of nucleotide addition. The Genome Analyzer IIx is designed to perform multiple cycles of sequencing chemistry and imaging to collect sequence data automatically from each cluster on the surface of each lane of an eight-lane flow cell.
For single genome read aggregation and variant calling image analysis, base calling and Phred quality scoring were carried out using the Illumina analysis pipeline (RTA vl .4-1.8). Sequence reads were ignored from those clusters whose proximity to others resulted in mixed sequence data (purity-filtering).
Sequences were aligned using Elandv2e from CASAVA version 1.8 (Illumina®, Inc.) with full repeat resolution and orphan rescue (sensitive mode), to the human hgl9/GRCh37 reference sequence. The aligned reads were aggregated and sorted into chromosomes based on alignment positions. The sorted reads were used to call variants using Hyrax, a Bayesian SNV caller and GROUPER. The callers are part of the standard CASAVA 1.8 distribution and were run with default parameters. This process was carried out for the tumor and normal genomes. Somatic single nucleotide variant subtraction for calling somatic SNVs was performed by taking the list of positions in the tumor genome with snp quality values greater than 15 (Q(snp)tumor >15) and high confidence of the assigned genotype given the polymorphic prior (Q(max gt)tumor >20). For each putative SNV the normal sample was investigated. If a call was present in the normal sample at the same position as a putative SNV, and if the call had a quality value greater than 0 (Q(snp) normal >0), the position was filtered out as background.
To minimize false positive somatic SNV calls the putative SNVs were recalled (using Hyrax) in the tumor sample, however for recalling additional information from candidate indel contigs constructed from the normal sample was used. This process was utilized to avoid any indels that were initially missed in the tumor due to low supporting evidence. A candidate SNV was called when there was complete agreement between the initial SNV call and the recall. To further minimize false positive somatic SNV calls, variants were also recalled in the normal sample, using Hyrax with all read filtering turned off. If the posterior probability of the tumor genotype was higher than a non- reference genotype, then that SNV was considered to have low confidence evidence in the normal sample and was discarded.
When considering somatic indel subtraction for calling somatic insertion/deletions (indels) only those indels that were confidently called in the tumor sample and not present in the normal sample were considered. Indels in the tumor with a Q score less than 30 (Q(indel) <30) and those positions that had less than 10 reads coverage in the normal sample (for a 3 OX build) were filtered out. To be considered as evidence, a read had a single read alignment score >10 and a paired read alignment score >90. Positions were excluded if they mapped within 1000 bases of a known centromere or telomere (as obtained from the reference genome hgl9/GRCh37) as these locations typically contain highly repetitive regions and read alignments are problematic. For each putative somatic indel position in a tumor sample the indel position was matched with the region and calls present in the normal sample. If a putative somatic indel position overlapped with an indel call originating from the normal sample, the indel was considered to be present in normal germline and hence the position was filtered out. Given the repetitive nature of the human genome, the putative somatic indel region was characterized by finding the shortest sequence around the indel that extended outside any repeats and that region was matched with each intersecting read in the normal sample. If there was evidence in the normal sample having the same pattern in the intersecting normal reads the candidate somatic indel was discarded. To account for homopolymer slippage (i.e., the sequencing polymerase misincorporates, or misses, one or more bases in a sequence of identical bases) during a run, one read was allowed in the normal sample to have the same indel as found in the tumor.
After sequence calling, each class of variant was annotated against the Ensembl database release e59. Each somatic variant was queried for overlapping annotated features. For all gene features, it was considered whether a consequence of the somatic variant was synonymous, non-synonymous, or nonsense or if the variant could disrupt a canonical splice site at an intron/exon boundary. For variants that fell in a coding exon, the consequence of the change was analyzed and reported. Regulatory regions (e.g., 3' and 5 ' untranslated regions) of the gene feature were also reported. For pre-validation samples, coding regions were sequenced at approximately 10X coverage, with the majority depth of read of 40X depth. Table 5 exemplifies mutational locations from the original analysis of the 19 pre-validation sample cohort sequencing experiment (alignment to the hgl9/GRCh37 human reference genome). Table 5 serves to demonstrate the types of mutations that were identified on the pre-validation sample cohort. Table 5 -Exemplary data from the pre-validation 23 patient cohort samples
Figure imgf000049_0001
UBR5 snv:25924903
MYH11
F8
IGSF10
PIK3CA snv:56435669
DHX57 snv:21258577
PLEC
C6orfl0
MAP2
MDN1
MMRN1
AP3B1
CTNNB1
FSHR
KIAA1109
TRPM7
CNTLN snv:48677486
KIAA0182 snv:123111207
AC130364.1
RAB3GAP2
ASXL1
UBE3A snv:13792290
OTOG
FNIP1
APOB
RP1 indel:89981719
REV3L
PAPPA2
ABCB5 snv:90494798
LAMA5
ASPM
LRCH2
PCDH10
CR2
RP1- 21018.1
AC073995.2
MAP7D3 snv:153590898
FLNA
DNAH5
S9 S10 Sll S12 S13 S14
MUC4 snv:195505790 snv:195498625 snv: 195508682 snv:195507874 indel:179477976 |
TTN snv:179447828 snv:179588252
NEB
MUC16 snv:7574018
indel:9064310 |
TP53 snv:9046233
CSMD3 indel:103544375 indel:185963989 |
SYNEl snv:185992195 snv:186158661
COL11A1 snv:105418481 snv:105415519 indel:152779898|
ABCA13 snv:152462341
indel:151836879 |
HMCN1 snv:151845190
AHNAK2 snv:48411885 snv:48318454
MLL3 snv:114186074 snv:113303867
ATR snv:228475761 snv:228524822 snv:228495134
CEP290 snv:237755104
COL29A1 snv:92615962
indel:88524079| s
PCLO nv:88504981
indel:23912864| s
ANK2 nv:23911692 snv:23910973
RELN indel:8193949
AR snv:130110427
indel:142274740|
CACNA2D1 snv:142281624
FBN3 snv:114186132
LVRN snv:166999151 snv:166981235
indel:115361754|
OBSCN snv:115361772
indel:103301977 |
SACS snv:103123316 snv:103197510
RYR2 snv:66765695 snv:66863151
indel:81598292| s
FAT3 nv:81667490
TLL1 snv:82784425 snv:82474791
indel:109811737 |
C5orf42 snv:109808520 snv: 109801095
indel:120468376|
KCNMA1 snv:120479981
GPR98 snv:145302745 snv:145302704
AL592307.2 snv:15430580
COL7A1 snv:176679206
indel:197104336|
ATP10A snv:197072928
indel:207648294|
SPINK5 snv:207646351
indel:220355682 |
CELSR3 snv:220369701 snv:220387197
NOTCH 2 indel:27089706 indel:27089760
RNF43 indel:78729786
STARD8 snv:17577341 snv:17635178 snv:17655407
PTPRQ indel:49860159 snv:49854766
indel:81063246| s
CELSR2 nv:80943318
indel:21056126
ARID1A snv:210574871 7
VWA3B indel:25616652 UBR5
indel:50925140| s
MYH11 nv:50897257
F8 indel: 15802687
IGSF10 snv:85690028
PIK3CA snv:56437567
DHX57 snv:21230765
PLEC snv:39074213
C6orfl0 snv:49195973
MAP2 snv:96521687
MDN1 snv:98928449
MMRN1 snv:31022479
AP3B1 snv:60900398
CTNNB1 snv:151165200 snv:151166675
FSHR snv:178936047
indel:41274869| s
KIAA1109 nv:41275678
indel:48631052| s
TRPM7 nv:48625802
indel:48685382| s
CNTLN nv:48679799
KIAA0182 snv:123170717
AC130364.1
indel:90872905| s
RAB3GAP2 nv:90856951
indel:130815369 |
ASXL1 snv:130841055
UBE3A snv:13708392
OTOG snv:147505145 snv:147493958
indel:37139475| s
FNIP1 nv:37169266
APOB snv:77409690 snv:77458646
RP1
REV3L indel:111665249 snv:111688743
indel:55541377| s
PAPPA2 nv:55647227
indel:90491072| s
ABCB5 nv:90434949
LAMA5 snv:20683145 snv:20685418 snv:20689706
indel:103289349 |
ASPM snv:103300416
indel:144992771 |
LRCH2 snv:144998233 snv:144997253
PCDH10 snv:17340948
CR2 indel:32303237
RP1- 21018.1 indel:114358568 snv:114414209 indel:135328310|
AC073995.2 snv:135301793
MAP7D3 snv:153593195 indel:154158790 |
FLNA snv:154159581
DNAH5 indel:67943482
S15 S16 S17 S18 S19 S20
MUC4 snv:195515056 snv:195510190 snv:195509756
indel:179459325 |
TTN snv:179495862 snv: 179408994 snv:179465658 snv:179640839 indel:152550922 |
NEB snv:152499731 snv:152471039 snv:152374910 snv:152363455
MUC16 indel:7578398 snv:7577538 snv:7578406
TP53 snv:9075760 snv:9091573
CSMD3
SYNE1 snv:185985245
C0L11A1 snv:105410978
indel:152552705 |
ABCA13 snv:152510512
HMCN1 snv:151884920 indel:151874148
AHNAK2 snv:48375101
MLL3 snv:113657343 snv:113933896
ATR indel:228474704
CEP290 snv:237604778 snv:237756908
COL29A1 snv:92533528 snv:92087143
PCLO indel:88471631 snv:88465575
ANK2 snv:23904832
RELN snv:8188821
AR snv:130107920
CACNA2D1 snv:142286941
FBN3 snv:114277524 snv:114279867
LVRN snv:166795183
OBSCN snv:115358141
SACS indel:103301977 snv:103205971
RYR2 snv:66766072 snv:66931291
FAT3 indel:81598292
TLL1 snv:82584570
C5orf42 indel:109801413
KCNMA1 snv:120491184
GPR98 snv:145366857 snv:145313390
AL592307.2 snv:15361331
COL7A1 snv:176668289
ATP10A indel:197091176
SPINK5 snv:207653332
indel:22037928
CELSR3 6
NOTCH 2
RNF43 indel:78729786
STARD8
PTPRQ snv:49855070 snv:49854953
CELSR2
ARID1A snv:210561275
VWA3B snv:25605614 UBR5 indel:25953420
MYH11 snv:50929658
F8 indel:15802687 snv:15857699
IGSF10 indel:85682290 snv:85698638
PIK3CA indel:56435161
DHX57 indel:21247751
PLEC indel:39074185
C6orfl0 indel:49195930 indel:49295359 snv:49190976 indel:96557436 | s
MAP2 nv:96548073
MDN1 indel:98928507
MMRN1
AP3B1 snv:60912881
CTNNB1 snv:151166400
FSHR snv:178936091
KIAA1109 snv:41268787
indel:48624533 | s
TRPM7 nv:48615774
CNTLN snv:48699107
KIAA0182 snv:123247036
AC130364.1 indel:134072551 snv:134071678
RAB3GAP2 snv:90849035
ASXL1 indel:131039814
UBE3A indel:13830195
OTOG snv:147510826
FNIP1 snv:37169390
APOB indel:77334907
RP1 indel:89947490
REV3L indel:111689017
PAPPA2 indel:55638972 snv:55671623
ABCB5 indel:90491072
LAMA5 snv:20668362
ASPM indel:103289349
LRCH2 snv:144991958
PCDH10 indel:17466109
CR2 snv:32299928
RP1- 21018.1 indel:114358568
AC073995.2 indel:135328310
MAP7D3 indel:153594939
FLNA snv:154176033
DNAH5 snv:67937971
S21 S22 S23 S24 S25
MUC4 snv:195507011 snv:195508586 snv:195510217 snv:195505828 snv:195505790
TTN snv:179486270 snv: 179484826 snv:179431219
indel:152550922
NEB snv:152410371 | snv:152349874
MUC16 snv:7574014
Figure imgf000055_0001
PIK3CA snv:56492757
DHX57 snv:21238327
indel:39074185 |
PLEC snv:39029976
C6orfl0
MAP2 snv:96604606
indel:98928507 |
MDN1 snv:98779407
indel:31022442 |
MMRN1 snv:31022247 snv:31022280
AP3B1 snv:60913153 snv:60886353
CTNNB1 snv:151176316
FSHR snv:178936091 snv:178921549
KIAA1109 snv:41275669
TRPM7
CNTLN
KIAA0182 snv:123210241
AC130364.1 snv:134111306 indel:134071968
RAB3GAP2 indel:90872905
ASXL1 snv:131039914
UBE3A snv:13735963
OTOG indel:147499875
FNIP1 indel:37167150
APOB indel:77521435
indel:90124781 |
RP1 snv:89924472
REV3L indel:111695064
PAPPA2
ABCB5
LAMA5
ASPM indel:103299795
LRCH2
PCDH10 snv: 17409445 indel:17135271
indel:32303237 |
CR2 snv:32260804
RP1- 21018.1 indel:114358568
AC073995.2 indel:135328310
MAP7D3 snv:153588017
FLNA snv:154159916
indel:67943520|
DNAH5 snv:67938104
The cohort samples S9, SI 6, SI 8 and S24 were subsequently excluded and a reanalysis of the original data, minus those excluded samples was performed. Following data alignment to the reference genome (hgl9/GRCh37) using CASAVA v.1.8 tumor specific mutations were detected in each patient by subtracting matching normal tissue variants from the tumor variants using the Strelka algorithm (Saunder et al.,
Bioinformatics, in press, incorporated herein by reference in its entirety). The somatic variants were annotated using Variant Effect Predictor (VEP) from Ensembl API (Flicek, 2012, Nucl Acid Res 40:D84-D90, incorporated herein by reference in its entirety). The genes were ranked according to four metrics; the number of samples with functional mutations as defined by VEP analysis, the p-values defined using Fisher's exact test using the number of functional versus all mutations annotated to the gene (excluding intronic mutations), the VEP produced Condel score, and the length of the cDNA of the gene. The final ranking of each gene reflects a weighted average rank of those four metric values with the first three rankings having a weight of 10 and the cDNA metric value having a weight of one.
Reanalysis of the pre-validation cohort without the identified problematic samples resulted in a ranked list of candidate biomarker genes. Table 3 reports the list of biomarkers that represents mutational hot spots that can be correlated in pre-validation sample data to the presence of gastric cancer.
Validation cohort experimentation was performed to support and validate the biomarkers identified in the pre-validation sample cohort. Whole exome sequencing was performed on a 39 validation sample cohort (tumor/normal pairs) from an original validation cohort of 50 validation tumor/normal pairs (problematic samples were not analyzed resulting in the validation cohort of 39 tumor/normal pairs). Data analysis identified biomarkers that could be significantly correlated with the presence of gastric cancer as shown in Table 4. Table 6 is an example of the unique mutations and their locations (duplicates not shown; sequences as when aligned to the hgl9 NCBI reference genome) as identified in the 39 validation sample cohort, thereby demonstrating and supporting the identified markers as mutational hot spots that can be correlated to the presence of gastric cancer as detailed in Table 4.
Table 6- Validation sample unique mutations in biomarkers
Gene Position-variant Sample Gene Position-variant Sample AHNAK2 chrl4:105412293A>G V101 MUC16 chrl9:9087700T>C V116 chrl4:105419179A>G
chrl4:105415305G>A chrl9:8993029C>T
AHNAK2 chrl4:105415665G>A V103 MUC16 chrl9:9088016T>G V120 chrl9:9076778T>C chrl4:105408997T>C chrl9:9086224A>G
AHNAK2 chrl4:105408985C>T V104 MUC16 chrl9:9072257T>G V136 chrl9:8993449A>C
AHNAK2 chrl4:105418917C>A V109 MUC16 chrl9:8993426T>C V156
AHNAK2 chrl4:105410269G>A V116 MUC16 chrl9:9061352A>C V74 chrl4:105408997T>C
chrl4:105419125C>T
chrl4:105409392T>C
chrl4:105408985C>T chrl9:9090228C>A
AHNAK2 chrl4:105411590G>C V124 MUC16 chrl9:8993375T>C V75 chrl4:105417807T>C
chrl4:105419125C>T
chrl4:105415846T>C
AHNAK2 chrl4:105411604;C>G V126 MUC16 chrl9:9087127T>A V94 chr3:195510129G>T
AHNAK2 chrl4:105419125C>T V129 MUC4 chr3:195509877G>A V104 chrl4:105409593T>G chr3:195508854G>C
AHNAK2 chrl4:105416452C>T V131 MUC4 chr3:195508035G>C V109
AHNAK2 chrl4:105415947A>C V135 MUC4 chr3:195508035G>C V126 chrl4:105409183G>T
AHNAK2 chrl4:105409182G>T V139 MUC4 chr3:195507547C>G V129
AHNAK2 chrl4:105412293A>G V140 MUC4 chr3:195510189G>A V131
AHNAK2 chrl4:105416620T>C V141 MUC4 chr3:195506481G>T V135
AHNAK2 chrl4:105411764G>C V154 MUC4 chr3:195510655A>T V135 chr3:195505875G>C
AHNAK2 chrl4:105411590G>C V156 MUC4 chr3:195506404G>A V136 chr3:195505809G>T chr3:195508050G>T
AHNAK2 chrl4:105407709A>G V74 MUC4 chr3:195510666A>C V140 chrl4:105414031G>T chr3:195506020G>A chrl4:105408393-A>C chr3:195508834C>T
AHNAK2 chrl4:105409123G>T V81 MUC4 chr3:195507538G>A V140
AHNAK2 chrl4:105416506T>C V92 MUC4 chr3:195508077G>T V154
AHNAK2 chrl4:105418797T>C V98 MUC4 chr3:195509475A>G V156
AR chrX:66863123T>A V136 MUC4 chr3:195508468G>A V45
AR chrX:66765455T>A V139 MUC4 chr3:195505993C>A V70
AR chrX:66765075G>A V62 MUC4 chr3:195508854G>C V74 chr3:195509217G>T chr3:195508003T>A
AR chrX:66765228G>C V86 MUC4 chr3:195508677G>T V75 ARID1A chrl:27056153A>G V116 MUC4 chr3:195507268G>A V79
ARID1A chrl:27106648G>T V135 MUC4 chr3:195506411G>C V80
ARID1A chrl:27089624G>A V156 MUC4 chr3:195508378C>T V81 chrl:27056237C>T chr3:195510145G>C
ARID1A chrl:27105744G>T V55 MUC4 chr3:195509545C>T V86 chrl:27106362T>C
ARID1A chrl:27089729G>T V79 MUC4 chr3:195508110G>T V92 chrl:27094439G>A chr3:195508378C>T
ARID1A chrl:27059259C>T V87 MUC4 chr3:195509179C>G V94
ASPM chrl:197098434G>T V108 PCLO chr7:82764156C>A V102
ASPM chrl:197112198G>T V116 PCLO chr7:82531984T>C V120
ASPM chrl:197086944G>A V117 PCLO chr7:82545923T>C V136 chrl:197070696T>C
ASPM chrl:197073405C>A V74 PCLO chr7:82545285>CT V45
CEP290 chrl2:88457795G>C V139 PCLO chr7:82791816C>T V45
CEP290 chrl2:88508257T>C V74 PCLO chr7:82595426T>C V74 chr7:82784571T>C
CR2 chrl:207644141G>T V109 PCLO chr7:82784572G>T V80
CR2 chrl:207649707C>A V140 PCLO chr7:82387918A>C V88
CR2 chrl:207646177A>C V88 PCLO chr7:82585123C>A V92
CR2 chrl:207648299T>A V98 PLEC chr8:144995496C>T V102
FAT4 chr4:126372219A>T V101 PLEC chr8:144995604C>T V94 chr4:126240894A>T
chr4:126241298A>G chrl3:23914706A>C
FAT4 chr4:126240893G>T V120 SACS chrl3:23913655C>G V117 chrl3:23939299C>G
FAT4 chr4:126370821T>G V138 SACS chrl3:23909465C>A V139
FAT4 chr4:126371849G>T V140 SACS chrl3:23910557G>T V75 chr4:126372741G>A
FAT4 chr4:126372531C>G V74 SACS chrl3:23905607A>T V88 chrX:153581819C>T
FLNA chrX:153588453C>T V120 TP53 chrl7:7577057C>A V102
FLNA chrX:153580704C>T V75 TP53 chrl7:7577093G>A V104
GPR98 chr5:89940517C>A V101 TP53 chrl7:7578405-C>T V108
GPR98 chr5:90445990G>A V116 TP53 chrl7:7577113C>T V116
GPR98 chr5:89975394G>T V139 TP53 chrl7:7578264A>G V117
GPR98 chr5:90368390G>T V75 TP53 chrl7:7579698C>A V124
GPR98 chr5:89953757G>A V81 TP53 chrl7:7577117C>A V136 chr5:90055289A>C
GPR98 chr5:90041564A>C V88 TP53 chrl7:7579587G>T V138
GPR98 chr5:90459599T>G V98 TP53 chrl7:7578454C>G V139
MACF1 chrl:39851238A>G V103 TP53 chrl7:7578473->G V140
MACF1 chrl:39824857C>T V135 TP53 chrl7:7578477G>C V141 MACF1 chrl:39896529A>G V136 TP53 chrl7:7577547C>T V42
MACF1 chrl:39815142T>A V74 TP53 chrl7:7577119C>T V45
MACF1 chrl:39894920A>G V88 TP53 chrl7:7576857G>- V62
MACF1 chrl:39763307G>A V92 TP53 chrl7:7578207-T>C V74
chrl5:43817736C>T
MAP1A chrl5:43819832G>T V75 TP53 chrl7:7578274G>A V79
chrl7:7578364G>T
MAP1A chrl5:43819962C>T V76 TP53 chrl7:7577554G>C V80
MUC16 chrl9:9085591A>G V101 TP53 chrl7:7578405C>T V81
MUC16 chrl9:9007840A>G V108 TP53 chrl7:7577093G>C V88
EXAMPLE 3- H. pylori histological and serological determination
Serological testing for H. pylori as reported in Figure 1 was performed using the Helico Blot 2.1 kit (MP Biomedicals, Singapore) according to manufacturer's instructions.
All publications and patents mentioned in the present application are herein incorporated by reference. Various modification and variation of the described methods and compositions of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention that are obvious to those skilled in the relevant fields are intended to be within the scope of the following claims.

Claims

1. A method for determining the presence of gastric cancer comprising:
a) detecting in a nucleic acid sample from a subject the presence of a plurality of mutations in two or more genes selected from the group comprising MUC4, MEC16, TP53, SACS, ARID 1 A, FLNA, FAT4, ASPM, AHNAK2, CEP290, PCLO, GPR98, CR2, AR, PLEC, MACF1 and MAPI A, wherein at least one of said genes is from the list consisting of SACS, FLNA, ASPM, PCLO, CR2, and MAPI A
b) evaluating the probability that the two or more genes are correlated with
gastric cancer, and
c) determining the presence of gastric cancer in a sample based on said
evaluation.
2. The method of claim 1, wherein said evaluating comprises comparing the nucleic acid sample to a matched normal nucleic acid sample.
3. The method of claim 1 and 2, wherein said nucleic acid sample is from a human subject.
4. The method of any preceding claim, wherein said nucleic acid sample is a
genomic DNA sample.
5. The method of claim 4, wherein said genomic DNA sample is isolated from a sample selected from the group consisting of a tissue sample, a biopsy sample, a cell sample, a circulating tumor cell sample, a fixed tissue sample or a frozen tissue sample.
6. The method of claim 2, wherein the matched normal nucleic acid sample is a genomic DNA sample.
7. The method of any preceding claim, wherein said detecting a plurality of mutations is in three more genes selected from the group consisting of MUC4, MEC16, TP53, SACS, ARID 1 A, FLNA, FAT4, ASPM, AHNAK2, CEP290, PCLO, GPR98, CR2, AR, PLEC, MACF1 and MAPI A, wherein at least one of said genes is from the list consisting of SACS, FLNA, ASPM, PCLO, CR2, and MAPI A.
8. The method of any preceding claim, wherein said determining further comprises determining the location of the gastric cancer from a sample.
9. The method of any preceding claim, wherein said evaluating comprises one or more of sequencing the nucleic acids, applying the nucleic acids to microarray analysis or performing polymerase chain reaction on the nucleic acids..
10. The method of claim 9, wherein said sequencing comprises sequence by synthesis or wherein said polymerase chain reaction comprises quantitative or real time polymerase chain reaction.
11. A method for diagnosing or prognosing the presence of gastric cancer comprising:
a) providing a nucleic acid sample from a subject,
b) detecting a plurality of mutations in two or more genes selected from the group comprising MUC4, MEC16, TP53, SACS, ARID 1 A, FLNA, FAT4, ASPM, AHNAK2, CEP290, PCLO, GPR98, CR2, AR, PLEC, MACF1 and MAPI A, wherein at least one of said genes is from the list consisting of SACS, FLNA, ASPM, PCLO, CR2, and MAPI A, c) evaluating the probability that the two or more genes are correlated with gastric cancer, and
d) determining the presence of gastric cancer in a sample based on said evaluation.
12. The method of claim 11, wherein said evaluating comprises comparing the nucleic acid sample to a matched normal nucleic acid sample and wherein said samples are genomic DNA samples from a human subject.
13. The method of claim 12, wherein said genomic DNA sample is isolated from a sample selected from the group consisting of a tissue sample, a biopsy sample, a cell sample, a circulating tumor cell sample, a fixed tissue sample or a frozen tissue sample.
14. The method of any of claim 11 through 13, wherein said detecting a plurality of mutations is in three more genes selected from the group consisting of MUC4, MEC16, TP53, SACS, ARID 1 A, FLNA, FAT4, ASPM, AHNAK2, CEP290, PCLO, GPR98, CR2, AR, PLEC, MACF1 and MAPI A, wherein at least one of said genes is from the list consisting of SACS, FLNA, ASPM, PCLO, CR2, and MAPI A.
15. The method of any of claim 11 through 14, wherein said determining further comprises determining the location of the gastric cancer from a sample.
16. The method of any of claims 11-15, wherein said evaluating comprises one or more of sequencing the nucleic acids, applying the nucleic acids to microarray analysis or performing polymerase chain reaction on the nucleic acids..
17. The method of claim 16, wherein said sequencing comprises sequence by
synthesis or wherein said polymerase chain reaction comprises quantitative or real time polymerase chain reaction.,
PCT/US2012/040501 2011-06-01 2012-06-01 Gastric cancer biomarkers WO2012167112A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161492061P 2011-06-01 2011-06-01
US61/492,061 2011-06-01

Publications (2)

Publication Number Publication Date
WO2012167112A2 true WO2012167112A2 (en) 2012-12-06
WO2012167112A9 WO2012167112A9 (en) 2013-03-07

Family

ID=47260389

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/040501 WO2012167112A2 (en) 2011-06-01 2012-06-01 Gastric cancer biomarkers

Country Status (1)

Country Link
WO (1) WO2012167112A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103954767A (en) * 2014-04-02 2014-07-30 上海交通大学医学院附属仁济医院 Novel application of TRIM59 protein for diagnosis and treatment for stomach cancer
US20190362808A1 (en) * 2017-02-01 2019-11-28 The Translational Genomics Research Institute Methods of detecting somatic and germline variants in impure tumors
CN111172271A (en) * 2020-01-17 2020-05-19 中国辐射防护研究院 Application of UIMC1 gene as molecular marker for judging susceptibility of radiation damage
WO2021062041A1 (en) * 2019-09-24 2021-04-01 Joshua Labaer Novel antibodies for detecting gastric cancer
KR102281058B1 (en) * 2020-07-10 2021-07-23 서울대학교병원 Composition for predicting or diagnosing gastric cancer comprising detection agent of mutation of MUC4 gene

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103954767A (en) * 2014-04-02 2014-07-30 上海交通大学医学院附属仁济医院 Novel application of TRIM59 protein for diagnosis and treatment for stomach cancer
US20190362808A1 (en) * 2017-02-01 2019-11-28 The Translational Genomics Research Institute Methods of detecting somatic and germline variants in impure tumors
US11978535B2 (en) * 2017-02-01 2024-05-07 The Translational Genomics Research Institute Methods of detecting somatic and germline variants in impure tumors
WO2021062041A1 (en) * 2019-09-24 2021-04-01 Joshua Labaer Novel antibodies for detecting gastric cancer
CN111172271A (en) * 2020-01-17 2020-05-19 中国辐射防护研究院 Application of UIMC1 gene as molecular marker for judging susceptibility of radiation damage
KR102281058B1 (en) * 2020-07-10 2021-07-23 서울대학교병원 Composition for predicting or diagnosing gastric cancer comprising detection agent of mutation of MUC4 gene
WO2022010312A1 (en) * 2020-07-10 2022-01-13 서울대학교병원 Composition comprising muc4 gene mutation detecting agent for prediction or diagnosis of gastric cancer

Also Published As

Publication number Publication date
WO2012167112A9 (en) 2013-03-07

Similar Documents

Publication Publication Date Title
KR101437718B1 (en) Markers for predicting gastric cancer prognostication and Method for predicting gastric cancer prognostication using the same
US8093001B2 (en) Detection and diagnosis of smoking related cancers
KR101918004B1 (en) Diagnostic methods for determining prognosis of non-small cell lung cancer
CN105917008B (en) Gene expression panels for prognosis of prostate cancer recurrence
JP2021525069A (en) Cell-free DNA for assessing and / or treating cancer
US20120115735A1 (en) Pathways Underlying Pancreatic Tumorigenesis and an Hereditary Pancreatic Cancer Gene
BRPI0708534A2 (en) molecular assay to predict recurrence of colon cancer dukes b
JP4435259B2 (en) Detection method of trace gastric cancer cells
JP2017532959A (en) Algorithm for predictors based on gene signature of susceptibility to MDM2 inhibitors
WO2014071279A2 (en) Gene fusions and alternatively spliced junctions associated with breast cancer
JP6388615B2 (en) Detection of chromosomal abnormalities associated with prognosis of non-small cell lung cancer
JP6106257B2 (en) Diagnostic methods for determining the prognosis of non-small cell lung cancer
WO2012167112A2 (en) Gastric cancer biomarkers
WO2016118670A1 (en) Multigene expression assay for patient stratification in resected colorectal liver metastases
WO2016014941A1 (en) Method to diagnose malignant melanoma in the domestic dog
JP2006223303A (en) Method for detecting fine amount of gastric cancer cell
KR101847815B1 (en) A method for classification of subtype of triple-negative breast cancer
JP2022528182A (en) A composition for diagnosing or predicting a glioma, and a method for providing information related thereto.
EP2550534A1 (en) Prognosis of oesophageal and gastro-oesophageal junctional cancer
WO2012135635A2 (en) Ovarian cancer biomarkers
EP2978861A2 (en) Unbiased dna methylation markers define an extensive field defect in histologically normal prostate tissues associated with prostate cancer: new biomarkers for men with prostate cancer
KR101864331B1 (en) Predicting kit for survival of lung cancer patients and the method of providing the information for predicting survival of lung cancer patients
KR101805977B1 (en) Predicting kit for survival of lung cancer patients and the method of providing the information for predicting survival of lung cancer patients
CN116287180A (en) Application of reagent for detecting marker in preparation of kit for diagnosing asthma
CN117660640A (en) Methylation biomarker, kit and method for auxiliary detection of EGFR gene mutation of lung cancer somatic cells

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12792570

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12792570

Country of ref document: EP

Kind code of ref document: A2