WO2019174004A1 - System and method for determining lung cancer - Google Patents

System and method for determining lung cancer Download PDF

Info

Publication number
WO2019174004A1
WO2019174004A1 PCT/CN2018/079155 CN2018079155W WO2019174004A1 WO 2019174004 A1 WO2019174004 A1 WO 2019174004A1 CN 2018079155 W CN2018079155 W CN 2018079155W WO 2019174004 A1 WO2019174004 A1 WO 2019174004A1
Authority
WO
WIPO (PCT)
Prior art keywords
znf649
lhx5
rasgrf2
cacna1e
methylation
Prior art date
Application number
PCT/CN2018/079155
Other languages
French (fr)
Inventor
Jian-Bing Fan
Weihong XU
Jinsheng TAO
Meng Yang
Original Assignee
Anchordx Medical Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anchordx Medical Co., Ltd. filed Critical Anchordx Medical Co., Ltd.
Priority to PCT/CN2018/079155 priority Critical patent/WO2019174004A1/en
Publication of WO2019174004A1 publication Critical patent/WO2019174004A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Definitions

  • the present invention generally relates to systems and methods for determining lung cancer.
  • Lung cancer is now ranking as the leading cause of death among malignant tumors. Although a combined use of surgery, chemotherapy, radiotherapy, and targeted therapy has significantly improved the survival of patients with lung cancer, the prognosis remains poor. One important reason is the difficulty to detect early-stage lung cancer, and most patients have already developed into advanced stage upon diagnosis. High sensitivity detection methods such as Low-dose CT (LDCT) screening have been developed for early diagnosis of lung cancer and are proven to be able to significantly reduce lung cancer mortality (e.g. by 20%) . However, high sensitivity sometimes comes with a price of low accuracy.
  • LDCT Low-dose CT
  • the present disclosure provides a method for in vitro diagnosis of lung cancer in a subject.
  • the method comprises: measuring methylation level of CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B in a sample from the subject; comparing the measured methylation level of the CpG island to a corresponding predetermined reference level; and determining a likelihood of the subject having lung cancer, wherein, the measured methylation level of the CpG island in the sample comparable to the reference level is indicative of the subject having lung cancer.
  • the at least one gene is selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E. In one embodiment, the at least one gene is selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, ANKRD18B. In some embodiments, the at least one gene is selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E.
  • the measuring step comprising measuring methylation level of CpG islands of ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, and CACNA1E. In some embodiments, the measuring step comprising measuring methylation level of CpG islands of ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, and ANKRD18. In some embodiments, the measuring step comprising measuring methylation level of CpG islands of ZNF649, LHX5-AS1, RASGRF2, and CACNA1E.
  • the sample is blood sample.
  • the blood sample comprises cell free DNA.
  • the blood sample is processed prior to the measuring step, and the step of processing comprises enrichment or purification of the cell free DNA.
  • the methylation levels are measured by an amplification assay, a hybridization assay, a sequencing assay or an array.
  • the sample is amplified to construct a sequencing library before the measuring step.
  • the sequencing library is prepared with AnchorIRIS TM method.
  • the comparing step is performed by a processor of a computing device the determining step is performed by a processor of a computing device.
  • the determining step comprises using a machine learning model.
  • the machine learning model is Least Absolute Shrinkage and Selection Operator.
  • Non-transitory computer readable medium having instructions stored thereon.
  • the instructions when executed by a processor, cause the processor to: retrieving methylation level of CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B in a sample from the subject; comparing the retrieved methylation level of the CpG island to a corresponding predetermined reference level; and determining a likelihood of the subject having lung cancer, wherein, the retrieved methylation level of the CpG island in the sample comparable to the reference level is indicative of that the subject having lung cancer.
  • the present disclosure provides a kit comprising methylation detection probes for detecting CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B.
  • the kit of the present disclosure comprises methylation detection probes for detecting CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E.
  • the kit of the present disclosure comprises methylation detection probes for detecting CpG island of at least one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, ANKRD18B. In some embodiments, the kit of the present disclosure comprises methylation detection probes for detecting CpG island of at least one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E. In some embodiments, the kit is a microarray. In some embodiments, the kit can be used in asequencing device.
  • the present disclosure provides a system for in vitro diagnosis of lung cancer in a test subject, the system comprising a kit of the present disclosure; and a non-transitory computer readable medium of the present disclosure.
  • FIG. 1 shows the AnchorIRIS TM assay and performance assessment.
  • A Workflow of the ultra-sensitive AnchorIRIS TM library preparation method.
  • B-D A bake-off experiment comparing assay performance between the AnchorIRIS TM assay and the accel-NGS Methyl-seq TM assay. The IRIS assay presents superior molecule conversion efficiency (C and D) with much higher average unique coverage for each input amount tested (B) .
  • E and F The sensitivity of the AnchorIRIS TM assay was evaluated by diluting tumor gDNA into WBC gDNA, showing that significantly more informative co-methylated CpG regions above WBC background can be detected at dilutions ⁇ 0.033%by Z-test (E) . Dilutions higher than 10% (gray box) preserve a linear response of average co-methylation signal to the tumor fractions of input DNA (F) ..
  • FIG. 2 shows a Assay performance and sequencing quality metrics comparing the AnchorIRIS TM (IRM) assay to the Accel-NGS Methyl-Seq TM (SWT) assay.
  • IRM AnchorIRIS TM
  • SWT Accel-NGS Methyl-Seq TM
  • FIG. 3 shows assay performance metrics of serial dilution samples.
  • FIG. 4 shows characterization of tissue level hypermethylation signatures of lung cancer.
  • A Differentially methylated MCBs were identified using the Wilcoxon rank sum test comparing 33 invasive adenocarcinoma (IA) samples to 78 benign lesions. Hypermethylated MCBs were determined with FDR ⁇ 0.05 in the volcano plot.
  • B Heatmap showing hypermethylation signature MCB regions for representative lung cancer and benign tissue samples. Methylation level of each MCB was calculated as co-methylated read counts per million total mapped reads (CPM) . Samples are ordered from left to right by malignant/benign status (top color bar) and corresponding subtypes (second color bar) .
  • CPM co-methylated read counts per million total mapped reads
  • Signal is shown in linear scale of color, with red indicating high methylation signal and green indicating low methylation signal.
  • ROC receiver operating curve
  • CI 95%confidence interval
  • CI 95%confidence interval
  • FIG. 5 shows lung cancer tissue co-methylation patterns can be captured in the cfDNA pool. Concordance of co-methylation between paired tissue (row) and plasma (column) samples is calculated using the percentage of reads sharing pre-defined co-methylation patterns and displayed in the heatmap. The highest similarity of a tissue sample to its matched plasma is shown in the diagonal of the heatmap, with ranking and Wilcoxon test p-values of each self-pair compared to the rest tissue-plasma pairs shown on the right. The smaller the rank (and p-value) , the better the match of self-pair.
  • FIG. 6 shows cancer classification using plasma DNA.
  • A Workflow chart of building a plasma level diagnostic prediction model.
  • B Heatmap of the 9 methylation markers used for the diagnostic prediction model in the training and independent test data sets. Methylation level of each MCB was calculated as the percentage of co-methylated reads.
  • C and D ROC curves plot the performance of plasma level classification with the 95%confidence interval (CI) of sensitivity in the training (C) and test (D) data sets.
  • E Univariate (left) and multivariate (right) analyses were performed using logistic regression to determine significant clinical co-variates of malignancy for early stage lung cancers.
  • P partial solid nodule
  • S solid nodule
  • G ground-glass nodule.
  • a method for in vitro diagnosis of lung cancer in a subject.
  • the method comprises the steps of measuring in a sample from the subject the methylation level of one or more CpG island (s) of interest; comparing the measured methylation level of the CpG island (s) to a corresponding predetermined reference level; and determining a likelihood of the subject having lung cancer, wherein, the measured methylation level of the CpG island in the sample comparable to the reference level is indicative of the subject having lung cancer.
  • the terms “subject” and “individual” are used interchangeably and refer to a human or any non-human animal (e.g., mouse, rat, rabbit, dog, cat, cattle, swine, sheep, horse or primate) .
  • a human includes pre and post-natal forms.
  • a subject is a human being.
  • a subject can be those suspected to be afflicted with a disease or disorder but may or may not display symptoms of the disease or disorder.
  • lung cancer refers to any type of lung cancer, including but are not limited to adenocarcinoma, squamous carcinoma (SC) , large cell lung cancer (LCLC) , small cell lung cancer (SCLC) , and non-small cell lung cancer (NSCLC) .
  • SC squamous carcinoma
  • LCLC large cell lung cancer
  • SCLC small cell lung cancer
  • NSCLC non-small cell lung cancer
  • the “lung cancer” of the present disclosure can be at any stage.
  • the diagnostic method provided herein is useful in determining lung cancer at a rather early stage, for example, at or before stage Ib (3cm ⁇ tumor size ⁇ 5cm, no metastasis) , at or before stage Ia (tumor size ⁇ 3cm, no metastasis) , or even before any symptom is developed.
  • sample refers to any DNA-containing biological sample, including but not limited to cells (e.g., bacteria, yeast, virus, plant cells, animal cells and the like) , tissues (e.g., biopsy tissue, paraffin embedded tissue and the like) , and body fluids (e.g., blood, plasma, serum, saliva, pleural effusion, amniocentesis fluid, seroperitoneum and the like) .
  • cells e.g., bacteria, yeast, virus, plant cells, animal cells and the like
  • tissues e.g., biopsy tissue, paraffin embedded tissue and the like
  • body fluids e.g., blood, plasma, serum, saliva, pleural effusion, amniocentesis fluid, seroperitoneum and the like.
  • the “sample” is a primary biological sample directly derived from a subject suspected to be afflicted with a disease or disorder
  • examples of such primary biological sample include but are not limited to, blood, serum, plasma, urine, pleural effusion, or biopsied tissue (e.g., tumor tissue) .
  • the DNA-containing sample is a biopsied lung tissue.
  • Exemplary techniques that may be used to obtain a biopsied lung tissue including but are not limited to needle biopsy, bronchoscopy biopsy and surgery.
  • the sample is blood, serum or plasma
  • the DNA contained in the sample can be genomic DNA or cell-free DNA (cfDNA) .
  • cfDNA cell-free DNA
  • the term “cell-free DNA” or “cfDNA” as used herein refers to DNA free from cells found in circulatory system (e.g., blood) , the source of which is generally believed to be genomic DNA released during cell apoptosis. Studies showed that the size of most cell-free DNA in human body is about 160bp (see Fan et al., (2010) Analysis of the Size Distributions of Fetal and Maternal Cell-Free DNA by Paired-End Sequencing, Clin Chem 56: 8 1279-86) .
  • circulating tumor DNA Cell-free DNA originated from tumor cells is referred as “circulating tumor DNA” or “ctDNA” .
  • a tumor cell may release its genomic DNA into the blood due to causes such as apoptosis and immune responses. Since a normal cell may also release its genomic DNA into the blood, circulating tumor DNA usually constitutes only a very small part of cell-free DNA in the blood (e.g., 0.1% ⁇ 50%of total cfDNA, or down to 0.01%or lower at early stages of tumorigenesis) .
  • the DNA contained in the sample is circulating tumor DNA (ctDNA) .
  • a sample shall contain at least about 0.2 ng, 0.5 ng, 1 ng, 2 ng, 3 ng, 4 ng, 5 ng, 6 ng, 7 ng, 8 ng, 9 ng, 10 ng, 15 ng, 20 ng, 25 ng, 30 ng, 45 ng, 50 ng, 55 ng, 60 ng, 65 ng, 70 ng, 75 ng, 80 ng, 85 ng, 90 ng, 95 ng, or 100 ng of DNA.
  • the samples contain 20 ⁇ 5 ng DNA.
  • the samples contain 1-100 ng or over 100 ng DNA.
  • the sample is processed to release/enrich/purify the DNA prior to the measuring steps.
  • the sample is tissue or numbers of cells, and the sample is processed prior to the measuring step, while the step of processing comprises releasing of the DNA from within the cells.
  • DNA may be extracted from the sample using techniques well-known to those of skill in the art, including chemical extraction techniques utilizing phenol-chloroform (Sambrook et al., 1989) , guanidine-containing solutions, or CTAB-containing buffers.
  • commercial DNA extraction kits are also widely available from laboratory reagent supply companies, including for example, the QIAamp DNA Mini prep kit available from QIAGEN (Chatsworth, Calif.
  • the sample is blood sample which contains cell free DNA, and the blood sample is processed prior to the measuring step, while the step of processing comprises enrichment or purification of the cell free DNA.
  • the methylation level of CpG island (s) of interest in the sample is determined.
  • the method provided herein at least partially involves detection of methylation levels of certain CpG islands of interest in samples such that to provide an earlier and/or more sensitive/accurate diagnostic result for lung cancer detection in such samples.
  • CpG is shorthand for a linear DNA sequence structure of 5'-cytosine-phosphate-guanine-3'.
  • CpG islands are DNA regions having a length of about 100-3000 bp (e.g., 100-1000 bp, 200-2000 bp, 300-3000 bp, 100-200 bp, 100-300 bp, 200-300 bp etc. ) , having a GC percentage greater than 50%, and a high frequency of CpG sites.
  • DNA methylation is a process by which methyl groups are added to the DNA molecule, which change the activity of the DNA molecule segment without changing the sequence.
  • DNA methylation occurs at the 5’position of the pyrimidine ring of the cytosine residues within CpG sites to form 5-methylcytosines.
  • 5-mC as a major epigenetic modification in phenotype and gene expression has been widely recognized, and hypermethylation at CpG islands are shown to contribute to tumorigenesis.
  • methylation level is used herein to refer to the extent to which there is methylation at CpG sites, or the methylation status, of a target region.
  • the extent may be expressed in the absolute terms, i.e., the total quantity of the methylated CpG sites within the target region, or in the relative terms, i.e., the percentage of methylated CpG sites in total numbers of CpG sites within the target region. Methylation of any number of CpG sites included within the target region may be determined for comparison to a control.
  • CG sites within the target region may or may not be contiguous CpG sites.
  • the CpG island of interest contains 2, 3, 4, 5 or more CpG sites, and more preferably the 2, 3, 4, 5 or more CpG sites are contiguous.
  • methylation of at least about 2, at least 3, at least 4, at least 5 or more CpG sites, or up to all or a majority of detectable CpG sites (depending on the technique used to identify CpG sites, the number of detectable CpG sites may vary) within the target region are determined to obtain a more accurate and/or sensitive diagnostic result.
  • the methylation level of more than one CpG islands are determined also for obtaining a more accurate and/or sensitive diagnostic result.
  • the CpG island of interest is CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B.
  • Detailed region of the CpG island of ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B are each defined in Table 1 below:
  • the CpG island of interest is CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E.
  • the CpG islands of interest are CpG islands of more than one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E.
  • the CpG island of interest is CpG island of at least one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, ANKRD18B.
  • the CpG islands of interest are CpG islands of more than one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, ANKRD18B.
  • the CpG island of interest is CpG island of at least one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E.
  • the CpG islands of interest are CpG islands of more than one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E.
  • the methylation level at CpG sites within the target region can be determined through any techniques known in the art, for example, the DNA contained in the sample can be firstly bisulfite converted using standard kits for this purpose, and the bisulfite converted product is amplified using well-established methods including PCR, and the methylation level at CpG sites in the region is then determined using known techniques such as pyrosequencing or Sequenom analysis.
  • the methylation level at CpG sites within the target region can be determined by employing a pre-established methylation detection system, for example combination of DNA methylation library construction kits Accel-NGS Methyl-Seq TM (hereafter “SWIFT” ) from Swift Biosciences, MI, USA or AnchorIRIS TM (hereafter “IRIS” ) kit from AnchorDX, Guangdong, CN, and NGS sequencing.
  • SWIFT DNA methylation library construction kits
  • IRIS AnchorIRIS TM
  • the IRIS assay is a quantitative technique to determine DNA methylation levels at specific gene loci in small amounts of genomic DNA or in trace amount of cell free circulating DNA.
  • methylation-dependent sequence differences are introduced into the genomic DNA by sodium bisulfite treatment, adaptors are directly ligated to the 3’end of single stranded DNA molecules after bisulfite conversion and the bisulfite treated DNA are subsequently PCR amplified (see Figure 1A for more details) .
  • This combination of bisulfite treatment and PCR amplification results in conversion of unmethylated cytosine residues to thymine and of methylated cytosine residues to cytosine with a signal amplification.
  • the amplification products will then be subject to sequencing and the sequencing results will be compared to the reference gene sequence of the target region to reflect the DNA methylation level within the target region.
  • the IRIS assay is easy to use and provides high sensitivity and quantitative accuracy.
  • the methylation level in the target region within a selected biological sample is then compared to a predetermined reference methylation level, and the measured methylation level of the CpG island in the sample comparable to the reference level is indicative of the subject having lung cancer.
  • a predetermined reference methylation level is the methylation level of CpG sites within the same target region pre-determined using an appropriate technique in a corresponding control or normal biological sample.
  • a control or normal biological sample is a non-cancerous sample of a corresponding biological sample from the same subject or from a subject determined to be comparable, e.g. in the case of a test sample from a human, a control sample may be obtained from another human, that may be of the same age or sex. Generally a median value of methylation level determined within the target region from multiple control samples aids in providing an accurate control value.
  • normal biological samples exhibit a small degree or baseline amount of methylation that may vary from sample type to sample type.
  • the determination of the degree of methylation in a target region of a control sample assists in providing an accurate analysis of the actual methylation degree in a test sample, i.e. whether or not hypermethylation actually exists in the test sample being analyzed.
  • methylation level of the target region in the sample is within the range of 0.8-1.5 times (e.g., 0.8-1.2 times, 0.9-1.1 times etc. ) of the predetermined level.
  • the diagnostic method of lung cancer through determining methylation level of one or more CpG island (s) of interest can provide a diagnostic sensitivity of at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to 100%.
  • the method of diagnosing lung cancer through determining methylation level of one or more CpG island (s) of interest provided herein can provide a diagnostic specificity of at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to 100%.
  • the diagnostic method of lung cancer provided herein has a sensitivity of at least 70%, and a specificity of at least 80%.
  • the present application is at least partially based on the discovery of a set of CpG regions whose methylation level/status is correlated with lung cancer, and thus can serve as biomarkers for diagnosis of lung cancer.
  • the set of CpG regions includes CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B.
  • the set of CpG regions includes CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E.
  • the set of CpG regions includes CpG island of at least one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, ANKRD18B. In some embodiments, the set of CpG regions includes CpG island of at least one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E. In some embodiment, the set of CpG regions is selected based on sensitivity only. In some embodiment, the set of CpG regions is selected based on specificity only. In some embodiment, the set of CpG regions is selected by balancing the sensitivity and specificity.
  • any of the methods described herein may be totally or partially performed with a computer system including one or more processors, which can be configured to perform theese steps.
  • embodiments are directed to computer systems configured to perform the steps of any of the methods described herein, potentially with different components performing a respective step or a respective group of steps.
  • steps of methods herein can be performed at the same time or in a different order. Additionally, portions of these steps may be used with portions of other steps from other methods. Also, all or portions of a step may be optional. Any of the steps of any of the methods can be performed with modules, circuits, or other means for performing these steps.
  • a computer system includes a single computer apparatus, where the subsystems can be the components of the computer apparatus.
  • a computer system can include multiple computer apparatuses, each being a subsystem, with internal components.
  • the subsystems can be interconnected via a system bus. Additional subsystems include, for examples, a printer, keyboard, storage device (s) , monitor, which is coupled to display adapter, and others.
  • Peripherals and input/output (I/O) devices which couple to I/O controller, can be connected to the computer system by any number of means known in the art, such as serial port. For example, serial port or external interface (e.g. Ethernet, Wi-Fi, etc.
  • system memory e.g., a fixed disk, such as a hard drive or optical disk
  • system memory and/or the storage device (s) may embody a computer readable medium. Any of the data mentioned herein can be output from one component to another component and can be output to the user.
  • a computer system can include a plurality of the same components or subsystems, e.g., connected together by external interface or by an internal interface.
  • computer systems, subsystem, or apparatuses can communicate over a network.
  • one computer can be considered a client and another computer a server, where each can be part of a same computer system.
  • a client and a server can each include multiple systems, subsystems, or components.
  • any of the embodiments of the present disclosure can be implemented in the form of control logic using hardware (e.g., an application specific integrated circuit or field programmable gate array) and/or using computer software with a generally programmable processor in a modular or integrated manner.
  • a processor includes a multi-core processor on a same integrated chip, or multiple processing units on a single circuit board or networked.
  • any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, R, Java, C/C++, Python, or Perl using, for example, conventional or object-oriented techniques.
  • the software code may be stored as a series of instructions or commands on a computer readable medium for storage and/or transmission, suitable media include random access memory (RAM) , a read only memory (ROM) , a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk) , flash memory, and the like.
  • RAM random access memory
  • ROM read only memory
  • magnetic medium such as a hard-drive or a floppy disk
  • an optical medium such as a compact disk (CD) or DVD (digital versatile disk)
  • flash memory and the like.
  • the computer readable medium may be any combination of such storage or transmission devices.
  • Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet.
  • a computer readable medium may be created using a data signal encoded with such programs.
  • Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via Internet download) .
  • Any such computer readable medium may reside on or within a single computer product (e.g. a hard drive, a CD, or an entire computer system) , and may be present on or within different computer products within a system or network.
  • a computer system may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
  • kits for use in the methods described above.
  • the kits may comprise any or all of the reagents to perform the methods described herein.
  • the kits may include any or all of the following: assay reagents, buffers, probes and/or primers that bind to the CpG island of interest described herein, etc.
  • kit as used herein in the context of detection reagents, are intended to refer to such things as combinations of multiple gene expression product detection reagents, or one or more gene expression product detection reagents in combination with one or more other types of elements or components (e.g., other types of biochemical reagents, containers, packages such as packaging intended for commercial sale, substrates to which gene expression detection product reagents are attached, electronic hardware components, etc. ) .
  • elements or components e.g., other types of biochemical reagents, containers, packages such as packaging intended for commercial sale, substrates to which gene expression detection product reagents are attached, electronic hardware components, etc.
  • the present disclosure provides oligonucleotide probes attached to a solid support, such as an array slide or chip, e.g., as described in Eds., Bowtell and Sambrook DNA Microarrays: A Molecular Cloning Manual (2003) Cold Spring Harbor Laboratory Press. Construction of such devices are well known in the art, for example as described in US Patents and Patent Publications U.S. Patent No. 5,837,832; PCT application W095/11995; U.S. Patent No. 5,807,522; US Patent Nos.
  • a microarray can be composed of a large number of unique, single-stranded polynucleotides, usually either synthetic antisense polynucleotides or fragments of cDNAs, fixed to a solid support.
  • Typical polynucleotides are preferably about 6-60 nucleotides in length, more preferably about 15-30 nucleotides in length, and most preferably about 18-25 nucleotides in length.
  • preferred probe lengths can be, for example, about 15-80 nucleotides in length, preferably about 50-70 nucleotides in length, more preferably about 55-65 nucleotides in length, and most preferably about 60 nucleotides in length.
  • kits may include instructional materials containing directions (i.e., protocols) for the practice of the methods provided herein. While the instructional materials typically comprise written or printed materials they are not limited to such. Any medium capable of storing such instructions and communicating them to an end user is contemplated by this invention. Such media includes but is not limited to electronic storage media (e.g., magnetic discs, tapes, cartridges, chips) , optical media (e.g., CD ROM) , and the like. Such media may include addresses to internet sites that provide such instructional materials.
  • electronic storage media e.g., magnetic discs, tapes, cartridges, chips
  • optical media e.g., CD ROM
  • This example illustrates the development and evaluation of a AnchorIRIS TM targeted methylation sequencing pipeline in comparison with a commercially available Accel-NGS Methyl-Seq TM (SWIFT) system.
  • the AnchorIRIS TM (IRIS) assay employs a technology that directly ligates adaptors to the 3’end of single stranded DNA molecules after bisulfite conversion ( Figure 1A) . This significantly reduces DNA loss due to bisulfite conversion of constructed libraries.
  • Another improvement included in the IRIS assay is the linear amplification after the first adaptor ligation. This increases the availability of molecules for the second adaptor ligation rendering a much higher chance for the original molecule to be incorporated into a sequencing library. These two improvements are particularly important for ctDNA recovery for subsequent sequencing considering the limited amount of ctDNA.
  • a final target enrichment step is also introduced, and thus sequencing cost can be significantly reduced by pre-selecting a set of targets of interest.
  • Bisulfite conversion was performed using the Zymo Lightning Conversion Reagent (Cat#D5031, Zymo Research) according to the manufacturer’s protocol. Briefly, 130 ⁇ l of Lightning Conversion Reagent was added to 20 ⁇ l DNA sample, which was incubated in a thermocycler with the following program: 98 °C for 8 mins, 54°C for 60 mins, and 4 °C for up to 20 hrs. Then bisulfite-converted DNA was mixed with M-Binding buffer, run through a Zymo-Spin TM IC Column, desulphonated, washed, and eluted in 17 ⁇ l M-Elution buffer.
  • Zymo Lightning Conversion Reagent Cat#D5031, Zymo Research
  • Bisulfite-converted cfDNA was then aliquoted for different input titrations in duplicate, including 10 ng, 5 ng, 3 ng, and 1 ng. By doing this, assay variation between samples introduced at the step of bisulfite conversion was avoided.
  • AnchorIRIS TM pre-library construction was performed using AnchorDx EpiVisio TM Methylation Library Prep Kit (AnchorDx, Cat#A0UX00019) and AnchorDx EpiVisio TM Indexing PCR Kit (AnchorDx, Cat#A2DX00025) . End repair of bisulfite-converted DNA was performed using the MEE1 enzyme at 37 °C for 30 mins. DNA was then denatured at 95 °C for 5 mins and snap cooled on ice. 3’end adaptor was ligated using the MLE1 and MLE5 enzymes at 37 °C for 30 mins.
  • First amplification was immediately performed to generate reverse complemented DNA molecules using the MAE3 enzyme with the following PCR program: 1 cycle of 95 °C for 3mins, 4 cycles of 95 °C for 30 secs +60 °C for 30 secs + 68 °C for 1 min, and 1 cycle of 68 °C for 5 mins.
  • Amplified DNA was purified using the AMB1 Magnetic Beads and eluted in a 20 ⁇ volume.
  • 3’end adaptor ligation of reverse complemented DNAs was next performed using the MSE1 and MSE5 enzymes at 37 °C for 30 mins.
  • Indexing PCR (i5 and i7) was immediately performed using the MIB1 PCR master mix with the following PCR program: 1 cycle of 98 °C for 45 secs, 14 cycles of 98 °C for 15 secs + 60 °C for 30 secs + 72 °C for secs, and 1 cycle of 72 °C for 5 mins.
  • the amplified pre-libraries were subsequently purified using the IPB1 Magnetic Beads and the concentration was determined using the Qubit TM dsDNA HS Assay Kit. Pre-libraries containing more than 400 ng DNA were considered qualified for target enrichment.
  • Target Enrichment was performed using AnchorDx EpiVisio TM Target Enrichment Kit (AnchorDx, Cat#A0UX00031) .
  • a total of 1000 ng DNA containing up to 4 pre-libraries was pooled for target enrichment using our custom made 10K methylation panel, which includes 9921 pre-selected regions enriched for cancer-specific methylation.
  • HE, HBA and HBB blocking reagents were added to the 1000 ng pooled pre-libraries and completely dried using a heated vacuum, which was subsequently reconstituted in 7.5 ⁇ MHB1 hybridization buffer plus 3 ⁇ l MHE1 hybridization enhancer.
  • Reconstituted pre-library pools were next denatured at 95 °C for 10 mins and immediately transferred to a 47 °C hybridization oven. Then probes were added to each pre-library pool, which was quickly transferred to a thermocycler for hybridization incubation following the manufacturer’s protocol.
  • DNA pre-libraries bound with biotinylated probes were pulled down using the Dynabeads M270 streptavidin beads (Thermo Fisher Scientific, Cat#65306) . Briefly, 30 ⁇ l streptavidin beads were used for each pre-library pool, washed twice with 1X Binding Wash Buffer, and resuspended in 60 ⁇ l Binding Wash Buffer. Pre-library pools were added and mixed well with beads by repeated pipetting, and the mixture was incubated on a rotator at 47 °C for 45 mins. After beads binding, 100 ⁇ l pre-warmed 1X Transfer Buffer was added to the mixture.
  • enriched libraries were further amplified with P5 and P7 primers using the KAPA HiFi HotStart Ready Mix (KAPA Biosystems, Cat#KK2602) with the following PCR program: 1 cycle of 98 °C for 45 secs, 12 cycles of 98 °C for 15 secs + 60 °C for 30 secs + 72 °C for 30 secs, and 1 cycle of 72 °C for 1 min.
  • PCR product was then purified with Agencourt AMPure XP Magnetic Beads (Beckman Coulter, Cat#A63882) and eluted in 40 ⁇ l EB buffer. The concentration of this final library was determined using Qubit dsDNA HS Assay.
  • IRIS libraries were constructed according to the methods described above, while SWIFT libraries were constructed according to the manufacturer’s protocol (Cat#DL-ILMMS-12/48) . Briefly, bisulfite converted DNA was denatured at 95 °C for 2 mins and snap cooled on ice. 3’end adaptor-1 ligation was immediately performed using the Adaptase Reaction Mix with the incubation program: 37 °C for 1 min, 62 °C for 2 mins, and 65 °C for 5 mins.
  • Ligation products were purified with SPRIselect beads and carried to subsequent Indexing PCR for amplification with the PCR program: 1 cycle of 98 °C for 30 secs and repeated cycles of 98 °C for 10 secs + 60 °C for 30 secs + 68 °C for 60 secs.
  • PCR cycle numbers were adjusted according to input DNA amount for both IRIS and SWIFT assays.
  • PCR products were bead purified, eluted in 40 ⁇ l EB buffer, and quantified using Qubit.
  • Target enrichment was performed using the same 10K panel for both assays according to the methods described above.
  • the library conversion efficiencies of IRIS and SWIFT are estimated and compared ( Figure 1C and 1D, Figure 2) .
  • the IRIS assay conferred at least 20%conversion rate with at least 5-fold greater efficiency than SWIFT.
  • An unexpected high library conversion efficiency with 1 ng cfDNA input is also observed for IRIS, which was likely due to the higher efficiency of library construction with much abundant enzymes and reagents at each step relative to the limited starting material.
  • This example relates to the evaluation of sensitivity and detection limit of the AnchorIRIS TM targeted methylation sequencing.
  • DNA methylation alteration has been shown as an early event during tumorigenesis, and multiple genomic regions are affected simultaneously. While whether it plays a causal role still needs to be determined, it renders a great advantage for DNA methylation being used as biomarkers for cancer early detection, by which much more genomic markers can be acquired in parallel from a tiny amount of starting material, especially in the case of ctDNA. Due to this special feature of DNA methylation, two factors need to be considered for evaluating the LoD: (1) whether a set of regions with informative co-methylation signals can be detected above background at a given dilution; (2) define a linear quantitative range for input dilutions.
  • gDNA lung cancer tumor tissue genomic DNA
  • WBC gDNA lung cancer tumor tissue genomic DNA
  • Undiluted and WBC samples were also included.
  • a total amount of 900 ng gDNA per 50 ⁇ l volume for each dilution was sheared to 200 bp, successful shearing was confirmed by running 1%agarose gel. Concentrations of sheared DNA were measured using Qubit and bisulfite conversion was performed using 250 ng DNA.
  • 100 ng of bisulfite-converted DNA was aliquoted from each dilution in duplicate for library construction and target enrichment according to the methods described above.
  • This example illustrates the identification and validation of methylation related markers for detection of lung cancer in tissue samples and plasma samples.
  • this study employs samples from cancer-free individuals and formalin-fixed paraffin embedded (FFPE) tissue samples and plasma samples from patients screened positive for pulmonary nodules (PNs) by CT/LDCT scan and subsequently underwent surgical resections. Since the study is aimed for non-invasive diagnosis of early-stage lung cancer, enrolled patients were required to be free of previous cancer history and diagnosed with only 1 or 2 PNs. Both genders were included and smoking history was recorded. Pathological information of all samples was determined based on surgically resected tissue sections according to 2015 WHO Histological Classification of Lung Cancer. The collection of all samples was approved by Ethical Committees at each site, and all participants provided written informed consent.
  • FFPE formalin-fixed paraffin embedded
  • adenocarcinoma is the major subtype of lung cancer in this cohort, among which IA is considered a later stage during cancer development beginning from AIS and MIA and should have accumulated more methylation markers
  • DML differentially methylated CpG loci
  • DMR differentially methylated regions
  • AIS samples also revealed lack of hypermethylation signals. This gradual gain of hypermethylation (from right to left in the heatmap in Figure 4A) is consistent with the sequential events of adenocarcinoma development progressing from AIS, MIA, to IA. These hypermethylated CpG sites were further confirmed using the TCGA methylation microarray data generated from lung adenocarcinoma and normal lung tissues.
  • FFPE tissue samples were enrolled for training, comprising 129 malignant tumor samples of invasive adenocarcinoma (IA) , minimally invasive adenocarcinoma (MIA) , adenocarcinoma in situ (AIS) , squamous cell (SC) , large cell (LC) , small cell (SCLC) , and other rare case lung cancers, and 101 benign lesion samples of hamartoma (HAM) , tuberculosis (TB) , inflammatory granuloma (GRAN) , fungal infection (FUN) , inflammation (INF) , sclerosing hemangioma (SH) , and other rare cases, detailed information for each tissue sample are listed below in Table 2.
  • IA invasive adenocarcinoma
  • MIA minimally invasive adenocarcinoma
  • AIS adenocarcinoma in situ
  • SC squamous cell
  • LC large cell
  • 260 plasma samples were also collected, including 132 samples from individuals diagnosed with positive PNs and 128 samples from asymptomatic normal individuals, detailed information for each plasma sample are listed below in Table 3.
  • the enrolled samples include 33 paired tissue-plasma samples that were used to evaluate methylation concordance between tissue and plasma within a same individual.
  • 8 ml of blood was drawn 1-3 days prior to surgery and stored in Cell-Free DNA blood collection tubes (Streck, Inc. Cat#218962) at room temperature. Plasma was separated from blood (no apparent hemolysis) within 48 hours after blood draw, and stored at -80 °C until DNA isolation.
  • 8 ml of blood was drawn using BD EDTA Tubes (Becton, Dickinson and Company, Cat#367525) and plasma was immediately separated within 2 hours after blood draw and stored at -80 °C.
  • Tissue genomic DNA was isolated from FFPE tissue samples using the Qiagen QIAamp DNA FFPE Tissue Kit (Qiagen, Cat#56404) according to the manufacture’s protocol. gDNA was fragmented to 200 bp using the M220 Focused- ultrasonicator TM (Covaris, Inc. ) following the manufacturer’s protocol and 100 ng of fragmented DNA was used for library construction.
  • cfDNA was isolated using the Qiagen QIAamp Circulating Nucleic Acid Kit (Qiagen, Cat#55114) according to the manufacturer’s protocol, while cfDNA was isolated using the Bioo NextPrep-Mag TM cfDNA Isolation Kit (Bioo Scientific, Cat#NOVA-3825-01/3) for plasma collected using EDTA-K2 tubes. Repeated freezing and thawing of plasma was avoided to prevent cfDNA degradation and gDNA contamination from white blood cells (WBC) .
  • WBC white blood cells
  • cfDNA concentration was measured using the Qubit TM dsDNA HS Assay Kit (Thermo Fisher Scientific, Cat#Q32854) and quality was examined using the Agilent High Sensitivity DNA Kit (Cat#5067-4626) .
  • cfDNA with yield greater than 3 ng without overly genomic DNA contamination was proceeded to library construction and sequencing according to the IRIS Targeted methylation sequencing method described in Example 1.
  • Sequencing adapters and 3’-low quality bases were trimmed from raw sequencing reads using Trim Galore version 0.4.1 (https: //github. com/FelixKrueger/TrimGalore) , and then aligned to C->T converted hg19 genome, as well as G->A converted hg19 genome using Bismark version 0.15.0 (Bowtie2 is the default aligner behind Bismark) [34] .
  • Reads having at least 2 methylated CpGs within a sliding window of 2 ⁇ 5 CpGs were designated as co-methylated reads and used for subsequent analysis of methylation patterns and predictive modeling of malignant/benign states of patient samples.
  • Aligned reads were evaluated by Picard version 2.5.0 for metrics that measures the performance of target-capture based bisulfide sequencing assays (http: //broadinstitute.github. io/picard) .
  • the library conversion efficiency is calculated as the ratio of estimated molecule number incorporated in a library divided by the theoretical molecule number equivalent to the input DNA amount.
  • the estimated molecule number is derived from sequencing depth (pre-deduplication mean bait coverage) and observed sequencing diversity (observed molecule number, post-deduplication mean bait coverage) based on the Poisson Distribution.
  • Differential methylation (DM) analysis was performed on the training cohort of lung cancer patients using R package DSS version 2.14.0.
  • Differentially methylated CpGs were identified comparing invasive adenocarcinoma (IA) to benign samples (p ⁇ 0.05) , and further assembled into differentially methylated regions (DMRs) .
  • Targeted regions of the capture panel covered by DMRs were selected as candidate features for building classification models of malignant/benign state.
  • the differential signal was visually confirmed by heatmap using Gitools version 2.3.0.
  • a random forest model was built for tissue samples in the training cohort of lung cancer patients. The 2-fold cross-validation was repeated 10 times and top 1000 markers were selected by their importance scores in the random forest model. The performance of this model was evaluated on an independent test set using receiving operation curve (ROC) method. For a chosen threshold, the sensitivity and specificity were calculated as follows,
  • the signal distribution of these candidate markers were further examined in plasma samples, and a total of 92 markers that preferentially discriminated malignant samples from benign and asymptomatic normal samples in the training set were identified.
  • the Least Absolute Shringkage and Selection Operator (Lasso) (Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267-288) ) method was applied to select top 9 markers (see in Table 1) that appeared most frequently among 500 subsampling of the original dataset at 75%sampling rate without replacement.
  • the Lasso model was determined according to the expected generalization error estimated from 10-fold cross-validation.
  • a logistic regression model was trained with these 9 markers to discriminate the same malignant samples from benign samples in the training set. The performance of this classification model was evaluated in an independent test set using the ROC method.
  • Tissue level classification was tested by 10 bootstraps of 2-fold cross-validation each time randomizing all IA and benign samples into training and test groups and classifier was modified based on these hypermethylated CpG markers using regularized logistic regression (Figure 4B, upper panel) .
  • This example relates to validation of tissue derived methylation related markers for lung cancer detection in plasma samples.
  • ctDNA detection via methylation profiling can achieve higher sensitivity and specificity compared to somatic mutation profiling in early-stage patients because 1) a greater magnitude of markers can be simultaneously accessed to increase sensitivity and 2) multiple CpG loci within each selected region can be interrogated together to derive “cancer-specific methylation patterns” for increased specificity.
  • methylation profiling can be used to differentiate tissue-of-origin and cancer subtypes. The release of gDNA from apoptotic/necrotic tumor cells into blood provides an opportunity to use ctDNA for the detection of cancer.
  • Table 5 Performance of the malignancy classifier of plasma samples among various lung cancer subtypes and stages against benign and normal conditions in the independent test group
  • the abundance of ctDNA out of total cfDNA is largely associated with the tumor volume.
  • a tumor with 1 cm 3 volume is predicted to have a ctDNA fraction between 0.001%-0.03%; therefore, the limit of detection of a diagnostic assay is critical for detection of early-stage lung cancer.
  • the IRIS assay demonstrates a limit of detection of 0.0033%by combining several hundreds of pre-selected markers, which allows sensitive detection of malignancy from patients with tumor as small as 0.5 cm in diameter.
  • Using the IRIS assay we archived an overall sensitivity of 82.1%in detecting malignancy from plasma DNA of lung cancer patients. Particularly, the sensitivity for stage-Ia and Ib patients remain at 85.0%and 100%, respectively, superior to other ctDNA-based liquid biopsy performance via somatic mutation or DNA methylation profiling (Table 5) .

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided are lung cancer diagnostic markers, methods and compositions, e.g., kits, for evaluating methylation levels of the markers and methods of using such methylation levels to predict. Such information can be used in diagnosing lung cancer.

Description

SYSTEM AND METHOD FOR DETERMINING LUNG CANCER FIELD OF THE INVENTION
The present invention generally relates to systems and methods for determining lung cancer.
BACKGROUND
Lung cancer is now ranking as the leading cause of death among malignant tumors. Although a combined use of surgery, chemotherapy, radiotherapy, and targeted therapy has significantly improved the survival of patients with lung cancer, the prognosis remains poor. One important reason is the difficulty to detect early-stage lung cancer, and most patients have already developed into advanced stage upon diagnosis. High sensitivity detection methods such as Low-dose CT (LDCT) screening have been developed for early diagnosis of lung cancer and are proven to be able to significantly reduce lung cancer mortality (e.g. by 20%) . However, high sensitivity sometimes comes with a price of low accuracy. In one study conducted by US National Lung Cancer Screening Test (NLST) , the false positive rate in the LDCT screening group was determined to be as high as 96.4% (de Koning, H.J., et al., Ann Intern Med, 2014. 160 (5) : p. 311-20) . For the past decades, investigators have tried to combine LDCT with additional tests such as PET-CT, aspiration biopsy or surgery to reduce the false positive rate, and these approaches cause excessive medical care and unnecessary psychological burden for those who undertook such screening, while none of these techniques succeed in balancing sensitivity and specificity. Therefore, there is a need for development of novel method for early detection of lung cancers.
SUMMARY OF INVENTION
In one aspect, the present disclosure provides a method for in vitro diagnosis of lung cancer in a subject. In some embodiments, the method comprises: measuring methylation level of CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B in a sample from the subject; comparing the measured methylation level of the CpG island to a corresponding predetermined reference level; and determining a likelihood of the subject having lung cancer, wherein, the measured methylation level of the CpG island in the sample comparable to the reference level is indicative of the subject having lung cancer.
In some embodiments, the at least one gene is selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E. In one embodiment, the at least one gene is selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, ANKRD18B. In some embodiments, the at least one gene is selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E.
In some embodiments, the measuring step comprising measuring methylation level of CpG islands of ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, and CACNA1E. In some embodiments, the measuring step comprising measuring methylation level of CpG islands of ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, and ANKRD18. In some embodiments, the measuring step comprising measuring methylation level of CpG islands of ZNF649, LHX5-AS1, RASGRF2, and CACNA1E.
In some embodiments, the sample is blood sample. In some embodiments, the blood sample comprises cell free DNA. In some embodiments, the blood sample is processed prior to the measuring step, and the step of processing comprises enrichment or purification of the cell free DNA.
In some embodiments, the methylation levels are measured by an amplification assay, a hybridization assay, a sequencing assay or an array. In some specific embodiments, when the methylation levels are measured by a sequencing assay, the sample is amplified to construct a sequencing library before the measuring step. In some specific embodiments, the sequencing library is prepared with AnchorIRIS TM method.
In some embodiments, the comparing step is performed by a processor of a computing device the determining step is performed by a processor of a computing device. In some embodiments, the determining step comprises using a machine learning model. In some embodiments, the machine learning model is Least Absolute Shrinkage and Selection Operator.
Also provided herein is a non-transitory computer readable medium having instructions stored thereon. In some embodiments, the instructions, when executed by a processor, cause the processor to: retrieving methylation level of CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B in a sample from the subject; comparing the retrieved methylation level of the CpG island to a corresponding predetermined reference level; and determining a likelihood of the subject having lung cancer, wherein, the retrieved methylation level of the CpG island in the sample comparable to the reference level is indicative of that the subject having lung cancer.
In another aspect, the present disclosure provides a kit comprising methylation detection probes for detecting CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B. In some embodiments, the kit of the present disclosure comprises methylation detection probes for detecting CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E. In one embodiment, the kit of the present disclosure comprises methylation detection probes for detecting CpG island of at least one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, ANKRD18B. In some embodiments, the kit of the present disclosure comprises methylation detection probes for detecting CpG island of at least one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E. In some embodiments, the kit is a microarray. In some embodiments, the kit can be used in asequencing device.
In yet another aspect, the present disclosure provides a system for in vitro diagnosis of lung cancer in a test subject, the system comprising a kit of the present disclosure; and a non-transitory computer readable medium of the present disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
The aforesaid and other features of the present application are better described through a combination of the drawings with the following specification and attached claims. It can be understood that these drawings only displays several embodiments of the present application, and thus should not be regarded as limitation on the scope of the present application. Through the adoption of the drawings, the present application shall be illustrated more explicitly and in more details.
FIG. 1 shows the AnchorIRIS TM assay and performance assessment. (A) Workflow of the ultra-sensitive AnchorIRIS TM library preparation method. (B-D) A bake-off experiment comparing assay performance between the AnchorIRIS TM assay and the
Figure PCTCN2018079155-appb-000001
accel-NGS Methyl-seq TM assay. The IRIS assay presents superior molecule conversion efficiency (C and D) with much higher average unique coverage for each input amount tested (B) . (E and F) The sensitivity of the AnchorIRIS TM assay was evaluated by diluting tumor gDNA into WBC gDNA, showing that significantly more informative co-methylated CpG regions above WBC background can be detected at dilutions ≥ 0.033%by Z-test (E) . Dilutions higher than 10% (gray box) preserve a linear response of average co-methylation signal to the tumor fractions of input DNA (F) ..
FIG. 2 shows a Assay performance and sequencing quality metrics comparing the AnchorIRIS TM (IRM) assay to the
Figure PCTCN2018079155-appb-000002
Accel-NGS Methyl-Seq TM (SWT) assay.
FIG. 3 shows assay performance metrics of serial dilution samples.
FIG. 4 shows characterization of tissue level hypermethylation signatures of lung cancer. (A) Differentially methylated MCBs were identified using the Wilcoxon rank sum test comparing 33 invasive adenocarcinoma (IA) samples to 78 benign lesions. Hypermethylated MCBs were determined with FDR < 0.05 in the volcano plot. (B) Heatmap showing hypermethylation signature MCB regions for representative lung cancer and benign tissue samples. Methylation level of each MCB was calculated as co-methylated read counts per million total mapped reads (CPM) . Samples are ordered from left to right by malignant/benign status (top color bar) and corresponding subtypes (second color bar) . Subtypes from left to right are IA (n=33) , MIA (n=19) , AIS (n=8) , FUN (n=11) , INF (n=9) , GRAN (n=4) , TB (n=25) , and HAM (n=21) . Signal is shown in linear scale of color, with red indicating high methylation signal and green indicating low methylation signal. (C) A representative receiver operating curve (ROC) displays the tissue classification performance for distinguishing IA samples (n=65) against benign lesions (n=101) based on 10 bootstraps of 2-fold cross-validation of a regularized logistic regression. 95%confidence interval (CI) is shown in blue shade (top panel) . Overall sensitivity, specificity, and area under curve (AUC) of 10 bootstraps are summarized in the lower panel.
FIG. 5 shows lung cancer tissue co-methylation patterns can be captured in the cfDNA pool. Concordance of co-methylation between paired tissue (row) and plasma (column) samples is calculated using the percentage of reads sharing pre-defined co-methylation patterns and displayed in the heatmap. The highest similarity of a tissue sample to its matched plasma is shown in the diagonal of the heatmap, with ranking and Wilcoxon test p-values of each self-pair compared to the rest tissue-plasma pairs shown on the right. The smaller the rank (and p-value) , the better the match of self-pair.
FIG. 6 shows cancer classification using plasma DNA. (A) Workflow chart of building a plasma level diagnostic prediction model. (B) Heatmap of the 9 methylation markers used for the diagnostic prediction model in the training and independent test data sets. Methylation level of each MCB was calculated as the percentage of co-methylated reads. (C and D) ROC curves plot the performance of plasma level classification with the 95%confidence interval (CI) of sensitivity in the training (C) and test (D) data sets. (E) Univariate (left) and multivariate (right) analyses were performed using logistic regression to  determine significant clinical co-variates of malignancy for early stage lung cancers. P, partial solid nodule; S, solid nodule; G, ground-glass nodule.
DETAILED DESCRIPTION OF THE INVENTION
Before the present disclosure is described in greater detail, it is to be understood that this disclosure is not limited to particular embodiments described, and as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present disclosure will be limited only by the appended claims.
All publications and patents cited in this specification are herein incorporated by reference as if each individual publication or patent were specifically and individually indicated to be incorporated by reference and are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present disclosure is not entitled to antedate such publication by virtue of prior disclosure. Further, the dates of publication provided could be different from the actual publication dates that may need to be independently confirmed.
As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present disclosure. Any recited method can be carried out in the order of events recited or in any other order that is logically possible.
The definitions are provided to assist the reader. Unless otherwise defined, all terms of art, notations and other scientific or medical terms or terminology used herein are intended to have the meanings commonly understood by those of skill in the art to which this disclosure belongs. In some cases, terms with commonly understood meanings are defined herein for clarity and/or for ready reference, and the inclusion of such definitions herein should not necessarily be construed to represent a substantial difference over the definition of the term as generally understood in the art.
As used herein, the singular forms “a” , “an” and “the” include plural references unless the context clearly dictates otherwise.
As used herein, terms such as “comprises” , “comprised” , “comprising” , “contains” , “containing” and the like have the meaning attributed in United States Patent law;  they are inclusive or open-ended and do not exclude additional, un-recited elements or method steps. Terms such as “consisting essentially of” and “consists essentially of” have the meaning attributed in United States Patent law; they allow for the inclusion of additional ingredients or steps that do not materially affect the basic and novel characteristics of the claimed invention. The terms “consists of” and “consisting of” have the meaning ascribed to them in United States Patent law; namely that these terms are close ended.
Diagnostic Method for Lung Cancer
In one aspect of the present disclosure, a method is provided for in vitro diagnosis of lung cancer in a subject. In one embodiment, the method comprises the steps of measuring in a sample from the subject the methylation level of one or more CpG island (s) of interest; comparing the measured methylation level of the CpG island (s) to a corresponding predetermined reference level; and determining a likelihood of the subject having lung cancer, wherein, the measured methylation level of the CpG island in the sample comparable to the reference level is indicative of the subject having lung cancer.
As used herein, the terms “subject” and “individual” are used interchangeably and refer to a human or any non-human animal (e.g., mouse, rat, rabbit, dog, cat, cattle, swine, sheep, horse or primate) . A human includes pre and post-natal forms. In many embodiments, a subject is a human being. A subject can be those suspected to be afflicted with a disease or disorder but may or may not display symptoms of the disease or disorder.
As used herein, the term “lung cancer” refers to any type of lung cancer, including but are not limited to adenocarcinoma, squamous carcinoma (SC) , large cell lung cancer (LCLC) , small cell lung cancer (SCLC) , and non-small cell lung cancer (NSCLC) . The “lung cancer” of the present disclosure can be at any stage. In some embodiment, the diagnostic method provided herein is useful in determining lung cancer at a rather early stage, for example, at or before stage Ib (3cm< tumor size ≤5cm, no metastasis) , at or before stage Ia (tumor size ≤3cm, no metastasis) , or even before any symptom is developed.
As used herein, the term “sample” refers to any DNA-containing biological sample, including but not limited to cells (e.g., bacteria, yeast, virus, plant cells, animal cells and the like) , tissues (e.g., biopsy tissue, paraffin embedded tissue and the like) , and body fluids (e.g., blood, plasma, serum, saliva, pleural effusion, amniocentesis fluid, seroperitoneum and the like) . The sample may be obtained using techniques well-established and known to those of skill in the art, and may vary depending on the sample type as one of skill in the art will appreciate. In certain embodiments, the “sample” is a primary biological  sample directly derived from a subject suspected to be afflicted with a disease or disorder, examples of such primary biological sample include but are not limited to, blood, serum, plasma, urine, pleural effusion, or biopsied tissue (e.g., tumor tissue) .
Examples of different techniques that may be used to obtain a biopsied sample include surgery, needle biopsy, endoscopic biopsy, stereotactic biopsy, liquid biopsy and combined techniques which employ biopsy and imaging techniques. In some specific embodiments, the DNA-containing sample is a biopsied lung tissue. Exemplary techniques that may be used to obtain a biopsied lung tissue including but are not limited to needle biopsy, bronchoscopy biopsy and surgery.
In some specific embodiments, the sample is blood, serum or plasma, and the DNA contained in the sample can be genomic DNA or cell-free DNA (cfDNA) . The term “cell-free DNA” or “cfDNA” as used herein refers to DNA free from cells found in circulatory system (e.g., blood) , the source of which is generally believed to be genomic DNA released during cell apoptosis. Studies showed that the size of most cell-free DNA in human body is about 160bp (see Fan et al., (2010) Analysis of the Size Distributions of Fetal and Maternal Cell-Free DNA by Paired-End Sequencing, Clin Chem 56: 8 1279-86) . Cell-free DNA originated from tumor cells is referred as “circulating tumor DNA” or “ctDNA” . In human body, a tumor cell may release its genomic DNA into the blood due to causes such as apoptosis and immune responses. Since a normal cell may also release its genomic DNA into the blood, circulating tumor DNA usually constitutes only a very small part of cell-free DNA in the blood (e.g., 0.1%~ 50%of total cfDNA, or down to 0.01%or lower at early stages of tumorigenesis) . In some specific embodiments, the DNA contained in the sample is circulating tumor DNA (ctDNA) .
Generally, to be suitable for use in measuring methylation levels of CpG islands of interest, a sample shall contain at least about 0.2 ng, 0.5 ng, 1 ng, 2 ng, 3 ng, 4 ng, 5 ng, 6 ng, 7 ng, 8 ng, 9 ng, 10 ng, 15 ng, 20 ng, 25 ng, 30 ng, 45 ng, 50 ng, 55 ng, 60 ng, 65 ng, 70 ng, 75 ng, 80 ng, 85 ng, 90 ng, 95 ng, or 100 ng of DNA. In some embodiments, the samples contain 20±5 ng DNA. In some other embodiments, the samples contain 1-100 ng or over 100 ng DNA.
In some embodiments, the sample is processed to release/enrich/purify the DNA prior to the measuring steps. In certain specific examples, the sample is tissue or numbers of cells, and the sample is processed prior to the measuring step, while the step of processing comprises releasing of the DNA from within the cells. DNA may be extracted  from the sample using techniques well-known to those of skill in the art, including chemical extraction techniques utilizing phenol-chloroform (Sambrook et al., 1989) , guanidine-containing solutions, or CTAB-containing buffers. As a matter of convenience, commercial DNA extraction kits are also widely available from laboratory reagent supply companies, including for example, the QIAamp DNA Mini prep kit available from QIAGEN (Chatsworth, Calif. ) , or the Extract-N-Amp blood kit available from Sigma-Aldrich (St. Louis, Mo. ) . In certain specific embodiment, the sample is blood sample which contains cell free DNA, and the blood sample is processed prior to the measuring step, while the step of processing comprises enrichment or purification of the cell free DNA.
Once an appropriate sample is obtained and optionally pre-processed, the methylation level of CpG island (s) of interest in the sample is determined. In some embodiment, the method provided herein at least partially involves detection of methylation levels of certain CpG islands of interest in samples such that to provide an earlier and/or more sensitive/accurate diagnostic result for lung cancer detection in such samples.
The terms “determine” , “assess” , “analysis” and “measure” can be used interchangeably and refer to both quantitative and semi-quantitative determinations. Where either a quantitative and semi-quantitative determination is intended, the phrase “determining/measuring a methylation level” of a CpG island of interest can be used.
CpG is shorthand for a linear DNA sequence structure of 5'-cytosine-phosphate-guanine-3'. CpG islands are DNA regions having a length of about 100-3000 bp (e.g., 100-1000 bp, 200-2000 bp, 300-3000 bp, 100-200 bp, 100-300 bp, 200-300 bp etc. ) , having a GC percentage greater than 50%, and a high frequency of CpG sites.
DNA methylation is a process by which methyl groups are added to the DNA molecule, which change the activity of the DNA molecule segment without changing the sequence. In humans, DNA methylation occurs at the 5’position of the pyrimidine ring of the cytosine residues within CpG sites to form 5-methylcytosines. The biological importance of 5-mC as a major epigenetic modification in phenotype and gene expression has been widely recognized, and hypermethylation at CpG islands are shown to contribute to tumorigenesis.
The term "methylation level" is used herein to refer to the extent to which there is methylation at CpG sites, or the methylation status, of a target region. The extent may be expressed in the absolute terms, i.e., the total quantity of the methylated CpG sites within the target region, or in the relative terms, i.e., the percentage of methylated CpG sites in total numbers of CpG sites within the target region. Methylation of any number of CpG sites included within the target region may be determined for comparison to a control. CG sites  within the target region may or may not be contiguous CpG sites. It is generally believed that DNA methylation is more biologically meaningful when multiple neighboring CpGs are co-methylated, therefore, preferably the CpG island of interest contains 2, 3, 4, 5 or more CpG sites, and more preferably the 2, 3, 4, 5 or more CpG sites are contiguous. In some embodiments, methylation of at least about 2, at least 3, at least 4, at least 5 or more CpG sites, or up to all or a majority of detectable CpG sites (depending on the technique used to identify CpG sites, the number of detectable CpG sites may vary) within the target region are determined to obtain a more accurate and/or sensitive diagnostic result. In some embodiment, the methylation level of more than one CpG islands are determined also for obtaining a more accurate and/or sensitive diagnostic result.
In one embodiment, the CpG island of interest is CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B. Detailed region of the CpG island of ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B are each defined in Table 1 below:
Table 1: CpG islands of interest
Figure PCTCN2018079155-appb-000003
In some embodiments, the CpG island of interest is CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E. In some embodiments, the CpG islands of interest are CpG islands of more than one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E. In one embodiment, the CpG island of interest is CpG island of at least one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, ANKRD18B. In some embodiments, the CpG islands of interest are CpG islands of more than one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, ANKRD18B. In some embodiments, the CpG island of interest is CpG island of at least one gene selected from ZNF649, LHX5-AS1,  RASGRF2, CACNA1E. In some embodiments, the CpG islands of interest are CpG islands of more than one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E.
The methylation level at CpG sites within the target region can be determined through any techniques known in the art, for example, the DNA contained in the sample can be firstly bisulfite converted using standard kits for this purpose, and the bisulfite converted product is amplified using well-established methods including PCR, and the methylation level at CpG sites in the region is then determined using known techniques such as pyrosequencing or Sequenom analysis. In some embodiments, the methylation level at CpG sites within the target region can be determined by employing a pre-established methylation detection system, for example combination of DNA methylation library construction kits
Figure PCTCN2018079155-appb-000004
Accel-NGS Methyl-Seq TM (hereafter “SWIFT” ) from Swift Biosciences, MI, USA or AnchorIRIS TM (hereafter “IRIS” ) kit from AnchorDX, Guangdong, CN, and NGS sequencing.
The IRIS assay is a quantitative technique to determine DNA methylation levels at specific gene loci in small amounts of genomic DNA or in trace amount of cell free circulating DNA. In the IRIS assay, methylation-dependent sequence differences are introduced into the genomic DNA by sodium bisulfite treatment, adaptors are directly ligated to the 3’end of single stranded DNA molecules after bisulfite conversion and the bisulfite treated DNA are subsequently PCR amplified (see Figure 1A for more details) . This combination of bisulfite treatment and PCR amplification results in conversion of unmethylated cytosine residues to thymine and of methylated cytosine residues to cytosine with a signal amplification. The amplification products will then be subject to sequencing and the sequencing results will be compared to the reference gene sequence of the target region to reflect the DNA methylation level within the target region. The IRIS assay is easy to use and provides high sensitivity and quantitative accuracy.
To render a diagnosis of lung cancer, the methylation level in the target region within a selected biological sample is then compared to a predetermined reference methylation level, and the measured methylation level of the CpG island in the sample comparable to the reference level is indicative of the subject having lung cancer.
A predetermined reference methylation level is the methylation level of CpG sites within the same target region pre-determined using an appropriate technique in a corresponding control or normal biological sample. A control or normal biological sample is a non-cancerous sample of a corresponding biological sample from the same subject or from  a subject determined to be comparable, e.g. in the case of a test sample from a human, a control sample may be obtained from another human, that may be of the same age or sex. Generally a median value of methylation level determined within the target region from multiple control samples aids in providing an accurate control value. In this regard, it is noted that normal biological samples exhibit a small degree or baseline amount of methylation that may vary from sample type to sample type. Thus, the determination of the degree of methylation in a target region of a control sample assists in providing an accurate analysis of the actual methylation degree in a test sample, i.e. whether or not hypermethylation actually exists in the test sample being analyzed.
The term “comparable” as used herein, means that the methylation level of the target region in the sample is within the range of 0.8-1.5 times (e.g., 0.8-1.2 times, 0.9-1.1 times etc. ) of the predetermined level.
In some embodiments, the diagnostic method of lung cancer through determining methylation level of one or more CpG island (s) of interest provided herein can provide a diagnostic sensitivity of at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to 100%. In some embodiments, the method of diagnosing lung cancer through determining methylation level of one or more CpG island (s) of interest provided herein can provide a diagnostic specificity of at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to 100%. In some specific embodiment, the diagnostic method of lung cancer provided herein has a sensitivity of at least 70%, and a specificity of at least 80%.
Biomarkers for Diagnosing Lung cancer
The present application is at least partially based on the discovery of a set of CpG regions whose methylation level/status is correlated with lung cancer, and thus can serve as biomarkers for diagnosis of lung cancer. In one aspect, the set of CpG regions includes CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B. In some embodiments, the set of CpG regions includes CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E. In some embodiments, the set of CpG regions includes CpG island of at least one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, ANKRD18B. In some embodiments, the set of CpG regions includes CpG island of at least one gene selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E.  In some embodiment, the set of CpG regions is selected based on sensitivity only. In some embodiment, the set of CpG regions is selected based on specificity only. In some embodiment, the set of CpG regions is selected by balancing the sensitivity and specificity.
Computer-implemented Methods, Systems and Devices
Any of the methods described herein may be totally or partially performed with a computer system including one or more processors, which can be configured to perform theese steps. Thus, embodiments are directed to computer systems configured to perform the steps of any of the methods described herein, potentially with different components performing a respective step or a respective group of steps. Although presented as numbered steps, steps of methods herein can be performed at the same time or in a different order. Additionally, portions of these steps may be used with portions of other steps from other methods. Also, all or portions of a step may be optional. Any of the steps of any of the methods can be performed with modules, circuits, or other means for performing these steps.
Any of the computer systems mentioned herein may utilize any suitable number of subsystems. In some embodiments, a computer system includes a single computer apparatus, where the subsystems can be the components of the computer apparatus. In other embodiments, a computer system can include multiple computer apparatuses, each being a subsystem, with internal components. The subsystems can be interconnected via a system bus. Additional subsystems include, for examples, a printer, keyboard, storage device (s) , monitor, which is coupled to display adapter, and others. Peripherals and input/output (I/O) devices, which couple to I/O controller, can be connected to the computer system by any number of means known in the art, such as serial port. For example, serial port or external interface (e.g. Ethernet, Wi-Fi, etc. ) can be used to connect computer system to a wide area network such as the Internet, a mouse input device, or a scanner. The interconnection via system bus allows the central processor to communicate with each subsystem and to control the execution of instructions from system memory or the storage device (s) (e.g., a fixed disk, such as a hard drive or optical disk) , as well as the exchange of information between subsystems. The system memory and/or the storage device (s) may embody a computer readable medium. Any of the data mentioned herein can be output from one component to another component and can be output to the user.
A computer system can include a plurality of the same components or subsystems, e.g., connected together by external interface or by an internal interface. In some  embodiments, computer systems, subsystem, or apparatuses can communicate over a network. In such instances, one computer can be considered a client and another computer a server, where each can be part of a same computer system. A client and a server can each include multiple systems, subsystems, or components.
It should be understood that any of the embodiments of the present disclosure can be implemented in the form of control logic using hardware (e.g., an application specific integrated circuit or field programmable gate array) and/or using computer software with a generally programmable processor in a modular or integrated manner. As used herein, a processor includes a multi-core processor on a same integrated chip, or multiple processing units on a single circuit board or networked. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement embodiments of the present disclosure using hardware and a combination of hardware and software.
Any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, R, Java, C/C++, Python, or Perl using, for example, conventional or object-oriented techniques. The software code may be stored as a series of instructions or commands on a computer readable medium for storage and/or transmission, suitable media include random access memory (RAM) , a read only memory (ROM) , a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk) , flash memory, and the like. The computer readable medium may be any combination of such storage or transmission devices.
Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet. As such, a computer readable medium according to an embodiment of the present invention may be created using a data signal encoded with such programs. Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via Internet download) . Any such computer readable medium may reside on or within a single computer product (e.g. a hard drive, a CD, or an entire computer system) , and may be present on or within different computer products within a system or network. A computer system may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
Kits an Microarrays
In another aspect, the present disclosure provides kits for use in the methods described above. The kits may comprise any or all of the reagents to perform the methods described herein. In such applications the kits may include any or all of the following: assay reagents, buffers, probes and/or primers that bind to the CpG island of interest described herein, etc.
The term “kit” as used herein in the context of detection reagents, are intended to refer to such things as combinations of multiple gene expression product detection reagents, or one or more gene expression product detection reagents in combination with one or more other types of elements or components (e.g., other types of biochemical reagents, containers, packages such as packaging intended for commercial sale, substrates to which gene expression detection product reagents are attached, electronic hardware components, etc. ) .
In some embodiments, the present disclosure provides oligonucleotide probes attached to a solid support, such as an array slide or chip, e.g., as described in Eds., Bowtell and Sambrook DNA Microarrays: A Molecular Cloning Manual (2003) Cold Spring Harbor Laboratory Press. Construction of such devices are well known in the art, for example as described in US Patents and Patent Publications U.S. Patent No. 5,837,832; PCT application W095/11995; U.S. Patent No. 5,807,522; US Patent Nos. 7,157,229, 7,083,975, 6,444,175, 6,375,903, 6,315,958, 6,295,153, and 5,143,854, 2007/0037274, 2007/0140906, 2004/0126757, 2004/0110212, 2004/0110211, 2003/0143550, 2003/0003032, and 2002/0041420. Nucleic acid arrays are also reviewed in the following references: Biotechnol Annu Rev (2002) 8: 85-101; Sosnowski et al. Psychiatr Genet (2002) 12 (4) : 181-92; Heller, Annu Rev Biomed Eng (2002) 4: 129-53; Kolchinsky et al., Hum. Mutat (2002) 19 (4) : 343-60; and McGail et al., Adv Biochem Eng Biotechnol (2002) 77: 21-42.
A microarray can be composed of a large number of unique, single-stranded polynucleotides, usually either synthetic antisense polynucleotides or fragments of cDNAs, fixed to a solid support. Typical polynucleotides are preferably about 6-60 nucleotides in length, more preferably about 15-30 nucleotides in length, and most preferably about 18-25 nucleotides in length. For certain types of arrays or other detection kits/systems, it may be preferable to use oligonucleotides that are only about 7-20 nucleotides in length. In other types of arrays, such as arrays used in conjunction with chemiluminescent detection technology, preferred probe lengths can be, for example, about 15-80 nucleotides in length, preferably about 50-70 nucleotides in length, more preferably about 55-65 nucleotides in length, and most preferably about 60 nucleotides in length.
In addition, the kits may include instructional materials containing directions (i.e., protocols) for the practice of the methods provided herein. While the instructional materials typically comprise written or printed materials they are not limited to such. Any medium capable of storing such instructions and communicating them to an end user is contemplated by this invention. Such media includes but is not limited to electronic storage media (e.g., magnetic discs, tapes, cartridges, chips) , optical media (e.g., CD ROM) , and the like. Such media may include addresses to internet sites that provide such instructional materials.
The following examples are provided to better illustrate the claimed invention and are not to be interpreted as limiting the scope of the invention. All specific compositions, materials, and methods described below, in whole or in part, fall within the scope of the present invention. These specific compositions, materials, and methods are not intended to limit the invention, but merely to illustrate specific embodiments falling within the scope of the invention. One skilled in the art may develop equivalent compositions, materials, and methods without the exercise of inventive capacity and without departing from the scope of the invention. It will be understood that many variations can be made in the procedures herein described while still remaining within the bounds of the present invention. It is the intention of the inventors that such variations are included within the scope of the invention.
EXAMPLE 1
This example illustrates the development and evaluation of a AnchorIRIS TM targeted methylation sequencing pipeline in comparison with a commercially available
Figure PCTCN2018079155-appb-000005
Accel-NGS Methyl-Seq TM (SWIFT) system.
The AnchorIRIS TM (IRIS) assay employs a technology that directly ligates adaptors to the 3’end of single stranded DNA molecules after bisulfite conversion (Figure 1A) . This significantly reduces DNA loss due to bisulfite conversion of constructed libraries. Another improvement included in the IRIS assay is the linear amplification after the first adaptor ligation. This increases the availability of molecules for the second adaptor ligation rendering a much higher chance for the original molecule to be incorporated into a sequencing library. These two improvements are particularly important for ctDNA recovery for subsequent sequencing considering the limited amount of ctDNA. A final target enrichment step is also introduced, and thus sequencing cost can be significantly reduced by pre-selecting a set of targets of interest.
Given the limited amount of ctDNA recovered from one typical blood draw, the detection limit and data quality of the IRIS assay highly rely on multiple factors, including bisulfite conversion efficiency, conversion efficiency of cfDNA molecules into sequencing libraries, sequencing coverage, and sequencing uniformity. To address these fundamental challenges, a bake-off study comparing IRIS assay to the commercially available DNA methylation library construction kit
Figure PCTCN2018079155-appb-000006
Accel-NGS Methyl-Seq TM was performed using variable amounts of input cfDNA (1 ng to 10 ng) , representing the typical range of cfDNA yield that we could retrieve from a typical blood draw. cfDNA was isolated from the plasma collected from 3 patients with ovarian cancer. DNA concentration was measured using Qubit and cfDNA was pooled. The concentration of the pooled cfDNA was calculated based on the total DNA amount and total volume.
Bisulfite conversion
Bisulfite conversion was performed using the Zymo Lightning Conversion Reagent (Cat#D5031, Zymo Research) according to the manufacturer’s protocol. Briefly, 130 μl of Lightning Conversion Reagent was added to 20 μl DNA sample, which was incubated in a thermocycler with the following program: 98 ℃ for 8 mins, 54℃ for 60 mins, and 4 ℃ for up to 20 hrs. Then bisulfite-converted DNA was mixed with M-Binding buffer, run through a Zymo-Spin TM IC Column, desulphonated, washed, and eluted in 17 μl M-Elution buffer. Bisulfite-converted cfDNA was then aliquoted for different input titrations in duplicate, including 10 ng, 5 ng, 3 ng, and 1 ng. By doing this, assay variation between samples introduced at the step of bisulfite conversion was avoided.
IRIS Library Preparation
AnchorIRIS TM pre-library construction was performed using AnchorDx EpiVisio TM Methylation Library Prep Kit (AnchorDx, Cat#A0UX00019) and AnchorDx EpiVisio TM Indexing PCR Kit (AnchorDx, Cat#A2DX00025) . End repair of bisulfite-converted DNA was performed using the MEE1 enzyme at 37 ℃ for 30 mins. DNA was then denatured at 95 ℃ for 5 mins and snap cooled on ice. 3’end adaptor was ligated using the MLE1 and MLE5 enzymes at 37 ℃ for 30 mins. First amplification was immediately performed to generate reverse complemented DNA molecules using the MAE3 enzyme with the following PCR program: 1 cycle of 95 ℃ for 3mins, 4 cycles of 95 ℃ for 30 secs +60 ℃ for 30 secs + 68 ℃ for 1 min, and 1 cycle of 68 ℃ for 5 mins. Amplified DNA was purified using the AMB1 Magnetic Beads and eluted in a 20μvolume. 3’end adaptor ligation of reverse complemented DNAs was next performed using the MSE1 and MSE5 enzymes at  37 ℃ for 30 mins. Indexing PCR (i5 and i7) was immediately performed using the MIB1 PCR master mix with the following PCR program: 1 cycle of 98 ℃ for 45 secs, 14 cycles of 98 ℃ for 15 secs + 60 ℃ for 30 secs + 72 ℃ for secs, and 1 cycle of 72 ℃ for 5 mins. The amplified pre-libraries were subsequently purified using the IPB1 Magnetic Beads and the concentration was determined using the Qubit TM dsDNA HS Assay Kit. Pre-libraries containing more than 400 ng DNA were considered qualified for target enrichment.
Target Enrichment was performed using AnchorDx EpiVisio TM Target Enrichment Kit (AnchorDx, Cat#A0UX00031) . A total of 1000 ng DNA containing up to 4 pre-libraries was pooled for target enrichment using our custom made 10K methylation panel, which includes 9921 pre-selected regions enriched for cancer-specific methylation. Briefly, HE, HBA and HBB blocking reagents were added to the 1000 ng pooled pre-libraries and completely dried using a heated vacuum, which was subsequently reconstituted in 7.5 
Figure PCTCN2018079155-appb-000007
λ MHB1 hybridization buffer plus 3 μl MHE1 hybridization enhancer. Reconstituted pre-library pools were next denatured at 95 ℃ for 10 mins and immediately transferred to a 47 ℃ hybridization oven. Then probes were added to each pre-library pool, which was quickly transferred to a thermocycler for hybridization incubation following the manufacturer’s protocol.
After hybridization, DNA pre-libraries bound with biotinylated probes were pulled down using the Dynabeads M270 streptavidin beads (Thermo Fisher Scientific, Cat#65306) . Briefly, 30 μl streptavidin beads were used for each pre-library pool, washed twice with 1X Binding Wash Buffer, and resuspended in 60 μl Binding Wash Buffer. Pre-library pools were added and mixed well with beads by repeated pipetting, and the mixture was incubated on a rotator at 47 ℃ for 45 mins. After beads binding, 100 μl pre-warmed 1X Transfer Buffer was added to the mixture. The supernatant was quickly removed as soon as it turned clear and beads were washed twice using pre-warmed 1X Stringent Wash Buffer. Next, beads were resuspended with 200 μl room temperature 1X Wash Buffer I and mixed thoroughly. Supernatant was then removed and beads were subsequently washed with 1X Wash Buffer II and 1X Wash Buffer III following the same steps, and finally eluted in 23 μl H 2O.
These enriched libraries were further amplified with P5 and P7 primers using the KAPA HiFi HotStart Ready Mix (KAPA Biosystems, Cat#KK2602) with the following PCR program: 1 cycle of 98 ℃ for 45 secs, 12 cycles of 98 ℃ for 15 secs + 60 ℃ for 30 secs + 72 ℃ for 30 secs, and 1 cycle of 72 ℃ for 1 min. PCR product was then purified with  Agencourt AMPure XP Magnetic Beads (Beckman Coulter, Cat#A63882) and eluted in 40 μl EB buffer. The concentration of this final library was determined using Qubit dsDNA HS Assay.
SWIFT Library Preparation
IRIS libraries were constructed according to the methods described above, while SWIFT libraries were constructed according to the manufacturer’s protocol (Cat#DL-ILMMS-12/48) . Briefly, bisulfite converted DNA was denatured at 95 ℃ for 2 mins and snap cooled on ice. 3’end adaptor-1 ligation was immediately performed using the Adaptase Reaction Mix with the incubation program: 37 ℃ for 1 min, 62 ℃ for 2 mins, and 65 ℃ for 5 mins. Next, reverse complemented sequence of each ssDNA was synthesized using the Extension Reaction with the enzyme Y2 with the incubation program: 98 ℃ for 1 min, 62 ℃ for 2 mins, and 65 ℃ for 5 mins, resulting in dsDNA molecules. Next, dsDNA was purified using the SPRIselect beads and eluted in 15 μl low EDTA TE. The ligation of adaptor-2 was performed using the Enzyme B3 at 25 ℃ for 15 mins. Ligation products were purified with SPRIselect beads and carried to subsequent Indexing PCR for amplification with the PCR program: 1 cycle of 98 ℃ for 30 secs and repeated cycles of 98 ℃ for 10 secs + 60 ℃ for 30 secs + 68 ℃ for 60 secs. PCR cycle numbers were adjusted according to input DNA amount for both IRIS and SWIFT assays. PCR products were bead purified, eluted in 40 μl EB buffer, and quantified using Qubit.
Target enrichment was performed using the same 10K panel for both assays according to the methods described above.
Results and discussion
All prepared libraries are sequenced and the results show that all the libraries are capable of achieving similar amounts of uniquely mapped total reads (> 10 million) with about 65%~ 70%mapping rate. Overall performance improves with higher cfDNA input for both techniques as shown by proportionally increased post-deduplication mean bait coverage (Figure 1B) . However, at each cfDNA input level, the IRIS libraries produced 4 ~ 8 times higher post-deduplication mean target coverage than SWIFT libraries, with uniformity (percentage of target bases with greater than 0.2x mean target coverage) at all conditions greater than 90% (Figure 2) . Even the libraries with 1 ng cfDNA input using the IRIS assay displayed greater performance metrics than the libraries constructed with 10 ng input cfDNA using the SWIFT assay.
The library conversion efficiencies of IRIS and SWIFT are estimated and compared (Figure 1C and 1D, Figure 2) . The IRIS assay conferred at least 20%conversion  rate with at least 5-fold greater efficiency than SWIFT. An unexpected high library conversion efficiency with 1 ng cfDNA input is also observed for IRIS, which was likely due to the higher efficiency of library construction with much abundant enzymes and reagents at each step relative to the limited starting material.
EXAMPLE 2
This example relates to the evaluation of sensitivity and detection limit of the AnchorIRIS TM targeted methylation sequencing.
DNA methylation alteration has been shown as an early event during tumorigenesis, and multiple genomic regions are affected simultaneously. While whether it plays a causal role still needs to be determined, it renders a great advantage for DNA methylation being used as biomarkers for cancer early detection, by which much more genomic markers can be acquired in parallel from a tiny amount of starting material, especially in the case of ctDNA. Due to this special feature of DNA methylation, two factors need to be considered for evaluating the LoD: (1) whether a set of regions with informative co-methylation signals can be detected above background at a given dilution; (2) define a linear quantitative range for input dilutions.
To assess assay sensitivity and limit of detection (LoD) of the IRIS assay, different amounts of lung cancer tumor tissue genomic DNA (gDNA) were spiked into WBC gDNA to create serial dilutions of tumor fraction, including 1: 10, 1: 30, 1: 100, 1: 300, and 1: 1000. Undiluted and WBC samples were also included. A total amount of 900 ng gDNA per 50 μl volume for each dilution was sheared to 200 bp, successful shearing was confirmed by running 1%agarose gel. Concentrations of sheared DNA were measured using Qubit and bisulfite conversion was performed using 250 ng DNA. Next, 100 ng of bisulfite-converted DNA was aliquoted from each dilution in duplicate for library construction and target enrichment according to the methods described above.
Results and discussion
As shown in Figure 1E and 1F, at higher dilutions (>10%) when tumor DNA was still adequate, almost all pre-selected informative CpG regions could be detected (Figure 1E) , and only at this range, the percentage of co-methylation presents a linear relationship according to dilution factors (gray box, Figure 1F) . This is because the percentage of co-methylation varies among different genomic regions, and the average percentage is subjected to change when the detected region set becomes smaller as tumor DNA is more diluted. Moreover, we could still detect a number of informative CpG regions even at the dilution of 0.033%, which was significantly greater than the number of regions detected at background  using WBC gDNA. Considering that the cancer cell content of the starting material was estimated to be approximately 30%of the perspective FFPE tissue block, our assay can achieve a detection limit of 0.0033%. All dilution samples displayed very similar library construction and sequencing performance (Figure 3) . The overall sequencing performance of tissue gDNA is generally better than cfDNA with higher diversity and uniformity at the same sequencing depth due to the substantially higher DNA input amount (Figure 2 and 3) . Technical replicates of each dilution were highly correlated across all target regions. All these suggest that the IRIS assay is highly stable and reproducible.
EXAMPLE 3
This example illustrates the identification and validation of methylation related markers for detection of lung cancer in tissue samples and plasma samples.
To characterize methylation signatures specific to early-stage lung cancer, this study employs samples from cancer-free individuals and formalin-fixed paraffin embedded (FFPE) tissue samples and plasma samples from patients screened positive for pulmonary nodules (PNs) by CT/LDCT scan and subsequently underwent surgical resections. Since the study is aimed for non-invasive diagnosis of early-stage lung cancer, enrolled patients were required to be free of previous cancer history and diagnosed with only 1 or 2 PNs. Both genders were included and smoking history was recorded. Pathological information of all samples was determined based on surgically resected tissue sections according to 2015 WHO Histological Classification of Lung Cancer. The collection of all samples was approved by Ethical Committees at each site, and all participants provided written informed consent.
Considering that adenocarcinoma is the major subtype of lung cancer in this cohort, among which IA is considered a later stage during cancer development beginning from AIS and MIA and should have accumulated more methylation markers, we started with the identification of hypermethylated CpG sites by comparing 33 IA samples to 78 benign samples. By doing so, differentially methylated CpG loci (DML) were first identified, and neighboring associated CpG loci were further grouped into differentially methylated regions (DMR) . 1494 DMRs were selected by this approach. While these regions were hypermethylated in almost all IA samples, this pattern of hypermethylation was only detected in half MIA samples and the other half MIA samples presented no difference compared to the benign samples (Figure 4A) . Similarly, AIS samples also revealed lack of hypermethylation signals. This gradual gain of hypermethylation (from right to left in the heatmap in Figure 4A) is consistent with the sequential events of adenocarcinoma development progressing from AIS, MIA, to IA. These hypermethylated CpG sites were further confirmed using the  TCGA methylation microarray data generated from lung adenocarcinoma and normal lung tissues.
Patients and sample collection
309 malignant and benign lung tissue samples were collected in FFPE format, and 79 samples were excluded due to insufficient extracted DNA amount, low library yield, or poor sequencing quality. Therefore, in total 230 FFPE tissue samples were enrolled for training, comprising 129 malignant tumor samples of invasive adenocarcinoma (IA) , minimally invasive adenocarcinoma (MIA) , adenocarcinoma in situ (AIS) , squamous cell (SC) , large cell (LC) , small cell (SCLC) , and other rare case lung cancers, and 101 benign lesion samples of hamartoma (HAM) , tuberculosis (TB) , inflammatory granuloma (GRAN) , fungal infection (FUN) , inflammation (INF) , sclerosing hemangioma (SH) , and other rare cases, detailed information for each tissue sample are listed below in Table 2.
Table 2. Patient characteristics for the tissue samples
Figure PCTCN2018079155-appb-000008
260 plasma samples were also collected, including 132 samples from individuals diagnosed with positive PNs and 128 samples from asymptomatic normal individuals, detailed information for each plasma sample are listed below in Table 3.
Table 3. Patient characteristics for the plasma samples
Figure PCTCN2018079155-appb-000009
The enrolled samples include 33 paired tissue-plasma samples that were used to evaluate methylation concordance between tissue and plasma within a same individual. 8 ml of blood was drawn 1-3 days prior to surgery and stored in Cell-Free DNA
Figure PCTCN2018079155-appb-000010
blood collection tubes (Streck, Inc. Cat#218962) at room temperature. Plasma was separated from blood (no apparent hemolysis) within 48 hours after blood draw, and stored at -80 ℃ until DNA isolation. For asymptomatic normal participants, 8 ml of blood was drawn using BD 
Figure PCTCN2018079155-appb-000011
EDTA Tubes (Becton, Dickinson and Company, Cat#367525) and plasma was immediately separated within 2 hours after blood draw and stored at -80 ℃.
Isolation of tissue genomic DNA and plasma cell-free DNA (cfDNA)
Tissue genomic DNA (gDNA) was isolated from FFPE tissue samples using the Qiagen QIAamp DNA FFPE Tissue Kit (Qiagen, Cat#56404) according to the manufacture’s protocol. gDNA was fragmented to 200 bp using the M220 Focused- ultrasonicator TM (Covaris, Inc. ) following the manufacturer’s protocol and 100 ng of fragmented DNA was used for library construction.
For plasma collected using Streck BCT, cfDNA was isolated using the Qiagen QIAamp Circulating Nucleic Acid Kit (Qiagen, Cat#55114) according to the manufacturer’s protocol, while cfDNA was isolated using the Bioo NextPrep-Mag TM cfDNA Isolation Kit (Bioo Scientific, Cat#NOVA-3825-01/3) for plasma collected using EDTA-K2 tubes. Repeated freezing and thawing of plasma was avoided to prevent cfDNA degradation and gDNA contamination from white blood cells (WBC) . The concentration of cfDNA was measured using the Qubit TM dsDNA HS Assay Kit (Thermo Fisher Scientific, Cat#Q32854) and quality was examined using the Agilent High Sensitivity DNA Kit (Cat#5067-4626) . cfDNA with yield greater than 3 ng without overly genomic DNA contamination was proceeded to library construction and sequencing according to the IRIS Targeted methylation sequencing method described in Example 1.
Read mapping
Sequencing adapters and 3’-low quality bases were trimmed from raw sequencing reads using Trim Galore version 0.4.1 (https: //github. com/FelixKrueger/TrimGalore) , and then aligned to C->T converted hg19 genome, as well as G->A converted hg19 genome using Bismark version 0.15.0 (Bowtie2 is the default aligner behind Bismark) [34] . Reads having at least 2 methylated CpGs within a sliding window of 2~5 CpGs were designated as co-methylated reads and used for subsequent analysis of methylation patterns and predictive modeling of malignant/benign states of patient samples.
Assay performance evaluation
Aligned reads were evaluated by Picard version 2.5.0 for metrics that measures the performance of target-capture based bisulfide sequencing assays (http: //broadinstitute.github. io/picard) . Specifically, the library conversion efficiency is calculated as the ratio of estimated molecule number incorporated in a library divided by the theoretical molecule number equivalent to the input DNA amount. The estimated molecule number is derived from sequencing depth (pre-deduplication mean bait coverage) and observed sequencing diversity (observed molecule number, post-deduplication mean bait coverage) based on the Poisson Distribution.
Differential methylation signature identification
Differential methylation (DM) analysis was performed on the training cohort of lung cancer patients using R package DSS version 2.14.0. Differentially methylated CpGs  were identified comparing invasive adenocarcinoma (IA) to benign samples (p<0.05) , and further assembled into differentially methylated regions (DMRs) . Targeted regions of the capture panel covered by DMRs (requires >50%bases of a target region to be covered) were selected as candidate features for building classification models of malignant/benign state. The differential signal was visually confirmed by heatmap using Gitools version 2.3.0.
Predictive modeling of malignant/benign state
To validate the collective prediction power of candidate features, a random forest model was built for tissue samples in the training cohort of lung cancer patients. The 2-fold cross-validation was repeated 10 times and top 1000 markers were selected by their importance scores in the random forest model. The performance of this model was evaluated on an independent test set using receiving operation curve (ROC) method. For a chosen threshold, the sensitivity and specificity were calculated as follows,
Figure PCTCN2018079155-appb-000012
Figure PCTCN2018079155-appb-000013
After confirming the collective prediction power in tissue samples, the signal distribution of these candidate markers were further examined in plasma samples, and a total of 92 markers that preferentially discriminated malignant samples from benign and asymptomatic normal samples in the training set were identified. Next, the Least Absolute Shringkage and Selection Operator (Lasso) (Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267-288) ) method was applied to select top 9 markers (see in Table 1) that appeared most frequently among 500 subsampling of the original dataset at 75%sampling rate without replacement. The Lasso model was determined according to the expected generalization error estimated from 10-fold cross-validation. Finally, a logistic regression model was trained with these 9 markers to discriminate the same malignant samples from benign samples in the training set. The performance of this classification model was evaluated in an independent test set using the ROC method.
Limit of detection based on serial dilution experiment
Two replicates of serial dilutions were created by mixing lung cancer tissue gDNA and WBC gDNA according to dilutions of 1: 10, 1: 30, 1: 100, 1: 300, 1: 1000, and 1: 3000. To achieve optimum quantitation, a set of 887 informative co-methylated CpG regions were selected from our 10K methylation panel, meeting the following criteria: (1) the percentage of co-methylated reads (co-methylated reads /all mapped reads with at least 3  CpGs) is required to be greater than 7.5%from the undiluted tumor gDNA sample; (2) the percentage of co-methylated reads at dilution 1: 100 is required to be greater than the percentage of co-methylation at background using WBC gDNA; (3) the ranking of the percentage of co-methylation for undiluted, 1: 10, 1: 30, and 1: 100 dilution samples fully agrees with the titration order in at least one dilution replicate. For a given dilution, whether significantly more regions with informative co-methylation signals can be detected above background is determined by Z-test to assign a p-value. P-value <0.05 is considered statistically significant.
Tissue level classification was tested by 10 bootstraps of 2-fold cross-validation each time randomizing all IA and benign samples into training and test groups and classifier was modified based on these hypermethylated CpG markers using regularized logistic regression (Figure 4B, upper panel) . The prediction performance achieved an overall sensitivity of 92.7%± 4.4%and an overall specificity of 92.8%±3.5%for separating IA (n=65) from benign lesions (n=101) , giving rise to an overall AUC of 97.4%± 1.0% (Figure 4B, low panel) .
Moreover, an independent cohort of additional 58 patients from an independent cancer center were enrolled, and a sensitivity of 89.2%and a specificity of 81.0%was achieved. As expected, the model yielded the highest sensitivity towards IA specimens (100%) , while sensitivities towards other lung cancer subtypes are also high (Table 4) .
Table 4. Independent validation of the malignancy classifier performance for tissue samples using a separate cohort (Cohort 2) of patients. NLCTL, lung normal control tissue; EM, emphysema.
Figure PCTCN2018079155-appb-000014
EXAMPLE 4
This example relates to validation of tissue derived methylation related markers for lung cancer detection in plasma samples.
Cancer early detection is by far the most economical and effective mean to reduce cancer-specific mortality. As the largest cancer type in the world, lung cancer early screening has long been challenging due to the high false positive rate of LDCT screening and the difficulty to perform diagnostic biopsies. Therefore, a non-invasive yet sensitive diagnostic assay that can distinguish malignant pulmonary nodules from benign diseases will be particularly valuable for patients with positive LDCT results. Liquid biopsy of ctDNA has become one of the most attractive approaches for such clinical applications. However, a number of recent studies that attempted to detect ctDNA from early-stage lung cancer patients via PCR or NGS-based somatic mutation profiling all concluded with limited sensitivities.
ctDNA detection via methylation profiling can achieve higher sensitivity and specificity compared to somatic mutation profiling in early-stage patients because 1) a greater magnitude of markers can be simultaneously accessed to increase sensitivity and 2) multiple CpG loci within each selected region can be interrogated together to derive “cancer-specific methylation patterns” for increased specificity. Furthermore, methylation profiling can be used to differentiate tissue-of-origin and cancer subtypes. The release of gDNA from apoptotic/necrotic tumor cells into blood provides an opportunity to use ctDNA for the detection of cancer.
Cancer classification using plasma DNA
To validate the tumor tissue-derived DNA methylation markers in the cfDNA pool, 33 pairs of tissue and plasma samples were studied, each pair of which was derived from the same patient. As described above, we only focused on co-methylated reads, and each co-methylation pattern was recorded. These pre-defined co-methylation patterns were next used to evaluate concordance between paired tissue and plasma samples, revealing an enrichment of tumor tissue-derived co-methylation patterns in the respective paired cfDNA pool (Figure 5) .
To explore the clinical application of using DNA methylation signatures carried in ctDNA for early-stage lung cancer detection, we enrolled 192 patients, 10ml of plasma was collected from each patient before surgery. 13 samples were excluded due to hemolysis, insufficient cfDNA yield (< 3 ng) , or leukocyte DNA contamination; 9 samples were excluded due to low pre-library yield (< 400 ng) ; 18 samples were excluded due to  failed sequencing quality control (QC) ; and another 18 samples were excluded due to the lack of pathological results; all of which resulted in a total of 134 samples qualifying for subsequent analyses. We also included 128 asymptomatic normal participants who have never been diagnosed with any tumor type.
As an independent validation test, we applied the 9-markers derived from tissue to the analysis of plasma samples (including 31 malignant samples, 27 benign samples, and 64 normal samples) (Table 5 and Figure 6A) , and achieved a sensitivity of 89.7%and an overall specificity of 80.2% (AUC = 91.5%) in separating malignant from benign and normal samples (Figure 6D and Table 5) . The IRIS assay was proven to be highly sensitive against early-stage lung cancers showing sensitivities of 85.0%and 100%for stage Ia and Ib lung cancers (Table 5) . Consistent with a recent report that adenocarcinoma lung cancers shed less amount of ctDNA into the blood, sensitivities is 82.6%for IA. A sensitivity of 100%were achieved for squamous cell lung cancers, MIAs and other types of lung cancers that are associated with higher cell growth turnover rate. And the specificity for normal populations is 87.5% (Table 5) .
Table 5: Performance of the malignancy classifier of plasma samples among various lung cancer subtypes and stages against benign and normal conditions in the independent test group
Figure PCTCN2018079155-appb-000015
Smoking history and age have been shown to affect DNA methylation status and have been reported as risk factors for lung cancer development. We therefore performed univariable and multivariable analyses to determine which clinical risk factors may associate with pathological outcomes in the current setting and may provide better prediction power in combination with DNA methylation information. Univariable analyses showed that age, smoking history, and nodule size facilitated ctDNA detection, among which nodule size is the strongest risk factor other than DNA methylation in predicting malignancy (Figure 6E and Table 6) .
Table 6: Malignancy prediction performance of plasma samples according to nodule sizes
Figure PCTCN2018079155-appb-000016
The multivariable analysis showed that only DNA methylation served as an independent predictor, while other risk factors provided little additional effect. It suggests that the utilization of methylation signatures carried by cfDNA can serve as a stand-alone approach, independent of other clinical factors, distinguishing malignant lung cancers from benign nodules.
Here we present the first study with a lung cancer specific cohort, primarily focusing on early-stage lung adenocarcinomas, in combination with a novel targeted methylation profiling assay that exhibits superior library conversion efficiency and assay sensitivity (Figure 1) . The two major technical hurdles for bisulfite sequencing are 1) the low library conversion efficiency, which limits the use of low input samples such as plasma DNA, and 2) limited means for targeted enrichment. The IRIS assay combines high library conversion efficiency with a streamlined targeted enrichment workflow, which enables deep sequencing of pre-selected highly informative regions from clinical samples. The average unique target coverage from our clinical cohort is >180X, which greatly facilitates the power of detecting low frequency ctDNA compared to the shallow sequencing approach used in previous studies.
The abundance of ctDNA out of total cfDNA is largely associated with the tumor volume. A tumor with 1 cm 3 volume is predicted to have a ctDNA fraction between  0.001%-0.03%; therefore, the limit of detection of a diagnostic assay is critical for detection of early-stage lung cancer. The IRIS assay demonstrates a limit of detection of 0.0033%by combining several hundreds of pre-selected markers, which allows sensitive detection of malignancy from patients with tumor as small as 0.5 cm in diameter. Using the IRIS assay, we archived an overall sensitivity of 82.1%in detecting malignancy from plasma DNA of lung cancer patients. Particularly, the sensitivity for stage-Ia and Ib patients remain at 85.0%and 100%, respectively, superior to other ctDNA-based liquid biopsy performance via somatic mutation or DNA methylation profiling (Table 5) .
In the present disclosure, we demonstrated the feasibility of using high-throughput targeted DNA methylation sequencing of ctDNA to detect sub-centimeter tumor non-invasively. This approach can be potentially applied to various aspects of cancer management including cancer early screening, LDCT confirmatory diagnosis, minimal residual disease surveillance, recurrence and treatment response monitoring.
While the disclosure has been particularly shown and described with reference to specific embodiments (some of which are preferred embodiments) , it should be understood by those having skill in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present disclosure as disclosed herein.

Claims (31)

  1. A method for in vitro diagnosis of lung cancer in a subject, comprising:
    measuring methylation level of CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B in a sample from the subject;
    comparing the measured methylation level of the CpG island to a corresponding predetermined reference level; and
    determining a likelihood of the subject having lung cancer,
    wherein, the measured methylation level of the CpG island in the sample comparable to the reference level is indicative of the subject having lung cancer.
  2. The method of claim 1, wherein the at least one gene is selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E.
  3. The method of claim 1, wherein the at least one gene is selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, ANKRD18B.
  4. The method of claim 1, wherein the at least one gene is selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E.
  5. The method of claim 1, wherein the measuring step comprising measuring methylation level of CpG islands of ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, and CACNA1E.
  6. The method of claim 1, wherein the measuring step comprising measuring methylation level of CpG islands of ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, and ANKRD18B.
  7. The method of claim 1, wherein the measuring step comprising measuring methylation level of CpG islands of ZNF649, LHX5-AS1, RASGRF2, and CACNA1E
  8. The method of claim 1, wherein the sample is blood sample.
  9. The method of claim 8, wherein the blood sample comprises cell free DNA.
  10. The method of claim 9, wherein the blood sample is processed prior to the measuring step, wherein the step of processing comprises enrichment or purification of the cell free DNA.
  11. The method of claim 1, wherein the methylation levels are measured by an amplification assay, a hybridization assay, a sequencing assay or an array.
  12. The method of claim 1, wherein the comparing step is performed by a processor of a computing device.
  13. The method of claim 1, wherein the determining step is performed by a processor of a computing device.
  14. The method of claim 1, wherein the determining step comprises using a machine learning model.
  15. The method of claim 1, wherein the machine learning model is Least Absolute Shrinkage and Selection Operator.
  16. A non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by a processor, cause the processor to:
    retrieving methylation level of CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B in a sample from the subject;
    comparing the retrieved methylation level of the CpG island to a corresponding predetermined reference level; and
    determining a likelihood of the subject having lung cancer,
    wherein, the retrieved methylation level of the CpG island in the sample comparable to the reference level is indicative of that the subject having lung cancer.
  17. The non-transitory computer readable medium of claim 16, wherein the at least one gene is selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E.
  18. The non-transitory computer readable medium of claim 16, wherein the at least one gene is selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, ANKRD18B.
  19. The non-transitory computer readable medium of claim 16, wherein the at least one gene is selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E.
  20. The non-transitory computer readable medium of claim 16, wherein the measuring step comprising measuring methylation level of CpG islands of ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, and CACNA1E.
  21. The non-transitory computer readable medium of claim 16, wherein the measuring step comprising measuring methylation level of CpG islands of ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, and ANKRD18B.
  22. The non-transitory computer readable medium of claim 16, wherein the measuring step comprising measuring methylation level of CpG islands of ZNF649, LHX5-AS1, RASGRF2, and CACNA1E
  23. A kit, wherein the kit comprises methylation detection probes for detecting CpG island of at least one gene selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E, PRLR, ANKRD18B.
  24. The kit of claim 23, wherein the at least one gene is selected from ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, CACNA1E.
  25. The kit of claim 23, wherein the at least one gene is selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, ANKRD18B.
  26. The kit of claim 23, wherein the at least one gene is selected from ZNF649, LHX5-AS1, RASGRF2, CACNA1E.
  27. The kit of claim 23, wherein the detecting comprising measuring methylation level of CpG islands of ZNF649, IRF8, XKR4, LHX5-AS1, RASGRF2, NKX2-2, and CACNA1E.
  28. The kit of claim 23, wherein the detecting comprising measuring methylation level of CpG islands of ZNF649, LHX5-AS1, RASGRF2, CACNA1E, PRLR, and ANKRD18B.
  29. The kit of claim 23, wherein the detecting comprising measuring methylation level of CpG islands of ZNF649, LHX5-AS1, RASGRF2, and CACNA1E.
  30. The kit of claim 23, wherein the kit is a microarray.
  31. A system for in vitro diagnosis of lung cancer in a test subject, the system comprising
    a kit of claim 23; and
    a non-transitory computer readable medium of claim 16.
PCT/CN2018/079155 2018-03-15 2018-03-15 System and method for determining lung cancer WO2019174004A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/079155 WO2019174004A1 (en) 2018-03-15 2018-03-15 System and method for determining lung cancer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/079155 WO2019174004A1 (en) 2018-03-15 2018-03-15 System and method for determining lung cancer

Publications (1)

Publication Number Publication Date
WO2019174004A1 true WO2019174004A1 (en) 2019-09-19

Family

ID=67908674

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/079155 WO2019174004A1 (en) 2018-03-15 2018-03-15 System and method for determining lung cancer

Country Status (1)

Country Link
WO (1) WO2019174004A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111254194A (en) * 2020-01-13 2020-06-09 东南大学 Cancer-related biomarkers based on sequencing and data analysis of cfDNA and application thereof in classification of cfDNA samples
WO2021072171A1 (en) * 2019-10-11 2021-04-15 Grail, Inc. Cancer classification with tissue of origin thresholding
CN112899359A (en) * 2021-01-27 2021-06-04 广州市基准医疗有限责任公司 Methylation marker for detecting benign and malignant lung nodules or combination and application thereof
CN115807095A (en) * 2022-12-07 2023-03-17 中国人民解放军总医院第八医学中心 Primer composition for detecting methylation sites of lung adenocarcinoma and application thereof

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
H CHEN ET AL.: "Aberrant methylation of RASGRF2 and RASSF1A in human non-small cell lung cancer", JNCOLOGY REPORTS, vol. 15, no. 5, 1 May 2006 (2006-05-01), pages 1281 - 1285, XP055636715 *
IVB LIU ET AL.: "Epigenetic Regulation of ANKRD18B in lung cancer", EPIGENETIC REGULATION OFANKRD18B IN LUNG CANCER, vol. 54, no. 4, 19 November 2013 (2013-11-19), pages 312 - 321, XP055636718 *
RA RAUCH ET AL.: "High-resolution mapping of DNA hypermethylation and hypomethylation in ung cancer", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 105, 8 January 2008 (2008-01-08), pages 252 - 257, XP055624844, doi:10.1073/pnas.0710735105 *
VI SUZUKI ET AL.: "Aberrant methylation and silencing of IRF8 expression in non-small cell lung cancer", ONCOLOGY LETTERS, vol. 8, no. 3, 11 June 2014 (2014-06-11), pages 1025 - 1030, XP055636716 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021072171A1 (en) * 2019-10-11 2021-04-15 Grail, Inc. Cancer classification with tissue of origin thresholding
CN111254194A (en) * 2020-01-13 2020-06-09 东南大学 Cancer-related biomarkers based on sequencing and data analysis of cfDNA and application thereof in classification of cfDNA samples
CN111254194B (en) * 2020-01-13 2021-09-07 东南大学 Cancer-related biomarkers based on sequencing and data analysis of cfDNA and application thereof in classification of cfDNA samples
CN112899359A (en) * 2021-01-27 2021-06-04 广州市基准医疗有限责任公司 Methylation marker for detecting benign and malignant lung nodules or combination and application thereof
CN115807095A (en) * 2022-12-07 2023-03-17 中国人民解放军总医院第八医学中心 Primer composition for detecting methylation sites of lung adenocarcinoma and application thereof
CN115807095B (en) * 2022-12-07 2023-10-13 中国人民解放军总医院第八医学中心 Primer composition for detecting methylation sites of lung adenocarcinoma and application of primer composition

Similar Documents

Publication Publication Date Title
Liang et al. Non-invasive diagnosis of early-stage lung cancer using high-throughput targeted DNA methylation sequencing of circulating tumor DNA (ctDNA)
US10718010B2 (en) Noninvasive diagnostics by sequencing 5-hydroxymethylated cell-free DNA
JP6830094B2 (en) Nucleic acids and methods for detecting chromosomal abnormalities
WO2019174004A1 (en) System and method for determining lung cancer
US20230040907A1 (en) Diagnostic assay for urine monitoring of bladder cancer
US20210115518A1 (en) Leukemia methylation markers and uses thereof
WO2018098379A1 (en) Methods for cancer detection
WO2017223216A1 (en) Compositions and methods for diagnosing lung cancers using gene expression profiles
US20200109457A1 (en) Chromosomal assessment to diagnose urogenital malignancy in dogs
WO2020002621A2 (en) Detection of microsatellite instability
CN112567050A (en) Detection method
EP3728630A1 (en) Compositions and methods for diagnosing lung cancers using gene expression profiles
JP2023516525A (en) DNA methylation biomarker combinations, detection methods and reagent kits
US20220145368A1 (en) Methods for noninvasive prenatal testing of fetal abnormalities
CA3099612C (en) Method of cancer prognosis by assessing tumor variant diversity by means of establishing diversity indices
WO2023147445A2 (en) Cell-free rna biomarkers for the detection of cancer or predisposition to cancer
CN115772564A (en) Methylation biomarker for auxiliary detection of lung cancer somatic cell ATM gene fusion mutation and application thereof
CN115772566A (en) Methylation biomarker for auxiliary detection of lung cancer somatic cell ERBB2 gene mutation and application thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18909709

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18909709

Country of ref document: EP

Kind code of ref document: A1