WO2023200404A2 - Method for determining cfdna fragment size ratio and fragment size distribution - Google Patents

Method for determining cfdna fragment size ratio and fragment size distribution Download PDF

Info

Publication number
WO2023200404A2
WO2023200404A2 PCT/SG2023/050252 SG2023050252W WO2023200404A2 WO 2023200404 A2 WO2023200404 A2 WO 2023200404A2 SG 2023050252 W SG2023050252 W SG 2023050252W WO 2023200404 A2 WO2023200404 A2 WO 2023200404A2
Authority
WO
WIPO (PCT)
Prior art keywords
cfdna
base pairs
cancer
fragment size
fragments
Prior art date
Application number
PCT/SG2023/050252
Other languages
French (fr)
Other versions
WO2023200404A3 (en
Inventor
Yukti CHOUDHURY
Min-Han Tan
Original Assignee
Lucence Life Sciences Pte. Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lucence Life Sciences Pte. Ltd. filed Critical Lucence Life Sciences Pte. Ltd.
Publication of WO2023200404A2 publication Critical patent/WO2023200404A2/en
Publication of WO2023200404A3 publication Critical patent/WO2023200404A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/686Polymerase chain reaction [PCR]

Definitions

  • the present disclosure relates to the detection of fragment size ratio and fragment size distribution of nucleic acid.
  • the present disclosure relates to the determination of fragment size ratio and fragment size distribution of cfDNA.
  • Cell-free DNA is composed of short DNA fragments found in plasma, urine and other bodily fluids, while circulating tumor (ctDNA) is a subset of cfDNA of tumor origin. Circulating cell-free DNA fragments typically range between 120-220 base pairs in size, or multiples of this size range, with a maximum peak at 167 base pairs. This pattern coincides with the length of DNA wrapped around a single nucleosome, plus a short stretch of ⁇ 20 base pairs (linker DNA) bound to a histone Hl. Fragmentation patterns of normal cfDNA are therefore the result of nucleosomal DNA patterns that reflect the chromatin structure of normal blood cells.
  • Tumor-derived ctDNAs have been shown to be variable in size and are in general shorter than normal cfDNAs in healthy people. Interestingly, while tumor cfDNA fragments are shortened, they still retain the peak at 166 base pairs within the size distribution.
  • the modal size of tumor cfDNA is between 130 base pairs and 150 base pairs, while the overall cfDNA size distribution peaking at 166 base pairs is the consequence of low tumor cfDNA purity in the abundance of cfDNA from non-neoplastic origin. Detection of fragment size differences, including small differences, could provide valuable insight in the physical attributes of ctDNA, and inform cancer origin and progression.
  • Targeted NGS panels with deep sequencing have also investigated cfDNA fragment size distribution in cancer and healthy plasma samples, and found that cfDNA fragment sizes are more variable and smaller in tumor plasma, than in normal cfDNA.
  • qPCR- based methods and the described NGS -based methods differ fundamentally in their approaches towards cfDNA fragment size analysis, in that the former predefines the fragment size of interest and quantifies target signal in that range, while the latter NGS methods based on WGS or targeted hybridization capture rely on fragments of naturally occurring lengths being captured, and determine fragments lengths after capture and sequencing.
  • qPCR-based methods typically rely on the targeting of only two different fragment sizes of interest in order to calculate the DN A integrity index. As a result, data on other fragment sizes which may provide significant information in disease prediction, are not available.
  • WGS or targeted hybridization capture- based methods are able to capture a wider range of fragment sizes, the labour and cost associated with these methods limit their scalability and applications.
  • amplicon-based targeted sequencing in highly multiplex
  • the study of cfDNA fragment sizes have not been explored, and the utility of such panels has been strictly restricted to the sensitive detection of genomic alterations, including single nucleotide variants (SNVs), insertion-deletion mutations (indels), and copy number variations among others.
  • SNVs single nucleotide variants
  • Indels insertion-deletion mutations
  • Apparent limitations to the use of amplicon-based NGS assays for fragment size determination include the predetermined nature of fragment sizes captured (based on amplicon/primer design, which is in turn determined by the nature of specific genomic regions which are more suitable for the design of functional PCR primers), which do not naturally lend themselves to the range of cfDNA fragment sizes present in circulation, as described by WGS assays.
  • the present disclosure refers to a method of determining cell-free DNA (cfDNA) fragment size ratio and/or fragment size distribution in a biological sample, comprising:
  • each target region comprises a plurality of target gene sequences or variants thereof; wherein the plurality of target gene sequences comprises a plurality of short target sequences and/or a plurality of long target sequences; wherein each short target sequence comprises a short cfDNA fragment comprising a predetermined number of base pairs or less than a predetermined number of base pairs; wherein each long target sequence comprises a long cfDNA fragment comprising more than a predetermined number of base pairs; wherein each primer set comprises one forward primer and a plurality of consecutive reverse primers specific to the plurality of target gene sequences or variants thereof, wherein the forward and reverse primers of each primer set specific to the plurality of target regions, are complementary to the plurality of short target sequences and/or the plurality of long target sequences; wherein each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions, is complementary to a different part of the plurality
  • step (b) purifying the plurality of amplicons from step (a);
  • step (c) amplifying the purified product from step (b) by using universal indexed adapter primers to generate a sequencing library
  • step (e) subjecting the purified sequencing library from step (d) to multiplex sequencing on a next-generation sequencing platform to obtain a plurality of sequencing reads;
  • step (f) deriving a consensus read of each sequence from the plurality of sequencing reads obtained from step (e);
  • step (g) performing a sequence alignment of each consensus read obtained from step (f) to a reference genome
  • the present disclosure refers to a kit for determining cfDNA fragment size ratio and/or fragment size distribution in a biological sample according to the method disclosed herein, comprising a plurality of primer sets specific to a plurality of target regions as defined in step 1(a)(1) of the first aspect, and instructions for use in the method disclosed herein.
  • Figure 1 illustrates the amplicon panel-wide distribution of fragment lengths in a lung cancer sample (A) and in a healthy plasma sample (B), showing similar range of fragment sizes between 70-170 base pairs.
  • the overlay of the two histograms (C) shows the relative increase in the distribution of fragments lesser than 110 base pairs in the lung cancer sample.
  • the total panel size is 0.06 Mbp.
  • Figure 2 shows the Integrated Genome Viewer (IGV) visualization of amplicons used to capture the full length of BRCA1 exon 10 (which is 3426 base pairs in length), requiring multiple consecutive amplicons for complete capture.
  • IOV Integrated Genome Viewer
  • Figure 3 shows the distribution of fragment lengths in a lung cancer sample (A), and in a healthy plasma sample (B), showing similar range of fragment sizes in base pairs enriched for target regions with fragments of size >150 base pairs, in this example, BRCA1 and BRCA2 genes.
  • A lung cancer sample
  • B healthy plasma sample
  • FIG. 3 shows the distribution of fragment lengths in a lung cancer sample (A), and in a healthy plasma sample (B), showing similar range of fragment sizes in base pairs enriched for target regions with fragments of size >150 base pairs, in this example, BRCA1 and BRCA2 genes.
  • an overlay C
  • enrichment of short fragments in the cancer sample is highlighted by a dashed arrow
  • depletion of long fragments is highlighted by a solid arrow.
  • Figure 4 shows that size ratio of fragments is significantly different among cfDNA analyzed for plasma from healthy individuals and cancer samples, and is more variable among cancer samples. Size ratio is defined by ratio of the number of short cfDNA fragments (70-150 base pairs) compared to long cfDNA fragments (151-500 base pairs).
  • Figure 6 shows that among lung cancer cases, size ratio of cfDNA fragments is higher in metastatic cases relative to early stage (localized) lung cancer cases, showing an enrichment of small fragments ( ⁇ 150 base pairs) in late-stage disease.
  • Figure 7 shows cfDNA fragment size ratio is correlated with plasma cfDNA concentration across all samples, indicating an increase in abundance of ctDNA fragments (measurable by the size ratio) as tumor burden increases in cancer samples.
  • Figure 8 shows cfDNA fragment size ratio generally increases as the measured highest mutant allele frequency (AF%) detected from the amplicon-based NGS assay increases.
  • ROC Receiver Operator Characteristic
  • Figure 10 shows examples of design of amplicons at different regions of the genome to capture fragments of varying lengths originating at the same target region.
  • the forward primer with respect to the gene is shown with a dashed arrow and two reverse primers capturing short and long fragments, are shown with solid arrows, for two illustrative examples of regions in chromosome 1 (chrl) and chromosome 2 (chr2).
  • these amplicons are termed “fragment size amplicons” (FSA).
  • the present disclosure describes a method which enables amplicon-based targeted NGS assays or panels to perform cfDNA fragment size distribution analysis.
  • the present disclosure refers to a method of determining cell-free DNA (cfDNA) fragment size ratio and/or fragment size distribution in a biological sample, comprising:
  • each target region comprises a plurality of target gene sequences or variants thereof; wherein the plurality of target gene sequences comprises a plurality of short target sequences and/or a plurality of long target sequences; wherein each short target sequence comprises a short cfDNA fragment comprising a predetermined number of base pairs or less than a predetermined number of base pairs; wherein each long target sequence comprises a long cfDNA fragment comprising more than a predetermined number of base pairs; wherein each primer set comprises one forward primer and a plurality of consecutive reverse primers specific to the plurality of target gene sequences, wherein the forward and reverse primers of each primer set specific to the plurality of target regions, are complementary to the plurality of short target sequences and/or the plurality of long target sequences; wherein each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions, is complementary to a different part of the plurality of short target sequence
  • step (b) purifying the plurality of amplicons from step (a);
  • step (c) amplifying the purified product from step (b) by using universal indexed adapter primers to generate a sequencing library
  • step (e) subjecting the purified sequencing library from step (d) to multiplex sequencing on a next-generation sequencing platform to obtain a plurality of sequencing reads;
  • step (f) deriving a consensus read of each sequence from the plurality of sequencing reads obtained from step (e);
  • step (g) performing a sequence alignment of each consensus read obtained from step (f) to a reference genome
  • target regions already included in the amplicon panel with potential to form short and long fragments were considered for fragment size analysis (not deliberate design for fragment size study specifically), as described herein.
  • several (>30) amplicons deliberately designed throughout the genome to variably capture fragments of length 90 base pairs, 300 base pairs or other sizes would additionally provide fragment size information, as illustrated in Figure 10.
  • a common forward (arbitrary directionality) per region and two or more reverse primers would allow the capture of different fragment lengths.
  • a substantially sized panel >0.05 Mb as shown, it is possible to discern fragment size information, and sensitivity of detection is anticipated to increase with increasing panel size.
  • the disclosed method further comprises determining fragment size distribution in a biological sample, by adding into the pool of plurality of primer sets of step (a) of the first aspect:
  • each primer set comprises one forward primer and a plurality of consecutive reverse primers specific to a plurality of cfDNA fragments of predefined lengths, to generate a plurality of amplicons comprising cfDNA fragments of predefined lengths that are then subjected to steps (b)-(g) of the method of the first aspect to thereby determine the fragment size distribution of the cfDNA in the biological sample.
  • the disclosed method is used to determine cfDNA fragment size ratio in a biological sample. In another example, the method is used to determine fragment size distribution cfDNA in a biological sample. In a further example, the disclosed method is used to simultaneously determine cfDNA fragment size ratio and cfDNA fragment size distribution in a biological sample.
  • short target sequences comprise short cfDNA fragments comprising 150 base pairs or less than 150 base pairs.
  • short target sequences comprise short cfDNA fragments comprising about 70 to 149 base pairs, or about 80 to 140 base pairs, or about 90 to 130 base pairs or about 100 to 120 base pairs, or about 70 base pairs, or about 80 base pairs or about 90 base pairs, or about 100 base pairs, or about 110 base pairs, or about 120 base pairs, or about 130 base pairs, or about 140 base pairs or about 145 base pairs.
  • long target sequences comprise long cfDNA fragments comprising more than 150 base pairs.
  • long target sequences comprise long cfDNA fragments comprising about 151 base pairs to about 500 base pairs, or about 160 to about 490 base pairs, or about 170 to about 480 base pairs, or about 180 to about 470 base pairs, or about 190 to about 460 base pairs, or about 200 to about 450 base pairs, or about 210 to about 440 base pairs, or about 220 to about 430 base pairs, or about 230 to about 420 base pairs, or about 240 to about 410 base pairs, or about 250 to about 400 base pairs, or about 260 to about 390 base pairs or about 270 to about 380 base pairs, or about 280 to about 370 base pairs, or about 290 to about 360 base pairs, or about 300 to about 350 base pairs, or about 310 to about 340 base pairs, or about 320 to about 330 base pairs, or about 151 base pairs, or about 152 base pairs, or about 153 base pairs
  • the plurality of primer sets specific to a plurality of target regions are as disclosed in step (a)(1) of the disclosed method.
  • the plurality of primer sets are specific to a plurality of target regions.
  • the plurality of primer sets are specific to a plurality of target regions that comprise a plurality of target gene sequences or variants thereof.
  • “plurality” means at least two. Therefore, in one example, the plurality of primer sets comprise at least two sets of primers that are specific to a plurality of target regions.
  • the plurality of target regions comprise at least two target regions.
  • the plurality of target regions comprise a plurality of target gene sequences or variants thereof.
  • the plurality of target gene sequences or variants thereof comprise at least two target gene sequences.
  • the “target region” may be any region of a nucleic acid that is to be analyzed, and the term “target sequence” within the target region may be construed accordingly.
  • the target region may also comprise one or more exons (which may form the target sequences) and/or one or more introns.
  • the “target region” comprises genes that are housekeeping genes, non-functional genes, functional genes or genes that are related to a disease
  • the “target gene sequence” is a sequence of a housekeeping gene, non-functional gene, functional gene or a gene that is related to a disease within the “target region”.
  • the target region is BRCA1 gene.
  • the target region may be AMPD2, OPTC, CRYZ, PPOX, AFF3, RIF1, PAX3, OLA1, ALAS1, RHO, FRMD4B, SOBP, AGK, ANK1, TG, FXN, CUTC, APOA4, PTS, APOF, PAH, GCH1, PSEN1, GATM, RTF1, ARMC5, BRCA1, FECH, PIGN, ADA, PANK2, APP, SOD1, BCR, SOXIO, or APOO.
  • the primer sequences for any target region(s), such as the exemplary target region(s), can be determined by a person skilled in the art using publicly available gene databases, such as the NCBI database.
  • the primer design tool on the publicly available NCBI website can be used by the skilled person to design suitable primer sequences for the target region(s) of interest.
  • the plurality of target regions comprising a plurality of target gene sequences or variants thereof comprise a plurality of short target sequences and/or a plurality of long target sequences.
  • the plurality of short target sequences comprise at least two short target sequences.
  • the plurality of long target sequences comprise at least two long target sequences.
  • the plurality of short target sequences comprise a plurality of short cfDNA fragments comprising 150 base pairs or less than 150 base pairs.
  • the plurality of long target sequences comprise a plurality of long cfDNA fragments comprising more than 150 base pairs.
  • each primer set specific to a plurality of target regions comprises one forward primer and a plurality of consecutive reverse primers specific to the plurality of target gene sequences.
  • each primer set specific to a plurality of target regions comprises one forward primer and at least two consecutive reverse primers specific to the plurality of target gene sequences.
  • each primer set specific to a plurality of target regions comprises one forward primer and two or three consecutive reverse primers specific to the plurality of target gene sequences.
  • each primer set specific to a plurality of target regions comprises one forward primer and two consecutive reverse primers specific to the plurality of target gene sequences.
  • each primer set specific to a plurality of target regions comprises one forward primer and three consecutive reverse primers specific to the plurality of target gene sequences.
  • each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions is complementary to a different part of the plurality of short target sequence and/or long target sequence originating from the same or different target region.
  • each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions is complementary to a different part of the plurality of short target sequence originating from the same target region.
  • each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions is complementary to a different part of the plurality of long target sequence originating from the same target region.
  • each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions is complementary to a different part of the plurality of short target sequence originating from different target region.
  • each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions is complementary to a different part of the plurality of long target sequence originating from different target region.
  • the plurality of primer sets described in (II) above are specific to a plurality of cfDNA fragments of predefined lengths of the disclosed method.
  • the plurality of cfDNA fragments of predefined lengths are different from the plurality of short cfDNA fragments and plurality of long cfDNA fragments described in (I) of the first aspect.
  • the plurality of primer sets described in (II) above specific to a plurality of cfDNA fragments of predefined lengths comprise more at least two sets.
  • the plurality of cfDNA fragments of predefined lengths comprise at least two cfDNA fragments of predefined lengths.
  • each primer set specific to a plurality of cfDNA fragments of predefined lengths comprises one forward primer and a plurality of consecutive reverse primers specific to a plurality of cfDNA fragments of predefined lengths.
  • each primer set specific to a plurality of cfDNA fragments of predefined lengths comprises one forward primer and at least two consecutive reverse primers specific to a plurality of cfDNA fragments of predefined lengths.
  • each primer set specific to a plurality of cfDNA fragments of predefined lengths comprises one forward primer and two or three consecutive reverse primers specific to a plurality of cfDNA fragments of predefined lengths.
  • each primer set specific to a plurality of cfDNA fragments of predefined lengths comprises one forward primer and two consecutive reverse primers specific to a plurality of cfDNA fragments of predefined lengths.
  • each primer set specific to a plurality of cfDNA fragments of predefined lengths comprises one forward primer and three consecutive reverse primers specific to a plurality of cfDNA fragments of predefined lengths.
  • each forward primer of the plurality of primer sets comprises a barcode sequence on its 5’ end, wherein each barcode sequence is different.
  • barcode sequence refers to an encoded molecule or barcode that includes variable amount of information within the nucleic acid sequence.
  • the barcode sequence is a tag that can be read out using any of a variety of sequence identification techniques, for example, nucleic acid sequencing, probe hybridization-based assay, and the like.
  • the barcode sequence allows the pooled analysis of multiple unique target sequences, where the resulting sequence information from the pool can be later attributed back to each starting target sequence. That is, after the process of amplification, the barcode sequence is used to group amplicons to form a family of amplicons having the same barcode sequence. In some examples, the barcode sequence is an overhang that does not complement any sequence within the target region. As each forward primer carries on its 5’ end a randomly assigned barcode sequence as disclosed herein, the barcode sequence allows individual cfDNA molecules to be tagged uniquely in the step of sequencing library formation.
  • the barcode sequence is an oligonucleotide comprising 10 to 16 random nucleotides, or 10 to 15 random nucleotides, or 10 to 13 random nucleotides, or 10 random nucleotides, or 11 random nucleotides, or 12 random nucleotides, or 13 random nucleotides, or 14 random nucleotides, or 15 random nucleotides, or 16 random nucleotides.
  • the barcode sequence is an oligonucleotide comprising 10 to 16 random nucleotides.
  • the barcode sequence is an oligonucleotide comprising 10 random nucleotides.
  • the barcode sequence is an oligonucleotide comprising 10 random nucleotides which can be represented as NNNNNNNNNN (SEQ ID NO: 4).
  • the biological sample containing cfDNA is a bodily fluid selected from the group consisting of blood, bone marrow, cerebral spinal fluid, peritoneal fluid, pleural fluid, lymph fluid, ascites, serous fluid, sputum, lacrimal fluid, stool, urine, saliva, ductal fluid from breast, gastric juice, and pancreatic juice.
  • the bodily fluid is blood.
  • the blood is plasma.
  • the biological sample is obtained from a subject having and/or suspected of having a disease.
  • the disease is cancer.
  • the disease is other pathological conditions.
  • the cancer is selected from the group consisting of leukemia, lung cancer, colorectal cancer, breast cancer, pancreatic cancer, prostate cancer, nasopharyngeal cancer, liver cancer, cholangiocarcinoma, esophageal cancer, urothelial cancer, and gastrointestinal cancer.
  • the cancer is selected from the group consisting of pancreatic cancer, prostate cancer, breast cancer, lung cancer, colorectal cancer and liver cancer.
  • the pathological conditions include viral infections and neurological disorders.
  • the disease is an early-stage disease.
  • the disease is a late-stage disease.
  • an early- stage disease in the context of cancer is a localized cancer that has not spread into surrounding tissues.
  • an early-stage disease is stage 0 cancer or stage I cancer or stage II cancer.
  • a late-stage disease in the context of cancer is a cancer that has spread to distant tissue or organs.
  • a late-stage disease is stage III or stage IV cancer.
  • a late-stage disease is metastatic cancer.
  • the cfDNA present in the biological sample is tumor-derived cfDNA (ctDNA) and fetal-derived cfDNA.
  • the cfDNA is ctDNA.
  • a plurality of multiplexed PCR reactions are performed on cfDNA present in the biological sample as disclosed in step (a) of the first aspect, using a plurality of primer sets specific to a plurality of target regions as disclosed in (a)(1) and/or a plurality of primer sets specific to a plurality of cfDNA fragments of predefined lengths as disclosed in (a)(II), wherein the plurality of primer sets specific to a plurality of target regions differ from that of a plurality of primer sets specific to a plurality of cfDNA fragments of predefined lengths.
  • the plurality of multiplexed PCR reactions are performed using a plurality of primer sets specific to a plurality of target regions as disclosed in step (a)(1) of the method of the first aspect.
  • the plurality of multiplexed PCR reactions are performed using a plurality of primer sets specific to a plurality of target regions as disclosed in step (a)(1) of the method of the first aspect and a plurality of primer sets specific to a plurality of cfDNA fragments of predefined lengths.
  • the multiplexed amplicon-based panel covers 5-500 genes. In one example, the multiplexed amplicon-based panel has a total panel size of 0.01 Mb to 0.5 Mb. In one example, the multiplexed amplicon-based panel covers 5-500 genes and has a total panel size of 0.01 Mb to 0.5 Mb. In another example, the multiplexed amplicon-based panel covers 100 genes and has a total panel size of at least 0.06 Mb.
  • the plurality of multiplexed PCR reactions performed on the cfDNA comprises 3 to 15 PCR cycles.
  • the PCR reactions comprise 3 PCR cycles.
  • the PCR reactions comprise 4 PCR cycles.
  • the PCR reactions comprise 5 PCR cycles.
  • the PCR reactions comprise 6 PCR cycles.
  • the PCR reactions comprises 7 PCR cycles.
  • the PCR reaction comprise 8 PCR cycles.
  • the PCR reactions comprise 9 PCR cycles.
  • the PCR reactions comprise 10 PCR cycles.
  • the PCR reactions comprise 11 PCR cycles.
  • the PCR reactions comprise 12 PCR cycles.
  • the PCR reactions comprise 13 PCR cycles.
  • the PCR reactions comprise 14 PCR cycles.
  • the PCR reactions comprise 15 PCR cycles.
  • the cfDNA fragments of predefined length to be captured using primer sets of (II) comprise a length of about 70 base pairs to about 350 base pairs, or about 80 base pairs to about 340 base pairs, or about 90 base pairs to about 330 base pairs, or about 100 base pairs to about 320 base pairs, or about 110 base pairs to about 310 base pairs, or about 120 base pairs to about 300 base pairs, or about 130 base pairs to about 290 base pairs, or about 140 base pairs to about 280 base pairs, or about 150 base pairs to about 270 base pairs, or about 160 base pairs to about 260 base pairs, or about 170 base pairs to about 250 base pairs, or about 180 base pairs to about 240 base pairs, or about 190 base pairs to about 230 base pairs, or about 200 base pairs to about 220 base pairs, or about 70 base pairs, or about 80 base pairs, or about 90 base pairs, or about 100 base pairs, or about 110 base pairs, or about 120 base pairs, or about 130 base pairs, or about 140 base pairs, or about 150 base pairs
  • the cfDNA fragments of predefined length to be captured using primer sets of (II) comprise a length of about 90 base pairs, or about 300 base pairs. In one example, the cfDNA fragments of predefined length to be captured using primer sets of (II) comprise a length of about 90 base pairs. In another example, the cfDNA fragments of predefined length to be captured using primer sets of (II) comprise a length of about 300 base pairs.
  • the number of cfDNA fragments of predefined length to be captured by the plurality of primer sets of (II) is from about 10 to about 100, or about 20 to about 90, or about 30 to about 80, or about 40 to about 70, or about 50 to about 60, or about 10 or about 20, or about 30, or about 40, or about 50, or about 60, or about 70, or about 80, or about 90, or about 100.
  • the number of DNA fragments of predefined length to be captured by the plurality of primer sets of (II) is at least 30.
  • the plurality of amplicons generated from the plurality of primer sets of step (a)(1) comprise about 70 to 170 base pairs, or about 80 to 160 base pairs, or about 90 to 150 base pairs, or about 100 to 140 base pairs, or about 110 to 130 base pairs, or about 70 base pairs, or about 80 base pairs, or about 90 base pairs, or about 100 base pairs, or about 110 base pairs, or about 120 base pairs, or about 130 base pairs, or about 140 base pairs, or about 150 base pairs, or about 160 base pairs, or about 170 base pairs.
  • the plurality of amplicons derived from the cfDNA in present in the biological sample are then purified, as disclosed in step (b) of the method of the first aspect.
  • the method of the present disclosure is designed to involve size-based separation (for example, magnetic bead-based separation) of smaller primer dimer artefacts to be removed and desired products to be retained.
  • size-based separation for example, magnetic bead-based separation
  • the purification of cfDNA is performed using an agent such as paramagnetic beads.
  • the paramagnetic beads are selected from the group consisting of AMPure XP beads, SPRI beads, and Dynabeads.
  • the purified plurality of amplicons is amplified using universal indexed adapter primers to generate a plurality of sequencing library, as disclosed in step (c) of the method of the first aspect.
  • the amplification is performed using KAPA Hifi HotStart ReadyMix, Phusion U Hot Start DNA Polymerase (Thermo Scientific), ZymoTaq DNA Polymerase (Zymo Research) and Q5U Hot Start High-Fidelity DNA Polymerase (NEB), etc.
  • each universal indexed adapter primer as disclosed in step (c) comprises an adapter sequence.
  • the term “adapter sequence” refers to an oligonucleotide sequence bound to the 5’ and 3’ end of each DNA fragment in a sequencing library.
  • the adapter sequences are complementary to the plurality of oligonucleotides present on the surface of the flow cells of the sequencing tools thereby allowing the DNA fragment to attach to the sequencing tool.
  • an adapter sequence allows for the sequencing of the oligonucleotide of interest. Sequencing platform specific adapter sequences are known in the art, and include, for example, the Illumina P5/P7 adapter sequences.
  • the universal indexed adapter primers as disclosed in step (c) of the method of the first aspect comprise: a forward primer comprising the sequence of
  • ACGCTCTTCCGATC*T (SEQ ID NO: 5); and a reverse primer comprising the sequence of
  • the plurality of sequencing library formed is then purified, as disclosed in step (d) of the method of the first aspect.
  • the purification of the plurality of sequencing library is performed using an agent such as paramagnetic beads.
  • the paramagnetic beads are selected from the group consisting of AMPure XP beads, SPRI beads, and Dynabeads.
  • the plurality of purified sequencing library from step (d) are sequenced on a NGS platform to obtain a plurality of sequencing reads as disclosed in step (e) of the method of the first aspect.
  • the NGS platform is Nextseq 550, NovaSeq 6000, BGI MGISEQ-2000, DNBSEQ-G400 or DNBSEQ-T7.
  • a plurality of consensus reads is derived from each sequence of the plurality of sequencing reads obtained from step (e), as disclosed in step (f) of the method of the first aspect.
  • the derived consensus reads are aligned to a reference genome, as disclosed in step (h) of the method of the first aspect.
  • reference genome refers to DNA sequences known in the art that may be obtainable from public databases.
  • consensus read refers to a nucleotide sequence obtained from consensus calling.
  • consensus calling is performed by identifying the nucleotide at each position for each sequencing result within the subgroup, comparing the identity for the nucleotide at each position across the plurality of sequencing results, and determining a majority nucleotide at each position. If the majority nucleotide count is above a threshold set for determining majority for a specific position, the assignment for said position is the majority nucleotide. If the majority nucleotide count is below this threshold, no assignment is made for said position.
  • the threshold is variable for every position and is a function of the total number of sequencing results corresponding to a specific position.
  • step (g) of the disclosed method further comprises, if the sequence alignment results in a partial alignment to the reference genome of a short target region then: (i) determining the sequence alignment as a consensus read of a short cfDNA fragment (a cfDNA fragment comprising 150 base pairs or less than 150 base pairs), and (ii) counting/enumerating the consensus read pairs of a short cfDNA fragment.
  • step (g) of the disclosed method further comprises, if the sequence alignment results in a partial alignment to the reference genome of a long target region then: (i) determining the sequence alignment as a consensus read of a long cfDNA fragment (a cfDNA fragment comprising more than 150 base pairs), and (ii) counting/enumerating the consensus read pairs of a long cfDNA fragment.
  • the consensus read pairs are used to count the numbers of fragments, corresponding to specific primer target sets.
  • the numbers of fragments are then used to determine the cfDNA fragment size ratio and/or fragment size distribution in the biological sample, wherein: (i) the fragment size ratio is defined as the number of short cfDNA fragments to the number of long cfDNA fragments; and (ii) the fragment size distribution is derived by determining the number of fragments from an entire sample library or a subset of the sample library, and determining the size ranges of each fragment obtained. This results in a distribution of size ranges (i.e., the fragment size distribution).
  • step (g) of the disclosed method further comprises visualisation of amplicons.
  • the visualisation is performed using Integrated Genome Viewer, Savant Genome Browser, etc.
  • the disclosed method comprises the use of Bioinformatic tools to capture fragment sizes from alignment files for each sample obtained from step (g).
  • the Bioinformatic tool used to capture fragment size from alignment files for each sample is the publicly available tool, Sequence Alignment Map (SAMtools).
  • SAMtools Sequence Alignment Map
  • the SAMtools specification is as follows:
  • the consensus read pairs with fragment size of 50 to 1000 are retained to obtain a representation of expected fragment sizes.
  • the disclosed method involves selection of target regions and binning fragments into size ranges.
  • the fragments mapping to specific targeted regions are extracted based on genomic location. This is done to enrich for fragments longer than 200 base pairs.
  • the fragment size ratio is then derived as number of cfDNA fragment comprising 150 base pairs or less than 150 base pairs over cfDNA fragment comprising more than 150 base pairs. This represents the approximate abundance of “short” fragments relative to “long” fragments.
  • the method of the present disclosure involves determining that the subject has a disease if the fragment size ratio of cfDNA fragments in the biological sample obtained from the subject is higher or lower than the fragment size ratio of cfDNA fragments in a control sample (e.g. a sample obtained from a healthy subject).
  • the method of the present disclosure involves determining that the subject has cancer if the fragment size ratio of cfDNA fragments in the biological sample obtained from the subject is higher than the fragment size ratio of cfDNA fragments in a control sample (e.g. a sample obtained from a healthy subject).
  • the fragment size ratio of cfDNA fragments in the control sample obtained from the healthy subject is about 40 or less.
  • a healthy subject is an individual free from any known diagnosed disease, and has no significant health-related issues.
  • the plurality of primer sets used to determine if a subject has a disease, and the plurality of primer sets used to determine if a subject is disease-free is the same.
  • the method of the present disclosure further comprises determining that the disease is an early-stage disease if the fragment size ratio of cfDNA fragments in the biological sample is higher or lower than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late- stage of the disease, and determining that the disease is a late-stage disease if the fragment size ratio of cfDNA fragments in the biological sample is approximately the same as, higher or lower than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the disease.
  • the method of the present disclosure further comprises determining that the disease is an early-stage disease if the fragment size ratio of cfDNA fragments in the biological sample is higher than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the disease, and determining that the disease is a late-stage disease if the fragment size ratio of cfDNA fragments in the biological sample is approximately the same as or lower than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the disease.
  • the method of the present disclosure further comprises determining that the disease is an early-stage disease if the fragment size ratio of cfDNA fragments in the biological sample is lower than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the disease, and determining that the disease is a late- stage disease if the fragment size ratio of cfDNA fragments in the biological sample is approximately the same as or higher than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the disease.
  • the method of the present disclosure further comprises determining that the cancer is an early-stage cancer if the fragment size ratio of cfDNA fragments in the biological sample is lower than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the cancer, and determining that the cancer is a late-stage cancer if the fragment size ratio of cfDNA fragments in the biological sample is approximately the same as or higher than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the cancer.
  • the disease is a non-cancer disease.
  • an early-stage non-cancer disease may have a fragment size ratio of cfDNA fragments in the biological sample which is higher or lower than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the non-cancer disease.
  • the fragment size ratios obtained using the disclosed method is correlated with clinical features such as cancer, type of cancer, plasma cfDNA concentration and mutant allelic frequency.
  • the correlation is performed using GraphPad Prism 8.0.1, RStudio (1.2.5033), pROC package in R, plotROC package in R, ROCR package in R, etc.
  • the present disclosure refers to a kit for determining cfDNA fragment size ratio and/or fragment size distribution in a biological sample according to the method of the first aspect, comprising a plurality of primer sets specific to a plurality of target regions as defined in step (a)(1) of the method of the first aspect, and instructions for use in the disclosed method.
  • the present disclosure refers to a kit for determining cfDNA fragment size distribution in a biological sample according to the method disclosed herein, further comprising a plurality of primer sets specific to a plurality of cfDNA fragments of predefined lengths as defined in the method disclosed herein and instructions for use in the disclosed method.
  • the kit further comprises a buffer for performing a plurality of multiplexed PCR reactions, a DNA polymerase, a plurality of deoxy nucleoside triphosphates (dNTPs), and a reagent capable of removing excess primers.
  • the reagents provided in the kit as described herein may be provided in separate containers comprising the components independently distributed in one or more containers. As the method as described herein relates to sequencing (such as high-throughput sequencing), further components required in sequencing process could be easily determined by the person skilled in the art.
  • a primer includes a plurality of primers, including mixtures and combinations thereof.
  • the terms “increase” and “decrease” refer to the relative alteration of a chosen trait or characteristic in a subset of a population in comparison to the same trait or characteristic as present in the whole population. An increase thus indicates a change on a positive scale, whereas a decrease indicates a change on a negative scale.
  • the term “change”, as used herein, also refers to the difference between a chosen trait or characteristic of an isolated population subset in comparison to the same trait or characteristic in the population as a whole. However, this term is without valuation of the difference seen.
  • the term “about” in the context of concentration of a substance, size of a substance, length of time, or other stated values means +/- 5% of the stated value, or +/- 4% of the stated value, or +/- 3% of the stated value, or +/- 2% of the stated value, or +/- 1% of the stated value, or +/- 0.5% of the stated value.
  • range format may be disclosed in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosed ranges. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
  • TGGAACCTACTTCATTAATATTGCT SEQ ID NO: 1.
  • An example of a suitable “reverse primer” when the target region is BRCA1 is as follows: CATTCAATGTCACCTGAAAGAGAAA (SEQ ID NO: 2).
  • An example of yet another suitable “reverse primer” when the target region is BRCA1 is as follows: GAAAGATAAGCCAGTTGATAATGCC (SEQ ID NO: 3).
  • Plasma cfDNA samples from 115 cancer cases and 21 healthy cases were subjected to amplicon-based NGS testing. Mutation analysis was done routinely.
  • Fragments mapping to specific targeted regions was extracted based on genomic location. This was done to enrich for fragments longer than 200 base pairs. Regionally enriched fragments were categorized into 0-150 base pairs and 151-500 base pairs ranges. The fragment size ratio was then derived as (number of fragments of size ⁇ 150base pairs )/(number of fragments of size >150base pairs). This represents the approximate abundance of “short” fragments relative to “long” fragments.
  • Fragment size ratios were correlated with clinical features, such as cancer, type of cancer, plasma cfDNA concentration, mutant allelic frequency. Pearson correlation analysis was performed for pairwise correlation and unpaired t-test analysis was done for group-wise analysis using GraphPad Prism 8.0.1. RStudio (1.2.5033) was used for logistic regression modelling and ROC analysis was done with ROCR package in R.
  • the present disclosure describes a methodology which enables amplicon-based targeted NGS assays/panels to perform cfDNA fragment size distribution analysis.
  • This approach has parallels with qPCR-based fragment size analysis methods, and relies on the readout of read counts from NGS, but importantly, in contrast to qPCR is enabled on a highly multiplex scale approaching total panel sizes of 0.01-0.5 Mb, and continues to yield mutational information simultaneous to fragment size information.
  • the method described herein is able to characterize cancers tumors from normal samples and further show different cancer types are characterized by different fragment size distributions.
  • An amplicon-based NGS panel of size -0.06 Mb was used to sequence plasma cfDNA from 115 cancer samples (including 25 early-stage cancers) and 21 normal samples (from healthy individuals). Routine detection of genomic alterations was performed. It was reasoned that the overall distribution of cfDNA fragments would be subject to variations among samples in line with the prevailing knowledge for WGS studies. For an amplicon-based NGS assay, this variation would reflect as increasing or decreasing read counts within predetermined fragment length size bins (representing the sizes of fragments expected to be captured due to the amplicon design).
  • Consecutive design of amplicons allows the capture of longer amplicons (from subsequence of primers), if target cfDNA fragment of larger size exist in the sample.
  • cfDNA fragment size distribution for such target regions with consecutive amplicon design are shown in Figure 3 in a lung cancer sample (A) and in a normal plasma (B), highlighting both the relative increase in short fragments and decrease in long fragments in cancer sample, therefore representing an overall shift to the smaller fragment sizes in cancer sample (Figure 3C).
  • Plasma cfDNA concentration (cfDNA per ml plasma) is a known prognostic and predictive biomarker in multiple cancer types.
  • cfDNA concentration potentially reflects the tumour burden and the aggressiveness potential of the disease. Therefore, the fragment size ratio was compared to plasma cfDNA concentration among all samples, as a proxy of tumor burden and a significant increasing trend of size ratio (p ⁇ 0.0001) with increasing plasma cfDNA concentration was noted ( Figure 7).
  • the most important features of the method disclosed herein include (1) a highly multiplexed amplicon-based NGS panel, (2) consecutive design of amplicons for full capture of large target regions and (3) determination of appropriate format size ranges to compare for determination of size ratios.
  • PCR-based and even hybridization-based cfDNA profiling techniques are developed for the detection of particular DNA modifications, making them unsuitable for generalizable exploratory analysis.
  • size profiling of plasma cfDNA with wholegenome sequencing was shown to follow a fragment size pattern that depended on whether plasma was from a healthy individual or from a cancer patient, and on the cancer type. In general, there was an enrichment of fragments in the range of 90-150 base pairs in plasma from cancer patients.
  • the present disclosure looks to extract cfDNA fragment size information - shown to be relevant by qPCR and WGS for the characterization of cfDNA from cancer samples relative to healthy sample - from a highly multiplexed amplicon-based NGS assay, while not sacrificing or compromising the accompanying detection sensitivity advantages afforded by ampliconbased NGS.
  • the method described herein is generalizable as part of the NGS assay process which leads to the detections of mutations and copy number changes, and other genomic alterations, and does not require additional sample or laboratory manipulations.
  • the method described in the present disclosure does not require the same sequencing resources as 4x WGS to capture fragment size information, and at the same time allows for the sensitive capture of genomic alterations down to 0.1% variant allele frequency.
  • WGS is only generally informative of mutations when ctDNA content is -10% or greater, and hence not generally suitable for the purpose of detection of genomic alterations in the cfDNA setting.
  • the method described herein is able to simultaneously establish fragment size patterns and perform the primary intended function of detection of genomic alterations sensitively.
  • WGS -based methods or qPCR-based methods meant for fragment size analysis would not afford the same genomic alteration ability, necessitating the performing of a separate analytical test (requiring additional sample, cost, time and manpower) to additionally gather information on genomic alterations.
  • the disclosed method showed that these features can be used to extract cfDNA fragment size measures that are correlated with clinical features of cancers and healthy plasma samples (as presented in past studies using different approaches to measuring cfDNA fragment sizes), and can be incorporated as a feature to detect presence of cancer.
  • This provides validation of the utility of the present disclosure.
  • the features of the present disclosure can be applied to any amplicon-based NGS panel for fragment size analysis, and is not limited to findings from the particular panel described herein. It is envisioned that the features of this disclosure will be routinely employed as part of diagnostic testing using amplicon-based NGS assays designed primarily for the detection of genomic alterations, without any additional cost or time.
  • the disclosed method can be used to independently estimate the presence of cancer, and quantitate the tumor fraction in circulation, based on the fragment size ratios measured. Further, it is envisioned, that additional features derived from specific size ranges (beyond the two ranges used currently, ⁇ 150base pairs and >150base pairs), can be used to inform cancer type in the setting of early cancer detection.
  • the method of the present disclosure uses multiplexed amplicon-based NGS assays comprising large panels (>0.05 Mb) with predetermined amplicon sizes with a range of amplicon sizes from 70-170 bp. Such panel design allows for standard sequencing and are compatible with cfDNA-like material thereby allowing fragment size analysis. 2.
  • the method of the present disclosure allows for determination of both fragment size ratio and fragment size distribution, and simultaneous detection of significant mutations which could occur anywhere through target regions, for the characterization of cfDNA from cancer samples relative to healthy samples using a multiplexed amplicon-based NGS method with high sensitivity, without additional steps, sampling, cost, time and manpower.
  • the method of the present disclosure allows for determination of fragment size ratio and fragment size distribution without compromising the inherent functions of amplicon-based NGS, such as detection of genomic alterations.
  • the technological significance lies in the consecutive design of amplicons for full capture of large target regions in a highly multiplexed amplicon-based NGS panel, which allows the detection of short and long fragments to be considered for fragment size analysis and simultaneous detection of significant mutations in which could occur anywhere through target regions comprising long coding exons of genes.
  • fragment size amplicons are deliberately designed throughout the genome to variably capture fragments with length of 90 base pairs, 300 base pairs or other sizes, which would additionally provide fragment size information. Such amplicons are additional to those designed for target regions forming part of the panel primarily of interest for detection of genomic alterations, which can continue to be simultaneously detected.
  • the method of the present disclosure allows for determination of appropriate fragment size ranges to derive size ratios using a targeted multiplex NGS panel.
  • the analysis of fragment size ratio is another feature measurable by the amplicon-based NGS assay, providing information on the global physical state of cfDNA, which has been shown to be highly informative of cancer origin and progression.
  • leveraging yet another biological property of cfDNA i.e. its fragmentation profile was investigated in order to improve the detection of ctDNA specifically, and by inference, the presence of cancer.

Abstract

Disclosed is a method of determining cell-free DNA (cfDNA) fragment size ratio and/or fragment size distribution in a biological sample. Also disclosed is a kit for determining cfDNA fragment size ratio and/or fragment size distribution in a biological sample.

Description

METHOD FOR DETERMINING CFDNA FRAGMENT SIZE RATIO AND FRAGMENT SIZE DISTRIBUTION
FIELD OF INVENTION
[0001] The present disclosure relates to the detection of fragment size ratio and fragment size distribution of nucleic acid. In particular, the present disclosure relates to the determination of fragment size ratio and fragment size distribution of cfDNA.
BACKGROUND
[0002] Cell-free DNA (cfDNA) is composed of short DNA fragments found in plasma, urine and other bodily fluids, while circulating tumor (ctDNA) is a subset of cfDNA of tumor origin. Circulating cell-free DNA fragments typically range between 120-220 base pairs in size, or multiples of this size range, with a maximum peak at 167 base pairs. This pattern coincides with the length of DNA wrapped around a single nucleosome, plus a short stretch of ~20 base pairs (linker DNA) bound to a histone Hl. Fragmentation patterns of normal cfDNA are therefore the result of nucleosomal DNA patterns that reflect the chromatin structure of normal blood cells. Meanwhile, circulating DNA derived from tumor origin (circulating tumor DNA (ctDNA)), and fetal derived cfDNA are commonly thought to have higher degrees of fragmentation than cfDNA that is shed by non-neoplastic or maternal tissues, respectively. Tumor-derived ctDNAs have been shown to be variable in size and are in general shorter than normal cfDNAs in healthy people. Interestingly, while tumor cfDNA fragments are shortened, they still retain the peak at 166 base pairs within the size distribution. The prevailing understanding is that the modal size of tumor cfDNA is between 130 base pairs and 150 base pairs, while the overall cfDNA size distribution peaking at 166 base pairs is the consequence of low tumor cfDNA purity in the abundance of cfDNA from non-neoplastic origin. Detection of fragment size differences, including small differences, could provide valuable insight in the physical attributes of ctDNA, and inform cancer origin and progression.
[0003] Indeed, in the past, the quantitation and comparison of cfDNA fragment sizes in clinical plasma or serum specimens have been demonstrated to predict cancer progression, distinguish healthy subjects from colorectal cancer (CRC) patients, to enhance the detection of tumor DNA, and genome-wide fragmentation cfDNA profiles have been applied to classify healthy vs. cancer, in the setting of screening and early detection. Notably, the earliest methods of cfDNA size assessment used quantitative PCR with primers deliberately designed to target variable lengths of DNA fragments.
[0004] Targeted NGS panels with deep sequencing (example, 382 genes, 58 genes), in which target capture was done using hybridization-based methods, have also investigated cfDNA fragment size distribution in cancer and healthy plasma samples, and found that cfDNA fragment sizes are more variable and smaller in tumor plasma, than in normal cfDNA. qPCR- based methods and the described NGS -based methods (WGS or targeted by hybridization capture) differ fundamentally in their approaches towards cfDNA fragment size analysis, in that the former predefines the fragment size of interest and quantifies target signal in that range, while the latter NGS methods based on WGS or targeted hybridization capture rely on fragments of naturally occurring lengths being captured, and determine fragments lengths after capture and sequencing. As discussed above, qPCR-based methods typically rely on the targeting of only two different fragment sizes of interest in order to calculate the DN A integrity index. As a result, data on other fragment sizes which may provide significant information in disease prediction, are not available. On the other hand, although WGS or targeted hybridization capture- based methods are able to capture a wider range of fragment sizes, the labour and cost associated with these methods limit their scalability and applications.
[0005] Highly multiplexed amplicon-based target sequencing offers a highly scalable, sensitive approach for the detection of genomic alterations and are deployed for plasma cfDNA characterization, similar to the application of hybridization-based target capture. Broadly, it has been argued that plasma cfDNA sequencing approaches that focus only on genomic alterations do not take advantage of the potential differences in chromatin organization or fragment sizes of ctDNA, which are progressively being attributed to have a role in identification of tumor versus normal condition. In the context of amplicon-based targeted sequencing (in highly multiplex), the study of cfDNA fragment sizes have not been explored, and the utility of such panels has been strictly restricted to the sensitive detection of genomic alterations, including single nucleotide variants (SNVs), insertion-deletion mutations (indels), and copy number variations among others. Apparent limitations to the use of amplicon-based NGS assays for fragment size determination include the predetermined nature of fragment sizes captured (based on amplicon/primer design, which is in turn determined by the nature of specific genomic regions which are more suitable for the design of functional PCR primers), which do not naturally lend themselves to the range of cfDNA fragment sizes present in circulation, as described by WGS assays.
[0006] Thus, there is a need to provide a method to determine fragment size ratio and fragment size distribution of cfDNA in a biological sample that overcomes the limitations described above. There is a need to provide a method to detect a broader range of cfDNA fragment sizes present in the biological sample, without compromising the inherent functions of ampliconbased NGS, such as detection of genomic alterations.
SUMMARY
[0007] In a first aspect, the present disclosure refers to a method of determining cell-free DNA (cfDNA) fragment size ratio and/or fragment size distribution in a biological sample, comprising:
(a) performing a plurality of multiplexed PCR reactions on cfDNA present in the biological sample using:
(I) a plurality of primer sets specific to a plurality of target regions; wherein each target region comprises a plurality of target gene sequences or variants thereof; wherein the plurality of target gene sequences comprises a plurality of short target sequences and/or a plurality of long target sequences; wherein each short target sequence comprises a short cfDNA fragment comprising a predetermined number of base pairs or less than a predetermined number of base pairs; wherein each long target sequence comprises a long cfDNA fragment comprising more than a predetermined number of base pairs; wherein each primer set comprises one forward primer and a plurality of consecutive reverse primers specific to the plurality of target gene sequences or variants thereof, wherein the forward and reverse primers of each primer set specific to the plurality of target regions, are complementary to the plurality of short target sequences and/or the plurality of long target sequences; wherein each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions, is complementary to a different part of the plurality of short target sequence and/or long target sequence originating from the same or different target region; to generate a plurality of amplicons comprising cfDNA fragments originating from the plurality of short target sequences and/or long target sequences;
(b) purifying the plurality of amplicons from step (a);
(c) amplifying the purified product from step (b) by using universal indexed adapter primers to generate a sequencing library;
(d) purifying the sequencing library obtained from step (c);
(e) subjecting the purified sequencing library from step (d) to multiplex sequencing on a next-generation sequencing platform to obtain a plurality of sequencing reads;
(f) deriving a consensus read of each sequence from the plurality of sequencing reads obtained from step (e);
(g) performing a sequence alignment of each consensus read obtained from step (f) to a reference genome,
(A) if the sequence alignment results in a partial alignment to the reference genome of a short target sequence then:
(i) determining the sequence alignment as a consensus read of a short cfDNA fragment (a cfDNA fragment comprising a predetermined number of base pairs or less than the predetermined number of base pairs), and
(ii) counting/enumerating the consensus read pairs of a short cfDNA fragment;
(B) if the sequence alignment results in a partial alignment to the reference genome of a long target sequence then:
(i) determining the sequence alignment as a consensus read of a long cfDNA fragment (a cfDNA fragment comprising more than the predetermined number of base pairs), and
(ii) counting/enumerating the consensus read pairs of a long cfDNA fragment; to thereby determine the cfDNA fragment size ratio and/or fragment size distribution in the biological sample, wherein the fragment size ratio is defined as the number of short cfDNA fragments to the number of long cfDNA fragments.
[0008] In a second aspect, the present disclosure refers to a kit for determining cfDNA fragment size ratio and/or fragment size distribution in a biological sample according to the method disclosed herein, comprising a plurality of primer sets specific to a plurality of target regions as defined in step 1(a)(1) of the first aspect, and instructions for use in the method disclosed herein.
BRIEF DESCRIPTION OF DRAWINGS
[0009] The disclosure will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which:
[0010] Figure 1 illustrates the amplicon panel-wide distribution of fragment lengths in a lung cancer sample (A) and in a healthy plasma sample (B), showing similar range of fragment sizes between 70-170 base pairs. The overlay of the two histograms (C) shows the relative increase in the distribution of fragments lesser than 110 base pairs in the lung cancer sample. The total panel size is 0.06 Mbp.
[0011] Figure 2 shows the Integrated Genome Viewer (IGV) visualization of amplicons used to capture the full length of BRCA1 exon 10 (which is 3426 base pairs in length), requiring multiple consecutive amplicons for complete capture. Relatively, exons 9 and 11 which are only 77 and 89 base pairs in length, respectively, can be targeted by a single amplicon designed to completely flank the target region with insert size of about 110 base pairs, optimized for primer design and functionality in a multiplex reaction.
[0012] Figure 3 shows the distribution of fragment lengths in a lung cancer sample (A), and in a healthy plasma sample (B), showing similar range of fragment sizes in base pairs enriched for target regions with fragments of size >150 base pairs, in this example, BRCA1 and BRCA2 genes. In an overlay (C), enrichment of short fragments in the cancer sample is highlighted by a dashed arrow, and depletion of long fragments is highlighted by a solid arrow.
[0013] Figure 4 shows that size ratio of fragments is significantly different among cfDNA analyzed for plasma from healthy individuals and cancer samples, and is more variable among cancer samples. Size ratio is defined by ratio of the number of short cfDNA fragments (70-150 base pairs) compared to long cfDNA fragments (151-500 base pairs).
[0014] Figure 5 shows that size ratio of cfDNA fragments is higher in cancer samples relative to healthy plasma samples. The enrichment of small fragments <150base pairs is greatest in pancreatic, colorectal cancer and cholangiocarcinoma samples. ChC = cholangiocarcinoma. [0015] Figure 6 shows that among lung cancer cases, size ratio of cfDNA fragments is higher in metastatic cases relative to early stage (localized) lung cancer cases, showing an enrichment of small fragments (<150 base pairs) in late-stage disease.
[0016] Figure 7 shows cfDNA fragment size ratio is correlated with plasma cfDNA concentration across all samples, indicating an increase in abundance of ctDNA fragments (measurable by the size ratio) as tumor burden increases in cancer samples.
[0017] Figure 8 shows cfDNA fragment size ratio generally increases as the measured highest mutant allele frequency (AF%) detected from the amplicon-based NGS assay increases.
[0018] Figure 9 is a Receiver Operator Characteristic (ROC) analysis comparing the classification of plasma samples from patients with cancer (n=115) and plasma samples from healthy controls (n=21); using size ratio only yielded an area under curve (AUC) of 0.84 (A), and using size ratio and plasma cfDNA concentration yielded an AUC of 0.93 (B).
[0019] Figure 10 shows examples of design of amplicons at different regions of the genome to capture fragments of varying lengths originating at the same target region. The forward primer with respect to the gene, is shown with a dashed arrow and two reverse primers capturing short and long fragments, are shown with solid arrows, for two illustrative examples of regions in chromosome 1 (chrl) and chromosome 2 (chr2). In the multiplex amplicon panel, these amplicons are termed “fragment size amplicons” (FSA).
DETAILED DESCRIPTION
[0020] The present disclosure describes a method which enables amplicon-based targeted NGS assays or panels to perform cfDNA fragment size distribution analysis.
[0021] In the first aspect, the present disclosure refers to a method of determining cell-free DNA (cfDNA) fragment size ratio and/or fragment size distribution in a biological sample, comprising:
(a) performing a plurality of multiplexed PCR reactions on cfDNA present in the biological sample using:
(I) a plurality of primer sets specific to a plurality of target regions; wherein each target region comprises a plurality of target gene sequences or variants thereof; wherein the plurality of target gene sequences comprises a plurality of short target sequences and/or a plurality of long target sequences; wherein each short target sequence comprises a short cfDNA fragment comprising a predetermined number of base pairs or less than a predetermined number of base pairs; wherein each long target sequence comprises a long cfDNA fragment comprising more than a predetermined number of base pairs; wherein each primer set comprises one forward primer and a plurality of consecutive reverse primers specific to the plurality of target gene sequences, wherein the forward and reverse primers of each primer set specific to the plurality of target regions, are complementary to the plurality of short target sequences and/or the plurality of long target sequences; wherein each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions, is complementary to a different part of the plurality of short target sequences and/or long target sequences originating from the same or different target region; to generate a plurality of amplicons comprising cfDNA fragments originating from the plurality of short target sequences and/or long target sequences;
(b) purifying the plurality of amplicons from step (a);
(c) amplifying the purified product from step (b) by using universal indexed adapter primers to generate a sequencing library;
(d) purifying the sequencing library obtained from step (c);
(e) subjecting the purified sequencing library from step (d) to multiplex sequencing on a next-generation sequencing platform to obtain a plurality of sequencing reads;
(f) deriving a consensus read of each sequence from the plurality of sequencing reads obtained from step (e);
(g) performing a sequence alignment of each consensus read obtained from step (f) to a reference genome,
(A) if the sequence alignment results in a partial alignment to the reference genome of a short target sequence then:
(i) determining the sequence alignment as a consensus read of a short cfDNA fragment (a cfDNA fragment comprising a predetermined number of base pairs or less than the predetermined number of base pairs), and (ii) counting/enumerating the consensus read pairs of a short cfDNA fragment;
(B) if the sequence alignment results in a partial alignment to the reference genome of a long target sequence then:
(i) determining the sequence alignment as a consensus read of a long cfDNA fragment (a cfDNA fragment comprising more than the predetermined number of base pairs), and
(ii) counting/enumerating the consensus read pairs of a long cfDNA fragment; to thereby determine the cfDNA fragment size ratio and/or fragment size distribution in the biological sample, wherein the fragment size ratio is defined as the number of short cfDNA fragments to the number of long cfDNA fragments.
[0022] In one example of the present disclosure, target regions already included in the amplicon panel with potential to form short and long fragments were considered for fragment size analysis (not deliberate design for fragment size study specifically), as described herein. In another example of the present disclosure, it was envisioned that several (>30) amplicons deliberately designed throughout the genome to variably capture fragments of length 90 base pairs, 300 base pairs or other sizes, would additionally provide fragment size information, as illustrated in Figure 10. To minimize variability due to specific primer performance in multiplex, a common forward (arbitrary directionality) per region and two or more reverse primers would allow the capture of different fragment lengths. With a substantially sized panel >0.05 Mb, as shown, it is possible to discern fragment size information, and sensitivity of detection is anticipated to increase with increasing panel size.
[0023] In one example, the disclosed method further comprises determining fragment size distribution in a biological sample, by adding into the pool of plurality of primer sets of step (a) of the first aspect:
(II) a plurality of primer sets specific to a plurality of cfDNA fragments of predefined lengths, wherein the plurality of cfDNA fragments of predefined lengths is different from the plurality of short cfDNA fragments and plurality of long cfDNA fragments described in (a)(1) of the method of the first aspect, wherein each primer set comprises one forward primer and a plurality of consecutive reverse primers specific to a plurality of cfDNA fragments of predefined lengths, to generate a plurality of amplicons comprising cfDNA fragments of predefined lengths that are then subjected to steps (b)-(g) of the method of the first aspect to thereby determine the fragment size distribution of the cfDNA in the biological sample.
[0024] In one example, the disclosed method is used to determine cfDNA fragment size ratio in a biological sample. In another example, the method is used to determine fragment size distribution cfDNA in a biological sample. In a further example, the disclosed method is used to simultaneously determine cfDNA fragment size ratio and cfDNA fragment size distribution in a biological sample.
[0025] In one example, the predetermined number of base pairs is 150. Accordingly, in one example, short target sequences comprise short cfDNA fragments comprising 150 base pairs or less than 150 base pairs. In another example, short target sequences comprise short cfDNA fragments comprising about 70 to 149 base pairs, or about 80 to 140 base pairs, or about 90 to 130 base pairs or about 100 to 120 base pairs, or about 70 base pairs, or about 80 base pairs or about 90 base pairs, or about 100 base pairs, or about 110 base pairs, or about 120 base pairs, or about 130 base pairs, or about 140 base pairs or about 145 base pairs.
[0026] In another example, long target sequences comprise long cfDNA fragments comprising more than 150 base pairs. In another example, long target sequences comprise long cfDNA fragments comprising about 151 base pairs to about 500 base pairs, or about 160 to about 490 base pairs, or about 170 to about 480 base pairs, or about 180 to about 470 base pairs, or about 190 to about 460 base pairs, or about 200 to about 450 base pairs, or about 210 to about 440 base pairs, or about 220 to about 430 base pairs, or about 230 to about 420 base pairs, or about 240 to about 410 base pairs, or about 250 to about 400 base pairs, or about 260 to about 390 base pairs or about 270 to about 380 base pairs, or about 280 to about 370 base pairs, or about 290 to about 360 base pairs, or about 300 to about 350 base pairs, or about 310 to about 340 base pairs, or about 320 to about 330 base pairs, or about 151 base pairs, or about 152 base pairs, or about 153 base pairs, or about 154 base pairs, or about 155 base pairs, or about 156 base pairs, or about 157 base pairs, or about 158 base pairs, or about 159 base pairs, or about 160 base pairs, or about 170 base pairs, or about 180 base pairs, or about 190 base pairs, or about 200 base pairs, or about 210 base pairs, or about 220 base pairs, or about 230 base pairs, or about 240 base pairs, or about 250 base pairs, or about 260 base pairs, or about 270 base pairs, or about 280 base pairs, or about 290 base pairs, or about 300 base pairs, or about 400 base pairs or about 500 base pairs. In one example, the long target sequences comprise long cfDNA fragments comprising 3426 base pairs. In another example, there is no upper limit on the number of base pairs in a long cfDNA fragment.
[0027] In another example, the plurality of primer sets specific to a plurality of target regions are as disclosed in step (a)(1) of the disclosed method. In one example, the plurality of primer sets are specific to a plurality of target regions. In another example, the plurality of primer sets are specific to a plurality of target regions that comprise a plurality of target gene sequences or variants thereof. In one example, “plurality” means at least two. Therefore, in one example, the plurality of primer sets comprise at least two sets of primers that are specific to a plurality of target regions. In another example, the plurality of target regions comprise at least two target regions. In another example, the plurality of target regions comprise a plurality of target gene sequences or variants thereof. In yet another example, the plurality of target gene sequences or variants thereof comprise at least two target gene sequences. The “target region” may be any region of a nucleic acid that is to be analyzed, and the term “target sequence” within the target region may be construed accordingly. The target region may also comprise one or more exons (which may form the target sequences) and/or one or more introns. In one example, the “target region” comprises genes that are housekeeping genes, non-functional genes, functional genes or genes that are related to a disease, and the “target gene sequence” is a sequence of a housekeeping gene, non-functional gene, functional gene or a gene that is related to a disease within the “target region”. In one example, the target region is BRCA1 gene. In one example, the target region may be AMPD2, OPTC, CRYZ, PPOX, AFF3, RIF1, PAX3, OLA1, ALAS1, RHO, FRMD4B, SOBP, AGK, ANK1, TG, FXN, CUTC, APOA4, PTS, APOF, PAH, GCH1, PSEN1, GATM, RTF1, ARMC5, BRCA1, FECH, PIGN, ADA, PANK2, APP, SOD1, BCR, SOXIO, or APOO. The primer sequences for any target region(s), such as the exemplary target region(s), can be determined by a person skilled in the art using publicly available gene databases, such as the NCBI database. The primer design tool on the publicly available NCBI website (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) can be used by the skilled person to design suitable primer sequences for the target region(s) of interest. In another example, the plurality of target regions comprising a plurality of target gene sequences or variants thereof comprise a plurality of short target sequences and/or a plurality of long target sequences. In another example, the plurality of short target sequences comprise at least two short target sequences. In yet another example, the plurality of long target sequences comprise at least two long target sequences. In one example, the plurality of short target sequences comprise a plurality of short cfDNA fragments comprising 150 base pairs or less than 150 base pairs. In another example, the plurality of long target sequences comprise a plurality of long cfDNA fragments comprising more than 150 base pairs.
[0028] In one example, each primer set specific to a plurality of target regions comprises one forward primer and a plurality of consecutive reverse primers specific to the plurality of target gene sequences. In another example, each primer set specific to a plurality of target regions comprises one forward primer and at least two consecutive reverse primers specific to the plurality of target gene sequences. In another example, each primer set specific to a plurality of target regions comprises one forward primer and two or three consecutive reverse primers specific to the plurality of target gene sequences. In a further example, each primer set specific to a plurality of target regions comprises one forward primer and two consecutive reverse primers specific to the plurality of target gene sequences. In yet another example, each primer set specific to a plurality of target regions comprises one forward primer and three consecutive reverse primers specific to the plurality of target gene sequences. In one example, each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions, is complementary to a different part of the plurality of short target sequence and/or long target sequence originating from the same or different target region. In one example, each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions, is complementary to a different part of the plurality of short target sequence originating from the same target region. In another example, each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions, is complementary to a different part of the plurality of long target sequence originating from the same target region. In another example, each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions, is complementary to a different part of the plurality of short target sequence originating from different target region. In another example, each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions, is complementary to a different part of the plurality of long target sequence originating from different target region.
[0029] In one example, the plurality of primer sets described in (II) above are specific to a plurality of cfDNA fragments of predefined lengths of the disclosed method. In one example, the plurality of cfDNA fragments of predefined lengths are different from the plurality of short cfDNA fragments and plurality of long cfDNA fragments described in (I) of the first aspect. In one example, the plurality of primer sets described in (II) above specific to a plurality of cfDNA fragments of predefined lengths comprise more at least two sets. In another example, the plurality of cfDNA fragments of predefined lengths comprise at least two cfDNA fragments of predefined lengths.
[0030] In one example, each primer set specific to a plurality of cfDNA fragments of predefined lengths comprises one forward primer and a plurality of consecutive reverse primers specific to a plurality of cfDNA fragments of predefined lengths. In another example, each primer set specific to a plurality of cfDNA fragments of predefined lengths comprises one forward primer and at least two consecutive reverse primers specific to a plurality of cfDNA fragments of predefined lengths. In another example, each primer set specific to a plurality of cfDNA fragments of predefined lengths comprises one forward primer and two or three consecutive reverse primers specific to a plurality of cfDNA fragments of predefined lengths. In a further example, each primer set specific to a plurality of cfDNA fragments of predefined lengths comprises one forward primer and two consecutive reverse primers specific to a plurality of cfDNA fragments of predefined lengths. In another example, each primer set specific to a plurality of cfDNA fragments of predefined lengths comprises one forward primer and three consecutive reverse primers specific to a plurality of cfDNA fragments of predefined lengths.
[0031] In one example, each forward primer of the plurality of primer sets comprises a barcode sequence on its 5’ end, wherein each barcode sequence is different. As used herein, the term “barcode sequence” refers to an encoded molecule or barcode that includes variable amount of information within the nucleic acid sequence. For example, the barcode sequence is a tag that can be read out using any of a variety of sequence identification techniques, for example, nucleic acid sequencing, probe hybridization-based assay, and the like.
[0032] The barcode sequence allows the pooled analysis of multiple unique target sequences, where the resulting sequence information from the pool can be later attributed back to each starting target sequence. That is, after the process of amplification, the barcode sequence is used to group amplicons to form a family of amplicons having the same barcode sequence. In some examples, the barcode sequence is an overhang that does not complement any sequence within the target region. As each forward primer carries on its 5’ end a randomly assigned barcode sequence as disclosed herein, the barcode sequence allows individual cfDNA molecules to be tagged uniquely in the step of sequencing library formation.
[0033] In one example, the barcode sequence is an oligonucleotide comprising 10 to 16 random nucleotides, or 10 to 15 random nucleotides, or 10 to 13 random nucleotides, or 10 random nucleotides, or 11 random nucleotides, or 12 random nucleotides, or 13 random nucleotides, or 14 random nucleotides, or 15 random nucleotides, or 16 random nucleotides. In one example, the barcode sequence is an oligonucleotide comprising 10 to 16 random nucleotides. In one example, the barcode sequence is an oligonucleotide comprising 10 random nucleotides. In one specific example, the barcode sequence is an oligonucleotide comprising 10 random nucleotides which can be represented as NNNNNNNNNN (SEQ ID NO: 4).
[0034] In one example, the biological sample containing cfDNA is a bodily fluid selected from the group consisting of blood, bone marrow, cerebral spinal fluid, peritoneal fluid, pleural fluid, lymph fluid, ascites, serous fluid, sputum, lacrimal fluid, stool, urine, saliva, ductal fluid from breast, gastric juice, and pancreatic juice. In one example, the bodily fluid is blood. In another example, the blood is plasma.
[0035] In another example, the biological sample is obtained from a subject having and/or suspected of having a disease. In another example, the disease is cancer. In yet another example, the disease is other pathological conditions. In one example, the cancer is selected from the group consisting of leukemia, lung cancer, colorectal cancer, breast cancer, pancreatic cancer, prostate cancer, nasopharyngeal cancer, liver cancer, cholangiocarcinoma, esophageal cancer, urothelial cancer, and gastrointestinal cancer. In yet another example, the cancer is selected from the group consisting of pancreatic cancer, prostate cancer, breast cancer, lung cancer, colorectal cancer and liver cancer. In one example, the pathological conditions include viral infections and neurological disorders. In one example, the disease is an early-stage disease. In another example, the disease is a late-stage disease. In another example, an early- stage disease in the context of cancer is a localized cancer that has not spread into surrounding tissues. In one example, an early-stage disease is stage 0 cancer or stage I cancer or stage II cancer. In another example, a late-stage disease in the context of cancer is a cancer that has spread to distant tissue or organs. In one example, a late-stage disease is stage III or stage IV cancer. In yet another example, a late-stage disease is metastatic cancer. [0036] In some examples of the disclosed method, the cfDNA present in the biological sample is tumor-derived cfDNA (ctDNA) and fetal-derived cfDNA. In one example, the cfDNA is ctDNA.
[0037] A plurality of multiplexed PCR reactions are performed on cfDNA present in the biological sample as disclosed in step (a) of the first aspect, using a plurality of primer sets specific to a plurality of target regions as disclosed in (a)(1) and/or a plurality of primer sets specific to a plurality of cfDNA fragments of predefined lengths as disclosed in (a)(II), wherein the plurality of primer sets specific to a plurality of target regions differ from that of a plurality of primer sets specific to a plurality of cfDNA fragments of predefined lengths.
[0038] In one example, the plurality of multiplexed PCR reactions are performed using a plurality of primer sets specific to a plurality of target regions as disclosed in step (a)(1) of the method of the first aspect. In another example, the plurality of multiplexed PCR reactions are performed using a plurality of primer sets specific to a plurality of target regions as disclosed in step (a)(1) of the method of the first aspect and a plurality of primer sets specific to a plurality of cfDNA fragments of predefined lengths.
[0039] In one example, the multiplexed amplicon-based panel covers 5-500 genes. In one example, the multiplexed amplicon-based panel has a total panel size of 0.01 Mb to 0.5 Mb. In one example, the multiplexed amplicon-based panel covers 5-500 genes and has a total panel size of 0.01 Mb to 0.5 Mb. In another example, the multiplexed amplicon-based panel covers 100 genes and has a total panel size of at least 0.06 Mb.
[0040] In one example, the plurality of multiplexed PCR reactions performed on the cfDNA comprises 3 to 15 PCR cycles. In one example, the PCR reactions comprise 3 PCR cycles. In one example, the PCR reactions comprise 4 PCR cycles. In one example, the PCR reactions comprise 5 PCR cycles. In one example, the PCR reactions comprise 6 PCR cycles. In one example, the PCR reactions comprises 7 PCR cycles. In one example, the PCR reaction comprise 8 PCR cycles. In one example, the PCR reactions comprise 9 PCR cycles. In one example, the PCR reactions comprise 10 PCR cycles. In one example, the PCR reactions comprise 11 PCR cycles. In one example, the PCR reactions comprise 12 PCR cycles. In one example, the PCR reactions comprise 13 PCR cycles. In one example, the PCR reactions comprise 14 PCR cycles. In one example, the PCR reactions comprise 15 PCR cycles.
[0041] In the disclosed method, the cfDNA fragments of predefined length to be captured using primer sets of (II) comprise a length of about 70 base pairs to about 350 base pairs, or about 80 base pairs to about 340 base pairs, or about 90 base pairs to about 330 base pairs, or about 100 base pairs to about 320 base pairs, or about 110 base pairs to about 310 base pairs, or about 120 base pairs to about 300 base pairs, or about 130 base pairs to about 290 base pairs, or about 140 base pairs to about 280 base pairs, or about 150 base pairs to about 270 base pairs, or about 160 base pairs to about 260 base pairs, or about 170 base pairs to about 250 base pairs, or about 180 base pairs to about 240 base pairs, or about 190 base pairs to about 230 base pairs, or about 200 base pairs to about 220 base pairs, or about 70 base pairs, or about 80 base pairs, or about 90 base pairs, or about 100 base pairs, or about 110 base pairs, or about 120 base pairs, or about 130 base pairs, or about 140 base pairs, or about 150 base pairs, or about 160 base pairs, or about 170 base pairs, or about 180 base pairs, or about 190 base pairs, or about 200 base pairs, or about 210 base pairs, or about 220 base pairs, or about 230 base pairs, or about 240 base pairs, or about 250 base pairs, or about 260 base pairs, or about 270 base pairs, or about 280 base pairs, or about 290 base pairs, or about 300 base pairs, or about 310 base pairs, or about 320 base pairs, or about 330 base pairs, or about 340 base pairs, or about 350 base pairs. In one example, the cfDNA fragments of predefined length to be captured using primer sets of (II) comprise a length of about 90 base pairs, or about 300 base pairs. In one example, the cfDNA fragments of predefined length to be captured using primer sets of (II) comprise a length of about 90 base pairs. In another example, the cfDNA fragments of predefined length to be captured using primer sets of (II) comprise a length of about 300 base pairs.
[0042] In one example, the number of cfDNA fragments of predefined length to be captured by the plurality of primer sets of (II) is from about 10 to about 100, or about 20 to about 90, or about 30 to about 80, or about 40 to about 70, or about 50 to about 60, or about 10 or about 20, or about 30, or about 40, or about 50, or about 60, or about 70, or about 80, or about 90, or about 100. In another example, the number of DNA fragments of predefined length to be captured by the plurality of primer sets of (II) is at least 30.
[0043] In one example, the plurality of amplicons generated from the plurality of primer sets of step (a)(1) comprise about 70 to 170 base pairs, or about 80 to 160 base pairs, or about 90 to 150 base pairs, or about 100 to 140 base pairs, or about 110 to 130 base pairs, or about 70 base pairs, or about 80 base pairs, or about 90 base pairs, or about 100 base pairs, or about 110 base pairs, or about 120 base pairs, or about 130 base pairs, or about 140 base pairs, or about 150 base pairs, or about 160 base pairs, or about 170 base pairs. [0044] The plurality of amplicons derived from the cfDNA in present in the biological sample are then purified, as disclosed in step (b) of the method of the first aspect.
[0045] The method of the present disclosure is designed to involve size-based separation (for example, magnetic bead-based separation) of smaller primer dimer artefacts to be removed and desired products to be retained. In one example, the purification of cfDNA is performed using an agent such as paramagnetic beads. In one example, the paramagnetic beads are selected from the group consisting of AMPure XP beads, SPRI beads, and Dynabeads.
[0046] Next, the purified plurality of amplicons is amplified using universal indexed adapter primers to generate a plurality of sequencing library, as disclosed in step (c) of the method of the first aspect.
[0047] In one example, the amplification is performed using KAPA Hifi HotStart ReadyMix, Phusion U Hot Start DNA Polymerase (Thermo Scientific), ZymoTaq DNA Polymerase (Zymo Research) and Q5U Hot Start High-Fidelity DNA Polymerase (NEB), etc.
[0048] In one example, each universal indexed adapter primer as disclosed in step (c) comprises an adapter sequence. In one example, the term “adapter sequence” refers to an oligonucleotide sequence bound to the 5’ and 3’ end of each DNA fragment in a sequencing library. The adapter sequences are complementary to the plurality of oligonucleotides present on the surface of the flow cells of the sequencing tools thereby allowing the DNA fragment to attach to the sequencing tool. In some examples, an adapter sequence allows for the sequencing of the oligonucleotide of interest. Sequencing platform specific adapter sequences are known in the art, and include, for example, the Illumina P5/P7 adapter sequences.
[0049] In one example, the universal indexed adapter primers as disclosed in step (c) of the method of the first aspect comprise: a forward primer comprising the sequence of
AATGATACGGCGACCACCGAGATCTACACCTAGCGCTACACTCTTTCCCTACACG
ACGCTCTTCCGATC*T (SEQ ID NO: 5); and a reverse primer comprising the sequence of
CAAGCAGAAGACGGCATACGAGATAACCGCGGGTGACTGGAGTTCAGACGTGTG CTCTTCCGATC*T (SEQ ID NO: 6), wherein represents a phosphorothioate bond, and wherein the underlined sequences are the barcode sequences. [0050] The plurality of sequencing library formed is then purified, as disclosed in step (d) of the method of the first aspect.
[0051] In one example, the purification of the plurality of sequencing library is performed using an agent such as paramagnetic beads. In one example, the paramagnetic beads are selected from the group consisting of AMPure XP beads, SPRI beads, and Dynabeads.
[0052] In one example, the plurality of purified sequencing library from step (d) are sequenced on a NGS platform to obtain a plurality of sequencing reads as disclosed in step (e) of the method of the first aspect. In some examples, the NGS platform is Nextseq 550, NovaSeq 6000, BGI MGISEQ-2000, DNBSEQ-G400 or DNBSEQ-T7.
[0053] Next, a plurality of consensus reads is derived from each sequence of the plurality of sequencing reads obtained from step (e), as disclosed in step (f) of the method of the first aspect. [0054] Subsequently, the derived consensus reads are aligned to a reference genome, as disclosed in step (h) of the method of the first aspect. In one example, the term “reference genome” refers to DNA sequences known in the art that may be obtainable from public databases. In one example, the term “consensus read” refers to a nucleotide sequence obtained from consensus calling. In one example, consensus calling is performed by identifying the nucleotide at each position for each sequencing result within the subgroup, comparing the identity for the nucleotide at each position across the plurality of sequencing results, and determining a majority nucleotide at each position. If the majority nucleotide count is above a threshold set for determining majority for a specific position, the assignment for said position is the majority nucleotide. If the majority nucleotide count is below this threshold, no assignment is made for said position. The threshold is variable for every position and is a function of the total number of sequencing results corresponding to a specific position.
[0055] In one example, step (g) of the disclosed method further comprises, if the sequence alignment results in a partial alignment to the reference genome of a short target region then: (i) determining the sequence alignment as a consensus read of a short cfDNA fragment (a cfDNA fragment comprising 150 base pairs or less than 150 base pairs), and (ii) counting/enumerating the consensus read pairs of a short cfDNA fragment. In another example, step (g) of the disclosed method further comprises, if the sequence alignment results in a partial alignment to the reference genome of a long target region then: (i) determining the sequence alignment as a consensus read of a long cfDNA fragment (a cfDNA fragment comprising more than 150 base pairs), and (ii) counting/enumerating the consensus read pairs of a long cfDNA fragment. The consensus read pairs are used to count the numbers of fragments, corresponding to specific primer target sets. The numbers of fragments (defined by sizes or size ranges or by primer pair identities) are then used to determine the cfDNA fragment size ratio and/or fragment size distribution in the biological sample, wherein: (i) the fragment size ratio is defined as the number of short cfDNA fragments to the number of long cfDNA fragments; and (ii) the fragment size distribution is derived by determining the number of fragments from an entire sample library or a subset of the sample library, and determining the size ranges of each fragment obtained. This results in a distribution of size ranges (i.e., the fragment size distribution).
[0056] In one example, step (g) of the disclosed method further comprises visualisation of amplicons. In one example, the visualisation is performed using Integrated Genome Viewer, Savant Genome Browser, etc.
[0057] In one example, the disclosed method comprises the use of Bioinformatic tools to capture fragment sizes from alignment files for each sample obtained from step (g). In another example, the Bioinformatic tool used to capture fragment size from alignment files for each sample is the publicly available tool, Sequence Alignment Map (SAMtools). In yet another example, the SAMtools specification is as follows:
• Samtools view -f65 -F2048 SampleXYZ_consensus.bam | cut -f 1,4,9 > SampleXYZ_f65F2048.txt
• f65 = filter in read paired, first in pair
• Reads which are paired and fragment sizes for first read in read pair (second read in read pair will have same fragment size with “negative” length)
• F2048 = filter out reads with supplementary alignment
• Removes chimeric reads
[0058] In another example, the consensus read pairs with fragment size of 50 to 1000 are retained to obtain a representation of expected fragment sizes. In one example, the disclosed method involves selection of target regions and binning fragments into size ranges. In another example, the fragments mapping to specific targeted regions are extracted based on genomic location. This is done to enrich for fragments longer than 200 base pairs. In a further example, the fragment size ratio is then derived as number of cfDNA fragment comprising 150 base pairs or less than 150 base pairs over cfDNA fragment comprising more than 150 base pairs. This represents the approximate abundance of “short” fragments relative to “long” fragments. [0059] In one example, the method of the present disclosure involves determining that the subject has a disease if the fragment size ratio of cfDNA fragments in the biological sample obtained from the subject is higher or lower than the fragment size ratio of cfDNA fragments in a control sample (e.g. a sample obtained from a healthy subject). In one example, the method of the present disclosure involves determining that the subject has cancer if the fragment size ratio of cfDNA fragments in the biological sample obtained from the subject is higher than the fragment size ratio of cfDNA fragments in a control sample (e.g. a sample obtained from a healthy subject). In another example, the fragment size ratio of cfDNA fragments in the control sample obtained from the healthy subject is about 40 or less. In one example, a healthy subject is an individual free from any known diagnosed disease, and has no significant health-related issues. In one example, the plurality of primer sets used to determine if a subject has a disease, and the plurality of primer sets used to determine if a subject is disease-free, is the same.
[0060] In one example, the method of the present disclosure further comprises determining that the disease is an early-stage disease if the fragment size ratio of cfDNA fragments in the biological sample is higher or lower than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late- stage of the disease, and determining that the disease is a late-stage disease if the fragment size ratio of cfDNA fragments in the biological sample is approximately the same as, higher or lower than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the disease. In one example, the method of the present disclosure further comprises determining that the disease is an early-stage disease if the fragment size ratio of cfDNA fragments in the biological sample is higher than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the disease, and determining that the disease is a late-stage disease if the fragment size ratio of cfDNA fragments in the biological sample is approximately the same as or lower than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the disease. In one example, the method of the present disclosure further comprises determining that the disease is an early-stage disease if the fragment size ratio of cfDNA fragments in the biological sample is lower than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the disease, and determining that the disease is a late- stage disease if the fragment size ratio of cfDNA fragments in the biological sample is approximately the same as or higher than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the disease. In one example, wherein the disease is cancer, the method of the present disclosure further comprises determining that the cancer is an early-stage cancer if the fragment size ratio of cfDNA fragments in the biological sample is lower than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the cancer, and determining that the cancer is a late-stage cancer if the fragment size ratio of cfDNA fragments in the biological sample is approximately the same as or higher than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the cancer. In one example, the disease is a non-cancer disease. In one example, wherein the disease is a non-cancer disease, an early-stage non-cancer disease may have a fragment size ratio of cfDNA fragments in the biological sample which is higher or lower than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the non-cancer disease.
[0061] In one example, the fragment size ratios obtained using the disclosed method is correlated with clinical features such as cancer, type of cancer, plasma cfDNA concentration and mutant allelic frequency. In one example, the correlation is performed using GraphPad Prism 8.0.1, RStudio (1.2.5033), pROC package in R, plotROC package in R, ROCR package in R, etc.
[0062] In a second aspect, the present disclosure refers to a kit for determining cfDNA fragment size ratio and/or fragment size distribution in a biological sample according to the method of the first aspect, comprising a plurality of primer sets specific to a plurality of target regions as defined in step (a)(1) of the method of the first aspect, and instructions for use in the disclosed method.
[0063] In one example, the present disclosure refers to a kit for determining cfDNA fragment size distribution in a biological sample according to the method disclosed herein, further comprising a plurality of primer sets specific to a plurality of cfDNA fragments of predefined lengths as defined in the method disclosed herein and instructions for use in the disclosed method.
[0064] In one example, the kit further comprises a buffer for performing a plurality of multiplexed PCR reactions, a DNA polymerase, a plurality of deoxy nucleoside triphosphates (dNTPs), and a reagent capable of removing excess primers. In some examples, the reagents provided in the kit as described herein may be provided in separate containers comprising the components independently distributed in one or more containers. As the method as described herein relates to sequencing (such as high-throughput sequencing), further components required in sequencing process could be easily determined by the person skilled in the art.
[0065] As used in this application, the singular form “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “a primer” includes a plurality of primers, including mixtures and combinations thereof.
[0066] As used herein, the terms “increase” and “decrease” refer to the relative alteration of a chosen trait or characteristic in a subset of a population in comparison to the same trait or characteristic as present in the whole population. An increase thus indicates a change on a positive scale, whereas a decrease indicates a change on a negative scale. The term “change”, as used herein, also refers to the difference between a chosen trait or characteristic of an isolated population subset in comparison to the same trait or characteristic in the population as a whole. However, this term is without valuation of the difference seen.
[0067] As used herein, the term “about” in the context of concentration of a substance, size of a substance, length of time, or other stated values means +/- 5% of the stated value, or +/- 4% of the stated value, or +/- 3% of the stated value, or +/- 2% of the stated value, or +/- 1% of the stated value, or +/- 0.5% of the stated value.
[0068] Throughout this disclosure, certain embodiments may be disclosed in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosed ranges. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
[0069] The present disclosure illustratively described herein may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. Thus, for example, the terms "comprising", "including", "containing", etc. shall be read expansively and without limitation. Additionally, the terms and expressions employed herein have been used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the disclosure claimed. Thus, it should be understood that although the present disclosure has been specifically disclosed by preferred embodiments and optional features, modification and variation of the present disclosure embodied therein herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this present disclosure.
[0070] The disclosure has been described broadly and generically herein. Each of the narrower species and subgeneric groupings falling within the generic disclosure also form part of the present disclosure. This includes the generic description of the present disclosure with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein.
[0071] Other embodiments are within the following claims and non-limiting examples.
EXAMPLES
[0072] Materials
[0073] Exemplary molecular tag complex or primers when target is BRCA1
[0074] An example of a suitable “forward primer” when the target region is BRCA1 is as follows: TGGAACCTACTTCATTAATATTGCT (SEQ ID NO: 1).
[0075] An example of a suitable “reverse primer” when the target region is BRCA1 is as follows: CATTCAATGTCACCTGAAAGAGAAA (SEQ ID NO: 2).
[0076] An example of yet another suitable “reverse primer” when the target region is BRCA1 is as follows: GAAAGATAAGCCAGTTGATAATGCC (SEQ ID NO: 3).
[0077] Methods
[0078] Plasma cfDNA samples from 115 cancer cases and 21 healthy cases were subjected to amplicon-based NGS testing. Mutation analysis was done routinely.
[0079] Extraction of fragments/insert size information information
[0080] The publicly available utility SAMtools was used to capture fragment sizes from alignment files for each sample with the following specifications: samtools view -f65 -F2048 SampleXYZ_consensus.bam | cut -f 1,4,9 > SampleXYZ_f65F2048.txt where f65 = filter in read paired, first in pair (reads which are paired and insert sizes for first read in read pair (second read in read pair will have same insert size with “negative” length)); F2048 = filter out reads with supplementary alignment (removes chimeric reads). Finally, inserts of size 50 to 1000 base pairs were retained to get representation of expected fragment sizes. [0081] Selection of target regions and binning fragments into size ranges
[0082] Fragments mapping to specific targeted regions was extracted based on genomic location. This was done to enrich for fragments longer than 200 base pairs. Regionally enriched fragments were categorized into 0-150 base pairs and 151-500 base pairs ranges. The fragment size ratio was then derived as (number of fragments of size <150base pairs )/(number of fragments of size >150base pairs). This represents the approximate abundance of “short” fragments relative to “long” fragments.
[0083] Correlation with clinical features of samples
[0084] Fragment size ratios were correlated with clinical features, such as cancer, type of cancer, plasma cfDNA concentration, mutant allelic frequency. Pearson correlation analysis was performed for pairwise correlation and unpaired t-test analysis was done for group-wise analysis using GraphPad Prism 8.0.1. RStudio (1.2.5033) was used for logistic regression modelling and ROC analysis was done with ROCR package in R.
[0085] Results
[0086] The present disclosure describes a methodology which enables amplicon-based targeted NGS assays/panels to perform cfDNA fragment size distribution analysis. This approach has parallels with qPCR-based fragment size analysis methods, and relies on the readout of read counts from NGS, but importantly, in contrast to qPCR is enabled on a highly multiplex scale approaching total panel sizes of 0.01-0.5 Mb, and continues to yield mutational information simultaneous to fragment size information. The method described herein is able to characterize cancers tumors from normal samples and further show different cancer types are characterized by different fragment size distributions.
[0087] Validation of amplicon-based assay for fragment size distribution in cancer and healthy samples
[0088] An amplicon-based NGS panel of size -0.06 Mb, was used to sequence plasma cfDNA from 115 cancer samples (including 25 early-stage cancers) and 21 normal samples (from healthy individuals). Routine detection of genomic alterations was performed. It was reasoned that the overall distribution of cfDNA fragments would be subject to variations among samples in line with the prevailing knowledge for WGS studies. For an amplicon-based NGS assay, this variation would reflect as increasing or decreasing read counts within predetermined fragment length size bins (representing the sizes of fragments expected to be captured due to the amplicon design). The overall fragment size distribution was shown to be between 70-160 base pairs in both cancer and normal samples (Figures 1A and IB), but relative higher abundance of fragments of length 70-110 base pairs was seen in a representative cancer sample compared to a normal sample (Figure 1C), 38% vs. 17.5% of total, respectively.
[0089] Validation of primers designed to capture both short and long cfDNA fragments
[0090] Typically, due to the design of amplicon-based capture panel, the vast majority of fragments were in range of 120-130 base pairs, while examining the full panel-wide size distribution. This is because the majority of the panel is designed to most optimally capture specific variants within exons, which are naturally short (relative to intervening introns), and can be captured by a single discrete amplicon of average size. Therefore, any relevant changes in abundance of fragment sizes > 150 base pairs, as reported before1-3 would be obscured (as a relative change), due to their limited relative abundance in the whole panel, and would not result in meaningful proportion measurement if comparing small and large fragments as defined by <150 base pairs and >150 base pairs, respectively5.
[0091] Therefore, in order to enrich long fragments for comparison, analysis was focused on those parts of the panel where consecutive amplicons had been designed in order to cover entirety of long target regions of gene, including exon 10 of BRCA1. (Figure 2). Specific exons of these genes which are several times longer than the capture design of a typical amplicon, necessitate the design of consecutive amplicons to capture the full exon, significant mutations in which could occur anywhere through the exons. These genes are representative examples of target regions in the panel where amplicons larger than 150 base pairs can be expected to be formed due to design of consecutive primers, and the method described herein is not limited to these specific genes. Consecutive design of amplicons allows the capture of longer amplicons (from subsequence of primers), if target cfDNA fragment of larger size exist in the sample. cfDNA fragment size distribution for such target regions with consecutive amplicon design are shown in Figure 3 in a lung cancer sample (A) and in a normal plasma (B), highlighting both the relative increase in short fragments and decrease in long fragments in cancer sample, therefore representing an overall shift to the smaller fragment sizes in cancer sample (Figure 3C).
[0092] cDN A fragment size ratio calculation and its correlation to clinical features of cancer [0093] To generalize and quantify the shift in fragment size, a “size ratio” was calculated for all samples studied, defined by ratio of the number of short cfDNA fragments (70-150 base pairs) compared to long cfDNA fragments (151-500 base pairs), among the fragment data as enriched for analysis for larger fragments described herein. There was a significant difference (p<0.0001) in the size ratio between healthy plasma cfDNA and cfDNA from cancer samples (Figure 4). When analyzed by cancer type, the median size ratio was higher in all cancer types relative to healthy plasma, and seen to be highest in colorectal cancer and cholangiocarcinoma (and liver) cancer cases (Figure 5). The size ratio was in partial agreement with previous findings based on WGS-based fragment size analysis, which showed colorectal cancer and cholangicarcinoma samples had the highest proportion of fragments below 150 base pairs4.
[0094] Among the lung cancer samples, further analysis by cancer stage (early stage vs metastatic) showed that the size ratio of early stage lung cancer cases (n=22) is smaller than that of metastatic cases (n=50), suggesting a progressive shift to shorter fragments (<150 base pairs) with increasing stage of disease (Figure 6), as may be expected with increasing tumor burden which would lead to an enrichment of ctDNA fragments, known to have shorter length than normal cfDNA fragments in circulation.
[0095] Plasma cfDNA concentration (cfDNA per ml plasma) is a known prognostic and predictive biomarker in multiple cancer types. In cancers, as a significant proportion of cfDNA comes from tumour cells, cfDNA concentration potentially reflects the tumour burden and the aggressiveness potential of the disease. Therefore, the fragment size ratio was compared to plasma cfDNA concentration among all samples, as a proxy of tumor burden and a significant increasing trend of size ratio (p<0.0001) with increasing plasma cfDNA concentration was noted (Figure 7). As a direct (but not complete) measure of the tumor content or purity of cancer samples, the highest mutant allele frequency (AF%) measured from the same ampliconbased NGS assay was also correlated (p=0.0001) to size ratio of samples (Figure 8). This in contrast to a previous study which reported that tumor cfDNA purity as quantified by the maximum somatic allele frequency was poorly correlated with the occurrence of shorter (133/144 base pairs vs. 166 base pairs) fragments6. Nonetheless, the validity of the method described herein remains as among cancer samples, any shift to 134/144 base pairs dominance was shown to occur globally across the genome instead of being limited to specific regions of the genome, suggesting that extending the analysis to wider target regions (in the ampliconbased panel or any amplicon-based panel) encompassing short and long fragments irrespective of the identity of the target should continue to retain the correlations observed.
[0096] Finally, to determine if cfDNA fragment ratio could be used as a feature to determine if a given cfDNA sample has the characteristics of a patient with cancer or a healthy individual, a logistic regression model was implemented. The performance of this model was assessed as the area under curve (AUC) of Receiver operating characteristic (ROC) analysis. The AUC was 0.84 when only size ratio was used as a selection feature (Figure 9A) and was 0.93 when both size ratio and plasma cfDNA concentration were incorporated as features (Figure 9B). The accuracy of the model in predicting cancer/normal was 0.75 when only size ratio was used as a feature and was 0.89 when both size ratio and plasma cfDNA concentration were used. Therefore, cancer samples could be distinguished from healthy individuals on the basis of the fragment size ratio, as quantified from a targeted amplicon-based NGS panel, described in the present disclosure.
[0097] Discussion
[0098] A search of the past literature for cfDNA fragment size shows that while it is a significant aspect of a cfDNA sample to quantify, this information is not routinely captured in NGS tests (except in limited assays specifically designed for this). As described, past studies describing cfDNA fragment size used qPCR or WGS or hybridization-based targeted sequencing. Thus far, there is no description of the use of a highly multiplexed amplicon-based targeted NGS panel in the application of cfDNA fragment size study. The present disclosure describes for the first time:
1. The possibility of fragment- sized analysis to be done using an amplicon-based NGS method, which is typically not amenable to fragment- sized analysis;
2. Use of amplicon-based NGS assay for cfDNA fragment size distribution as relative abundance changes within the range of amplicon sizes that are detected by sequencing;
3. Exploiting the consecutive amplicon design in a large amplicon panel to extract long amplicon abundance information; and
4. The incorporation of amplicons (separate from target regions of interest for mutation detection) for short and long amplicons to additionally quantify the fragment size distribution.
[0099] The most important features of the method disclosed herein include (1) a highly multiplexed amplicon-based NGS panel, (2) consecutive design of amplicons for full capture of large target regions and (3) determination of appropriate format size ranges to compare for determination of size ratios. [0100] Typically, PCR-based and even hybridization-based cfDNA profiling techniques are developed for the detection of particular DNA modifications, making them unsuitable for generalizable exploratory analysis. Recently, size profiling of plasma cfDNA with wholegenome sequencing was shown to follow a fragment size pattern that depended on whether plasma was from a healthy individual or from a cancer patient, and on the cancer type. In general, there was an enrichment of fragments in the range of 90-150 base pairs in plasma from cancer patients.
[0101] The present disclosure looks to extract cfDNA fragment size information - shown to be relevant by qPCR and WGS for the characterization of cfDNA from cancer samples relative to healthy sample - from a highly multiplexed amplicon-based NGS assay, while not sacrificing or compromising the accompanying detection sensitivity advantages afforded by ampliconbased NGS. The method described herein is generalizable as part of the NGS assay process which leads to the detections of mutations and copy number changes, and other genomic alterations, and does not require additional sample or laboratory manipulations. Importantly, the method described in the present disclosure does not require the same sequencing resources as 4x WGS to capture fragment size information, and at the same time allows for the sensitive capture of genomic alterations down to 0.1% variant allele frequency. In contrast, WGS is only generally informative of mutations when ctDNA content is -10% or greater, and hence not generally suitable for the purpose of detection of genomic alterations in the cfDNA setting. The method described herein is able to simultaneously establish fragment size patterns and perform the primary intended function of detection of genomic alterations sensitively. WGS -based methods or qPCR-based methods meant for fragment size analysis would not afford the same genomic alteration ability, necessitating the performing of a separate analytical test (requiring additional sample, cost, time and manpower) to additionally gather information on genomic alterations.
[0102] It has been shown in a comparative study of amplicon-based sequencing that the analytical sensitivity with target amplicons in the size range of 140-170 base pairs is identical and that NGS sequencing of plasma cfDNA performs similarly with amplicons in the range of 120 to 170 base pairs. This finding alleviates the need to deliberately design shorter amplicon sizes for multiplexed PCR-based NGS applications, which can be complicated by restrictions from GC content, matching melting temperatures and preventing primer dimerization. Therefore, panel design can be focused on optimizing compatibility of primers, rather than minimizing amplicon lengths for multiplexed cfDNA applications. This obviates the need to deliberately design short amplicons in order to maximize sensitivity. Therefore, the knowledge of shorter length of ctDNA fragments does not impulse a change in panel design for targets of interest for genomic alteration detection - which if implemented would not be amenable to the type of fragment size ratio analysis described herein, as all the amplicons would be of the same average size, and no shifts in abundances by fragment sizes would be discernible. In other words, precisely due to the range of fragment sizes in the panel design, prioritized by primer performance and nature of target (and not by a fixed amplicon length), the study of fragment size distribution in an amplicon-based NGS panel is made possible.
[0103] It is noted that similar to WGS-based fragment size analysis, there is a need to establish the normal pattern or size distribution of cfDNA as measured by the particular assay, to compare cancer samples to, and neither WGS-based method nor amplicon-based method is free from this requirement.
[0104] It is envisioned that with a normal healthy reference established for any biological fluid, including urine and saliva, differences in cfDNA fragmentation patterns for any biological fluid with cfDNA can be determined for cancer detection as well as other pathological conditions.
[0105] Importantly, the disclosed method showed that these features can be used to extract cfDNA fragment size measures that are correlated with clinical features of cancers and healthy plasma samples (as presented in past studies using different approaches to measuring cfDNA fragment sizes), and can be incorporated as a feature to detect presence of cancer. This provides validation of the utility of the present disclosure. Broadly, the features of the present disclosure can be applied to any amplicon-based NGS panel for fragment size analysis, and is not limited to findings from the particular panel described herein. It is envisioned that the features of this disclosure will be routinely employed as part of diagnostic testing using amplicon-based NGS assays designed primarily for the detection of genomic alterations, without any additional cost or time. The disclosed method can be used to independently estimate the presence of cancer, and quantitate the tumor fraction in circulation, based on the fragment size ratios measured. Further, it is envisioned, that additional features derived from specific size ranges (beyond the two ranges used currently, <150base pairs and >150base pairs), can be used to inform cancer type in the setting of early cancer detection.
[0106] Due to the inherent nature of amplicon-based panel design - with predetermined amplicon sizes within a narrow size range (120-170 base pairs) made amenable to standard sequencing and compatible with cfDNA-like material - it is non-trivial to determine size differences of cfDNA fragments using amplicon-based NGS assays. This is because the principal application of highly multiplexed amplicon-based NGS assays has been the detection of genomic alterations, and not fragment size analysis. On the face of it, amplicon-based assays do not easily lend to fragment size analysis.
[0107] In the present disclosure, this apparent limitation of amplicon-based NGS assays is overcome by:
1. Interrogating large (>0.05 Mb) panel with a range of amplicon sizes from 70-170 base pairs;
2. Enriching long fragment/amplicon information from regions of the panel where multiple consecutive primer pairs are designed to capture full sequences of naturally long target regions (example, BRCA1 exon 10, as described in Figure 2), which would then inadvertently capture long fragments at some measurable frequency (determined by read count from sequencing for fragments larger than a set length, say 150 base pairs) should such long fragments exist in the sample; and
3. Incorporating design of amplicons intended to capture short and long fragments, for example 90 base pairs, 200 base pairs, 300 base pairs, additional to target regions which are part of the panel primarily used for the interest in genomic alterations. These amplicons are termed - “fragment size amplicons” (Figure 10).
[0108] Research results from the analysis of plasma cfDNA from 115 cancer samples and 21 normal samples are described in Figures 4-9 of the Results section in the present disclosure, encompassing the observation of fragment size variation among samples, selection of target regions for determination of fragment size ratios, and the clinically relevant correlation of fragment size ratio to cancers, plasma cfDNA concentration and tumor purity, and the performance of a cancer prediction model incorporating fragment size feature.
[0109] The method of the present disclosure has the following advantages:
1. One limitation of conventional amplicon-based NGS methods is that they are not amenable to fragment size analysis. The method of the present disclosure uses multiplexed amplicon-based NGS assays comprising large panels (>0.05 Mb) with predetermined amplicon sizes with a range of amplicon sizes from 70-170 bp. Such panel design allows for standard sequencing and are compatible with cfDNA-like material thereby allowing fragment size analysis. 2. The method of the present disclosure allows for determination of both fragment size ratio and fragment size distribution, and simultaneous detection of significant mutations which could occur anywhere through target regions, for the characterization of cfDNA from cancer samples relative to healthy samples using a multiplexed amplicon-based NGS method with high sensitivity, without additional steps, sampling, cost, time and manpower. The method of the present disclosure allows for determination of fragment size ratio and fragment size distribution without compromising the inherent functions of amplicon-based NGS, such as detection of genomic alterations.
3. The technological significance lies in the consecutive design of amplicons for full capture of large target regions in a highly multiplexed amplicon-based NGS panel, which allows the detection of short and long fragments to be considered for fragment size analysis and simultaneous detection of significant mutations in which could occur anywhere through target regions comprising long coding exons of genes.
4. These amplicons termed - “fragment size amplicons” are deliberately designed throughout the genome to variably capture fragments with length of 90 base pairs, 300 base pairs or other sizes, which would additionally provide fragment size information. Such amplicons are additional to those designed for target regions forming part of the panel primarily of interest for detection of genomic alterations, which can continue to be simultaneously detected.
5. The method of the present disclosure allows for determination of appropriate fragment size ranges to derive size ratios using a targeted multiplex NGS panel.
6. The method of the present disclosure may be used in the following applications:
• Detection and quantification of tumor fraction in cfDNA
• Detection of response to treatment based on tumor fraction
• Determination of cancer stage and type based on distribution of fragment sizes in multiple size ranges
• A clinical laboratory service/kit which provides accurate quantitation of cfDNA fragment size to inform clinical decisions.
[0110] In the present disclosure, in addition to the sensitive detection of genomic alterations in cfDNA - SNVs, indels, copy number alterations, microsatellite instability, viruses, tumour mutational burden (TMB) - which are detected as specific changes to the sequence of the genome with respect to a reference, the analysis of fragment size ratio (or a similar measure) is another feature measurable by the amplicon-based NGS assay, providing information on the global physical state of cfDNA, which has been shown to be highly informative of cancer origin and progression. In summary - besides detection of mutations or alterations - in the present disclosure, leveraging yet another biological property of cfDNA i.e. its fragmentation profile was investigated in order to improve the detection of ctDNA specifically, and by inference, the presence of cancer.
REFERENCES:
1. Mouliere, F. et al. High Fragmentation Characterizes Tumour-Derived Circulating DNA. PLOS ONE 6, e23418 (2011).
2. Ellinger, J. et al. Cell-Free Circulating DNA: Diagnostic Value in Patients With Testicular Germ Cell Cancer. J. Urol. 181, 363-371 (2009).
3. Ellinger, J. et al. Noncancerous PTGS2 DNA fragments of apoptotic origin in sera of prostate cancer patients qualify as diagnostic and prognostic indicators. Int. J. Cancer 122, 138-143 (2008).
4. Mouliere, F. et al. Enhanced detection of circulating tumor DNA by fragment size analysis. Sci. Transl. Med. 10, eaat4921 (2018).
5. Cristiano, S. et al. Genome- wide cell-free DNA fragmentation in patients with cancer. Nature 570, 385-389 (2019).
6. Guo, J. et al. Quantitative characterization of tumor cell-free DNA shortening. BMC Genomics 21, 473 (2020).
SEQUENCE LISTING:
Table of exemplary primer sequences
Figure imgf000032_0001
Table of other sequences
Figure imgf000033_0001

Claims

CLAIMS What is claimed is:
1. A method of determining cell-free DNA (cfDNA) fragment size ratio and/or fragment size distribution in a biological sample, comprising:
(a) performing a plurality of multiplexed PCR reactions on cfDNA present in the biological sample using:
(I) a plurality of primer sets specific to a plurality of target regions; wherein each target region comprises a plurality of target gene sequences or variants thereof; wherein the plurality of target gene sequences comprises a plurality of short target sequences and/or a plurality of long target sequences; wherein each short target sequence comprises a short cfDNA fragment comprising a predetermined number of base pairs or less than a predetermined number of base pairs; wherein each long target sequence comprises a long cfDNA fragment comprising more than a predetermined number of base pairs; wherein each primer set comprises one forward primer and a plurality of consecutive reverse primers specific to the plurality of target gene sequences or variants thereof, wherein the forward and reverse primers of each primer set specific to the plurality of target regions, are complementary to the plurality of short target sequences and/or the plurality of long target sequences; wherein each consecutive reverse primer of the plurality of consecutive reverse primers of each primer set specific to the plurality of target regions, is complementary to a different part of the plurality of short target sequence and/or long target sequence originating from the same or different target region; to generate a plurality of amplicons comprising cfDNA fragments originating from the plurality of short target sequences and/or long target sequences;
(b) purifying the plurality of amplicons from step (a);
(c) amplifying the purified product from step (b) by using universal indexed adapter primers to generate a sequencing library;
(d) purifying the sequencing library obtained from step (c);
(e) subjecting the purified sequencing library from step (d) to multiplex sequencing on a next-generation sequencing platform to obtain a plurality of sequencing reads;
(f) deriving a consensus read of each sequence from the plurality of sequencing reads obtained from step (e);
(g) performing a sequence alignment of each consensus read obtained from step (f) to a reference genome,
(A) if the sequence alignment results in a partial alignment to the reference genome of a short target sequence then:
(i) determining the sequence alignment as a consensus read of a short cfDNA fragment (a cfDNA fragment comprising a predetermined number of base pairs or less than the predetermined number of base pairs), and
(ii) counting/enumerating the consensus read pairs of a short cfDNA fragment;
(B) if the sequence alignment results in a partial alignment to the reference genome of a long target sequence then: (i) determining the sequence alignment as a consensus read of a long cfDNA fragment (a cfDNA fragment comprising more than the predetermined number of base pairs), and
(ii) counting/enumerating the consensus read pairs of a long cfDNA fragment; to thereby determine the cfDNA fragment size ratio and/or fragment size distribution in the biological sample, wherein the fragment size ratio is defined as the number of short cfDNA fragments to the number of long cfDNA fragments. The method of claim 1, wherein step (a) further comprises using:
(II) a plurality of primer sets specific to a plurality of cfDNA fragments of predefined lengths, wherein the plurality of cfDNA fragments of predefined lengths are different from the plurality of short cfDNA fragments and plurality of long cfDNA fragments in claim 1(a)(1), wherein each primer set comprises one forward primer and a plurality of consecutive reverse primers specific to a plurality of cfDNA fragments of predefined lengths, to generate a plurality of amplicons comprising cfDNA fragments of predefined lengths that are subjected to steps (b)-(g) of claim 1 to thereby determine the fragment size distribution of the cfDNA in the biological sample. The method of claim 1 or 2, wherein the predetermined number of base pairs is 150. The method of any one of claims 1-3, wherein the biological sample is a bodily fluid selected from the group consisting of blood, bone marrow, cerebral spinal fluid, peritoneal fluid, pleural fluid, lymph fluid, ascites, serous fluid, sputum, lacrimal fluid, stool, urine, saliva, ductal fluid from breast, gastric juice, and pancreatic juice. The method of claim 4, wherein the bodily fluid is blood, and wherein optionally the blood is plasma. The method of any one of the preceding claims, wherein the biological sample is to be obtained from a subject having and/or suspected of having a disease. The method of claim 6, comprising determining that the subject has or is suspected of having a disease if the fragment size ratio of cfDNA fragments in the biological sample obtained from the subject is higher or lower than the fragment size ratio of cfDNA fragments in a control sample (e.g. a sample obtained from a healthy subject). The method of claim 6 or 7, comprising determining that the subject has or is suspected of having cancer if the fragment size ratio of cfDNA fragments in the biological sample obtained from the subject is higher than the fragment size ratio of cfDNA fragments in a control sample (e.g. a sample obtained from a healthy subject). The method of claim 8, wherein the cancer is selected from the group consisting of leukemia, lung cancer, colorectal cancer, breast cancer, pancreatic cancer, prostate cancer, nasopharyngeal cancer, liver cancer, cholangiocarcinoma, esophageal cancer, urothelial cancer, and gastrointestinal cancer. The method of claim 8 or 9, wherein the cancer is selected from the group consisting of pancreatic cancer, prostate cancer, breast cancer, lung cancer, colorectal cancer and liver cancer. The method of any one of claims 8-10, wherein the cfDNA is tumor-derived cfDNA (ctDNA). The method of any one of claims 8-11, comprising determining that the cancer is an early - stage cancer if the fragment size ratio of cfDNA fragments in the biological sample is lower than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the cancer, and determining that the cancer is a late-stage cancer if the fragment size ratio of cfDNA fragments in the biological sample is approximately the same as or higher than the fragment size ratio of cfDNA fragments from a sample predetermined to be from late-stage of the cancer. The method of any one of claims 6-12, wherein the plurality of primer sets of claim 1(a)(1) used to determine if a subject has or is suspected of having a disease, and the plurality of primer sets of claim 1(a)(1) used to determine if a subject is disease-free, is the same. The method of any one of the preceding claims, wherein the number of consecutive reverse primers of the plurality of primer sets of claim 1(a)(1) and claim 2 (II) is two or three. The method of any one of the preceding claims, wherein the plurality of multiplexed PCR reactions performed on the cfDNA comprises 3 to 15 PCR cycles. The method of any one of claims 2-15, wherein the cfDNA fragments of predefined length to be captured using primer sets of claim 2(11) comprise a length of about 70 base pairs to about 350 base pairs, wherein optionally the cfDNA fragments of predefined length comprise about 90 base pairs, or about 300 base pairs. The method of any one of claims 2-16, wherein the number of cfDNA fragments of predefined length to be captured by the plurality of primer sets of claim 2(11) is about 10 to about 100, wherein optionally the number of cfDNA fragments of predefined length to be captured is at least 30. The method of any one of the preceding claims, wherein the plurality of amplicons generated from the plurality of primer sets of step (a) comprise 70 to 170 base pairs. A kit for determining cfDNA fragment size ratio and/or fragment size distribution in a biological sample according to the method of claim 1, comprising a plurality of primer sets specific to a plurality of target regions as defined in claim 1(a)(1), and instructions for use in the method of claim 1. 20. The kit according to claim 19, further comprising a plurality of primer sets specific to a plurality of cfDNA fragments of predefined lengths as defined in claim 2(11) and instructions for use in the method of claim 2. 21. The kit according to claim 19 or 20, wherein the kit further comprises: a buffer for performing a plurality of multiplexed PCR reactions, a DNA polymerase, a plurality of deoxynucleoside triphosphates (dNTPs), and a reagent capable of removing excess primers.
PCT/SG2023/050252 2022-04-13 2023-04-13 Method for determining cfdna fragment size ratio and fragment size distribution WO2023200404A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SG10202203861P 2022-04-13
SG10202203861P 2022-04-13

Publications (2)

Publication Number Publication Date
WO2023200404A2 true WO2023200404A2 (en) 2023-10-19
WO2023200404A3 WO2023200404A3 (en) 2023-11-23

Family

ID=88330447

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2023/050252 WO2023200404A2 (en) 2022-04-13 2023-04-13 Method for determining cfdna fragment size ratio and fragment size distribution

Country Status (1)

Country Link
WO (1) WO2023200404A2 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6586177B1 (en) * 1999-09-08 2003-07-01 Exact Sciences Corporation Methods for disease detection
US6919174B1 (en) * 1999-12-07 2005-07-19 Exact Sciences Corporation Methods for disease detection
US6911308B2 (en) * 2001-01-05 2005-06-28 Exact Sciences Corporation Methods for detecting, grading or monitoring an H. pylori infection
KR20210045953A (en) * 2018-05-18 2021-04-27 더 존스 홉킨스 유니버시티 Cell-free DNA for the evaluation and/or treatment of cancer
EP3927838A4 (en) * 2019-02-22 2022-11-16 AccuraGen Holdings Limited Methods and compositions for early cancer detection

Also Published As

Publication number Publication date
WO2023200404A3 (en) 2023-11-23

Similar Documents

Publication Publication Date Title
JP6695392B2 (en) Diagnosis of fetal chromosomal aneuploidy using genomic sequencing
AU2019301959B2 (en) DNA methylation markers for noninvasive detection of cancer and uses thereof
Bănescu Do we really need genetic tests in current clinical practice?
DK2881739T3 (en) Method and kit for determining the genome integrity and / or quality of a library of DNA sequences obtained by deterministic restriction site whole-genome amplification
WO2023200404A2 (en) Method for determining cfdna fragment size ratio and fragment size distribution
WO2022231449A1 (en) Circulating noncoding rnas as a signature of autism spectrum disorder symptomatology
JP6608424B2 (en) Methods and kits for identifying precancerous colorectal polyps and colorectal cancer
Liu et al. Development and validation of a tetra-primer amplification refractory mutation system-polymerase chain reaction combined with melting analysis-assay for clinical JAK2 V617F mutation detection
Chen et al. Microsatellite Instability Detection in Clinical Cancer Samples: A Multiplex qPCR Approach without Matching Normal Samples
CA3099612C (en) Method of cancer prognosis by assessing tumor variant diversity by means of establishing diversity indices
JP5586164B2 (en) How to determine cancer risk in patients with ulcerative colitis
AU2013203079B2 (en) Diagnosing fetal chromosomal aneuploidy using genomic sequencing
WO2023052795A1 (en) Microsatellite markers
KR20230037111A (en) Metabolic syndrome-specific epigenetic methylation markers and uses thereof
CN115044671A (en) Gene methylation marker for gastric cancer HER2 concomitant diagnosis or combination and application thereof
CN115873947A (en) Nasopharyngeal darcinoma genetic risk assessment system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23788704

Country of ref document: EP

Kind code of ref document: A2