EP3810805A1 - Procédé de détection et de quantification de modifications génétiques - Google Patents

Procédé de détection et de quantification de modifications génétiques

Info

Publication number
EP3810805A1
EP3810805A1 EP19825310.6A EP19825310A EP3810805A1 EP 3810805 A1 EP3810805 A1 EP 3810805A1 EP 19825310 A EP19825310 A EP 19825310A EP 3810805 A1 EP3810805 A1 EP 3810805A1
Authority
EP
European Patent Office
Prior art keywords
double stranded
primer
nucleotides
dna
base pairs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP19825310.6A
Other languages
German (de)
English (en)
Other versions
EP3810805A4 (fr
Inventor
Yukti CHOUDHURY
Hao Chen
Min-Han Tan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lucence Life Sciences Pte Ltd
Original Assignee
Lucence Life Sciences Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lucence Life Sciences Pte Ltd filed Critical Lucence Life Sciences Pte Ltd
Publication of EP3810805A1 publication Critical patent/EP3810805A1/fr
Publication of EP3810805A4 publication Critical patent/EP3810805A4/fr
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6858Allele-specific amplification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates
    • C12Q1/6855Ligating adaptors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • C12N9/1241Nucleotidyltransferases (2.7.7)
    • C12N9/1252DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase

Definitions

  • the present invention relates to the measuring or testing processes involving nucleic acid.
  • the present invention relates to the detection, quantification, and identification of DNA.
  • Detection and quantification of rare genetic events is complicated by nature.
  • high-throughput detection methodologies which are characterized by an error rate of 0.1-1%, with every 1 of 100 or 1000 bases being called incorrectly due to artifacts introduced during sample preparation and sequencing, are needed to detect and quantify rare genetic events.
  • High-throughput detection methodologies known in the art require repeated sampling or deep sequencing of a large number of molecules, that may not be readily possible due to limitations of sample input amount.
  • the person skilled in the art typically would have to amplify the nucleic acid sequences present in the sample.
  • amplification methods known in the art are not reliable and do not retain the degree of accuracy demanded for the detection of genomic alterations that occur at extremely low frequencies (i.e. ⁇ l%) in the background of otherwise unchanged DNA.
  • hybridization-based approach capture methods which tend to capture off-target regions besides (or in addition to) sequences targeted by capture probes. These off-target regions consume sequencing capacity which is undesirable from the viewpoint of cost-reduction and simplification of analytical methods.
  • Hybridization methods also take much longer for library preparation and have lower specificity of target capture with off-target regions being captured by the hybridization probes.
  • conventional methods for target capture using forward and reverse primers flanking the target loci are limited to being able to capture only structural variants with previously known or characterized breakpoints. For the detection of genomic rearrangements with unknown fusion partners, the conventional method (e.g. a pure PCR- based approach) is therefore not applicable. Therefore, there is a need for an alternative method for capturing and identifying distinct targets within a DNA sample. The method should seek to retain specificity of target capture while being able to identify targets of multiple classes.
  • the method of the present invention seeks to impart specificity of target capture while not being limited to capturing target regions with previously known sequence changes.
  • the present invention also seeks to provide an alternative method of detecting and/or quantifying genetic alterations that address reliable detection and a system of verification to ensure errors that occur during amplifications are removed from further processing.
  • the present invention provides a method of simultaneously capturing and identifying distinct targets within a DNA sample, wherein the distinct targets comprise a defined target region and an undefined target region, wherein the undefined target region comprises structural variations or rearrangement or fusion, comprising the steps of:
  • fragments A a plurality of double stranded DNA fragments B, a polymerase, a primer A, and a primer B, wherein:
  • the double stranded DNA fragment A is a double stranded DNA fragment comprising a part of the defined target region
  • the double stranded DNA fragment B is a double stranded DNA fragment comprising a part of the undefined target region
  • the primer A comprises, a barcode sequence, and a target- specific sequence A, wherein the target-specific sequence A is an oligonucleotide complementary to a sequence at/close to the 3’ end of a single strand of the double stranded DNA fragment A;
  • the primer B comprises a separation molecule, a barcode sequence, and a target-specific sequence B,
  • target-specific sequence B is an oligonucleotide complementary to a sequence within a single strand of the double stranded DNA fragment B
  • the double stranded product A is a single stranded elongated primer A that is annealed to the single stranded DNA fragment A;
  • the double stranded product B is a single stranded elongated primer B that is annealed to the single stranded DNA fragment B;
  • the mixture B comprises the double stranded complex B ; f. adding a primer C to the mixture A, wherein the primer C comprises a target- specific sequence C, wherein the target- specific sequence C is an oligonucleotide complementary to a sequence at/close to the 3’ end of the single stranded elongated primer A;
  • the barcode sequence is an oligonucleotide comprising 10 to 16 random nucleotides, or 10 to 15 random nucleotides, or 10 to 13 random nucleotides, or 10 random nucleotides, or 11 random nucleotides, or 12 random nucleotides, or 13 random nucleotides, or 14 random nucleotides, or 15 random nucleotides, or 16 random nucleotides.
  • the barcode sequence is an oligonucleotide comprising 10 random nucleotides.
  • the primer A, the primer B, the primer C and/or the double stranded oligonucleotide further comprises an adapter sequence.
  • the structural variation is selected from the group consisting of deletion, duplication, insertion, inversion, transversion, and translocation.
  • the sequencing result is further used to detect a point mutation within the undefined target regions.
  • step o further comprises:
  • the length of the target-specific sequence A, the target-specific sequence B, and/or the target-specific sequence C is from 16 nucleotides to 30 nucleotides, or from 19 nucleotides to 29 nucleotides, or from 20 nucleotides to 28 nucleotides, or from 21 nucleotides to 27 nucleotides, or from 22 nucleotides to 26 nucleotides, or 16 nucleotides, or 17 nucleotides, or 18 nucleotides, or 19 nucleotides, or 20 nucleotides, or 21 nucleotides, or 22 nucleotides, or 23 nucleotides, or 24 nucleotides, or 25 nucleotides, or 26 nucleotides, or 27 nucleotides, or 28 nucleotides, or 29 nucleotides, or 30 nucleotides.
  • the separation molecule is selected from the group consisting of biotin, digoxigenin (DIG), and Fluorescein isothiocyanate (FITC). In yet another embodiment, the separation molecule is biotin.
  • the bead that binds the separation molecule comprises streptavidin, anti-digoxigenin, or anti-FITC. In yet another embodiment, the bead that binds the separation molecule comprises streptavidin.
  • the DNA sample is obtained from a subject having and/or suspected of having a disease.
  • the disease is cancer or infectious disease.
  • the cancer is selected from the group consisting of lung cancer, colorectal cancer, breast cancer, pancreatic cancer, prostate cancer, nasopharyngeal cancer, liver cancer, cholangiocarcinoma, esophageal cancer, urothelial cancer, and gastrointestinal cancer.
  • the infectious disease is viral infection and bacterial infection.
  • the DNA sample is a liquid sample, a tissue sample, or a cell sample.
  • the liquid sample is bodily fluids selected from the group consisting of blood, bone marrow, cerebral spinal fluid, peritoneal fluid, pleural fluid, lymph fluid, ascites, serous fluid, sputum, lacrimal fluid, stool, urine, saliva, ductal fluid from breast, gastric juice, and pancreatic juice.
  • the bodily fluid is blood.
  • the tissue sample is a frozen tissue sample or a fixed tissue sample.
  • the length of the DNA fragment A and/or the DNA fragment B is from 80 base pairs to 220 base pairs, or from 90 base pairs to 210 base pairs, or from 100 base pairs to 200 base pairs, or from 110 base pairs to 190 base pairs, or from 120 base pairs to 180 base pairs, or from 130 base pairs to 170 base pairs, or from 140 base pairs to 160 base pairs, or about 80 base pairs, or about 90 base pairs, or about 100 base pairs, or about 110 base pairs, or about 120 base pairs, or about 130 base pairs, or about 140 base pairs, or about 150 base pairs, or about 160 base pairs, or about 170 base pairs, or about 180 base pairs, or about 190 base pairs, or about 200 base pairs, or about 210 base pairs, or about 220 base pairs. In yet another embodiment, the length of the DNA fragment A and/or the DNA fragment B is about 150 base pairs.
  • the amount of DNA sample is from lOng to 200ng, or from 20ng to l90ng, or from 30ng to l80ng, or from 40ng to l70ng, or from 50ng to l60ng, or from 60ng to l50ng, or from 70ng to l40ng, or from 80ng to l30ng, or from 90ng to l20ng, or from lOOng to l lOng, or about lOng, or about 20ng, or about 30ng, or about 40ng, or about 50ng, or about 60ng, or about 70ng, or about 80ng, or about 90ng, or about lOOng, or about l lOng, or about l20ng, or about l30ng, or about l40ng, or about l50ng, or about l60ng, or about l70ng, or about l80ng, or about l90ng, or about 200ng.
  • the amount of DNA sample is about lOOng.
  • the DNA sample is selected from the group consisting of a eukaryotic DNA sample, a prokaryotic DNA sample, a viral DNA sample, and a mixture thereof.
  • the prokaryotic DNA sample is a bacterial DNA sample.
  • the eukaryotic DNA sample is selected from the group consisting of a protozoa DNA sample, a fungal DNA sample, an algae DNA sample, a plant DNA sample, and an animal DNA sample.
  • the animal DNA sample is a mammalian DNA sample.
  • the mammalian DNA sample is a human DNA sample.
  • the DNA sample is a cell free DNA or DNA of a lysed cell.
  • the method described herein allows for simultaneous capture and identification of both defined target regions and undefined target regions within a DNA sample, which increases efficiency of the detection, quantification, and identification of DNA.
  • the method described herein does not require initial splitting of the sample at the target capture step, and a single sample is used for capturing both the defined target region and the undefined target region.
  • the copy number of the DNA fragments that can be accessed by both the primer that targets the defined target region (i.e. primer A) and the primer that targets the undefined target region (i.e. primer B) is not reduced. Accordingly, the method achieves high sensitivity and specificity.
  • the method described herein is able to achieve simultaneous detection of: 1) Viral DNA; 2) Microsatellite instability; 3) Structural rearrangements; 4) SNVs and INDELs from samples ranging from cfDNA from plasma (or cerebrospinal fluid, pleural effusion) or DNA from fixed tissue.
  • the present invention provides a kit comprising a plurality of primer A as defined herein, a plurality of primer B as defined herein, a plurality of primer C as defined herein, a bead that binds the separation molecule as defined herein, and a double stranded oligonucleotide as defined herein.
  • the kit further comprises a DNA polymerase, a Taq polymerase, a ligase, and a plurality of deoxyribonucleotide triphosphate (dNTPs).
  • dNTPs deoxyribonucleotide triphosphate
  • Figure 1 is a schematic diagram of the method as described herein. That is, Figure la describes steps a and b of the method as described herein, which are:
  • A A, a plurality of double stranded DNA fragments B, a polymerase, a primer A, and a primer
  • the double stranded DNA fragment A is a double stranded DNA fragment comprising a part of the defined region
  • the double stranded DNA fragment B is a double stranded DNA fragment comprising a part of the undefined region
  • the primer A comprises, a barcode sequence, and a target- specific sequence A, wherein the target- specific sequence A is an oligonucleotide complementary to a sequence at/close to the 3’ end of a single strand of the double stranded DNA fragment A;
  • the primer B comprises a separation molecule, a barcode sequence, and a target-specific sequence B,
  • target-specific sequence B is an oligonucleotide complementary to a sequence within a single strand of the double stranded DNA fragment B
  • b denaturing the double stranded DNA fragment A and the double stranded DNA fragment B thereby allowing the primer A to anneal to a single stranded DNA fragment A and the primer B to anneal to the a single stranded DNA fragment B.
  • the double stranded product A is a single stranded elongated primer A that is annealed to the single stranded DNA fragment A and
  • the double stranded product B is a single stranded elongated primer B that is annealed to the single stranded DNA fragment B; d. adding a bead that binds the separation molecule to the main mixture and allowing the separation molecule in the double stranded product B to bind to the bead thereby forming a double stranded complex B ;
  • the mixture A comprises the double stranded product A
  • the mixture B comprises the double stranded complex B .
  • Figure lc illustrates steps f to h as follows:
  • target-specific sequence C is an oligonucleotide complementary to a sequence at/close to the 3’ end of the single stranded elongated primer A;
  • Figure Id illustrates the addition of one single nucleic acid overhang, which represents step i as follows:
  • Figure le illustrates the addition and ligation of double stranded oligonucleotide comprising a nucleic acid that is complementary to the single nucleic acid overhang in Figure ld (or step i), as follows:
  • Figure If illustrates the sequencing and data processing process of the method as described herein, which refers to steps 1 to n as follows: l. combining the double stranded product C and the double stranded product D;
  • Figure 2a shows illustrative examples of library generation for target amplicons generated from the same starting DNA with both ends being defined by primers. Amplicon generation is achieved by the use of a pair of primers.
  • Figure 2b shows illustrative examples of library generation for target amplicons generated from the same starting DNA with only one primer-defined end. Amplicon generation is achieved by the single-ended ligation of a double-stranded oligo adapter.
  • Figure 3a shows an illustration of the sequencing reads mapping to the reference for amplicons generated with one primer and one ligated adapter.
  • the design of target capture primers is to capture with a multiplicity of primers the region of ALK intron 19. These primers correspond to primer B in Figure 1.
  • Figure 3b shows an illustration of the sequencing reads mapping to the reference for amplicons generated with both ends defined by primers.
  • the captured region is defined by a pair of primers designed to capture a hotspot region in ESR1. The pair of primers corresponds to primer A and primer C from Figure 1.
  • Figure 4a shows a summary of Variant allele frequency (VAF) observed using the method of the present invention vs. expected frequency of variants in the Horizon DiscoveryTM cfDNA standards.
  • the amount of DNA used in library preparation was 50-100 ng.
  • Figure 4b shows observed frequencies averaged across variants in the Horizon DiscoveryTM cfDNA standards.
  • Figure 4c shows the sensitivity of detection of true variants in the Horizon DiscoveryTM cfDNA standards and the specificity reported as the per-base specificity across the target panel (detection of true negatives).
  • Figure 5 shows an example of a primer B, wherein the primer B comprises a separation molecule, an adapter, a barcode, and a target- specific sequence B.
  • Figure 6 shows an example of a product A with a very short target captured region shown for illustrative purposes.
  • Figure 7 shows an example of a double stranded oligonucleotide comprising a nucleotide overhang.
  • Figure 8a shows an example of a Product D, which is obtained when a captured target goes through adapter ligation for amplicon generation as illustrated in Figure le.
  • Figure 8b shows and an example of Product C, which is obtained when a captured target is converted to amplicon with a second primer as shown in Figure lc.
  • Figure 9 shows an example of the amplification result of either Product C or Product D. Only a single strand of a double- stranded product is shown for illustrative purposes.
  • Figure 10 is an example of using the sequencing results for the detection of fusion.
  • Figure 10a is a paired mate mapping for ROS1 gene region known to undergo fusion/rearrangement.
  • the darker and lighter grey reads represent paired reads which have distinct mapping locations in the human genome.
  • the right panel is the region of interest in ROS 1 gene with known rearrangement.
  • the left panel is the region that the paired read for the lighter grey read maps to and is identified as SLC34A2.
  • Figure 10b shows that the location of the paired read in SLC34A2 is chr4: 256666465, which is a distinct chromosome from the location of ALK which is chr6: 117658151.
  • Figure 10 shows that the method of the present invention is able to detect the fusion of SLC34A2 gene to ROS1 gene or other genes that are known to undergo rearrangements including fusions.
  • Figure 11a is a schematic description of a structural variant described as an inversion, at the level of a chromosome.
  • Figure lib is a schematic description of an inversion, compared to the wild-type condition.
  • Figure 12 shows the results of detecting an exemplary inversion in a DNA sample with known inversion variant, using the method of the invention.
  • Figure 13 is a schematic description of a structural variant described as a translocation, at the level of a chromosome.
  • Figure 14 shows the results of detecting an exemplary translocation in a DNA sample with known translocation variant, using the method of the invention.
  • the platform technology allows the simultaneous capture of targeted regions of the human and/or viral genome, as defined by pairs of primers, and of regions not defined by primers pairs, allowing the capture of genomic regions undergoing alterations at unspecified locations within a defined region of interest.
  • a unique molecular tag i.e. barcode sequence
  • the molecular tag i.e. barcode sequence
  • the presence of a molecular tag i.e.
  • barcode sequence is detected using bioinformatics methods known in the art to count and assign each target DNA sequence from high-throughput sequencing to an original DNA molecule from the sample, carrying the same molecular tag (i.e. barcode sequence).
  • the molecular tags i.e. barcode sequence
  • the molecular tags are used to define molecular families, each member of which should carry the exact same sequence unaltered by the processes of capture and conversion to DNA library. Molecular families are then considered together for each region of interest to identify deviations from the expected DNA sequence.
  • tags i.e. barcode sequence
  • the method as described herein is also capable of detecting non-human genomic sequences such as microbial DNA in a mixture with human DNA.
  • the present invention can also be broadly illustrated by the following features. Firstly, a group of primers will bind to DNA fragments comprising the defined (or fully defined) target regions and another group of primers will bind to DNA fragments comprising the undefined (or partly defined) target regions.
  • the primers that annealed to the DNA fragments comprising part of the defined target region are separated from the primers that annealed to the DNA fragments comprising part of the undefined target region (i.e. product B).
  • product A a reverse primer will be added.
  • product B a double stranded oligonucleotide is added and ligated to the end that is not connected to the separation molecule that binds the separation beads in an earlier separation step.
  • product A and product B that has been processed are recombined, amplified together, and the resulting amplicons are sequenced.
  • the method of the present invention is advantageous because it allows for simultaneous capture and identification of both the defined (or fully defined) target regions and the undefined (or partly defined) target regions (i.e. target regions that are prone to undergo sequence changes which are not previously characterized).
  • the simultaneous capture allows for lesser DNA samples to be used.
  • the reason for having a separate method for the undefined (or partly defined) target regions is that these regions cannot be captured by a pair of primers because the sequence changes can happen at positions within the target that cannot be known when the target capture is being performed (i.e. the precise location and sequence change is unknown). Because the location and the sequence change is unknown, it is not possible to use a pair of primers flanking the target region, as happens in conventional methods. Further, the use of primers and polymerase-mediated extension affords for greater specificity of target capture, compared to conventional methods based on probe hybridization.
  • another advantage that the present invention has is that despite separate workflows for converting the defined (or the fully defined) targets and the undefined (or the partly defined) targets into sequencing libraries, the method does not require initial splitting of the sample. By not requiring such splitting, the copy number of the DNA fragments that can be accessed by both the primer that targets the defined target region (i.e. primer A) and the primer that targets the undefined target region (i.e. primer B) is not reduced.
  • the present invention provides a method of simultaneously capturing and identifying distinct targets within a DNA sample, wherein the distinct targets comprise a defined (or a fully defined) target region and an undefined (or a partly defined) target region, wherein the undefined (or the partly defined) target region comprises structural variations or rearrangement or fusion, comprising the steps of:
  • the double stranded DNA fragment A is a double stranded DNA fragment comprising a part of the defined (or the fully defined) target region;
  • the double stranded DNA fragment B is a double stranded DNA fragment comprising a part of the undefined (or the partly defined) target region;
  • the primer A comprises, a barcode sequence, and a target- specific sequence A, wherein the target-specific sequence A is an oligonucleotide complementary to a sequence at/close to the 3’ end of a single strand of the double stranded DNA fragment A;
  • the primer B comprises a separation molecule, a barcode sequence, and a target-specific sequence B,
  • target-specific sequence B is an oligonucleotide complementary to a sequence within a single strand of the double stranded
  • the double stranded product A is a single stranded elongated primer A that is annealed to the single stranded DNA fragment A and
  • the double stranded product B is a single stranded elongated primer B that is annealed to the single stranded DNA fragment B; d. adding a bead that binds the separation molecule to the main mixture and allowing the separation molecule in the double stranded product B to bind to the bead thereby forming a double stranded complex B;
  • the mixture A comprises the double stranded product A
  • the mixture B comprises the double stranded complex B ; f. adding a primer C to the mixture A, wherein the primer C comprises a target-specific sequence C,
  • target-specific sequence C is an oligonucleotide complementary to a sequence at/close to the 3’ end of the single stranded elongated primer A;
  • the polymerase elongate the primer C thereby obtaining a double stranded product C, wherein the double stranded product C is a single stranded elongated primer C that is annealed to the single stranded elongated primer A; i. connecting a single nucleotide to the 3’ end of the single stranded elongated primer B of the double stranded complex B in the mixture B ;
  • k ligating the double stranded oligonucleotide to double stranded complex B at the 3’ end of the single stranded elongated primer B and 5’ end of the single stranded DNA fragment B thereby obtaining a double stranded product D; l. combining the double stranded product C and the double stranded product D; m. amplifying the double stranded product C and the double stranded product D thereby obtaining a plurality of amplicons (or DNA molecules for sequencing); n. sequencing the plurality of amplicons thereby obtaining a plurality of sequencing result;
  • the present invention provides a method of simultaneously identifying a defined region and an undefined region within a DNA sample, wherein the undefined region comprises a structural variation, comprising the steps of:
  • A A, a plurality of double stranded DNA fragments B, a polymerase, a primer A, and a primer
  • the double stranded DNA fragment A is a double stranded DNA fragment comprising a part of the defined region
  • the double stranded DNA fragment B is a double stranded DNA fragment comprising a part of the undefined region
  • the primer A comprises, a barcode sequence, and a target- specific sequence A, wherein the target- specific sequence A is an oligonucleotide complementary to a sequence at/close to the 3’ end of a single strand of the double stranded DNA fragment A;
  • the primer B comprises a separation molecule, a barcode sequence, and a target-specific sequence B,
  • the target-specific sequence B is an oligonucleotide complementary to a sequence within a single strand of the double stranded DNA fragment B
  • b denaturing the double stranded DNA fragment A and the double stranded DNA fragment B thereby allowing the primer A to anneal to a single stranded DNA fragment A and the primer B to anneal to the a single stranded DNA fragment B
  • c allowing the polymerase to elongate the primer A and the primer B thereby obtaining a double stranded product A and a double stranded product B
  • the double stranded product A is a single stranded elongated primer A that is annealed to the single stranded DNA fragment A and
  • the double stranded product B is a single stranded elongated primer B that is annealed to the single stranded DNA fragment B;
  • the mixture A comprises the double stranded product A
  • the mixture B comprises the double stranded complex B
  • target-specific sequence C is an oligonucleotide complementary to a sequence at/close to the 3’ end of the single stranded elongated primer A;
  • Figure la describes steps a and b of the method as described herein, which are:
  • the DNA fragment A is a DNA fragment comprising a part of the defined region
  • the DNA fragment B is a DNA fragment comprising a part of the undefined region
  • the primer A comprises, a barcode sequence, and a target- specific sequence A, wherein the target- specific sequence A is an oligonucleotide complementary to a sequence at/close to the 3’ end of a DNA fragment A;
  • the primer B comprises a separation molecule, a barcode sequence, and a target-specific sequence B,
  • target-specific sequence B is an oligonucleotide complementary to a sequence within a DNA fragment B
  • the double stranded product A is a single stranded elongated primer A that is annealed to the DNA fragment A and - the double stranded product B is a single stranded elongated primer B that is annealed to the DNA fragment B;
  • the mixture A comprises the double stranded product A
  • the mixture B comprises the double stranded complex B .
  • Figure lc illustrates steps f to h as follows:
  • target-specific sequence C is an oligonucleotide complementary to a sequence at/close to the 3’ end of the single stranded elongated primer A;
  • Figure ld illustrates the addition of one single nucleic acid overhang, which represents step I as follows:
  • Figure le illustrates the addition and ligation of double stranded oligonucleotide comprising a nucleic acid that is complementary to the single nucleic acid overhang in Figure ld (or step i), as follows:
  • Figure lf illustrates the sequencing and data processing process of the method as described herein, which refers to steps 1 to n as follows:
  • the term“defined region” is defined as a region in a DNA fragment that is free of structural variations that may be found in the undefined region (i.e. structural variations that are not previously characterized). That is, the“defined region” comprises a region of DNA fragment that structurally is identical to or substantially the same as DNA fragments from a reference sequence.
  • a“fully defined target region” is a target for which the sequence identity (i.e. the start and end of the target) are fully defined prior to capture.
  • the term“defined region”,“defined target region”, and“fully defined target region” are used interchangeably.
  • “undefined region” would encompass a region of DNA fragment that has structural variations that are not previously characterized.
  • “partly defined target region” is a target for which the sequence identity is not fully defined prior to target capture and comprises target region prone to undergo sequence changes (such as structural rearrangements). It is appreciated that the precise sequence composition of a“partly defined target region” cannot be predetermined and thus it may be impossible to design a pair of defining primers for such region.
  • the sequence definition of an “undefined region”, or a“partly defined target region”, such as detection of genomic rearrangements with unknown fusion partners, is determinable only once the sequencing results are obtained.
  • the target specific sequence A and the target specific sequence B do not overlap.
  • the term“undefined target region” does not mean that 100% of the DNA sequence within the target region is unknown in the art.
  • the“undefined target region” refers to a target region wherein about 5%, or about 10%, or about 20%, or about 30%, or about 40%, or about 50%, or about 60%, or about 70%, or about 80%, or about 90%, or about 95% of the DNA sequence within the target region is unknown in the art.
  • the term“undefined region”,“undefined target region”, and“partly defined target region” are used interchangeably.
  • the term “barcode sequence” is a commonly used term in the art of nucleic acid sequencing and used within the definition as known in the art.
  • the term“barcode sequence” refers to the encoded molecules or barcodes that include variable amount of information within the nucleic acid sequence.
  • the barcode sequence is a tag that can be read out using any of a variety of sequence identification techniques, for example, nucleic acid sequencing, probe hybridization based assay, and the like.
  • the barcode sequence is used in the method as described herein to append different target specific sequences, such that when the barcode sequence and target specific sequence anneal to the (target) DNA fragment, each different (target) DNA fragment would then have a unique barcode sequence that is attached to it and read out with the sequence of the (target) DNA fragment from that sample.
  • the barcode sequence allows the pooled analysis of multiple unique DNA fragments, where the resulting sequence information from the pool can be later attributed back to each starting DNA fragment. That is, after the process of amplification, the barcode sequence is used to group amplicons to form a family of amplicons having the same oligonucleotide with a randomly assigned nucleic acid sequence (i.e.
  • the barcode sequence is an overhang that does not complement any sequence within DNA fragment A and DNA fragment B.
  • the barcode sequence may be an oligonucleotide comprising 10 to 16 random nucleotides, or 10 to 15 random nucleotides, or 10 to 13 random nucleotides, or 8 random nucleotides, or 11 random nucleotides, or 12 random nucleotides, or 13 random nucleotides, or 14 random nucleotides, or 15 random nucleotides, or 16 random nucleotides.
  • the barcode sequence is an oligonucleotide comprising 10 random nucleotides.
  • the barcode sequence may be defined as NNNNNNNNNN (SEQ ID NO: 1), which may have the sequences such as, but is not limited to, CATTACATAC (SEQ ID NO: 2), GCGTGGACAA (SEQ ID NO: 3), TTTTTAGACA (SEQ ID NO: 4), TAAGAGGTCC (SEQ ID NO: 5), and the like.
  • the term“at the 3’ end” corresponds to the last nucleotide of a single DNA strand.
  • the term“close to the 3’ end” corresponds to a distance of from 1 to 100 nucleotides, or from 5 to 90 nucleotides, or from 10 to 80 nucleotides, or from 15 to 70 nucleotides, or from 20 to 60 nucleotides, or about 1 nucleotides, or about 5 nucleotides, or about 10 nucleotides, or about 15 nucleotides, or about 20 nucleotides, or about 25 nucleotides, or about 30 nucleotides, or about 35 nucleotides, or about 40 nucleotides, or about 50 nucleotides, or about 60 nucleotides, or about 70 nucleotides, or about 80 nucleotides, or about 90 nucleotides, or about 100 nucleotides from the 3’ end of a single
  • the binding site of the reverse primer (for example, primer C) is predetermined such that the overall length of the target region defined by combination of the forward primer (for example primer A) and the reverse primer is from 80 base pairs (bp) to 200 bp, or from 100 bp to 180 bp, or from 120 bp to 160 bp, or from 140 bp to 150 bp, or about 80 bp, or about 90 bp, or about 100 bp, or about 110 bp, or about 120 bp, or about 130 bp, or about 140 bp, or about 150 bp, or about 160 bp, or about 170 bp, or about 190 bp, or about 200 bp.
  • step i of the present invention i.e. the step of connecting a single nucleotide to the 3’ end of the single stranded elongated primer B of the double stranded complex B in the mixture B
  • the single nucleotide that is to be connected with the 3’ end of the single stranded elongated primer B can be any nucleotide.
  • the single nucleotide may include, but is not limited to, adenine (A), cytosine (C), guanine (G), thymine (T), and the like.
  • Taq polymerase is used and the connecting step is known as“A- tailing”.
  • the A-tailing step exploits the intrinsic terminal transferase activity of Taq polymerase by which it catalyzes the template-independent addition of an adenine residue to the 3' end of both strands of DNA molecules.
  • dA is added preferentially to 3' end of DNA molecule by Taq polymerase.
  • Other nucleotides can be added but would require differing reaction conditions for Taq activity. Therefore, under standard reaction conditions, in the presence of dNTPs, Taq polymerase will preferentially incorporate dA to the 3’ end of the DNA molecules.
  • the method as described herein utilises sequencing platforms/methods known in the art, it would be apparent to the person skilled in the art that the DNA fragment processed through the steps of the method as described herein may have to be prepared to comprise additional nucleic acid sequences recognised by the sequencing platforms/methods (i.e. adapter sequences).
  • the primer A, the primer B, the primer C and/or the double stranded oligonucleotide further comprises an adapter sequence.
  • the term “adapter sequence” refers to an oligonucleotide sequence bound to the 5' and 3' end of each DNA fragment in a sequencing library.
  • the adapter sequences are complementary to the plurality of oligonucleotide present on the surface of flow cells of the sequencing tools thereby allowing the DNA fragment to attach to the sequencing tools.
  • the adapter may be a universal P5 adapter as follows: AATGATACGGCGACCACCGAGATCT (SEQ ID NO: 13), and/or an indexed P7 adapter as follows: CAAGCAGAAGACGGCATACGAGAT (SEQ ID NO: 14) (see Table 1).
  • the distinct targets within the DNA sample that can be simultaneously captured and identified by the method of the present comprises a defined target region (or a fully defined target region) and an undefined target region (or a partly defined target region).
  • the undefined target region (or the partly defined target region) comprises structural variations or rearrangement or fusion, which are not previously characterized.
  • the undefined target region (or the partly defined target region) is prone to undergo a structural rearrangement or sequence changes.
  • the term“structural variations” refers to variations in the structure of the genome - i.e. in the order of sections of the DNA (as opposed to the smaller variation to the sequence alone which maintains the overall order to the DNA sections with respect to the genome).
  • the term“rearrangement” refers to - rearrangements in the order of sections of the DNA (interchangeable with“structural variations”).
  • the term“fusion” refers to structural variants produced through interchromosomal or intrachromosomal rearrangements.
  • the structural variations may include, but are not limited to, deletion, duplication, insertion, inversion, transversion, translocation, and the like.
  • the term“deletion” refers to a sequence change where more than 50 nucleotides are removed.
  • the term“duplication” refers to a sequence change where a copy of one or more nucleotides are inserted directly 3'-flanking of the original copy.
  • the term“insertion” refers to a sequence change where more than 50 nucleotides are inserted between two nucleotides but where the insertion is not a copy of a sequence immediately 5'-flanking.
  • the term“inversion” refers to a sequence change where more than one nucleotide replacing the original sequence are the reverse complement of the original sequence.
  • the term“translocation” refers to rearrangement of parts between non-homologous chromosomes, which can result in“fusion”.
  • the method as described herein can also be used to detect single nucleotide variations such as substitution.
  • the sequencing result is further used to detect a single nucleotide variation.
  • the sequencing result is further used to detect a single nucleotide variation within the undefined target region (or the partly defined target region).
  • the sequencing result is further used to detect a single nucleotide variation within the defined target region (or the fully defined target region).
  • the term“single nucleotide variation”,“single nucleotide sequence variation”, and“point mutation” may be used interchangeably.
  • the defined target region (or the fully defined target region) comprises single nucleotide sequence variations, small insertion, small deletion, genomic copy number alteration, deletion of homopolymeric region, foreign DNA sequences (e.g. wherein the DNA sample is human DNA, microbial DNA sequences are considered foreign DNA sequence), polymorphisms or single-nucleotide variations in microbial DNA sequence, and the like.
  • the deletion of homopolymeric region may include but is not limited to micro satellite instability.
  • the term“single nucleotide sequence variations” or“single nucleotide variations” refers to variation in a single nucleotide that occurs at a specific position in the genome, differing from the nucleotide defining the position in the reference genome.
  • the term“small insertion” refers to a sequence change where less than 50 nucleotides are inserted between two nucleotides but where the insertion is not a copy of a sequence immediately 5'-flanking.
  • the term“small deletion” refers to a sequence change where less than 50 nucleotides are removed.
  • the term“copy number alteration” refers to the repetition of sections of the genome (duplication) or loss of sections of the genome (deletion).
  • the term“deletions of homopolymeric regions” refers to the shortening of a homopolymeric tracts in the genome.
  • An example of“deletions of homopolymeric region” is GCGAAAAAAAAAAAAATA becomes GCGAAATA, this a deletion of 12 A’s from the the homopolymeric tract of 15 A’s.
  • the term“polymorphism” refers to a variation in a single nucleotide that occurs at a specific position in the genome, and is a variation in all copies of the organism’s genome, differing from nucleotide defining the position in the organism’s population (reference).
  • the term“microsatellite instability” refers to genetic instability in short nucleotide repeats or microsatellite, which is a tract of tandemly repeated (i.e. adjacent) DNA motif ranging from one to six or up to ten nucleotides, with each motif repeated 5 to 50 repeated times.
  • TMB total mutation (or variant) load or tumour mutational burden
  • the method of the present disclosure can also be used to detect certain diseases.
  • the DNA sample for the method of the present disclosure is obtained from a subject having and/or suspected of having a disease.
  • the disease may include, but is not limited to cancer, infectious disease, and the like.
  • the cancer may include, but is not limited to, lung cancer, colorectal cancer, breast cancer, pancreatic cancer, prostate cancer, nasopharyngeal cancer, liver cancer, cholangiocarcinoma, esophageal cancer, urothelial cancer, gastrointestinal cancer, and the like.
  • the infectious diseases may include, but is not limited to, viral infection, bacterial infection, and the like.
  • step n further comprises:
  • the mutation may be single nucleotide variations. In another example, the mutation may be small INDELs. In another example, the mutation may be microsatellite instability.
  • reference sequence refers to nucleotide sequences (such as DNA sequences or RNA sequences) known in the art that may be obtainable from public databases.
  • the term“consensus sequence” refers to a nucleotide sequence obtained from consensus calling.
  • consensus calling is performed by identifying the nucleotide at each position for each sequencing result within the subgroup, comparing the identity for the nucleotide at each position across the plurality of sequencing results, and determining a majority nucleotide at each position. If the majority nucleotide count is above a threshold set for determining majority for specific position, the assignment for said position is the majority nucleotide. If the majority nucleotide count is below this threshold, no assignment is made for said position.
  • the threshold is variable for every position and is a function of the total number of sequencing results corresponding to a specific position.
  • the length of the target-specific sequence A, the target-specific sequence B, and/or the target-specific sequence C is from 17 nucleotides to 31 nucleotides, or from 19 nucleotides to 29 nucleotides, or from 20 nucleotides to 28 nucleotides, or from 21 nucleotides to 27 nucleotides, or from 22 nucleotides to 26 nucleotides, or 18 nucleotides, or 19 nucleotides, or 20 nucleotides, or 21 nucleotides, or 22 nucleotides, or 23 nucleotides, or 24 nucleotides, or 25 nucleotides, or 26 nucleotides, or 27 nucleotides, or 28 nucleotides, or 29 nucleotides, or 30 nucleotides.
  • the length of the target- specific sequence A, the target- specific sequence B, and/or the target- specific sequence C is 22 nucleotides.
  • a person skilled in the art is also aware that in order to determine the length of the primer A, the primer B, the primer C, the target-specific sequence A, the target- specific sequence B, and/or the target- specific sequence C, he will have to also consider other primer properties including, but not limited to, melting temperature (or Tm), GC -content (or guanine-cytosine content or GC%) and propensity of a primer to dimerize with other primers and itself.
  • a“separation molecule” refers to a tag or molecule that is capable of binding to a bead to thereby allow for the separation of the nucleotide that is connected to the separation molecule.
  • the separation molecule may be, but is not limited to biotin, digoxigenin (DIG), Fluorescein isothiocyanate (FITC), and the like.
  • the separation molecule is biotin.
  • the bead that binds to the separation molecule may comprise, but is not limited to a substrate linked with streptavidin, anti-digoxigenin, anti-FITC, and the like.
  • the bead that binds to the separation molecule comprises magnetic beads linked to streptavidin.
  • the bead that binds to the separation molecule may be magnetic beads that have been functionalized with streptavidin, anti-digoxigenin, anti- FITC, and the like.
  • the method as described herein is compatible with multiple sources of DNA material, including circulating DNA from blood plasma or cerebrospinal fluid (CSF), fragmented formalin-fixed paraffin embedded DNA (FFPE DNA), genomic DNA from leukocytes and from other cells.
  • CSF cerebrospinal fluid
  • FFPE DNA fragmented formalin-fixed paraffin embedded DNA
  • the method as described herein could also cover more than 50 targeted genes, over 500 targeted regions in the human genome and 15 DNA virus families, and is readily expandable for future inclusion of target regions.
  • the sequencing library is based on the use of primers for the capture of target regions, it works with equivalent specifications on multiple sample types such as circulating DNA and FFPE DNA.
  • primer-based capture of FFPE DNA is not hindered by fragmentation, as long as the expected amplicon size as defined by primers is limited to a reasonably short length of about l60-bp.
  • Up to eight classes of target regions such as single-nucleotide variations or fusions can also be simultaneously captured using the first set of primers from a single sample of DNA.
  • steps are taken for the completion of amplicons or ends with sequencing adapters and final amplification before high-throughput sequencing.
  • the combination of primer and PCR-based methods for sequencing analysis allows for a smaller input DNA to be worked with without losing sensitivity.
  • the method as described herein can be performed in a liquid sample or tissue sample.
  • the sample is a liquid sample, a tissue sample, or a cell sample.
  • the liquid sample may include, but is not limited to, bodily fluids such as, but is not limited to, blood, bone marrow, cerebral spinal fluid, peritoneal fluid, pleural fluid, lymph fluid, ascites, serous fluid, sputum, lacrimal fluid, stool, urine, saliva, ductal fluid from breast, gastric juice, pancreatic juice, and the like.
  • the bodily fluid is blood.
  • the liquid sample that is useful for the method of the present technology is a liquid that comprises DNA which is circulating and not contained within cells (or cell free DNA). The DNA within the liquid can be isolated from the liquid in a form that is free from impurities (or pure form).
  • the tissue sample may include, but is not limited to frozen tissue sample, fixed tissue sample (such as formalin-fixed tissue sample).
  • the method of the present invention is optimized for DNA fragments having certain sizes.
  • a person skilled in the art is aware that when the DNA sample comprises full- length DNA, the full-length DNA can be processed and fragmented to certain length that is suitable for the method of the present invention.
  • the length of the DNA fragment A and/or the DNA fragment B is from 80 base pairs to 220 base pairs, or from 90 base pairs to 210 base pairs, or from 100 base pairs to 200 base pairs, or from 110 base pairs to 190 base pairs, or from 120 base pairs to 180 base pairs, or from 130 base pairs to 170 base pairs, or from 140 base pairs to 160 base pairs, or about 80 base pairs, or about 90 base pairs, or about 100 base pairs, or about 110 base pairs, or about 120 base pairs, or about 130 base pairs, or about 140 base pairs, or about 150 base pairs, or about 160 base pairs, or about 170 base pairs, or about 180 base pairs, or about 190 base pairs, or about 200 base pairs, or about 210 base pairs, or about 220 base pairs. In one example, the length of the DNA fragment A and/or the DNA fragment B is about 150 base pairs.
  • the amount of DNA sample may be from lOng to 200ng, or from 20ng to l90ng, or from 30ng to l80ng, or from 40ng to l70ng, or from 50ng to l60ng, or from 60ng to l50ng, or from 70ng to l40ng, or from 80ng to l30ng, or from 90ng to l20ng, or from lOOng to l lOng, or about lOng, or about 20ng, or about 30ng, or about 40ng, or about 50ng, or about 60ng, or about 70ng, or about 80ng, or about 90ng, or about lOOng, or about l lOng, or about l20ng, or about l30ng, or about l40ng, or about l50ng, or about l60ng, or about
  • the DNA sample to be used in the method as described herein may include, but is not limited to, a eukaryotic DNA sample, a prokaryotic DNA sample, a viral DNA sample, and a mixture thereof.
  • the prokaryotic DNA sample is a bacterial DNA sample.
  • the eukaryotic DNA sample may include, but is not limited to, a protozoa DNA sample, a fungal DNA sample, an algae DNA sample, a plant DNA sample, an animal DNA sample, and the like.
  • the animal DNA sample is a mammalian DNA sample (such as human DNA sample).
  • the DNA sample may be a cell free DNA or DNA of a lysed cell.
  • the present invention provides for a kit comprising a plurality of primer A as defined herein, a plurality of primer B as defined herein, a plurality of primer C as defined herein, a bead that binds the separation molecule as defined herein, and a double stranded oligonucleotide as defined herein.
  • the kit of the present invention further comprises a DNA polymerase, a Taq polymerase, a ligase, a plurality of deoxyribonucleotide triphosphate (dNTPs).
  • the reagents provided in the kit as described herein may be provided in separate containers comprising the components independently distributed in one or more containers. As the method as described herein relates to sequencing (such as high-throughput sequencing), further components required in sequencing processes could be easily determined by the person skilled in the art.
  • a primer includes a plurality of primers, including mixtures and combinations thereof.
  • the terms“increase” and“decrease” refer to the relative alteration of a chosen trait or characteristic in a subset of a population in comparison to the same trait or characteristic as present in the whole population. An increase thus indicates a change on a positive scale, whereas a decrease indicates a change on a negative scale.
  • the term“change”, as used herein, also refers to the difference between a chosen trait or characteristic of an isolated population subset in comparison to the same trait or characteristic in the population as a whole. However, this term is without valuation of the difference seen.
  • the term“about” in the context of concentration of a substance, size of a substance, length of time, or other stated values means +/- 5% of the stated value, or +/- 4% of the stated value, or +/- 3% of the stated value, or +/- 2% of the stated value, or +/- 1% of the stated value, or +/- 0.5% of the stated value.
  • range format may be disclosed in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosed ranges. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
  • ACACGACGCrCrrCCGArCTTVNNNNNNNNNGGTGACCCTTGTCTCTGTGTTC (SEQ ID NO: 6), wherein the bases in italic and underline are an example of adapter sequence, the bases in bold represent the barcode sequence and the bases in underline is an example of target specific sequence.
  • GACGTGTGCTCTTCCGATCTGAGCCCAGCACTTTGATCTTTTT (SEQ ID NO: 7), where bases in underline are target-specific primers.
  • Universal amplification primer 1 (an example of the primer for amplifying product C or D): AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TC*T (SEQ ID NO: 10)
  • Universal amplification primer 2 (indexed) (an example of the primer for amplifying product C or D, the index is the bases in bold and italic font):
  • the workflow for preparing DNA library is divided into three major steps.
  • target DNA were captured using a multiplex pool of primers.
  • Each primer is a molecular tag complex comprising an oligonucleotide with 10 random nucleotides (a molecular tag/barcode sequence) linked to a target- specific primer, which functions in target capture.
  • Some of the primers in primer pool for target capture are 5’ biotin-labeled (B).
  • An example of the primer that is biotin-labeled is shown on Figure 5.
  • each captured target DNA had a random molecular tag linked to it.
  • Excess unused primers were removed by purification with l.5x AMPure XP beads in two rounds. This means eluate from the first round of purification was bound to l.5x beads and subjected to a second round of purification. Final elution was done in 10 to 30 pl of buffer EB.
  • targets with defined (specified) ends and those with undefined ends were subjected to distinct treatments to complete the structure of a target-specific amplicon which could then be amplified to generate a sequencing platform- specific DNA library molecule.
  • targets with undefined ends were separated from other targets using the biotin tags incorporated in the target capture primers.
  • the 10 to 30 m ⁇ eluate from step 1 (target capture) was mixed with an equal volume of washed MyOne Streptavidin Cl beads, and the bead mix was allowed to incubate at room temperature with intermittent mixing for 1 hour to allow the binding of biotin to streptavidin.
  • target DNA that were captured with biotin-labeled primers become immobilized on the streptavidin-coated beads. Meanwhile, target DNA captured with unlabeled primers remain in the supernatant (or bead solution mix). At the end of one hour, the supernatant containing target DNA captured with unlabeled primers were collected separately, and the target DNA captured with biotin-labeled primers were on beads, thus achieving separation of captured DNA intended for different treatments in step 2 for amplicon-generation.
  • Targets captured on beads were washed briefly with bead wash (B&W) solution, followed by“on-bead” A-tailing reaction. Briefly, the beads with immobilized targets were resuspended in lOul reaction mixture containing 6.4m1 water, Im ⁇ 10X buffer for KOD-Plus- Neo, Im ⁇ of 2mM dNTPs, O. ⁇ m ⁇ of 25mM MgS0 4 , and Im ⁇ of 10X A-attachment mix
  • the mixture was incubated at 25 C for 1 hr, with intermittent shaking. At the end of hours, the mixture was chilled on ice. The beads were then washed three times with IX B&W buffer. At the end of this step, target DNA captured on the beads would have undergone amplicon-generation by the one-sided ligation of the partial adapter. Adapter ligation on the other (immobilized) end was inhibited due to the overhang tail introduced during target capture, and the presence of biotin- streptavidin complex.
  • the completed amplicons were eluted from the streptavidin beads by disrupting the biotin- streptavidin bonds, by incubating the beads in 10 m ⁇ of elution solution (lOmM EDTA pH 8.2 and 95% formamide) at 65 C for 5 mins to elute biotin labelled targets from the beads.
  • the eluate was collected following magnetic separation of streptavidin beads.
  • the eluate containing captured DNA targets (converted to amplicons) was collected and purified once with l.5x AMPure XP beads to remove the formamide solution and replace it with EB buffer. DNA was eluted in 11.5 m ⁇ Buffer EB.
  • Amplicon generation was done using the following thermocycling conditions: Denaturation at 94°C for 1 min, followed by 1 to 3 cycles of 98°C for 1 min, 60°C for 6 mins, and 68°C for 5 mins. The completed amplicons were purified twice from the PCR mix with l.5x AMPure beads. DNA was eluted in 11.5 m ⁇ Buffer EB.
  • An example of the product after step 2 if target captured goes through adapter ligation for amplicon generation is shown on Figure 8a and an example of the product after step 2 if target captured is converted to amplicon with a second primer is shown on Figure 8b.
  • the third step ( Figure If), a final amplification was performed to amplify the targets and to complete the library structure required for sequencing on the Illumina platform, by introducing sequencing adapters.
  • the purified targets amplicons from step 2
  • those with undefined (unknown) ends from the starting DNA material are recombined and pooled into one final PCR reaction.
  • the PCR was carried out with the following profile; Denaturation at 98 C for 45 s, followed by 22-26 cycles of 98 C for 15 s, 60 C for 30 s, and 72 C for 30 s, with a final extension at 72 C for 1 min.
  • the amplified library was purified twice with Q.6-0.8x AMPure XP beads to remove non-specific products.
  • the quality and quantity of the sequencing library was assessed using the 4200 Tapestation system (Agilent Technologies, USA) and KAPA Library ' Quantification Kit for Illumina ® Platforms (Kapa Biosystems Inc., USA) respectively.
  • An Example of the product after step 3 is shown on Figure 9.
  • FASTQ files were processed using a custom pipeline. First, expected amplieons were identified and labeled in the FASTQ files based on the expected primer sequences in Read 1 and paired Read 2. For amplieons with one unknown end, only primers in Read 1 were used for identification and labeling. Primer sequences and upstream molecular tag sequences were trimmed using cutadapt, primer trimmed sequences were mapped to the reference genome using hwa-mem. For“primer” trimmed fastq files, the name of the primer which had the best match to a read was concatenated to the name of the mapped output reads (for both Read 1 and Read 2).
  • the primer name assigned to Read 1 might not always match that of Read 2, which could be due to overlapping amplieons or non-specific binding.
  • An “ampliconjname” was assigned to each paired read by combining the matching primer name of Read 1 and Read 2 (concatenated by semicolon).
  • Molecular tag (or barcode) sequences were included in the trimmed“primer” sequences of Read 1, and could be extracted given the unique staicture of primer sequences in Read 1.
  • the extracted molecular tag sequences are clustered in two steps: 1. Initial grouping by exact match of the combination of ampliconjiame + barcode sequence and 2.
  • Cluster Reassignment in each group of same amplicon_name, barcodes were further reassigned using global pairwise alignment with maximum 2 base differences between barcodes. Barcode clusters with number of associated reads less than 3 (after cluster reassignment) were considered unreliable clusters and removed from downstream analysis.
  • Consensus Calling was done for each molecular tag (or barcode) cluster, by first performing global alignment among all associated reads using MAFFT.
  • the consensus base in each aligned position was called by determining the majority representative base type, the percentage of which is no less than an automatically determined threshold.
  • the threshold was a function of the total number of reads for that barcode sequence. If no representative base could be called, the position was assigned N (as opposed to one of A, C, T, G).
  • a new quality score was assigned to each position, which was either 90 th percentile of all the quality values from the representative base type in that position (if a consensus base was found), or l0 th percentile of all quality values in that position (if no consensus bases was found).
  • the consensus reads were written to a new FASTQ file. An exemplary result of the consensus reads mapped to the reference is shown on Figure 3.
  • Exemplary results for variant detection and frequency of clinical samples are shown on Table 2 and exemplary results for detection of Epstein Barr Virus (EBV) microbial DNA targets in clinical samples are shown on Table 3.
  • EBV Epstein Barr Virus
  • Table 3 exemplary results for detection of Epstein Barr Virus (EBV) microbial DNA targets in clinical samples are shown on Table 3.
  • EBV Epstein Barr Virus
  • Table 3 clinical samples which have been previously characterized for EGFR mutations (positive or negative) and EBV DNA (present or absent) by orthogonal methods (such as Quantitative PCR) were identified.
  • Cell-free DNA (cfDNA) was extracted from the same samples which had been selected to have had sufficient plasma. The extracted cfDNA was quantified and processed with the method as described herein to determine if similar results of detection (of EGFR mutations and EBV DNA) with orthogonal methods (such as Quantitative PCR) could be achieved.
  • Tables 2 and 3 summarize the findings of orthogonal methods (such as Quantitative PCR) presented together with findings from the method as described herein.
  • orthogonal methods such as Quantitative PCR
  • 16 clinical samples were tested by the method as described herein and by quantitative PCR, respectively, for detection of EGFR mutations (such as small nucleotide variants, and small INDELs) and determination of the frequency of mutations. The result showed 98% concordance of mutation detection and agreement of mutant allele frequency by both methods.
  • sample numbers in the first column of Table 2 which showed concordance between the conventional method (quantitative PCR) and the method of the present invention are: 1, 2, 3, 4, 5 (for L858R), 6, 7, 8, 9, 10, 11 (for EGFR c.2236_2250del), 12, 13, 14, 15 (for E746_A750delELRA and EGFR T790M), and 16 (for KRAS G12D).
  • quantitative PCR which is used to detect various mutations in separate reactions (each reaction is used to detect one mutation, i.e.
  • each row in column 2 (labeled“Mutation reported by AS-PCR”) of Table 2 corresponds to one single reaction)
  • the method of present invention is able to simultaneously detect multiple mutations in the same sample, in one single reaction (i.e. all the mutations listed in all the rows in column 6 (labeled“Variant identified by Hallmark”) of Table 2 are detected in one single reaction).
  • EBV Epstein Barr Virus
  • the sample numbers in the first column of Table 3 which showed concordance between the conventional method (quantitative PCR) and the method of the present invention are: 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34 and 35. Additionally, mutations in human DNA were detected. Also, in serial samples from the same individual, matched mutations (such as small nucleotide variants, and small INDELs) were present. Serial samples from the same individual are depicted within a black box and are shaded in grey. Thus, the method of the present inventions is able to simultaneously detect viral DNA and mutations in human DNA. In addition, in contrast to quantitative PCR, which is used to detect various mutations and the viral DNA in separate reactions (each reaction is used to detect one mutation or viral DNA, i.e.
  • each row in column 2 (labeled“EBV BamHI-W”) of Table 3 corresponds to one single reaction), the method of present invention is able to simultaneously detect multiple mutations in human DNA and the viral DNA in the same sample, in one single reaction (i.e. all the mutations listed in all the rows in column 11 (labeled“Mutations”) of Table 3 are detected in one single reaction).
  • Table 2 Mutation detection and frequency in 16 clinical samples (plasma) tested by the method of the present invention and by quantitative PCR.
  • FIG. 10 Exemplary use of the sequencing results obtained using the method of the present invention for the detection of fusion is shown in Figure 10.
  • the process for detection of fusion as shown in Figure 10 is described as follows:
  • a sample was obtained from cell line DNA with known structural variations, for the purpose of validating the method of the present invention.
  • the DNA was fragmented to generate fragments with sizes ranging from 20-400 bp;
  • Mapped reads were inspected in Integrated genome Viewer (IGV) for the presence of a) soft-clip, b) insertions and/or 3) mapping of Reads 1 and 2 of a paired sequencing read to physically separated regions of the genome. Two or more such supporting paired reads carrying the breakpoint or mapping to distant regions of genome were required to support the call for structural variant.
  • The“partner” of the structural variant was identified by the mate read location or by aligning (BLASTing) an insertion or soft-clip sequence against the human genome to identify the origin of the insertion sequence.
  • the above process may be used for detecting structural variation in any target region known to undergo structural variation without prior knowledge of the precise location of the breakpoint.
  • the above process may also be applied to DNA from fixed tissue (which is already fragmented to varying degrees) or cfDNA from plasma, pleural fluid or cerebrospinal fluid.
  • FIG. 11(a) An example of detection of a structural variant described as an inversion, in which a DNA sequence is reversed end to end, is shown in Figure 11(a) at the level of a chromosome.
  • Figure 11(b) The resulting inversion in a smaller target region of interest is represented in Figure 11(b), with sequence directionality indicated by arrows in wild-type condition and in the condition with the inversion.
  • the inversion may involve a large part of the genome or a relatively small part resulting in two breakpoints.
  • FIG. 12 An example of an inversion involving a region of chromosome 9 with breakpoints determined at exactly chr9:5,467,953 and chr9:6,557,405, was detected by the method of the invention ( Figure 12).
  • the inversion shown in Figure 12 is one which results in a portion of genome from chr9:6,557,405 to become adjacent in inverted form to chr9: chr9:5,467,953.
  • Figure 12 depicts paired reads from sequencing results, Reads 1 and 2, which map to different non-contiguous locations of the genome, as derived from the mapping of the reads sequence.
  • FIG 13 An example of a translocation involving a region of chromosome 6 and chromosome 4 is shown in Figure 13. Breakpoints are deducible from sequencing results obtained by the method of the invention.
  • Figure 14 depicts paired reads from sequencing results, Reads 1 and 2, which map to different non-contiguous locations of the genome, as derived from the mapping of the reads sequence.
  • the method of the invention detectable by the method of the invention, as long as a breakpoint in a target region of interest is captured among the sequencing reads.
  • the non-contiguous nature of the alignment of the sequencing read allows for the detection of any form of structural variant.
  • the method of the invention incorporates capture primers/probes for a target region of interest known to undergo any one of the structural variations mentioned above, that type of structural variant can be detected. This is because, once sequencing reads are available, detection of structural variants may be done by the alignment of the reads to two different non-contiguous parts of the genome. Based on the method of the invention, it is not critical that the non-contiguous part of the read comes from the same chromosome and is inverted (i.e. inversion) or from another chromosome (i.e. translocation).
  • the method of the invention achieved more than 99% sensitivity and specificity for detecting small nucleotide variations (SNVs) at all the mutant allele frequency tested; more than >83.3% sensitivity and specificity for detecting INDELs at 0.1% mutant allele frequency and more than 99% sensitivity and specificity for detecting INDELs at 1%, 5% and 10% mutant allele0 frequency tested; more than >50% sensitivity and specificity for detecting fusions at 1% mutant allele frequency and more than 99% sensitivity and specificity for detecting fusions at 5% and 10% mutant allele frequency tested.
  • the various mutations listed in Figure 4 are detected by the method of the present invention simultaneously, in one single reaction.
  • the method of the invention possesses unexpected advantages.
  • the method of the invention is able to achieve0 simultaneous detection of: 1) Viral DNA; 2) Micro satellite instability; 3) Structural rearrangements; 4) SNVs and INDELs from samples ranging from cfDNA from plasma (or cerebrospinal fluid, pleural effusion) or DNA from fixed tissue.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne un procédé de capture et d'identification simultanées d'une région cible définie et d'une région cible partiellement définie au sein d'un échantillon d'ADN, la région cible partiellement définie comprenant une variation ou un réarrangement ou une fusion structural(e). Premièrement, une amorce à code-barres se lie à des fragments comprenant la région cible définie et une autre amorce à code-barres comprenant une molécule de séparation se lie à des fragments comprenant la région cible partiellement définie. Deuxièmement, les amorces qui sont hybridées à la région cible définie (c'est-à-dire le produit A) sont séparées des amorces qui sont hybridées à la région cible partiellement définie (c'est-à-dire le produit B). Troisièmement, les deux produits sont traités différemment. Pour le produit A, une amorce inverse sera ajoutée. Pour le produit B, un oligonucléotide double brin est ligaturé à l'extrémité qui n'est pas reliée à la molécule de séparation. Quatrièmement, le produit A et le produit B qui ont été traités sont recombinés, amplifiés ensemble et les amplicons résultants sont séquencés.
EP19825310.6A 2018-06-25 2019-06-25 Procédé de détection et de quantification de modifications génétiques Pending EP3810805A4 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SG10201805450Y 2018-06-25
PCT/SG2019/050317 WO2020005159A1 (fr) 2018-06-25 2019-06-25 Procédé de détection et de quantification de modifications génétiques

Publications (2)

Publication Number Publication Date
EP3810805A1 true EP3810805A1 (fr) 2021-04-28
EP3810805A4 EP3810805A4 (fr) 2022-03-23

Family

ID=68985923

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19825310.6A Pending EP3810805A4 (fr) 2018-06-25 2019-06-25 Procédé de détection et de quantification de modifications génétiques

Country Status (4)

Country Link
EP (1) EP3810805A4 (fr)
CN (1) CN112639127A (fr)
SG (1) SG11202012687SA (fr)
WO (1) WO2020005159A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436679B (zh) * 2020-03-23 2024-05-10 北京合生基因科技有限公司 确定待测核酸样本变异率的方法和系统
WO2022025823A1 (fr) * 2020-07-29 2022-02-03 Lucence Life Sciences Pte. Ltd Procédés et kits pour la détection de virus à arn

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011019964A1 (fr) * 2009-08-12 2011-02-17 Nugen Technologies, Inc. Procédés, compositions, et kits pour générer des produits d'acides nucléiques sensiblement dépourvus d'acide nucléique matrice
SG10202009015SA (en) * 2012-12-10 2020-10-29 Resolution Bioscience Inc Methods for targeted genomic analysis
WO2015039006A1 (fr) * 2013-09-16 2015-03-19 The General Hospital Corporation Procédés de traitement du cancer
CN114214314A (zh) * 2014-06-24 2022-03-22 生物辐射实验室股份有限公司 数字式pcr条码化
WO2016081798A1 (fr) * 2014-11-20 2016-05-26 Children's Medical Center Corporation Procédés relatifs à la détection de bris bicaténaires récurrents et non spécifiques dans le génome
EP3551756A4 (fr) * 2016-12-12 2020-07-15 Dana Farber Cancer Institute, Inc. Compositions et procédés pour le codage par code-barres moléculaire de molécules d'adn avant l'enrichissement des mutations et/ou la détection des mutations

Also Published As

Publication number Publication date
CN112639127A (zh) 2021-04-09
WO2020005159A1 (fr) 2020-01-02
SG11202012687SA (en) 2021-01-28
EP3810805A4 (fr) 2022-03-23

Similar Documents

Publication Publication Date Title
US11535889B2 (en) Use of transposase and Y adapters to fragment and tag DNA
US11072819B2 (en) Methods of constructing small RNA libraries and their use for expression profiling of target RNAs
JP7379418B2 (ja) 腫瘍のディープシークエンシングプロファイリング
RU2708337C2 (ru) Способы и композиции для днк-профилирования
US20120003657A1 (en) Targeted sequencing library preparation by genomic dna circularization
JP2020182493A (ja) バリアント検出のための方法
US20240052408A1 (en) Single end duplex dna sequencing
US10465241B2 (en) High resolution STR analysis using next generation sequencing
EP3702457A1 (fr) Réactifs, kits et procédés pour codage à barres moléculaire
KR20240069835A (ko) 대규모 병렬 서열분석을 위한 dna 라이브러리를 생성하기 위한 개선된 방법 및 키트
US20180305683A1 (en) Multiplexed tagmentation
EP3390671B1 (fr) Procédé de séquençage direct de cibles à l'aide d'une protection contre la nucléase
EP3810805A1 (fr) Procédé de détection et de quantification de modifications génétiques
US20210180125A1 (en) Method for the detection and quantification of genetic alterations
WO2024117970A1 (fr) Procédé de détection multiplex et de quantification efficaces de modifications génétiques
WO2023170151A1 (fr) Procédé de détection d'une séquence d'acide nucléique cible dans un seul récipient de réaction
Kuiper et al. Reliable Next-Generation Sequencing of Formalin-Fixed, Paraffin-Embedded Tissue Using Single Molecule Tags

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20201223

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40045286

Country of ref document: HK

A4 Supplementary search report drawn up and despatched

Effective date: 20220223

RIC1 Information provided on ipc code assigned before grant

Ipc: C12Q 1/6858 20180101ALI20220217BHEP

Ipc: C12Q 1/6855 20180101ALI20220217BHEP

Ipc: C12Q 1/6869 20180101AFI20220217BHEP