WO2024073136A1 - Rapid reconstruction of large nucleic acids - Google Patents

Rapid reconstruction of large nucleic acids Download PDF

Info

Publication number
WO2024073136A1
WO2024073136A1 PCT/US2023/034301 US2023034301W WO2024073136A1 WO 2024073136 A1 WO2024073136 A1 WO 2024073136A1 US 2023034301 W US2023034301 W US 2023034301W WO 2024073136 A1 WO2024073136 A1 WO 2024073136A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
restriction enzymes
fragments
dna molecule
chimeric
Prior art date
Application number
PCT/US2023/034301
Other languages
French (fr)
Inventor
Chun-Chieh Lin
Original Assignee
Mary Hitchcock Memorial Hospital, For Itself And On Behalf Of Dartmouth-Hitchcock Clinic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mary Hitchcock Memorial Hospital, For Itself And On Behalf Of Dartmouth-Hitchcock Clinic filed Critical Mary Hitchcock Memorial Hospital, For Itself And On Behalf Of Dartmouth-Hitchcock Clinic
Publication of WO2024073136A1 publication Critical patent/WO2024073136A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups

Definitions

  • Nanopore sequencing (Oxford Nanopore Technologies) is a channel-based device that allows real-time interpretation of DNA/RNA sequences (A, T, C, G, and U) by sensing the ionic current charges.
  • A, T, C, G, and U DNA/RNA sequences
  • With upgrades to detect epigenetic base modifications such as 5-methylcytosine
  • intraoperative molecular diagnosis is recently made possible by detecting methylation profiling in brain tumors using nanopore sequencing.
  • intraoperative genetic characterization remains challenging due to lengthy library preparations.
  • Attorney Docket No.593850 SUMMARY Disclosed is a method for reconstructing large nucleic acid molecules within a short time period. The method comprises simultaneously reacting nucleic acids with restriction enzymes and a DNA ligase.
  • the method for constructing a plurality of chimeric DNA molecules comprises: a) providing a source DNA molecule; and b) simultaneously reacting the source DNA molecule with: i) two or more restriction enzymes; and ii) a DNA ligase.
  • the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein the two or more restriction enzymes cut the source DNA molecule into smaller fragments, and the DNA ligase assembles the smaller fragments into the plurality of chimeric DNA molecules.
  • the two or more restriction enzymes are two or more members selected from the group consisting of MseI, BfaI, CviQI, and NdeI.
  • the two or more restriction enzymes comprise two or more members selected from the group consisting of XbaI, SpeI, AvrII and NheI.
  • the DNA ligase is selected from the group consisting of T3 DNA ligase, T4 DNA ligase, and T7 DNA ligase.
  • step (b) comprises two reaction processes (b1) and (b2), wherein in process (b1), the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments, and in process (b2), the DNA ligase ligates the smaller DNA fragments having complementary overhangs into a larger DNA fragment, wherein the processes (b1) and (b2) are allowed to repeat for N cycles to generate the chimeric DNA molecule, N is an integer greater than 5.
  • process (b2) may comprise two types ligations, Type I and Type II ligations: Type I ligation results from ligation of two sticky ends generated by same restriction enzyme and can be cut by the same restriction enzyme again, and Type II ligation results from ligation of two sticky ends generated by two different restriction enzymes and cannot be cut by the any of the two different restriction enzyme again.
  • Attorney Docket No.593850 [0011]
  • the plurality of chimeric DNA molecules comprises more than 5, or more than 7, or between 5 and 15, or between 5-10, or between 7 and 8 smaller DNA fragments ligated together through Type II ligation.
  • small fragments comprise an average length of about 80- 100 bp, or about 90 bp when MseI, BfaI, CviQI, and NdeI are used.
  • the reaction of step (b) is completed within about 30 minutes.
  • the method may further comprise a step (c) of sequencing the plurality of chimeric DNA molecules.
  • the chimeric DNA molecules are sequenced using nanopore sequencing.
  • the source DNA molecule is obtained from a mammal, such as a human. In one aspect, the source DNA molecule is obtained from a tissue sample, such as a tumor sample from the mammal.
  • the source DNA molecule is obtained during a surgery being performed on the mammal.
  • a chimeric DNA molecule comprising a plurality of fragments is produced by cutting a source DNA fragment with two or more restriction enzymes, and the plurality of fragments are stochastically ligated together to form the chimeric DNA molecule, wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule.
  • said chimeric DNA molecule is not digestible by said two or more restriction enzymes.
  • the average size of the chimeric molecule is about 1-10 kb.
  • the average size of the chimeric molecular is about 1-2 kb, or about 1-10 kb, or about 4-10 kb.
  • an average size of the plurality of fragments is about 90 bp when MseI, BfaI, CviQI and NdeI are used.
  • a kit for constructing a plurality of chimeric DNA molecules is disclosed.
  • the kit may comprise i) two or more restriction enzymes; ii) a DNA ligase; and iii) an instruction to react a source DNA simultaneously with the two or more Attorney Docket No.593850 restriction enzymes and the DNA ligase to generate the plurality of chimeric DNA molecules.
  • the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments, and the DNA ligase ligates the smaller DNA fragments into the plurality of chimeric DNA molecules.
  • the two or more restriction enzymes comprise two or more members selected from the group consisting of MseI, BfaI, CviQI, and NdeI.
  • the two or more restriction enzymes comprise two or more members selected from the group consisting of XbaI, SpeI, AvrII and NheI.
  • a method for reconstructing DNA for single cell genomics comprises: a) lysing single isolated cells to expose genomic DNA; and b) simultaneously reacting the genomic DNA with: i) two or more restriction enzymes, wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein the two or more restriction enzymes cut the source DNA molecule into smaller fragments; ii) short barcodes, wherein the barcodes comprise compatible overhangs to the overhangs produced by the two or more restriction enzymes; and iii) a DNA ligase, wherein the DNA ligase assembles the smaller fragments and barcodes into the chimeric DNA molecule; c) amplifying the chimeric DNA molecule; and d) sequencing the chimeric DNA molecule.
  • the single cells are prepared by a process comprising: a) dissociating and suspending a cell population; b) diluting the cell population to form a cell solution; and c) transferring the cell solution to a container such that single cells are isolated.
  • Figure 1 shows an integrated workflow for intraoperative histological and molecular diagnosis.
  • Figures 2A-2C depict the iSCORED reaction.
  • FIG. 2A shows an example of iSCORED reactions using restriction enzymes generating compatible 5’ CTAG overhangs: 75% (12/16) of random ligations reach irreversible end products (black line) and 25% of them remain susceptible for further digestion (brown line).
  • FIG. 1 shows an integrated workflow for intraoperative histological and molecular diagnosis.
  • Figures 2A-2C depict the iSCORED reaction.
  • FIG. 2A shows an example of iSCORED reactions using restriction enzymes generating compatible 5’ CTAG overhangs: 75% (12/16) of random ligations reach irreversible end products (black line) and 25% of them remain
  • FIG. 2B shows a list of detailed ligation Attorney Docket No.593850 possibilities (shaded boxes show futile ligations).
  • Figures 3A-3D show copy number variation (CNV) detection using the iSCORED method of the present disclosure.
  • FIG. 3D shows that the sequenced units from 30 mins of reaction align to the GRCh38 reference genome. Zoom-in fragment distribution in EGFR gene region (green).
  • Figures 4A-4C show the target mutation detection workflow.
  • FIG.4A shows three hot spot mutation amplicons (black, green, and blue) are added with Type IIS sites (BsaI, recognition site in purple; cutting site in orange) for the iSCORED reaction.
  • FIG. 4B shows randomly assembled amplicons that are sequenced and mapped to reference sequences.
  • FIG.4C shows the alignment of forward sequencing results (IDH1 R132 and BRAF V600) to reference.
  • Figure 5 shows the proposed workflow of single-cell CNV with iSCORED
  • Figure 6 shows the iSCORED method for ultrafast copy number variation analysis.
  • (b) Long stochastically concatenated DNA molecules are analyzed with Nanopore device and aligned to the reference for genome-wide quantitative measurement.
  • the reconstruction efficiency of four iSCORED cocktail combinations are compared. The reaction is incubated at 37 oC for 30 mins.
  • CATG cocktail NcoI (C ⁇ CATGG), PciI (A ⁇ CATGT), BspHI (T ⁇ CATGA).
  • CTAG cocktail NheI (G ⁇ CTAGC), SpeI (A ⁇ CTAGT), AvrII (C ⁇ CTAGG), XbaI (T ⁇ CTAGA).
  • CG cocktail MspI (C ⁇ CGG), HinP1I (G ⁇ CGC), HpyCH4IV (A ⁇ CGT) and TaqI- V2 (T ⁇ CGA). EcoRV is employed as a control since it generates blunt ends upon restriction digestion. (d) Optimization of iSCORED reaction by adjusting various experimental parameters, such as incubation periods, DNA ligases and intermittent mixing and cooling.
  • Figure 7 shows comprehensive analysis of candidate REs for iSCORED.
  • Figure 8 shows distribution of concatenation numbers and sequenced lengths in iSCORED reactions utilizing cocktail combinations generating 2-nt and 4-nt 5’ overhangs.
  • Figure 9 shows normalization of variable mapped fragments in predefined bins for accurate copy number detection.
  • the number of mapped fragments per bin fluctuates across the wild-type genome (intrinsic regional variability, IRV), yielding a relatively high coefficient of variation (CoV) of 0.68 and hampering detection of true outliers.
  • IRV intrinsic regional variability
  • CoV coefficient of variation
  • Extensive sequencing does not address the fluctuation due to IRV (left panel). Normalizing the samples with the control wild-type dataset, the CoV dramatically drops and stabilizes at ⁇ 1 million mapped fragments (right panel).
  • Figure 11 shows minimum thresholds and mapped fragments to exclude false positive hits.
  • False positive hits were eliminated from the full datasets of control gDNA samples by setting a minimum threshold of 0.2 percentile.
  • the mapped fragments count of 200k or higher, in conjunction with the 0.2 percentile threshold is sufficient to eliminate all false positive hits.
  • Figure 12 shows the method of determining the detection threshold for regions with low amplification. Focal gene amplification (11x) was detected in chromosome 2 of an adenosquamous carcinoma.
  • Figure 13 shows the iSCORED pipeline which allows simultaneous methylation classification of primary CNS tumors.
  • Minimal methylation classification features are acquired within 45 minutes of MinION sequencing.
  • Calibrated methylation classification scores for glioblastoma, medulloblastoma and oligodendrogliomas with high tumor purity were calculated across multiple time points from the initiation of sequencing.
  • Correct methylation classification of CNS tumors depends on high tumor purity.
  • Figure 14 shows methylation classification of primary CNS tumors.
  • Figure 15 shows workflow of iSCORED for ultrafast molecular diagnosis.
  • Figure 16 shows assessment of reused MinION flowcells. a) Total sequenced data (Mb) and mapped fragments from MinION and flongle runs.
  • FIG. 17 shows the workflow of our intraoperative analysis pipeline. Before beginning to sequence a configuration file is modified to the appropriate run parameters. Once sequencing begins the analysis pipeline is initiated. It periodically gathers the new fast5 files, basecalls them, extracting modification information, filters them and aligns them with the appropriate parameters for methylation and CNV/amplification analysis.
  • Figure 18 shows EGFR amplification of glioblastomas in current study.
  • Figure 19 shows Table 1 showing comparison of CNV results between iSCORED and clinically validated assays.
  • Figure 20 shows Table 2 showing comparison of CNV assays with iSCORED.
  • the present disclosure relates to a method (also referred to herein as iSCORED) of reconstructing large nucleic acid molecules by employing simultaneous restriction endonuclease and DNA ligase reactions in vitro.
  • the reconstructed nucleic acids generated by the methods of the present disclosure may be sequenced following the reactions.
  • the present disclosure comprises a method for rapid reconstruction of large nucleic acids by simultaneously employing restriction endonucleases and a DNA ligase in an in vitro reaction.
  • a mixture of restriction enzymes capable of generating compatible overhangs are utilized to fragment large nucleic acid molecules into small fragments (units). Each unit exhibits compatible cohesive ends that are amenable to random ligation with other counterparts.
  • DNA ligase catalyzes random re-ligation of all digested units.
  • the cohesive ends are produced from different restriction enzymes, the “hybrid” ligation produces a stable irreversible result (FIG. 3A).
  • This unidirectional reaction is possible because of the Attorney Docket No.593850 staggered nature of DNA recognition sequence and actual phosphodiester bond breakage.
  • XbaI and SpeI recognize TCTAGA and ACTAGT, respectively.
  • both release cohesive 5’CTAG overhangs which could re-ligate to TCTAGA, ACTAGT, TCTAGT, and ACTAGA.
  • FIG. 3B shows the integrated workflow for histological and molecular diagnosis.
  • genomic DNA is extracted (20 mins; QIAmp DNA Micro Kit), followed by quality (Nanodrop, Thermo Fisher) and quantity (Qubit, Thermo Fisher) assessment.
  • selected diagnostic and targetable mutations are amplified by ultra-fast PCR ( ⁇ 25 mins, blue arrow), followed by iSCORED reaction (30 mins) and library preparation (10 mins).
  • Generated data is analyzed in real-time to render a molecular diagnosis.
  • the entire molecular diagnostics workflow takes 120 mins. The simplicity and time- effectiveness of this reaction allows for rapid library preparation, which makes it possible for intraoperative molecular diagnosis.
  • the randomly assembled DNA fragments acquire its uniqueness and could be used as unique molecular identifiers (intrinsic barcodes) to trace its original templates, circumventing the long-standing PCR amplification bias problem in single cell whole genome amplification.
  • the iSCORED circumvents sequential experimental steps for stochastic DNA construction (i.e. DNA fragmentation by mechanical shearing and/or enzymatic digestion, Attorney Docket No.593850 followed by purification, and ligation).
  • Sequential experimental steps could reach similar DNA reconstruction results; however, it is not possible to accomplish this within short periods of time (120-150 mins in the intraoperative molecular diagnosis application). Also, the multistep procedures prohibit its application to the cellular level given minute amount of genomic DNA in single cells (5-10 pico- grams).
  • Item 1 A method for constructing a plurality of chimeric DNA molecules, comprising: [0049] a) providing a source DNA molecule; and [0050] b) simultaneously reacting the source DNA molecule with: i) two or more restriction enzymes; and ii) a DNA ligase to obtain the plurality of chimeric DNA molecules, wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments, and the DNA ligase ligates the smaller DNA fragments into the plurality of chimeric DNA molecules.
  • Item 2 The method of Item 1, wherein the two or more restriction enzymes comprise two or more members selected from the group consisting of MseI, BfaI, CviQI, and NdeI.
  • Item 3 The method of Item 1, wherein the two or more restriction enzymes comprise two or more members selected from the group consisting of XbaI, SpeI, AvrII and NheI.
  • Item 4 The method of any preceding Items, wherein the DNA ligase is selected from the group consisting of T3 DNA ligase, T4 DNA ligase, and T7 DNA ligase.
  • step (b) comprises two reaction processes (b1) and (b2), wherein in process (b1), the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments, and in process (b2), the DNA ligase ligates the smaller DNA fragments having complementary overhangs into a larger DNA Attorney Docket No.593850 fragment, wherein the processes (b1) and (b2) are allowed to repeat for N cycles to generate the chimeric DNA molecule, N is an integer greater than 5.
  • Item 6 The method of any preceding Items, wherein process (b2) comprises two types of ligations, Type I and Type II ligations, wherein type I ligation results from ligation of two sticky ends generated by same restriction enzyme and can be cut by the same restriction enzyme again, and Type II ligation results from ligation of two sticky ends generated by two different restriction enzymes and cannot be cut by the any of the two different restriction enzyme again.
  • Item 7 The method of any preceding Items, wherein the plurality of chimeric DNA molecules comprises more than 5 smaller DNA fragments ligated together through Type II ligation.
  • Item 8 The method of any preceding Items, wherein the smaller DNA fragments have an average length of about 80-100 bp, or about 90 bp (base pairs) when using MseI, BfaI, CviQI and NdeI.
  • Item 9 The method of any preceding Items, wherein the reaction of step (b) is completed within 30 minutes or less.
  • Item 10 The method of any preceding Items, further comprising a step (c) of sequencing the plurality of chimeric DNA molecules.
  • Item 11 The method of any preceding Items, wherein the sequencing is performed by using nanopore sequencing.
  • Item 12 The method of any preceding Items, wherein result from the sequencing is analyzed to obtain information selected from the group consisting of copy number variation (CNV), methylation classification, and combination thereof.
  • Item 13 The method of any preceding Items, wherein the source DNA molecule is obtained from a mammal.
  • Item 14 The method of any preceding Items, wherein the source DNA molecule is obtained from a tumor sample of the mammal. Attorney Docket No.593850
  • Item 15 The method of any preceding Items, wherein the source DNA molecule is obtained during a surgery being performed on the mammal.
  • Item 16 The method of any preceding Items, wherein steps (a)-(c) are performed before the surgery is completed and results from steps (a)-(c) are used to guide a surgeon performing the surgery or for implementation of molecularly targeted therapies.
  • Item 17 A chimeric DNA molecule comprising a plurality of DNA fragments produced by cutting a source DNA fragment with two or more restriction enzymes, wherein the plurality of fragments is stochastically ligated together to form the chimeric DNA molecule, and wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein said chimeric DNA molecule are not digestible by said two or more restriction enzymes.
  • Item 18 The chimeric DNA molecule of Item 17, wherein the chimeric DNA molecule comprises more than 5 of the plurality of DNA fragments.
  • Item 19 The chimeric DNA molecule of any of Items 17-18, wherein an average length of the plurality of fragments is between 80-100 bp when using MseI, BfaI, CviQI and NdeI.
  • Item 20 The chimeric DNA molecule of any of Items 17-19, wherein an average length of the plurality of fragments is 90 bp when using MseI, BfaI, CviQI and NdeI.
  • Item 21 A method for reconstructing DNA for single-cell genomic analysis, comprising [0071] a) lysing single isolated cells to expose genomic DNA in the cells; [0072] b) simultaneously reacting the genomic DNA with: i) two or more restriction enzymes, wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments; ii) short barcodes, wherein the barcodes are unique DNA sequences comprising compatible overhangs to the overhangs produced by the two or more restriction enzymes; and iii) a DNA ligase, wherein the DNA ligase ligates the smaller DNA fragments and barcodes into a chimeric DNA molecule; [0073] c) amplifying the chimeric DNA molecule; and Attorney Docket No.593850 [0074] d) sequencing the chimeric DNA molecule.
  • Item 22 The method of Item 21 wherein the single cells are prepared by a process comprising a) dissociating and suspending a cell population; b) diluting the cell population to form a cell solution, and c) transferring the cell solution to a container where single cells are isolated.
  • Item 23 A kit for constructing a plurality of chimeric DNA molecules, comprising: i) two or more restriction enzymes; ii) a DNA ligase; and iii) an instruction to react a source DNA simultaneously with the two or more restriction enzymes and the DNA ligase to generate the plurality of chimeric DNA molecules; wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments, and the DNA ligase ligates the smaller DNA fragments into the plurality of chimeric DNA molecules.
  • Item 24 The kit of Item 23, wherein the two or more restriction enzymes comprise two or more members selected from the group consisting of MseI, BfaI, CviQI, and NdeI.
  • Item 25 The method of Item 23, wherein the two or more restriction enzymes comprise two or more members selected from the group consisting of XbaI, SpeI, AvrII and NheI.
  • Item 26 The method of any of Items 23-25, wherein the DNA ligase is selected from the group consisting of T3 DNA ligase, T4 DNA ligase, and T7 DNA ligase.
  • the iSCORED method of the present disclosure could fragment the genome into short DNA units and concatenate them into long molecules within 30 minutes.
  • 4-mer and 6-mer Type IIP restriction enzymes were selected to digest genomic DNA into the smallest sufficient units for unique genomic mapping (FIGs. 3A-3B). Specifically, restriction enzymes producing 5’TA overhangs are used: MseI, BfaI, CviQI and NdeI collectively generate DNA fragments with a mean length of 90.03 bp (total cleavage sites of 34,624,810 with CHM13 reference genome).
  • the DNA reconstruction efficiency peaks at around 20 to 30 mins, at which one read consists of up to 10 unites (FIG. 3C).
  • MinION flowcells (R9.4.1) generate approximately 100,000-150,000 reads/hour (length ⁇ 5,000 bp; ONT). Normalizing to 100,000 reads (1 hour of sequencing), this platform generates an average genomic resolution of 3.3 kb covering 5.4% of the entire genome (FIG. 3D). An even higher resolution could be reached by continuing sequencing. This result is superior to current SNP- based Affymetrix OncoScan, which has an average genomic resolution of 9.6 kb and takes 2-3 days. [0084]
  • the amplified DNAs are incorporated with Type IIS restriction enzyme sites during PCR reaction (i.e., BsaI; FIG. 4A).
  • Type IIS restriction enzymes recognize asymmetric DNA sequences and cut outside of recognition sequences at a defined distance (NEB).
  • NEB defined distance
  • each PCR product was purposefully designed to expose a 5’ CATG overhang after Type IIS enzyme digestion - each amplicon loses Type IIS recognition sites and gains the ligation ability (FIG. 4A and FIG. 4B).
  • All single hot spot mutation genes were successfully amplified within 25 mins (compared to 45 mins by C1000 Bio-Rad Thermal Cycler with the same protocol).
  • the digestion/ligation reaction reaches peak ligation efficiency at 30 mins by electrophoresis analysis.
  • Preliminary alignment results showed >3,000x coverage of sequence 25-30 potential genetic mutations at 1,000x coverage within 120 mins upon receipt of the specimen.
  • iSCORED for single-cell genomics
  • iSCORED may be applied to single-cell genomics for unbiased quantification of genetic elements.
  • two molecular barcode systems are employed: first, short internal barcodes with compatible 5’ overhangs are used in the single-cell iSCORED reaction to uniquely identify reads from the same cell. The process is compatible with the existing ligation reaction in iSCORED and will allow demultiplexing of sequencing reads for cellular origins.
  • the intrinsic barcode information is naturally embedded in the stochasticity of DNA reconstruction.
  • Each chimeric DNA molecule after the iSCORED is essentially unique, and the feature will be used to deconvolute sequenced amplicons to the unique original templates.
  • the genetic quantitative uncertainty due to amplification bias is circumvented. For instance, if gene X is duplicated, during stochastic DNA reconstruction, the two copies of gene X are ligated with different partners and form unique chimeric DNA molecules.
  • the traceable information is used to compile the same sequenced amplicons to generate original chimeric templates (deconvolution). Therefore, the necessary DNA amplification only increases the sequencing depth and accuracy of the chimeric molecules but does not create quantitative bias for the CNV analysis.
  • iSCORED for single-cell genomics
  • homogenous cell populations are dissociated and suspended, such as fresh human buccal mucosal cells and frozen glioblastoma cells.
  • the suspended cells are serially diluted and manually pipetted to 96 well plates. Each well contains 2.5- ⁇ l lysis buffer, and individually isolated cells will be lysed at 50" for 1 hour and 70" for 15 minutes.
  • 7.5- ⁇ l of iSCORED solution with internal barcode mixtures is added for stochastic DNA reconstruction and individual cell barcoding.
  • the barcoded chimeric DNA molecules are pooled and purified with phenol-chloroform extraction.
  • End repair and dA-tailing are completed to enable ligation to adaptors with a single T overhang.
  • established Illumina NGS reagents are used for library amplification. Briefly, the NEBNext hairpin adaptors are ligated to the pooled dA- tailed chimeric DNA molecules, followed by USER ® enzyme to remove uracils and PCR enrichment. The results are analyzed with quantitative electrophoresis to assess amplicon size distribution.
  • Deep Vent (exo-) DNA polymerase (NEB) is used, which exhibits dA-tailing property and 5x higher fidelity than Taq Attorney Docket No.593850 polymerase.
  • Example 4 Nanopore-based random genomic sampling for intraoperative molecular diagnosis
  • CNVs Copy number variations
  • CNVs are also involved in a wide range of biological processes, including human evolution 4–6 , neurodegeneration 7,8 , and developmental disorders 4,9–12 .
  • CNVs are also involved in a wide range of biological processes, including human evolution 4–6 , neurodegeneration 7,8 , and developmental disorders 4,9–12 .
  • CNVs are also involved in a wide range of biological processes, including human evolution 4–6 , neurodegeneration 7,8 , and developmental disorders 4,9–12 .
  • Current approaches rely on nucleotide hybridization 5–7,10,11,13–15 and next- generation sequencing 1–3,8 , with a turnaround time of several days to weeks which could delay clinical therapeutic plans 16,17 .
  • Nanopore sequencing (Oxford Nanopore Technologies, ONT) is a channel-based device that provides real-time interpretation of long-read nucleotide sequences.
  • Ultrafast high-resolution CNV detection can be achieved by analyzing randomly concatenated DNA fragments. The approach enables the identification of multiple mappable DNA fragments in one sequencing read, thus optimizing sequencing efficacy. By sequencing a fraction of randomly assembled genomic fragments, the genome-wide chromosomal integrity can be quantitatively assessed. While previous attempts involving quantitative genomic analysis have been introduced, the methods are lengthy and require sequential mechanical shearing and/or enzymatic digestion, purification, and ligation 20,21 .
  • Buffer 3 "*(( cR# HTK V[XL LZOHTUR "*(( cR# ]LXL ZOLT Y[IYLW[LTZR_ HKKLK ZU ZOL XLHJZPUTY& EOL MPTHR reaction was mixed thoroughly by vortexing and added to the spin column inserted to the ⁇ HJ[[S ZU MHJPRPZHZL ZOL L ⁇ ZXHJZPUT VXUJLK[XLY& DLW[LTZPHR I[MMLX 3F) "-(( cR# HTK 3F* "-(( cR# ]LXL HKKLK ZU ]HYO ZOL JUR[ST$ ]OPJO ]HY JLTZXPM[NLK HZ *($((( ⁇ N MUX +( YLJUTKY MUX MPTHR JRLHT[V& EOL 6?3 ]HY LR[ZLK ]PZO -
  • iSCORED Reaction Attorney Docket No.593850 [0099] Approximately 200-400 ng of input gDNA was used for the iSCORED reaction, followed by bead purification for Nanopore sequencing.
  • the reaction mixture JUSVXPYLK )- cR PT ZUZHR$ ]OPJO PTJR[KLK W[PJQ RPNHYL I[MMLX ",&- cR#$ W[PJQ RPNHYL ”) cR$ ?74 7.(-.# HTK ?E3?
  • reaction was agitated at 900 rpm on the 18 oC pad at reaction ZPSL VUPTZY UM 1%)($ ),%)-$ )1%*( HTK *,%*- SPT[ZLY& 7TK XLVHPX'K3 ZHPRPTN I[MMLX ”)&/- cR# HTK LT ⁇ _SL SP ⁇ "(&/- cR$ ?747/-,.# ]LXL HKKLK ZU ZOL SP ⁇ Z[XL$ ]OPJO ]HY ZOLT PTJ[IHZLK HZ *( oC for 5 minutes and 65 oC for 5 minutes.
  • reaction solution was removed from ZOL ]HYZL VUXZ HTK YZUXHNL I[MMLX "-(( cR# ]LXL RUHKLK PTZU ZOL VXPSPTN VUXZ ILMUXL YZUXPTN HZ , oC for next use.
  • a minimum of 800 active pores by flowcell check were required for a successful run.
  • a typical MinION flowcell R9.4.1
  • Each one of these datasets was segmented into independent data subsets (with no overlapping fragments) that vary in the number of mapped fragments.
  • the number of fragments in these datasets were 70k, 200k, 300k, 400k, 500k, 600k, 700k, 800k, 900k, 1M, 1.25M, 1.75M, 2M.
  • the coefficient of variation (CoV) across the genome was calculated to assess the variability.
  • the behavior of the CoV as the number of fragments change was assessed with the first order derivative of the CoV function.
  • CNV analysis was performed using the Smurf-seq analysis pipeline [pmid: 31287019]. Counts of uniquely mapped fragments to the 5,000 bins in the human genome were normalized for biases in GC content, finally an implementation of DNAcopy 38 (v1.74.1) using circular binary segmentation identified breakpoints in bin counts.
  • Output table and graph [00106] The output table contained a list of at least two consecutive statistically significant outliers (i.e., bins) to minimize the potential of identifying isolated/noisy outliers due to individual genome variation. The table displayed the corresponding position, ratio, Z score at the chromosomal level of each sample, along with commonly amplified gene(s) found in these bins of interest.
  • the -C command allowed the SAM tags for methylation to be moved from the fastq header back into the SAM file.
  • the -Y command turned off soft-clipping which interferes with downstream methylation extraction.
  • the per site methylation is extracted using mbtools (https://github.com/jts/mbtools).
  • a custom python script converted the bedfile to make it compatible with Rapid-CNS 2 .
  • the called methylations were processed using Rapid-CNS 2 which uses a random forest classifier trained on Illumina BeadChip 450 K methylation array from the Heidelberg reference cohort of brain tumor methylation profiles 39 .
  • the resulting reads were in silico admixed with datasets from a medulloblastoma, an oligodendroglioma, or a glioblastoma, all of which had >90% tumor percentage and calibrated scores of ⁇ 0.99 in methylation classification.
  • Reads equivalent to an hour of sequencing on the MinION at different ratios of tumor to control (tumor percentages of: 0-100 in intervals of 10) were used for methylation classification using the Rapid-CNS 2 pipeline.
  • Our computer system included an Intel® CoreTM i9-12900K Processor, 24 cores, 64 GB of RAM, 2Tb of storage and an RTX 3090Ti.
  • Bin-specific normalization helped to effectively mitigate the effects of intrinsic regional variability and detect copy number variation.
  • EGFR amplification is a molecular defining alterations in glioblastomas. In the six EGFR-amplified glioblastomas we examined, the average amplification regions spanned 1.66 ⁇ 0.44 Mb with an average of 150.5 ⁇ 47 copies.
  • MinION sequencing typically requires 60 minutes to generate sufficient data.
  • P2 Solo ONT
  • four of our most recent specimens were analyzed using PromethION technology (R10.4.1).
  • R10.4.1 PromethION technology
  • a mere 25 minutes of sequencing yielded 395 Mb ( ⁇ 55 Mb) of data on average, which corresponds to an average of 1.69 million ( ⁇ 37x10 6 ) mapped fragments (Fig 16).
  • the iSCORED platform and the real-time processing pipeline automatically generate a genome-wide copy number report and methylation classification within 5 minutes and 20 minutes of completing MinION and PromethION sequencing, respectively (140 and 120 minutes after receipt of specimen, Fig 15).
  • the iSCORED platform ensures an accurate, fast and inexpensive method for widespread clinical application.
  • we present iSCORED a novel method that can rapidly and affordably generate CNV profiles, detect gene amplifications, and classify tumors by their methylation status. While traditional NGS has had significant impacts in our understanding of the underlying molecular mechanisms of disease, in a clinical setting these technologies, platforms and chemistries are still limited by lengthy turnaround times 16,17 .
  • Nanopore sequencing is capable of real-time sequencing DNA at speeds of 400 bases per second. While Nanopore has well known capabilities in long-read Attorney Docket No.593850 sequencing, it is not inherently optimized for shorter reads, such as the 120 bp long genomic fragments found in our study during pre-reconstruction. This is primarily due to the potential time wasted on sequencing the adapter regions or awaiting for a new molecule to reload into a pore. We were able to overcome this limitation by concatenating small fragments in the same reaction, significantly improving sequencing efficiency. The iSCORED-based assay produced CNV results that were 100% concordant with clinically validated results in all of our cases.
  • Nanopore sequencing can detect 5-methylcytosine at CpG sites, facilitating tumor methylation classification without any additional sample preparation.
  • the low cost per sample ($125)
  • ease of infrastructure setup ($6,000-8,000)
  • unmatched turnaround time 140 minutes
  • the iSCORED platform demonstrates a high accuracy in detecting gene amplifications, with the thresholds set at 5-20% of tumor purities for high and low copy number amplifications, respectively.
  • Nanopore sequencing technology is rapidly improving.
  • the iSCORED genomic resolution is defined and validated at 60 kb, based on the data generated by one hour of MinION sequencing.
  • PromethION flowcells could generate 3-4 times more data in the same Attorney Docket No.593850 timeframe, thereby potentially achieving a 3-4 times higher resolution (15-20 kb per bin). Additional improvements that have been announced by ONT, such as flow cells that can sequence native DNA without adapter ligation, flow cells with 2.5x faster speeds and lower costs.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Plant Pathology (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A method for rapid reconstruction of nucleic acids is described. The method comprises the stochastic reconstruction of nucleic acid by simultaneous reaction of restriction enzymes and a DNA ligase. Copy number variations (CNVs) are almost ubiquitous in cancer. In many cases, somatic CNV analysis has led to the identification of oncogenic pathways and suggested molecular-defined therapeutic targets. However, current CNV analysis is laborious and time-consuming. Here, we develop iSCORED, a one-step random genomic DNA reconstruction method that enables efficient and unbiased quantitative assessment of CNV using a real-time Nanopore sequencer. By leveraging the long concatenated reads, we generate approximately 1-2 millions genomic fragments within one hour of sequencing. Analyzing a cohort of 26 brain tumors, we demonstrate 100% concordance in CNV detection when compared to clinically validated assays. In addition, concurrent tumor methylation classification is achieved without additional tissue preparation. The entire workflow is completed within a timeframe of 120-140 minutes, with an automatically generated CNV and methylation report. The ultrafast molecular analysis could be applied in the intra-operative setting and will be crucial for making informed decisions and modifying surgical plans.

Description

PATENT Docket No. 593850 RAPID RECONSTRUCTION OF LARGE NUCLEIC ACIDS RELATED APPLICATION [0001] This application claims priority to US Provisional Application No. 63/412,179 filed on September 30, 2022, and claims priority to US Provisional Application No. 63/525,897 filed on July 20, 2023, the contents of which are incorporated herein by reference in their entireties for all purposes. BACKGROUND [0002] Molecular characterization of tumors has reshaped clinical medicine in cancer diagnosis and prognostic prediction. More importantly, the information guides therapeutic selection on a personalized basis. Many aggressive neoplasms occur locally. For instance, despite multimodal treatment, approximately 90% of glioblastoma patients recur/progress locally (median time = six months). This is partially due to therapeutic obstacles of drug penetration into lesional tissue (i.e., central nervous system). Surgery provides a unique opportunity for direct visualization and access to the tumor resection margins, making topical intraoperative molecular- based therapy an intriguing possibility. However, current molecular diagnostic platforms are laborious and time-consuming, which makes intraoperative therapy difficult to implement, and leaves systemic administration as the only option. These drawbacks add further complexities to therapeutic plans, including bioavailability, systemic side effects, and drug interactions. [0003] Nanopore sequencing (Oxford Nanopore Technologies) is a channel-based device that allows real-time interpretation of DNA/RNA sequences (A, T, C, G, and U) by sensing the ionic current charges. With upgrades to detect epigenetic base modifications (such as 5-methylcytosine), intraoperative molecular diagnosis is recently made possible by detecting methylation profiling in brain tumors using nanopore sequencing. However, intraoperative genetic characterization remains challenging due to lengthy library preparations. Attorney Docket No.593850 SUMMARY [0004] Disclosed is a method for reconstructing large nucleic acid molecules within a short time period. The method comprises simultaneously reacting nucleic acids with restriction enzymes and a DNA ligase. The disclosed method is also referred to herein as iSCORED (Irreversible Sticking Compatible Overhangs to REconstruct DNA). [0005] In an aspect, the method for constructing a plurality of chimeric DNA molecules comprises: a) providing a source DNA molecule; and b) simultaneously reacting the source DNA molecule with: i) two or more restriction enzymes; and ii) a DNA ligase. In an embodiment, the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein the two or more restriction enzymes cut the source DNA molecule into smaller fragments, and the DNA ligase assembles the smaller fragments into the plurality of chimeric DNA molecules. [0006] In an embodiment, the two or more restriction enzymes are two or more members selected from the group consisting of MseI, BfaI, CviQI, and NdeI. [0007] In an embodiment, the two or more restriction enzymes comprise two or more members selected from the group consisting of XbaI, SpeI, AvrII and NheI. [0008] In an embodiment, the DNA ligase is selected from the group consisting of T3 DNA ligase, T4 DNA ligase, and T7 DNA ligase. [0009] In some embodiments, step (b) comprises two reaction processes (b1) and (b2), wherein in process (b1), the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments, and in process (b2), the DNA ligase ligates the smaller DNA fragments having complementary overhangs into a larger DNA fragment, wherein the processes (b1) and (b2) are allowed to repeat for N cycles to generate the chimeric DNA molecule, N is an integer greater than 5. [0010] In one aspect, process (b2) may comprise two types ligations, Type I and Type II ligations: Type I ligation results from ligation of two sticky ends generated by same restriction enzyme and can be cut by the same restriction enzyme again, and Type II ligation results from ligation of two sticky ends generated by two different restriction enzymes and cannot be cut by the any of the two different restriction enzyme again. Attorney Docket No.593850 [0011] In another aspect, the plurality of chimeric DNA molecules comprises more than 5, or more than 7, or between 5 and 15, or between 5-10, or between 7 and 8 smaller DNA fragments ligated together through Type II ligation. [0012] In an embodiment, small fragments comprise an average length of about 80- 100 bp, or about 90 bp when MseI, BfaI, CviQI, and NdeI are used. [0013] In an embodiment, the reaction of step (b) is completed within about 30 minutes. [0014] In an embodiment, the method may further comprise a step (c) of sequencing the plurality of chimeric DNA molecules. [0015] In another embodiment, the chimeric DNA molecules are sequenced using nanopore sequencing. [0016] In some embodiments, the source DNA molecule is obtained from a mammal, such as a human. In one aspect, the source DNA molecule is obtained from a tissue sample, such as a tumor sample from the mammal. In one aspect, the source DNA molecule is obtained during a surgery being performed on the mammal. [0017] In an aspect, a chimeric DNA molecule comprising a plurality of fragments is produced by cutting a source DNA fragment with two or more restriction enzymes, and the plurality of fragments are stochastically ligated together to form the chimeric DNA molecule, wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule. In an embodiment, said chimeric DNA molecule is not digestible by said two or more restriction enzymes. [0018] In an embodiment, the average size of the chimeric molecule is about 1-10 kb. In another embodiment, the average size of the chimeric molecular is about 1-2 kb, or about 1-10 kb, or about 4-10 kb. [0019] In an embodiment, an average size of the plurality of fragments is about 90 bp when MseI, BfaI, CviQI and NdeI are used. [0020] In an embodiment, a kit for constructing a plurality of chimeric DNA molecules is disclosed. The kit may comprise i) two or more restriction enzymes; ii) a DNA ligase; and iii) an instruction to react a source DNA simultaneously with the two or more Attorney Docket No.593850 restriction enzymes and the DNA ligase to generate the plurality of chimeric DNA molecules. In one aspect, the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments, and the DNA ligase ligates the smaller DNA fragments into the plurality of chimeric DNA molecules. In one embodiment, the two or more restriction enzymes comprise two or more members selected from the group consisting of MseI, BfaI, CviQI, and NdeI. In another embodiment, the two or more restriction enzymes comprise two or more members selected from the group consisting of XbaI, SpeI, AvrII and NheI. In another embodiment, the DNA ligase is selected from the group consisting of T3 DNA ligase, T4 DNA ligase, and T7 DNA ligase. [0021] In an aspect, a method for reconstructing DNA for single cell genomics comprises: a) lysing single isolated cells to expose genomic DNA; and b) simultaneously reacting the genomic DNA with: i) two or more restriction enzymes, wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein the two or more restriction enzymes cut the source DNA molecule into smaller fragments; ii) short barcodes, wherein the barcodes comprise compatible overhangs to the overhangs produced by the two or more restriction enzymes; and iii) a DNA ligase, wherein the DNA ligase assembles the smaller fragments and barcodes into the chimeric DNA molecule; c) amplifying the chimeric DNA molecule; and d) sequencing the chimeric DNA molecule. In an embodiment, the single cells are prepared by a process comprising: a) dissociating and suspending a cell population; b) diluting the cell population to form a cell solution; and c) transferring the cell solution to a container such that single cells are isolated. BRIEF DESCRIPTION OF THE DRAWINGS [0022] Figure 1 shows an integrated workflow for intraoperative histological and molecular diagnosis. [0023] Figures 2A-2C depict the iSCORED reaction. FIG. 2A shows an example of iSCORED reactions using restriction enzymes generating compatible 5’ CTAG overhangs: 75% (12/16) of random ligations reach irreversible end products (black line) and 25% of them remain susceptible for further digestion (brown line). FIG. 2B shows a list of detailed ligation Attorney Docket No.593850 possibilities (shaded boxes show futile ligations). FIG. 2C shows a comprehensive analysis of digestion frequency, generated fragments, and normalized ligation efficiencies across entire human genomes. (Blue= T2T-CHM13, Brown= GRCh38). [0024] Figures 3A-3D show copy number variation (CNV) detection using the iSCORED method of the present disclosure. FIGs.3A-3B show the schematic reaction using 5’ TA cocktail restriction enzymes to create irreversible hybrid products. Sequencing the assembled products provides a structural overview of the entire genome (dashed lines = no alignment). FIG. 3C shows that reaction efficiency peaks at 30 mins. N50 = weighted median of sequencing length. FIG. 3D shows that the sequenced units from 30 mins of reaction align to the GRCh38 reference genome. Zoom-in fragment distribution in EGFR gene region (green). [0025] Figures 4A-4C show the target mutation detection workflow. FIG.4A shows three hot spot mutation amplicons (black, green, and blue) are added with Type IIS sites (BsaI, recognition site in purple; cutting site in orange) for the iSCORED reaction. FIG. 4B shows randomly assembled amplicons that are sequenced and mapped to reference sequences. FIG.4C shows the alignment of forward sequencing results (IDH1 R132 and BRAF V600) to reference. [0026] Figure 5 shows the proposed workflow of single-cell CNV with iSCORED [0027] Figure 6 shows the iSCORED method for ultrafast copy number variation analysis. (a) iSCORED schematic showing simultaneous compatible end ligation with TA enzyme cocktail (T^TAA, C^TAG, G^TAC and CA^TATG by MseI, BfaI, CviQI and NdeI, respectively). (b) Long stochastically concatenated DNA molecules are analyzed with Nanopore device and aligned to the reference for genome-wide quantitative measurement. (c) The reconstruction efficiency of four iSCORED cocktail combinations are compared. The reaction is incubated at 37 ºC for 30 mins. CATG cocktail: NcoI (C^CATGG), PciI (A^CATGT), BspHI (T^CATGA). CTAG cocktail: NheI (G^CTAGC), SpeI (A^CTAGT), AvrII (C^CTAGG), XbaI (T^CTAGA). CG cocktail: MspI (C^CGG), HinP1I (G^CGC), HpyCH4IV (A^CGT) and TaqI- V2 (T^CGA). EcoRV is employed as a control since it generates blunt ends upon restriction digestion. (d) Optimization of iSCORED reaction by adjusting various experimental parameters, such as incubation periods, DNA ligases and intermittent mixing and cooling. (e) An oligodendroglioma sample was processed either sequentially (digestion, purification, and ligation), with iSCORED, or sequenced as native gDNA. Samples were normalized to contain Attorney Docket No.593850 the same amount of sequencing data. The number of unique fragments mapped per genomic bin are shown for each sample (left panels). The resulting CNV plots are shown in the right panels (resolution=600kb per bin). CoV for sequential approach, iSCORED and native gDNA sequencing are 0.57, 0.54 and 3.3, respectively. (f) Comparison of library preparation times across three methods. The sequential DNA digestion and ligation required 150 minutes, while the iSCORED required 75 minutes and the native DNA method required 45 minutes. The goal of intraoperative molecular diagnosis is achieved within 120-150 minutes of receiving the resected specimen. [0028] Figure 7 shows comprehensive analysis of candidate REs for iSCORED. (a) List of all enzymatic combinations for iSCORED (4 nt overhang in left panel and 2 nt overhang in right panel). The numbers denote available REs to produce such overhangs and numbers in the parentheses denote REs sensitive to CpG methylation. (b) In silico estimation of NTAN cocktail mix using the complete human genome database as a reference (T2T-CHM13). (c) Quantitative electrophoresis (Agilent) to assess the digestion efficiencies of restriction endonuclease generating TA overhang and iSCORED reaction. [0029] Figure 8 shows distribution of concatenation numbers and sequenced lengths in iSCORED reactions utilizing cocktail combinations generating 2-nt and 4-nt 5’ overhangs. [0030] Figure 9 shows normalization of variable mapped fragments in predefined bins for accurate copy number detection. (a) The number of mapped fragments per bin fluctuates across the wild-type genome (intrinsic regional variability, IRV), yielding a relatively high coefficient of variation (CoV) of 0.68 and hampering detection of true outliers. (b) Extensive sequencing does not address the fluctuation due to IRV (left panel). Normalizing the samples with the control wild-type dataset, the CoV dramatically drops and stabilizes at ~1 million mapped fragments (right panel). (c) The control genome data displayed CoV of 0.09 after normalization (upper panel). Application of this approach allows for detecting regions of amplification in both chromosome 2 and chromosome 19 (defined as ratio>5). (d) Mixture of tumor with wild-type gDNAs shows that the amplified copies increase as the tumor percentage increases. Using Z values of 10 as cutoff, the genetic amplification CCNE1 in chromosome 19 could be reliably detected at 5% tumor purity with 500,000 mapped fragments. Attorney Docket No.593850 [0031] Figure 10 shows the correlation of regional variability between wildtype datasets. The X-axis in all panels represents the proportion of NA12878, which is used for normalization in all samples. The color in the scatter plots reflects the kernel density map where the blue and red data-points correspond to low and high density, respectively. [0032] Figure 11 shows minimum thresholds and mapped fragments to exclude false positive hits. (a) False positive hits were eliminated from the full datasets of control gDNA samples by setting a minimum threshold of 0.2 percentile. (b) The mapped fragments count of 200k or higher, in conjunction with the 0.2 percentile threshold is sufficient to eliminate all false positive hits. [0033] Figure 12 shows the method of determining the detection threshold for regions with low amplification. Focal gene amplification (11x) was detected in chromosome 2 of an adenosquamous carcinoma. [0034] Figure 13 shows the iSCORED pipeline which allows simultaneous methylation classification of primary CNS tumors. (a) Minimal methylation classification features are acquired within 45 minutes of MinION sequencing. (b) Calibrated methylation classification scores for glioblastoma, medulloblastoma and oligodendrogliomas with high tumor purity were calculated across multiple time points from the initiation of sequencing. (c) Correct methylation classification of CNS tumors depends on high tumor purity. (d) In silico mixture of glioblastoma, medulloblastoma and oligodendrogliomas with control brain tissue dataset at various ratios (total data quantity after one-hour of sequencing). [0035] Figure 14 shows methylation classification of primary CNS tumors. (a) Calibrated scores across various intervals from the initiation of sequencing. (b) Calibrated scores of the six correctly classified glioblastomas and corresponding tumor percentages by histological estimation. (c) Comparison of methylation classification with intact gDNA (non-iSCORED) and iSCORED datasets with three oligodendroglioma cases. [0036] Figure 15 shows workflow of iSCORED for ultrafast molecular diagnosis. [0037] Figure 16 shows assessment of reused MinION flowcells. a) Total sequenced data (Mb) and mapped fragments from MinION and flongle runs. The usable MinION flowcells generate (1.38 ^ 0.08) x 106 mapped fragments within one hour whereas flongles generate (2.9 ^ Attorney Docket No.593850 0.23) x 106 mapped fragments during 24 hours of sequencing. b) Exact sequencing data and mapped fragments of all samples in the reused MinION flowcells. [0038] Figure 17 shows the workflow of our intraoperative analysis pipeline. Before beginning to sequence a configuration file is modified to the appropriate run parameters. Once sequencing begins the analysis pipeline is initiated. It periodically gathers the new fast5 files, basecalls them, extracting modification information, filters them and aligns them with the appropriate parameters for methylation and CNV/amplification analysis. At 40 minutes the aligned files for methylation are merged, the methylation call pileup is produced and used to classify the methylation class of the sample. At 55 minutes the aligned files for the CNV/amplification analysis are merged, one read from each duplex pair is removed, and the fragments per bin are calculated and used in the generation of a CNV plot and amplification analysis. [0039] Figure 18 shows EGFR amplification of glioblastomas in current study. [0040] Figure 19 shows Table 1 showing comparison of CNV results between iSCORED and clinically validated assays. [0041] Figure 20 shows Table 2 showing comparison of CNV assays with iSCORED. DETAILED DESCRIPTION [0042] The present disclosure relates to a method (also referred to herein as iSCORED) of reconstructing large nucleic acid molecules by employing simultaneous restriction endonuclease and DNA ligase reactions in vitro. The reconstructed nucleic acids generated by the methods of the present disclosure may be sequenced following the reactions. [0043] The present disclosure comprises a method for rapid reconstruction of large nucleic acids by simultaneously employing restriction endonucleases and a DNA ligase in an in vitro reaction. A mixture of restriction enzymes capable of generating compatible overhangs are utilized to fragment large nucleic acid molecules into small fragments (units). Each unit exhibits compatible cohesive ends that are amenable to random ligation with other counterparts. Within the same reaction, DNA ligase catalyzes random re-ligation of all digested units. When the cohesive ends are produced from different restriction enzymes, the “hybrid” ligation produces a stable irreversible result (FIG. 3A). This unidirectional reaction is possible because of the Attorney Docket No.593850 staggered nature of DNA recognition sequence and actual phosphodiester bond breakage. As a non-limiting example, XbaI and SpeI recognize TCTAGA and ACTAGT, respectively. Upon enzymatic digestion, both release cohesive 5’CTAG overhangs, which could re-ligate to TCTAGA, ACTAGT, TCTAGT, and ACTAGA. While the first two products are susceptible to further digestion (by XbaI and SpeI, respectively, brown line in FIG. 3A), TCTAGT and ACTAGA are the final end products. The likelihood of forming such irreversible ligations increases with the number of different restriction enzymes with compatible ends (FIG. 3B). [0044] The methods described herein along with subsequent Nanopore sequencing allows for the capability to detect copy number variations (CNVs) and molecular mutations that are (1) diagnostically essential and/or (2) therapeutically targetable. [0045] In another embodiment, the use of Nanopore sequencing with the method of the present disclosure helps identify key molecular alterations of neoplasms intraoperatively (in less than 120 minutes). FIG. 1 shows the integrated workflow for histological and molecular diagnosis. Fresh tissue acquired from the operation room is bisected for histological and molecular analysis. The result of conventional frozen section diagnosis provides valuable quality assurance of analyzed genomic DNA (green arrow). Molecularly, genomic DNA is extracted (20 mins; QIAmp DNA Micro Kit), followed by quality (Nanodrop, Thermo Fisher) and quantity (Qubit, Thermo Fisher) assessment. Approximately 300-500 ng of high-quality DNA (OD260/280 = 1.8-2.0) is reconstructed with iSCORED method, followed by sequencing adaptor ligation - and sequencing (55 mins). In parallel, selected diagnostic and targetable mutations are amplified by ultra-fast PCR (<25 mins, blue arrow), followed by iSCORED reaction (30 mins) and library preparation (10 mins). Generated data is analyzed in real-time to render a molecular diagnosis. The entire molecular diagnostics workflow takes 120 mins. The simplicity and time- effectiveness of this reaction allows for rapid library preparation, which makes it possible for intraoperative molecular diagnosis. The randomly assembled DNA fragments acquire its uniqueness and could be used as unique molecular identifiers (intrinsic barcodes) to trace its original templates, circumventing the long-standing PCR amplification bias problem in single cell whole genome amplification. [0046] The iSCORED circumvents sequential experimental steps for stochastic DNA construction (i.e. DNA fragmentation by mechanical shearing and/or enzymatic digestion, Attorney Docket No.593850 followed by purification, and ligation). Sequential experimental steps (mechanical shearing and/or enzymatic digestion, purification and ligation) could reach similar DNA reconstruction results; however, it is not possible to accomplish this within short periods of time (120-150 mins in the intraoperative molecular diagnosis application). Also, the multistep procedures prohibit its application to the cellular level given minute amount of genomic DNA in single cells (5-10 pico- grams). [0047] The instant disclosure may be further illustrated by the following items: [0048] Item 1: A method for constructing a plurality of chimeric DNA molecules, comprising: [0049] a) providing a source DNA molecule; and [0050] b) simultaneously reacting the source DNA molecule with: i) two or more restriction enzymes; and ii) a DNA ligase to obtain the plurality of chimeric DNA molecules, wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments, and the DNA ligase ligates the smaller DNA fragments into the plurality of chimeric DNA molecules. [0051] Item 2: The method of Item 1, wherein the two or more restriction enzymes comprise two or more members selected from the group consisting of MseI, BfaI, CviQI, and NdeI. [0052] Item 3: The method of Item 1, wherein the two or more restriction enzymes comprise two or more members selected from the group consisting of XbaI, SpeI, AvrII and NheI. [0053] Item 4: The method of any preceding Items, wherein the DNA ligase is selected from the group consisting of T3 DNA ligase, T4 DNA ligase, and T7 DNA ligase. [0054] Item 5: The method of any preceding Items, wherein step (b) comprises two reaction processes (b1) and (b2), wherein in process (b1), the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments, and in process (b2), the DNA ligase ligates the smaller DNA fragments having complementary overhangs into a larger DNA Attorney Docket No.593850 fragment, wherein the processes (b1) and (b2) are allowed to repeat for N cycles to generate the chimeric DNA molecule, N is an integer greater than 5. [0055] Item 6: The method of any preceding Items, wherein process (b2) comprises two types of ligations, Type I and Type II ligations, wherein type I ligation results from ligation of two sticky ends generated by same restriction enzyme and can be cut by the same restriction enzyme again, and Type II ligation results from ligation of two sticky ends generated by two different restriction enzymes and cannot be cut by the any of the two different restriction enzyme again. [0056] Item 7: The method of any preceding Items, wherein the plurality of chimeric DNA molecules comprises more than 5 smaller DNA fragments ligated together through Type II ligation. [0057] Item 8: The method of any preceding Items, wherein the smaller DNA fragments have an average length of about 80-100 bp, or about 90 bp (base pairs) when using MseI, BfaI, CviQI and NdeI. [0058] Item 9: The method of any preceding Items, wherein the reaction of step (b) is completed within 30 minutes or less. [0059] Item 10: The method of any preceding Items, further comprising a step (c) of sequencing the plurality of chimeric DNA molecules. [0060] Item 11: The method of any preceding Items, wherein the sequencing is performed by using nanopore sequencing. [0061] Item 12: The method of any preceding Items, wherein result from the sequencing is analyzed to obtain information selected from the group consisting of copy number variation (CNV), methylation classification, and combination thereof. [0062] Item 13: The method of any preceding Items, wherein the source DNA molecule is obtained from a mammal. [0063] Item 14: The method of any preceding Items, wherein the source DNA molecule is obtained from a tumor sample of the mammal. Attorney Docket No.593850 [0064] Item 15: The method of any preceding Items, wherein the source DNA molecule is obtained during a surgery being performed on the mammal. [0065] Item 16: The method of any preceding Items, wherein steps (a)-(c) are performed before the surgery is completed and results from steps (a)-(c) are used to guide a surgeon performing the surgery or for implementation of molecularly targeted therapies. [0066] Item 17: A chimeric DNA molecule comprising a plurality of DNA fragments produced by cutting a source DNA fragment with two or more restriction enzymes, wherein the plurality of fragments is stochastically ligated together to form the chimeric DNA molecule, and wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein said chimeric DNA molecule are not digestible by said two or more restriction enzymes. [0067] Item 18: The chimeric DNA molecule of Item 17, wherein the chimeric DNA molecule comprises more than 5 of the plurality of DNA fragments. [0068] Item 19: The chimeric DNA molecule of any of Items 17-18, wherein an average length of the plurality of fragments is between 80-100 bp when using MseI, BfaI, CviQI and NdeI. [0069] Item 20: The chimeric DNA molecule of any of Items 17-19, wherein an average length of the plurality of fragments is 90 bp when using MseI, BfaI, CviQI and NdeI. [0070] Item 21: A method for reconstructing DNA for single-cell genomic analysis, comprising [0071] a) lysing single isolated cells to expose genomic DNA in the cells; [0072] b) simultaneously reacting the genomic DNA with: i) two or more restriction enzymes, wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments; ii) short barcodes, wherein the barcodes are unique DNA sequences comprising compatible overhangs to the overhangs produced by the two or more restriction enzymes; and iii) a DNA ligase, wherein the DNA ligase ligates the smaller DNA fragments and barcodes into a chimeric DNA molecule; [0073] c) amplifying the chimeric DNA molecule; and Attorney Docket No.593850 [0074] d) sequencing the chimeric DNA molecule. [0075] Item 22: The method of Item 21 wherein the single cells are prepared by a process comprising a) dissociating and suspending a cell population; b) diluting the cell population to form a cell solution, and c) transferring the cell solution to a container where single cells are isolated. [0076] Item 23: A kit for constructing a plurality of chimeric DNA molecules, comprising: i) two or more restriction enzymes; ii) a DNA ligase; and iii) an instruction to react a source DNA simultaneously with the two or more restriction enzymes and the DNA ligase to generate the plurality of chimeric DNA molecules; wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments, and the DNA ligase ligates the smaller DNA fragments into the plurality of chimeric DNA molecules. [0077] Item 24: The kit of Item 23, wherein the two or more restriction enzymes comprise two or more members selected from the group consisting of MseI, BfaI, CviQI, and NdeI. [0078] Item 25: The method of Item 23, wherein the two or more restriction enzymes comprise two or more members selected from the group consisting of XbaI, SpeI, AvrII and NheI. [0079] Item 26: The method of any of Items 23-25, wherein the DNA ligase is selected from the group consisting of T3 DNA ligase, T4 DNA ligase, and T7 DNA ligase. [0080] It will be readily apparent to those skilled in the art that the methods described herein may be modified and substitutions may be made using suitable equivalents without departing from the scope of the embodiments disclosed herein. Having now described certain embodiments in detail, the same will be more clearly understood by reference to the following examples, which are included for purposes of illustration only and are not intended to be limiting. Attorney Docket No.593850 EXAMPLES [0081] Example 1: Copy number variation analysis of tumor genome [0082] Sequencing randomly assembled genomic molecules provides insight into chromosomal integrity at a genome-wide level. This allows “sampling” the genome without actual sequencing its entirety. The concept was proposed and tested; however, the lengthy protocols make it incompatible with the intraoperative application. The iSCORED method of the present disclosure could fragment the genome into short DNA units and concatenate them into long molecules within 30 minutes. [0083] To maximize random sampling without overburdening the sequencing capacity, 4-mer and 6-mer Type IIP restriction enzymes were selected to digest genomic DNA into the smallest sufficient units for unique genomic mapping (FIGs. 3A-3B). Specifically, restriction enzymes producing 5’TA overhangs are used: MseI, BfaI, CviQI and NdeI collectively generate DNA fragments with a mean length of 90.03 bp (total cleavage sites of 34,624,810 with CHM13 reference genome). The DNA reconstruction efficiency peaks at around 20 to 30 mins, at which one read consists of up to 10 unites (FIG. 3C). MinION flowcells (R9.4.1) generate approximately 100,000-150,000 reads/hour (length < 5,000 bp; ONT). Normalizing to 100,000 reads (1 hour of sequencing), this platform generates an average genomic resolution of 3.3 kb covering 5.4% of the entire genome (FIG. 3D). An even higher resolution could be reached by continuing sequencing. This result is superior to current SNP- based Affymetrix OncoScan, which has an average genomic resolution of 9.6 kb and takes 2-3 days. [0084] Example 2: Point mutations of diagnostically and therapeutically important targets [0085] A panel of diagnostically important and/or therapeutically targetable mutations was selected (Table 1a, bold = available molecularly targeted drugs; number = hot spot mutations sites). Published NGS primer sets are used to increase the success rate (Thermo Fisher #4475346). First, an ultra-fast thermocycler (UF-100, Victory Scientific) with a high ramping rate (8.0 °C/sec) is used to amplify uniformly short PCR fragments (150-400 bp), aiming to minimize reaction time (<25 mins) and incorporate them into multiplex PCR. Second, the mixed amplicons are concatenated into long DNA fragments with the digestion/ligation Attorney Docket No.593850 protocol. This increases the coverage of individual genes with fewer required adaptors, taking full advantage of the long-read capacity of Nanopore sequencing. Table 1a. Mutation targets
Figure imgf000017_0001
[0086] To optimize irreversible reactions, the amplified DNAs are incorporated with Type IIS restriction enzyme sites during PCR reaction (i.e., BsaI; FIG. 4A). As opposed to conventional Type IIP restriction enzymes used for CNV analysis above, Type IIS restriction enzymes recognize asymmetric DNA sequences and cut outside of recognition sequences at a defined distance (NEB). Thus, the overhangs are not constrained by the enzyme recognition sequence, allowing completely irreversible ligation of digested PCR products. Unique overhangs are also used for complex cloning in the synthetic biology field (i.e., Golden gate cloning). [0087] Here, each PCR product was purposefully designed to expose a 5’ CATG overhang after Type IIS enzyme digestion - each amplicon loses Type IIS recognition sites and gains the ligation ability (FIG. 4A and FIG. 4B). All single hot spot mutation genes were successfully amplified within 25 mins (compared to 45 mins by C1000 Bio-Rad Thermal Cycler with the same protocol). The digestion/ligation reaction reaches peak ligation efficiency at 30 mins by electrophoresis analysis. Preliminary alignment results showed >3,000x coverage of sequence 25-30 potential genetic mutations at 1,000x coverage within 120 mins upon receipt of the specimen. Attorney Docket No.593850 [0088] Example 3: iSCORED for single-cell genomics [0089] iSCORED may be applied to single-cell genomics for unbiased quantification of genetic elements. To achieve this, two molecular barcode systems are employed: first, short internal barcodes with compatible 5’ overhangs are used in the single-cell iSCORED reaction to uniquely identify reads from the same cell. The process is compatible with the existing ligation reaction in iSCORED and will allow demultiplexing of sequencing reads for cellular origins. Second, the intrinsic barcode information is naturally embedded in the stochasticity of DNA reconstruction. Each chimeric DNA molecule after the iSCORED is essentially unique, and the feature will be used to deconvolute sequenced amplicons to the unique original templates. Thus, the genetic quantitative uncertainty due to amplification bias is circumvented. For instance, if gene X is duplicated, during stochastic DNA reconstruction, the two copies of gene X are ligated with different partners and form unique chimeric DNA molecules. [0090] The traceable information (intrinsic barcodes) is used to compile the same sequenced amplicons to generate original chimeric templates (deconvolution). Therefore, the necessary DNA amplification only increases the sequencing depth and accuracy of the chimeric molecules but does not create quantitative bias for the CNV analysis. [0091] Several steps are employed to adapt iSCORED for single-cell genomics (FIG. 5). First, homogenous cell populations are dissociated and suspended, such as fresh human buccal mucosal cells and frozen glioblastoma cells. The suspended cells are serially diluted and manually pipetted to 96 well plates. Each well contains 2.5-µl lysis buffer, and individually isolated cells will be lysed at 50" for 1 hour and 70" for 15 minutes. After that, 7.5-µl of iSCORED solution with internal barcode mixtures is added for stochastic DNA reconstruction and individual cell barcoding. Second, the barcoded chimeric DNA molecules are pooled and purified with phenol-chloroform extraction. End repair and dA-tailing are completed to enable ligation to adaptors with a single T overhang. Third, established Illumina NGS reagents are used for library amplification. Briefly, the NEBNext hairpin adaptors are ligated to the pooled dA- tailed chimeric DNA molecules, followed by USER® enzyme to remove uracils and PCR enrichment. The results are analyzed with quantitative electrophoresis to assess amplicon size distribution. To seamlessly ligate to Nanopore sequencing adaptors, Deep Vent (exo-) DNA polymerase (NEB) is used, which exhibits dA-tailing property and 5x higher fidelity than Taq Attorney Docket No.593850 polymerase. Finally, the amplicon library is purified and ligated to sequencing adaptors for Nanopore sequencing. The sequencing reads are deconvoluted to original chimeric templates and aligned to the reference genome (FIG.5). [0092] Example 4 Nanopore-based random genomic sampling for intraoperative molecular diagnosis [0093] In order to determine whether the instant methodology would work in a clinical setting, nanopore-based random genomic sampling was performed for intraoperative molecular diagnosis. [0094] Copy number variations (CNVs) contribute to cancer development and progression by activating oncogenes and inactivating tumor suppressor genes1–3. As a predominant class of genomic alterations, CNVs are also involved in a wide range of biological processes, including human evolution4–6, neurodegeneration7,8, and developmental disorders4,9–12. Despite the importance of CNVs in genomic biology, characterizing CNVs remains laborious and time-consuming. Current approaches rely on nucleotide hybridization5–7,10,11,13–15 and next- generation sequencing1–3,8, with a turnaround time of several days to weeks which could delay clinical therapeutic plans16,17. In contrast, Nanopore sequencing (Oxford Nanopore Technologies, ONT) is a channel-based device that provides real-time interpretation of long-read nucleotide sequences. Despite some success in application of Nanopore sequencing in ultrafast CNV diagnostics18,19, the genomic resolution is limited to 10Mb due to low numbers of aligned DNA fragments within a short sequencing timeframe19 [0095] Ultrafast high-resolution CNV detection can be achieved by analyzing randomly concatenated DNA fragments. The approach enables the identification of multiple mappable DNA fragments in one sequencing read, thus optimizing sequencing efficacy. By sequencing a fraction of randomly assembled genomic fragments, the genome-wide chromosomal integrity can be quantitatively assessed. While previous attempts involving quantitative genomic analysis have been introduced, the methods are lengthy and require sequential mechanical shearing and/or enzymatic digestion, purification, and ligation20,21. An important rate-limiting factor has been the lack of a highly efficient method to reliably process genomic DNA in one reaction. Attorney Docket No.593850 [0096] Here, we introduce iSCORED, irreversible Sticking Compatible Overhang to Reconstruct DNA, a method to stochastically reconstruct genomic DNA by a simultaneous digestion and ligation reaction. The technology features a simple and time-effective sequencing library preparation that allows ultrafast assessment of genome-wide copy number variations in the intraoperative setting (2-2.5 hours after receiving specimen). Furthermore, by leveraging the epigenetic modifications available through Nanopore sequencing, we demonstrated the feasibility of concurrent methylation classification in primary CNS tumors. The pipeline was applied to a cohort of 26 intracranial neoplasms, consisting of 17 primary CNS tumors and 9 metastatic tumors. The results were compared to the clinically validated tests next generation sequencing and chromosomal microarray test results22 generated in a Clinical Laboratory Improvement Amendments (CLIA) certified laboratory. Methods- DNA extraction from OCT embedded frozen tissue [0097] The study was approved by the Institutional Review of Bord (IRB) of Dartmouth-Hitchcock Medical Center (DHMC, STUDY02001960). [0098] The banked samples were retrieved from the institutional biorepository at the DHMC. Genomic DNA was extracted using the DNeasy Blood & Tissue Kit (Qiagen #69504) with minor modifications for ultrafast extraction. Briefly, 5-10 scrolls of tissue (>5mm x 5mm) ]LXL YLJZPUTLK HZ - cS ZOPJQTLYY UTZU IRHTQ YRPKLY PT H JX_UYZHZ SHJOPTL& 3 VXLSP^LK YUR[ZPUT JUTZHPTPTN )0( cR UM ZPYY[L R_YPY I[MMLX "I[MMLX 3E=# HTK * cR UM C?HYL 3 "EOLXSU DJPLTZPMPJ #01236994) was added onto the slide and the tissue was scrapped off, then transferred to a 1.5 ml 7VVLTKUXM Z[IL& EOL Z[IL ]HY PTJ[IHZLK HZ +/ b5 MUX * SPT[ZLY$ HMZLX ]OPJO *( cR UM VXUZLHYL < was added to the reaction, followed by an additional incubation at 56 ºC for 8 minutes. Buffer 3= "*(( cR# HTK V[XL LZOHTUR "*(( cR# ]LXL ZOLT Y[IYLW[LTZR_ HKKLK ZU ZOL XLHJZPUTY& EOL MPTHR reaction was mixed thoroughly by vortexing and added to the spin column inserted to the \HJ[[S ZU MHJPRPZHZL ZOL L^ZXHJZPUT VXUJLK[XLY& DLW[LTZPHR I[MMLX 3F) "-(( cR# HTK 3F* "-(( cR# ]LXL HKKLK ZU ]HYO ZOL JUR[ST$ ]OPJO ]HY JLTZXPM[NLK HZ *($((( ^ N MUX +( YLJUTKY MUX MPTHR JRLHT[V& EOL 6?3 ]HY LR[ZLK ]PZO -(%/- cR UM 37 I[MMLX HTK PZY W[HTZPZ_ HTK W[HRPZ_ ]LXL checked using Nanodrop and Qubit instruments (ThermoFisher). iSCORED Reaction Attorney Docket No.593850 [0099] Approximately 200-400 ng of input gDNA was used for the iSCORED reaction, followed by bead purification for Nanopore sequencing. The reaction mixture JUSVXPYLK )- cR PT ZUZHR$ ]OPJO PTJR[KLK W[PJQ RPNHYL I[MMLX ",&- cR#$ W[PJQ RPNHYL ") cR$ ?74 7.(-.# HTK ?E3? JUJQZHPR SP^ ") cR2 LW[HR HSU[TZY UM >YL;$ 4MH;$ 5\PB; HTK ?KL; MXUS ?74#& The reaction was incubated at 37 ºC for 30 minutes with intermittent cooling/agitation to enhance ligation. Specifically, the reaction was agitated at 900 rpm on the 18 ºC pad at reaction ZPSL VUPTZY UM 1%)($ ),%)-$ )1%*( HTK *,%*- SPT[ZLY& 7TK XLVHPX'K3 ZHPRPTN I[MMLX ")&/- cR# HTK LT`_SL SP^ "(&/- cR$ ?747/-,.# ]LXL HKKLK ZU ZOL SP^Z[XL$ ]OPJO ]HY ZOLT PTJ[IHZLK HZ *( ºC for 5 minutes and 65 ºC for 5 minutes. For final ligation to the Nanopore motor proteins, a MXLYOR_ SHKL VXLSP^ UM RPNHZPUT I[MMLX "=?4$ , cR#$ HKHVZUX VXUZLPT "3>8$ )&- cR$ @?E =D<))( MUX C1&,&) MRU]JLRRY2 =3 )&- cR @?E =D<)), MUX C)(&,&) MRU]JLRRY# HTK W[PJQ RPNHYL ")&- cR# were added to the reaction solution and incubated for 10 minutes at room temperature (20-22 ºC) ]PZO PTZLXSPZZLTZ SP^PTN& 8PTHRR_$ )( cR UM KK:2@ HTK ), cR UM A79 KLVXP\LK 3>A[XL GA ILHKY (Beckman Coulter A63881) were added for standard magnetic bead purification. Of note, the AMPure XP beads were re-suspended in 2.5 M NaCl to eliminate PEG 8000 in the original YUR[ZPUT& EOL ILHKY ]LXL ]HYOLK Z]PJL ]PZO RUTN MXHNSLTZ I[MMLX "=84$ 0( cR# ILMUXL LR[ZPTN PTZU )* cR LR[ZPUT I[MMLX& 3VVXU^PSHZLR_ +(%-( TN UM RPNHZLK 6?3 ]LXL RUHKLK MUX MRUTNRL flowcells (R9.4.1), 100 ng were used for MinION flowcells (R9.4.1) and 50-75 ng for PromethION flowcells (R10.4.1). Flowcell reuse [00100] The MinION (R9.4.1) and PromethION (R10.4.1) flowcells were washed for YLW[LTZPHR X[TY I_ [YPTN ZOL MRU]JLRR ]HYO QPZ "FD:((,%G=#& 4XPLMR_$ ,(( cR UM MRU]JLRR ]HYO SP^ "+10 cR UM ]HYO KPR[LTZ HTK * cR UM ]HYO SP^# ]LXL RUHKLK ZU ZOL VXPSPTN VUXZ ZU HRRU] H DNase I reaction for 60 minutes at room temperature. The reaction solution was removed from ZOL ]HYZL VUXZ HTK YZUXHNL I[MMLX "-(( cR# ]LXL RUHKLK PTZU ZOL VXPSPTN VUXZ ILMUXL YZUXPTN HZ , ºC for next use. A minimum of 800 active pores by flowcell check were required for a successful run. Following the protocol, a typical MinION flowcell (R9.4.1) could be re-used for 5-7 times. Normalization of intrinsic regional variability [00101] Control human genomic DNA (gDNA; NA12878 from Coriell Institute) was first processed with iSCORED and sequenced extensively (16,564,873 mapped fragments) to Attorney Docket No.593850 establish a reference dataset for normalization to equivalent bins. The bins with counts below 0.2th percentile were removed to eliminate false positive bins as tested in four WT datasets (NA20967, NA12878, NA24385 and NA24631 from Coriell Institute; Fig2_ S2). An array of the proportion was segmented at the chromosomal level to optimize the resolution of the normalization and increase the sensitivity at which outliers are detected. A normalized vector of ratios (r1), identical in size to the reference array, was created. The distribution of this vector, at the chromosomal level, was employed to detect outliers using three components: i) a threshold of 5 to filter the elements in r1 that were not greater than a specific threshold =5, ii) Z scores to determine the statistical significance of the deviation from the distribution, and iii) the presence of surrounding outlier bins (a minimum of two consecutive bins must be present for a set of datapoints to be considered as outliers). Coefficient of variation calculation across genome [00102] Similar to what was described above, a normalized vector was created using NA12878 as a reference. For this approach, we used three other commonly used control gDNAs (NA20296, NA24385 and NA24631 from Coriell Institute). Each one of these datasets was segmented into independent data subsets (with no overlapping fragments) that vary in the number of mapped fragments. The number of fragments in these datasets were 70k, 200k, 300k, 400k, 500k, 600k, 700k, 800k, 900k, 1M, 1.25M, 1.75M, 2M. For each datasets, the coefficient of variation (CoV) across the genome was calculated to assess the variability. In addition, the behavior of the CoV as the number of fragments change was assessed with the first order derivative of the CoV function. Basecalling and read filtering [00103] Fast5 files were converted to pod5 with ONT’s pod5-file-format (https://github.com/nanoporetech/pod5-file-format) and basecalled with ONT’s Dorado v0.2.4
Figure imgf000022_0001
using dna_r9.4.1_e8_fast@v3.4 for r9.4.1 or dna_r10.4.1_e8.2_400bps_sup@v3.5.2 for r10.4.1 with the following setting: --modified-bases 5mCG --emit-moves. The resulting unmapped SAM (uSAM) files were converted to fastq using samtools fastq -TMM, ML to carry the methylation information forward into the fastq header. The resulting FastQs were first processed with Porechop (v0.2.1, to trim adapter sequences and split reads with internal
Figure imgf000022_0002
Attorney Docket No.593850 adapters. Filtered with NanoFilt (2.8.0, [pmid: 29547981]) to remove rare reads greater than 15kb which represented native genomic reads that did not contribute to our analysis. Read processing into aligned fragments [00104] The following was adapted from Prabakar36 et al. Briefly, filtered reads were aligned to GRCh37/hg19 using bwa-mem37 (v0.7.17) with the following settings: -x ont2d -k 12 -W 12 -A 4 -B 10 -O 6 -E 3 -T 120. These settings allowed the segmentation of the concatenated reads into individual fragments that were aligned to their respective genomic regions. To ensure accurate quantitative CNV analysis, the duplex reads were identified and excluded from the original uSAM with ONT-Duplex Tools (https://github.com/nanoporetech/duplex-tools). A single member of each pair was then removed from the SAM file using a list of readIDs and Picard (https://github.com/broadinstitute/picard). The genome was then subdivided into either 5,000 or 50,000 genomic bins for CNV and amplification analysis, respectively, and mapped fragments per bin were calculated. CNV analysis [00105] CNV analysis was performed using the Smurf-seq analysis pipeline [pmid: 31287019]. Counts of uniquely mapped fragments to the 5,000 bins in the human genome were normalized for biases in GC content, finally an implementation of DNAcopy38 (v1.74.1) using circular binary segmentation identified breakpoints in bin counts. Output table and graph [00106] The output table contained a list of at least two consecutive statistically significant outliers (i.e., bins) to minimize the potential of identifying isolated/noisy outliers due to individual genome variation. The table displayed the corresponding position, ratio, Z score at the chromosomal level of each sample, along with commonly amplified gene(s) found in these bins of interest. All annotated genes from the hg19 reference genome were included in the table with 75 commonly amplified genes highlighted in red. The graph was automatically generated if there are significant bins in the sample of interest. Methylation calling and tumor classification [00107] The following were adapted from Patel et al. Rapid-CNS2 pipeline34. The filtered reads were aligned to GRCh38/hg38 using bwa-mem37 (v0.7.17) with the following Attorney Docket No.593850 settings: -x ont2d -k 12 -W 12 -A 4 -B 10 -O 6 -E 3 -T 120 -C -Y. Similar to the CNV analysis this broke up the reads into the individual fragments, the -C command allowed the SAM tags for methylation to be moved from the fastq header back into the SAM file. The -Y command turned off soft-clipping which interferes with downstream methylation extraction. The per site methylation is extracted using mbtools (https://github.com/jts/mbtools). A custom python script converted the bedfile to make it compatible with Rapid-CNS2. The called methylations were processed using Rapid-CNS2 which uses a random forest classifier trained on Illumina BeadChip 450 K methylation array from the Heidelberg reference cohort of brain tumor methylation profiles39. [00108] To determine the minimum time required for methylation classification, we simulated the collection of methylation data over time using samples that had been sequenced for >60 minutes. The data were subdivided into several bins (10, 20, 30, 45, 60 minutes). Sequencing start time was recovered from the uSAM read header and the aligned SAM was filtered accordingly. The data was then processed using our standard analysis pipeline as described above to extract the number of detected methylation features, the methylation classification, and the calibrated scores at each time point. [00109] To study the role of tumor percentage in methylation classification, gDNA from control human frontal lobe was processed with iSCORED. The resulting reads were in silico admixed with datasets from a medulloblastoma, an oligodendroglioma, or a glioblastoma, all of which had >90% tumor percentage and calibrated scores of ~0.99 in methylation classification. Reads equivalent to an hour of sequencing on the MinION at different ratios of tumor to control (tumor percentages of: 0-100 in intervals of 10) were used for methylation classification using the Rapid-CNS2 pipeline. Computer setting [00110] Our computer system included an Intel® Core™ i9-12900K Processor, 24 cores, 64 GB of RAM, 2Tb of storage and an RTX 3090Ti. Results Concurrent fragmentation and concatenation of DNA molecules [00111] The central concept of iSCORED is simultaneous digestion and ligation of DNA molecules by utilizing a panel of restriction endonucleases (REs) capable of generating Attorney Docket No.593850 compatible cohesive ends. Within the same reaction, DNA ligase catalyzes random re-ligation of the digested fragments (concatemer reconstruction). Irreversible ligation products are generated when cohesive ends are produced by different restriction enzymes. This unidirectional reaction is possible because of the staggered nature of the DNA recognition sequences and actual phosphodiester bond breakage sites (Fig 6a). Using restriction enzymes capable of generating CTAG overhangs as an example, the digested fragments were concatenated to larger chimeric molecules in the presence of DNA ligase. The likelihood of forming such irreversible ligations increases with the number of different restriction enzymes producing compatible ends. [00112] Systematic analysis and optimization of all overhang candidates [00113] We next examined all existing 4-mer and 6-mer Type IIP REs capable of generating 2-nucleotide and 4-nucleotide overhangs (Fig 7). Given the palindromic nature of Type IIP REs, there are 16 (=42) and 4 (=41) possible combinations for 4-nucleotide and 2- nucleotide overhangs, respectively. Depending on the RE recognition sequence (4 or 6 bp), the same overhang could be generated by 41 or 42 different enzymes; however, some of the theoretical combinations do not exist, and some are partially or completely blocked by DNA methylation (Fig 7a). [00114] We tested the top four overhang candidates that had the highest number of RE combinations while exhibiting the least possibility of methylation inhibition (Fig 6c). To quantitatively measure the reconstruction efficiency (i.e., the number of uniquely mapped fragments per sequencing read), we compared combinations generating 4-nt overhangs with those generating 2-nt overhangs. Surprisingly, we found that the efficiency of 4-nt overhang combinations is not superior to that of 2-nt overhang combinations (Fig 6c, Fig 8). This is presumably due to the much higher frequency of RE recognition sites. By defining the reconstruction efficiency as numbers of DNA fragments per read, the most efficient combination was the TA overhang cocktail mix that consisted of MseI, BfaI, CviQI, and NdeI, resulting in a mean reconstruction efficiency of 4.6 and mapped fragment of 120 bp. To further optimize the iSCORED reaction, we tested various incubation periods and DNA ligases (Fig 6d). Our experiments revealed that an incubation period of 30 minutes at 37 ºC with intermittent agitation at 18 ºC (900 rpm) yielded the highest mean reconstruction efficiency of 8.7 (Fig 6d). This experimental condition was thus utilized for the remainder of the study. Attorney Docket No.593850 [00115] Genomewide aneuploidy detection in tumors [00116] To detect large CNV (>10 Mb) and aneuploidy, the sequenced reads were first segmented into individual fragments. These uniquely mapped fragments were then filtered for W[HRPZ_ "HRPNTSLTZ YJUXLY a )*($ >LZOUKY MUX KLZHPRY# HTK HYYPNTLK ZU VXLKLMPTLK NLTUSPJ IPTY (600 kb) for quantitative analysis. High numbers of mapped fragments per bin generated low variability between bins and this helped ensure high confidence in the resulting CNV plot. Finally, circular binary segmentation23 through DNACopy24 was employed to identify copy number alterations across genomic bins. [00117] The performance of the iSCORED pipeline was compared to the conventional sequential approach (i.e. digestion, purification and ligation) and unprocessed native gDNA. By normalizing the datasets to the same amount of total DNA sequence, the iSCORED exhibited a 16-fold increase in the number of fragments compared to native gDNA sequencing (Fig 6e). The significant increase in fragment count resulted in low variability, which was critical for detecting copy number changes with high confidence. Specifically, the coefficient of variations (CoV) for the sequential approach, iSCORED and native gDNA sequencing were 0.57, 0.54 and 3.3, respectively. Detection of large CNV and aneuploidy by iSCORED showed 100% concordance with clinically validated chromosomal microarray data (Table 1) and also demonstrated a much higher resolution than short-read based analysis19 within a comparable timeframe of 2 hours (Fig 6f, Table 2). [00118] Refinement of quantitative measures to detect copy number variations [00119] While large bins are effective in detecting aneuploidy, most clinically-relevant gene amplifications occur within a range of hundreds of kilobases to a few megabases25,26. In such cases, using a large bin could result in averaging out the dose change, leading to decreased accuracy of small amplifications. Thus, we refined the bin size to 60 kb, which was similar to the genomic resolution of clinically validated chromosomal microarray analysis (CMA). When using the control human genome (NA12878) to quantify the total mapped fragments in refined 60-kb bins, the numbers mapped fragments fluctuated substantially across the predefined bins (Fig.7a). Since the iSCORED is a restriction enzyme-based method, this finding was presumably due to variations in the density and distribution of restriction enzymes’ cutting sites. We had termed the phenomenon intrinsic regional variability (IRV) (Fig 10). The background noise might allow for Attorney Docket No.593850 tolerating outliers driven by true underlying copy number changes and affects the detection accuracy. In addition, this fluctuation behavior was inversely related to the amount of the data acquired. Hence, this finding was characterized in the context of the corresponding number of fragments in the genome. [00120] For this purpose, we utilized the coefficient of variation (CoV)27 as a quantitative index, and performed time-lapsed analysis of the sequenced control samples. After comparing the sub-datasets with varying numbers of mapped fragments, we found that significant fluctuations reached a plateau around a CoV of 0.68, even with extensive sequencing (Fig 9b, left panel). To address this issue, we performed bin-specific normalization by calculating a ratio of the mapped fragments in the sample of interest to those in the commonly used control reference genome (NA12878). This significantly reduced the observed genomic fluctuation by approximately four-fold (Fig 9b, right panel). The CoV of the normalized data was substantially reduced to 0.09 down from 0.68 in the corresponding non-normalized data. [00121] This design also allowed us to infer the required number of total mapped fragments to reliably identify regions of copy number change. When investigating the slope28 of CoV as a function of the acquired fragments, the inflection point was at a datapoint with mapped fragments of <500k, while the first order derivative function approached a value of zero (0) at about one million mapped fragments (Fig.9b, right panel). Thus, we determined that acquiring approximately one million mapped fragments was sufficient to reliably detect gene amplification. Bin-specific normalization helped to effectively mitigate the effects of intrinsic regional variability and detect copy number variation. [00122] Tumor purity in detecting gene amplification [00123] The iSCORED pipeline was first used to analyze a metastatic adenosquamous carcinoma from the head and neck region (M9 in Table 1). A CCNE1 gene amplification was detected in chromosome 19 (ratio=70, inset in Fig 9c), accurately identifying a known finding originally detected by next generation sequencing. [00124] To determine the minimum tumor percentage required for reliably detect CCNE1 amplification, we assessed different compositions of control gDNA mixed with tumor gDNA. This revealed an increasing amplification as tumor percentage increased. To define the accuracy of detection, we implemented a Z-score cutoff of 10. We were able to detect CCNE1 Attorney Docket No.593850 amplification in samples with tumor percentage as low as 5%, using only 500k mapped fragments (Fig 9d). Additionally, low copy number gain (ratio=11) was also reliably detected with the same parameter, albeit at a higher tumor percentage and with more fragments (20% and 1.5 million fragments, respectively, Fig 12). Overall, our results demonstrate the effectiveness of the iSCORED pipeline in detecting gene amplifications even in samples with low tumor purity. [00125] Comprehensive analysis of the cohort [00126] We simulated the intraoperative molecular diagnosis by performing blind testing of a cohort of 26 intracranial neoplasms, including 17 primary CNS tumors and 9 metastatic tumors. The performance was timed and the findings were compared to the results from clinically validated next-generation sequencing (TruSight® Tumor 170) and chromosomal microarray analysis (Affymetrix OncoScan®)22. [00127] Within one-hour of MinION sequencing, an average 344 Mb (± 24 Mb) of data were generated, corresponding to 1.38 million (± 0.08 x 106) mapped fragments (Fig 16). This is higher than the predetermined required data quantity for confident CNV detection (1x106 mapped fragments, Fig 9b). Across the 26 investigated samples, the diagnostic accuracy of our approach was 100% in detecting gene amplification of more than 10 copies (95% confidence interval29: 91%-100%, Table 1). One sample was detected to have MYB amplification (ratio of 10.5; case M8) by the iSCORED pipeline, a finding that was not originally uncovered by TST 170 panel but was later verified by a whole exome NGS study (13 copies). [00128] The output genomic graph from the iSCORED pipeline provided precise information on amplified regions and the confidence of detected outliers. EGFR amplification is a molecular defining alterations in glioblastomas. In the six EGFR-amplified glioblastomas we examined, the average amplification regions spanned 1.66 ± 0.44 Mb with an average of 150.5 ± 47 copies. These samples also exhibited varying regions of amplification and doses, which is consistent with the known heterogeneity of glioblastoma30,31 (Fig 18) [00129] Cost-effectiveness [00130] To reduce per-test costs, we evaluated the feasibility of re-using flowcells by using the ONT flow cell wash protocol (WSH004) after running each flow cell. Our results indicated that MinION flowcells can be re-used up to 7 times, before it dropped below the Attorney Docket No.593850 minimum requirement of 800 active pores. By employing the wash protocol, our results still demonstrated 100% consistency with independent experiments using new flongle flowcells, indicating the absence of detectable carryover between experiments (Fig 16). The ability to reuse flow cells resulted in a sequencing cost of $125 per sample (Table 2). [00131] Simultaneous methylation classification by iSCORED [00132] Methylation classification of tumor types has emerged as an important diagnostic tool in clinical practice. Brain tumors in particular have benefited from the Heidelberg methylation classifier which classified 91 tumors across 2801 samples32. Given that ONT sequencing can identifying 5-methycytosine (5mC) from native DNA with no additional sample preparation, we extracted methylation information from our sequencing data and classified it with Rapid-CNS2, a machine learning-based classifier that has been trained on the Heidelberg dataset. [00133] To evaluate the reliability of methylation classification over time, we processed MinION data at five timepoints (10, 20, 30, 45, and 60 minutes) and extracted the number of methylation features that overlapped with the 100k most variable features from the Heidelberg dataset (Fig. 13a). Within 45 minutes, all samples had identified more than 1000 CpG features in all samples, this is the cut-off determined by the original authors33, and 10 out of 14 samples were correctly classified into tumor subclasses (Fig 13b and Fig 14). To investigate if the poor classification scores and misclassification were due to short fragments generated in iSCORED, the results of three oligodendroglioma samples were processed using iSCORED and compared against unprocessed native DNA sequencing results, as in the original publication34. The results revealed comparable classification scores (Fig 14), indicating that fragmentation in iSCORED did not appear to affect the accuracy of methylation classification. [00134] As the majority of glial neoplasms exhibit an infiltrative growth pattern, resected tumors are often mixed with normal brain parenchyma and inflammatory cells, which could significantly affect the methylation classification accuracy. This is supported by a positive correlation between histologically assessed tumor purity and classification score (Fig 14). To further assess the impact of tumor purity on classification accuracy, we in silico admixed data from three tumors with the highest classification scores with control CNS tissue (frontal cortex, Fig8d). Our analysis revealed a rapid drop in classification scores as tumor purity decreased. Attorney Docket No.593850 Specifically, when the tumor purity was 60%, all classification scores fell below the commonly accepted classification threshold (0.6). Furthermore, when the tumor purity was below 40%, the samples were consistently assigned to incorrect classes. Notably, the medulloblastoma, which had the highest tumor purity, was least affected by control tissue mixing. Therefore, tumor purity is critical for successful tumor methylation classification. [00135] Timeframe for iSCORED pipeline [00136] To facilitate intraoperative molecular diagnosis, an analysis pipeline was created that runs in conjunction with sequencing and finishes within minutes of sequencing completion. This pipeline includes a real-time basecalling process along with periodical filtering and alignment of samples for CNV, amplification, and methylation analysis. Once sufficient data is accumulated, the separate files are merged to finalize the analysis (Fig 17). The pipeline is designed to function on standard computers, eliminating the need for complex and expensive infrastructure (see Methods for specific setup). [00137] MinION sequencing typically requires 60 minutes to generate sufficient data. With the introduction of P2 Solo (ONT), four of our most recent specimens were analyzed using PromethION technology (R10.4.1). Remarkably, a mere 25 minutes of sequencing yielded 395 Mb (±55 Mb) of data on average, which corresponds to an average of 1.69 million (±37x106) mapped fragments (Fig 16). Finally, the iSCORED platform and the real-time processing pipeline automatically generate a genome-wide copy number report and methylation classification within 5 minutes and 20 minutes of completing MinION and PromethION sequencing, respectively (140 and 120 minutes after receipt of specimen, Fig 15). Thus, the iSCORED platform ensures an accurate, fast and inexpensive method for widespread clinical application. [00138] In this study, we present iSCORED, a novel method that can rapidly and affordably generate CNV profiles, detect gene amplifications, and classify tumors by their methylation status. While traditional NGS has had significant impacts in our understanding of the underlying molecular mechanisms of disease, in a clinical setting these technologies, platforms and chemistries are still limited by lengthy turnaround times16,17. Our approach leverages the power of Nanopore sequencing, which is capable of real-time sequencing DNA at speeds of 400 bases per second. While Nanopore has well known capabilities in long-read Attorney Docket No.593850 sequencing, it is not inherently optimized for shorter reads, such as the 120 bp long genomic fragments found in our study during pre-reconstruction. This is primarily due to the potential time wasted on sequencing the adapter regions or awaiting for a new molecule to reload into a pore. We were able to overcome this limitation by concatenating small fragments in the same reaction, significantly improving sequencing efficiency. The iSCORED-based assay produced CNV results that were 100% concordant with clinically validated results in all of our cases. Furthermore, Nanopore sequencing can detect 5-methylcytosine at CpG sites, facilitating tumor methylation classification without any additional sample preparation. Finally, the low cost per sample ($125), ease of infrastructure setup ($6,000-8,000) and unmatched turnaround time (140 minutes) collectively position our method as robust and an invaluable tool for broad-scale clinical applications (Table 2). [00139] The iSCORED platform demonstrates a high accuracy in detecting gene amplifications, with the thresholds set at 5-20% of tumor purities for high and low copy number amplifications, respectively. However, it is important to note that when it comes to methylation classification of primary CNS tumors, sensitivity to tumor purity becomes apparent. Using calibrated score of 0.6 as the cutoff, accurate classification typically requires tumor content exceeding 60%. The phenomenon is not unique to iSCORED platform but also present in other DNA methylation arrays32. Thus, it is crucial to assess the tissue quality and estimate the tumor percentage during the morphology-based intraoperative diagnosis (Fig 15). [00140] Somatic copy number alterations represent a major type of genetic mutations in cancer initiation, progression and treatment resistance. Among the various cancer types, ovarian carcinoma and sarcoma bear the highest burden of CNVs, accounting for approximately 80% of cases. Following closely are uterine carcinosarcoma and esophageal carcinomas, with approximately 75% presenting CNVs35. These findings underscore the importance of CNV analysis and the applicability of the iSCORED platform in other cancer types. Accurate identification and comprehensive understanding of the prevalence and implications of CNVs in these cancers are pivotal for advancing diagnostic accuracy, prognostic evaluation, and the development of tailored therapeutic interventions. Nanopore sequencing technology is rapidly improving. Currently, the iSCORED genomic resolution is defined and validated at 60 kb, based on the data generated by one hour of MinION sequencing. However, utilizing the recently released P2 Solo device, PromethION flowcells could generate 3-4 times more data in the same Attorney Docket No.593850 timeframe, thereby potentially achieving a 3-4 times higher resolution (15-20 kb per bin). Additional improvements that have been announced by ONT, such as flow cells that can sequence native DNA without adapter ligation, flow cells with 2.5x faster speeds and lower costs. These forthcoming Nanopore sequencing enhancements may potentially enable even more rapid and versatile versions of the iSCORED pipeline. [00141] The contents of all cited references (including literature references, patents, patent applications, and websites) that may be cited throughout this application or listed below are hereby expressly incorporated by reference in their entirety for any purpose into the present disclosure. The disclosure may employ, unless otherwise indicated, conventional techniques of microbiology, molecular biology and cell biology, which are well known in the art. [00142] The disclosed methods may be modified without departing from the scope hereof. It should be noted that the matter contained in the above description or shown in the accompanying drawings should be interpreted as illustrative and not in a limiting sense.
Attorney Docket No.593850 References The following references are incorporated herein in their entirety: 1. Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010). 2. Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013). 3. Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009). 4. Levchenko, A., Kanapin, A., Samsonova, A. & Gainetdinov, R. R. Human Accelerated Regions and Other Human-Specific Sequence Variations in the Context of Evolution and Their Relevance for Brain Development. Genome Biol. Evol. 10, 166–188 (2018). 5. Dumas, L. et al. Gene copy number variation spanning 60 million years of human and primate evolution. Genome Res. 17, 1266–1277 (2007). 6. Marques-Bonet, T. et al. A burst of segmental duplications in the genome of the African great ape ancestor. Nature 457, 877–881 (2009). 7. Heinzen, E. L. et al. Genome-wide scan of copy number variation in late-onset Alzheimer’s disease. J. Alzheimers Dis. JAD 19, 69–77 (2010). 8. Lee, W.-P. et al. Copy Number Variation Identification on 3,800 Alzheimer’s Disease Whole Genome Sequencing Data from the Alzheimer’s Disease Sequencing Project. Front. Genet. 12, 752390 (2021). 9. Hastings, P. J., Lupski, J. R., Rosenberg, S. M. & Ira, G. Mechanisms of change in gene copy number. Nat. Rev. Genet.10, 551–564 (2009). 10. Pinto, D. et al. Convergence of genes and cellular pathways dysregulated in autism spectrum disorders. Am. J. Hum. Genet.94, 677–694 (2014). 11. Pinto, D. et al. Functional impact of global rare copy number variation in autism spectrum disorders. Nature 466, 368–372 (2010). 12. Malhotra, D. & Sebat, J. CNVs: harbingers of a rare variant revolution in psychiatric genetics. Cell 148, 1223–1241 (2012). 13. Kallioniemi, A. et al. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 258, 818–821 (1992). 14. Wang, D. G. et al. Large-scale identification, mapping, and genotyping of single- nucleotide polymorphisms in the human genome. Science 280, 1077–1082 (1998). 15. Pinkel, D. et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet. 20, 207–211 (1998). 16. Ciriello, G. et al. Emerging landscape of oncogenic signatures across human cancers. Nat. Genet.45, 1127–1133 (2013). 17. Malone, E. R., Oliva, M., Sabatini, P. J. B., Stockley, T. L. & Siu, L. L. Molecular profiling for precision cancer therapies. Genome Med. 12, 8 (2020). 18. Gorzynski, J. E. et al. Ultrarapid Nanopore Genome Sequencing in a Critical Care Setting. N. Engl. J. Med. 386, 700–702 (2022). 19. Wei, S. et al. Rapid Nanopore Sequencing-Based Screen for Aneuploidy in Reproductive Care. N. Engl. J. Med. 387, 658–660 (2022). Attorney Docket No.593850 20. Wang, Z. et al. SMASH, a fragmentation and sequencing method for genomic copy number analysis. Genome Res.26, 844–851 (2016). 21. Prabakar, R. K., Xu, L., Hicks, J. & Smith, A. D. SMURF-seq: efficient copy number profiling on long-read sequencers. Genome Biol. 20, 134 (2019). 22. Jung, H.-S., Lefferts, J. & Tsongalis, G. Utilization of the oncoscan microarray assay in cancer diagnostics. Appl. Cancer Res. 37, (2017). 23. Olshen, A. B., Venkatraman, E. S., Lucito, R. & Wigler, M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostat. Oxf. Engl. 5, 557–572 (2004). 24. Seshan, V. E. & Olshen, A. DNAcopy: DNA copy number data analysis. 25. Hung, K. L. et al. Targeted profiling of human extrachromosomal DNA by CRISPR-CATCH. Nat. Genet. 54, 1746–1754 (2022). 26. Helmsauer, K. et al. Enhancer hijacking determines extrachromosomal circular MYCN amplicon architecture in neuroblastoma. Nat. Commun.11, 5823 (2020). 27. Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020). 28. Quarteroni, A., Sacco, R. & Saleri, F. Numerical Mathematics. (Springer New York Springer e-books Imprint: Springer, 2007). 29. Newcombe, R. G. Interval estimation for the difference between independent proportions: comparison of eleven methods. Stat. Med. 17, 873–890 (1998). 30. Parker, N. R., Khong, P., Parkinson, J. F., Howell, V. M. & Wheeler, H. R. Molecular heterogeneity in glioblastoma: potential clinical implications. Front. Oncol. 5, 55 (2015). 31. Louis, D. N. et al. The 2021 WHO Classification of Tumors of the Central Nervous System: a summary. Neuro-Oncol. 23, 1231–1251 (2021). 32. Capper, D. et al. DNA methylation-based classification of central nervous system tumours. Nature 555, 469–474 (2018). 33. Euskirchen, P. et al. Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing. Acta Neuropathol. (Berl.) 134, 691–703 (2017). 34. Patel, A. et al. Rapid-CNS2: rapid comprehensive adaptive nanopore-sequencing of CNS tumors, a proof-of-concept study. Acta Neuropathol. (Berl.) 143, 609–612 (2022). 35. Harbers, L. et al. Somatic Copy Number Alterations in Human Cancers: An Analysis of Publicly Available Data From The Cancer Genome Atlas. Front. Oncol. 11, 700568 (2021). 36. Prabakar, R. K., Xu, L., Hicks, J. & Smith, A. D. SMURF-seq: efficient copy number profiling on long-read sequencers. Genome Biol. 20, 134 (2019). 37. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma. Oxf. Engl. 25, 1754–1760 (2009). 38. Venkatraman E. Seshan, A. O. DNAcopy. (2017) doi:10.18129/B9.BIOC.DNACOPY. 39. Kuschel, L. P. et al. CUI[YZ SLZO_RHZPUTdIHYLK JRHYYPMPJHZPUT UM IXHPT Z[SU[XY using nanopore sequencing. Neuropathol. Appl. Neurobiol. (2022) doi:10.1111/nan.12856.

Claims

Attorney Docket No.593850 CLAIMS What is claimed is: 1. A method for constructing a plurality of chimeric DNA molecules, comprising: a) providing a source DNA molecule; and b) simultaneously reacting the source DNA molecule with: i) two or more restriction enzymes; and ii) a DNA ligase; to obtain the plurality of chimeric DNA molecules, wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments, and the DNA ligase ligates the smaller DNA fragments into the plurality of chimeric DNA molecules. 2. The method of claim 1, wherein the two or more restriction enzymes comprise two or more members selected from the group consisting of MseI, BfaI, CviQI, and NdeI. 3. The method of claim 1, wherein the two or more restriction enzymes comprise two or more members selected from the group consisting of XbaI, SpeI, AvrII and NheI. 4. The method of claim 1, wherein the DNA ligase is selected from the group consisting of T3 DNA ligase, T4 DNA ligase, and T7 DNA ligase. 5. The method of claim 1, wherein step (b) comprises two reaction processes (b1) and (b2), wherein in process (b1), the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments, and in process (b2), the DNA ligase ligates the smaller DNA fragments having complementary overhangs into a larger DNA fragment, wherein the processes (b1) and (b2) are allowed to repeat for N cycles to generate the chimeric DNA molecule, N is an integer greater than 5. Attorney Docket No.593850 6. The method of claim 5, wherein process (b2) comprises two types of ligations, Type I and Type II ligations, wherein type I ligation results from ligation of two sticky ends generated by same restriction enzyme and can be cut by the same restriction enzyme again, and Type II ligation results from ligation of two sticky ends generated by two different restriction enzymes and cannot be cut by the any of said two different restriction enzyme again after Type II ligation is completed. 7. The method of claim 6, wherein the plurality of chimeric DNA molecules comprises more than 5 smaller DNA fragments ligated together through Type II ligation. 8. The method of claim 1, wherein the smaller DNA fragments have an average length of about 80-100bp, or about 90 bp (base pairs) when using MseI, BfaI, CviQI and NdeI. 9. The method of claim 1, wherein the reaction of step (b) is completed within 30 minutes or less. 10. The method of claim 1, further comprising a step (c) of sequencing the plurality of chimeric DNA molecules. 11. The method of claim 10, wherein the sequencing is performed by using nanopore sequencing. 12. The method of claim 10, wherein result from the sequencing is analyzed to obtain information selected from the group consisting of copy number variation (CNV), methylation classification, and combination thereof. 13. The method of claim 10, wherein the source DNA molecule is obtained from a mammal. 14. The method of claim 13, wherein the source DNA molecule is obtained from a tumor sample of the mammal. Attorney Docket No.593850 15. The method of claim 13, wherein the source DNA molecule is obtained during a surgery being performed on the mammal. 16. The method of claim 15, wherein steps (a)-(c) are performed before the surgery is completed and results from steps (a)-(c) are used to guide a surgeon performing the surgery or for implementation of molecularly targeted therapies. 17. A chimeric DNA molecule comprising a plurality of DNA fragments produced by cutting a source DNA fragment with two or more restriction enzymes, wherein the plurality of DNA fragments is stochastically ligated together to form the chimeric DNA molecule, and wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein said chimeric DNA molecule are not digestible by said two or more restriction enzymes. 18. The chimeric DNA molecule of claim 17, wherein the chimeric DNA molecule comprises more than 5 of the plurality of DNA fragments. 19. The chimeric DNA molecule of claim 17, wherein an average length of the plurality of fragments is between 80-100 bp when using MseI, BfaI, CviQI and NdeI. 20. The chimeric DNA molecule of claim 17, wherein an average length of the plurality of fragments is 90 bp when using MseI, BfaI, CviQI and NdeI. 21. A method for reconstructing DNA for single-cell genomic analysis, comprising: a) lysing single isolated cells to expose genomic DNA in the cells; b) simultaneously reacting the genomic DNA with: i) two or more restriction enzymes, wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments; Attorney Docket No.593850 ii) short barcodes, wherein the barcodes are unique DNA sequences comprising compatible overhangs to the overhangs produced by the two or more restriction enzymes; and iii) a DNA ligase, wherein the DNA ligase ligates the smaller DNA fragments and barcodes into a chimeric DNA molecule; c) amplifying the chimeric DNA molecule; and d) sequencing the chimeric DNA molecule. 22. The method of claim 21 wherein the single cells are prepared by a process comprising a) dissociating and suspending a cell population; b) diluting the cell population to form a cell solution, and c) transferring the cell solution to a container where single cells are isolated. 23. A kit for constructing a plurality of chimeric DNA molecules, comprising: i) two or more restriction enzymes; ii) a DNA ligase; and iii) an instruction to react a source DNA simultaneously with the two or more restriction enzymes and the DNA ligase to generate the plurality of chimeric DNA molecules, wherein the two or more restriction enzymes have different restriction recognition sites but produce identical overhangs when cutting a DNA molecule, wherein the two or more restriction enzymes cut the source DNA molecule into smaller DNA fragments, and the DNA ligase ligates the smaller DNA fragments into the plurality of chimeric DNA molecules. 24. The kit of claim 23, wherein the two or more restriction enzymes comprise two or more members selected from the group consisting of MseI, BfaI, CviQI, and NdeI. 25. The method of claim 23, wherein the two or more restriction enzymes comprise two or more members selected from the group consisting of XbaI, SpeI, AvrII and NheI. Attorney Docket No.593850 26. The method of claim 23, wherein the DNA ligase is selected from the group consisting of T3 DNA ligase, T4 DNA ligase, and T7 DNA ligase.
PCT/US2023/034301 2022-09-30 2023-10-02 Rapid reconstruction of large nucleic acids WO2024073136A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202263412179P 2022-09-30 2022-09-30
US63/412,179 2022-09-30
US202363525897P 2023-07-10 2023-07-10
US63/525,897 2023-07-10

Publications (1)

Publication Number Publication Date
WO2024073136A1 true WO2024073136A1 (en) 2024-04-04

Family

ID=90479032

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/034301 WO2024073136A1 (en) 2022-09-30 2023-10-02 Rapid reconstruction of large nucleic acids

Country Status (1)

Country Link
WO (1) WO2024073136A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0141484A2 (en) * 1983-06-10 1985-05-15 Biogen, Inc. Methods of producing hybrid DNA sequences and hybrid polypeptides and DNA sequences produced by them
US20180326388A1 (en) * 2013-08-05 2018-11-15 Twist Bioscience Corporation De novo synthesized gene libraries
WO2021188889A1 (en) * 2020-03-20 2021-09-23 Mission Bio, Inc. Single cell workflow for whole genome amplification
US20220146382A1 (en) * 2016-11-29 2022-05-12 S2 Genomics, Inc. Method for processing tissue samples
US20220170113A1 (en) * 2015-12-17 2022-06-02 Illumina, Inc. Distinguishing methylation levels in complex biological samples

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0141484A2 (en) * 1983-06-10 1985-05-15 Biogen, Inc. Methods of producing hybrid DNA sequences and hybrid polypeptides and DNA sequences produced by them
US20180326388A1 (en) * 2013-08-05 2018-11-15 Twist Bioscience Corporation De novo synthesized gene libraries
US20220170113A1 (en) * 2015-12-17 2022-06-02 Illumina, Inc. Distinguishing methylation levels in complex biological samples
US20220146382A1 (en) * 2016-11-29 2022-05-12 S2 Genomics, Inc. Method for processing tissue samples
WO2021188889A1 (en) * 2020-03-20 2021-09-23 Mission Bio, Inc. Single cell workflow for whole genome amplification

Similar Documents

Publication Publication Date Title
JP7119014B2 (en) Systems and methods for detecting rare mutations and copy number variations
Belton et al. Hi–C: a comprehensive technique to capture the conformation of genomes
CA3126428A1 (en) Compositions and methods for isolating cell-free dna
JP7379418B2 (en) Deep sequencing profiling of tumors
US20210404007A1 (en) Methods and systems for evaluating dna methylation in cell-free dna
JP2020511966A (en) Method for targeted nucleic acid sequence enrichment with application to error-corrected nucleic acid sequencing
EP3587589B1 (en) Reagents and methods for the analysis of circulating microparticles
US20150203907A1 (en) Genome capture and sequencing to determine genome-wide copy number variation
CN115803447A (en) Detection of structural variation in chromosome proximity experiments
Li et al. The cornerstone of integrating circulating tumor DNA into cancer management
Chung et al. Tissue requirements and DNA quality control for clinical targeted next-generation sequencing of formalin-fixed, paraffin-embedded samples: a mini-review of practical issues
Andersson et al. Principles of digital sequencing using unique molecular identifiers
EP3988672B1 (en) Use of off-target sequences for dna analysis
WO2024073136A1 (en) Rapid reconstruction of large nucleic acids
US20220127601A1 (en) Method of determining the origin of nucleic acids in a mixed sample
Solomon et al. Molecular diagnostics of non-hodgkin lymphoma
US20220362771A1 (en) Use of droplet single cell epigenome profiling for patient stratification
EP3935164A2 (en) Methods for rapid dna extraction from tissue and library preparation for nanopore-based sequencing
Emiliani et al. iSCORED: nanopore-based random genomic sampling for intraoperative molecular diagnosis
EP3645718A1 (en) Methods and systems for evaluating dna methylation in cell-free dna
US20220145368A1 (en) Methods for noninvasive prenatal testing of fetal abnormalities
Destenaves Challenges and Opportunities of Next Generation Sequencing Companion Diagnostics
WO2018061638A1 (en) METHOD FOR DETERMINING ORIGIN OF HUMAN GENOMIC DNA OF 100 pg OR LESS THEREFROM, METHOD FOR PERSONAL IDENTIFICATION, AND METHOD FOR ANALYZING DEGREE OF ENGRAFTMENT OF HEMATOPOIETIC STEM CELL
Deng High-Throughput Sequencing Technology and Its Applications in Human Disease
Perdomo et al. Detecting Noncoding RNA Expression: From Arrays to Next-Generation Sequencing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23873700

Country of ref document: EP

Kind code of ref document: A1