EP3959331A1 - Détermination par multipores de l'abondance fractionnaire de séquences polynucléotidiques dans un échantillon - Google Patents

Détermination par multipores de l'abondance fractionnaire de séquences polynucléotidiques dans un échantillon

Info

Publication number
EP3959331A1
EP3959331A1 EP19925663.7A EP19925663A EP3959331A1 EP 3959331 A1 EP3959331 A1 EP 3959331A1 EP 19925663 A EP19925663 A EP 19925663A EP 3959331 A1 EP3959331 A1 EP 3959331A1
Authority
EP
European Patent Office
Prior art keywords
target
nanopore
sample
parameters
analytes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP19925663.7A
Other languages
German (de)
English (en)
Inventor
Yanan Zhao
William Mckenna
William B. Dunbar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ontera Inc
Original Assignee
Ontera Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ontera Inc filed Critical Ontera Inc
Publication of EP3959331A1 publication Critical patent/EP3959331A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/26Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating electrochemical variables; by using electrolysis or electrophoresis
    • G01N27/416Systems
    • G01N27/447Systems using electrophoresis
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/26Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating electrochemical variables; by using electrolysis or electrophoresis
    • G01N27/416Systems
    • G01N27/49Systems involving the determination of the current at a single specific value, or small range of values, of applied voltage for producing selective measurement of one or more particular ionic species
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/483Physical analysis of biological material
    • G01N33/487Physical analysis of biological material of liquid biological material
    • G01N33/48707Physical analysis of biological material of liquid biological material by electrical means
    • G01N33/48721Investigating individual macromolecules, e.g. by translocation through nanopores
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2565/00Nucleic acid analysis characterised by mode or means of detection
    • C12Q2565/60Detection means characterised by use of a special device
    • C12Q2565/631Detection means characterised by use of a special device being a biochannel or pore

Definitions

  • Characterization of a liquid sample by determining a relative abundance of components present in the sample can provide valuable information for many scientific fields and applications. For example, a relative abundance of a point mutation in circulating cell free DNA can be used to diagnose or monitor progression of cancer in a patient. As another example, determining the fractional amount of a transgenic sequence of a genetically modified organism (GMO) to the non-GMO reference sequence within genomic DNA, obtained from a collection of seeds for example, is important for regulatory and economic reasons.
  • GMO genetically modified organism
  • qPCR quantitative real-time PCR
  • Nanopore devices have emerged as a sensitive tool for single molecule identification, wherein individual molecules are identified upon translocation through the nanopore under an applied voltage. Nanopore devices are amenable to point of use applications, and be sufficiently inexpensive and efficient for routine daily use cases, in human health, agriculture, or anywhere else. However, the use of data from a nanopore can be subject to errors that can impact a determination of quantitative estimates of analytes in a sample, such that the reliable use of this data is not feasible.
  • a method of determining an improved estimate of a true relative abundance of a target analyte in a mixed unknown sample using a nanopore device comprising applying a voltage across a nanopore in a nanopore device to generate a detectable electronic signature and to induce translocation of charged analytes through said nanopore separately for each of: a control sample comprising a known relative abundance of target analytes to reference analytes, and a mixed unknown sample comprising said target analytes and said reference analytes, wherein the relative abundance of said target analytes in said sample is to be determined; generating a plurality of event signatures generated by translocation of said target analytes or said reference analytes through said nanopore for each sample; identifying a quantity of first event signatures associated with said target analyte and a quantity of second event signatures associated with said reference analyte from said plurality of event signatures to determine a detected relative abundance of first and second event
  • control sample is a target control sample comprising said target analytes, but not said reference analytes. In some embodiments, the control sample is a reference control sample comprising said reference analytes, but not said target analytes.
  • the method of determining an improved estimate of a true relative abundance of a target analyte in a mixed unknown sample using a nanopore device further comprises applying a voltage to a nanopore device to induce translocation of charged analytes through a nanopore sensor for a target control sample comprising said target analytes, but not said reference analytes.
  • the adjustment of said detected relative abundance of said first and second event signatures in said unknown sample comprises using the detected relative abundance of said first and second event signatures in said target control sample and in said reference control sample to correct for said error in the detected relative abundance.
  • the error comprises a false positive or a false negative detection error of said target analyte.
  • the method of determining an improved estimate of a true relative abundance of a target analyte in a mixed unknown sample using a nanopore device further comprises applying a voltage to a nanopore device to induce translocation of charged analytes through a nanopore sensor for a mixed control sample comprising said target analytes and said reference analytes, wherein the relative abundance of said target analytes and said reference analytes is known.
  • the adjustment of said detected relative abundance of said first and second event signatures in said unknown sample comprises using the detected relative abundance of said first and second event signatures in said target control sample, said reference control sample, and said mixed control sample to correct for said error in the detected relative abundance.
  • the error comprises a false positive target analyte detection error, a false negative target analyte detection error, a capture rate constant differential between said target analyte and said reference analyte, or any combination thereof.
  • control sample is a mixed control sample comprising said target analytes and said reference analytes, wherein the relative abundance of said target analytes and said reference analytes is known.
  • error comprises a capture rate constant differential between said target analyte and said reference analyte.
  • the mixed control sample comprises a relative abundance of said target analytes to said reference analytes that differs by no more than a factor of 1.2, a factor of 1.5, a factor of 2, a factor of 5, or a factor of 10 relative to said mixed unknown sample.
  • the estimate of the true relative abundance is an estimate of the true ratio of said target analyte to said reference analyte in said mixed unknown sample.
  • the parameter a is an estimate of the ratio of the reference analyte capture rate divided by the target analyte capture rate.
  • the estimate of the true relative abundance is an estimate of the true fraction of said target analytes in a population of said reference analytes and said target analytes in said mixed unknown sample.
  • the parameter a can be used to compensate for a capture rate constant differential between said target analyte and said reference analyte.
  • the parameter a is an estimate of the ratio of the reference analyte capture rate divided by the target analyte capture rate.
  • the parameter Q X.Y is the fraction of said first event signature observed in said mixed control
  • the parameter Q miX is the fraction of said first event signature observed in said mixed unknown sample.
  • the unknown or control sample is prepared by nucleic acid amplification. In some embodiments, the unknown or control sample is not prepared by nucleic acid amplification. In some embodiments, the sample is purified to substantially consist of reference and target molecules. In some embodiments, the sample is not purified.
  • the quantity or concentration of said reference analytes in said mixed unknown sample are known.
  • the method of determining an improved estimate of a true relative abundance of a target analyte in a mixed unknown sample using a nanopore device further comprises determining an estimate of the absolute quantity or concentration of said target analytes in said mixed unknown sample using said estimate of the true relative abundance of said target analytes to said reference analytes in said mixed unknown sample and said known quantity or concentration of said reference analytes in said mixed unknown sample.
  • said absolute quantity or concentration of said target analytes can be determined using information derived from multiple nanopores of one or more nanopore devices.
  • the quantity of first event signatures associated with said target analyte and said quantity of second event signatures associated with said reference analyte are identified according to a defined threshold.
  • the method of determining an improved estimate of a true relative abundance of a target analyte in a mixed unknown sample using a nanopore device further comprises optimizing said threshold to increase accuracy of detection of said reference analytes and/or said target analytes using a Q-test, a support vector machine, or an expectation maximization algorithm.
  • the support vector machine is trained using electronic signatures from control samples comprising known quantities of target analytes and reference analytes.
  • the defined threshold is a function of one or more features of an event signature selected from the group consisting of: an event duration, a maximum 5G, a median 5G, an average 5G, a standard deviation of the event signature, a mean or median of the noise power of the event below 50 Hz, a unique pattern in said event signature, an area of an event, or any combination thereof.
  • the adjustment of said detected relative abundance of said first and second event signatures in said mixed unknown sample to correct for said error in the detected relative abundance is performed using a Q-test, a support vector machine, or an expectation maximization algorithm.
  • the target analyte and said reference analyte each comprise a polynucleotide.
  • the target analyte polynucleotide and said reference analyte polynucleotide are of different lengths. In some embodiments, the lengths are different by at least 10 nucleotides, at least 20 nucleotides, at least 50 nucleotides, at least 100 nucleotides, at least 150 nucleotides or at least 200 nucleotides.
  • the method of determining an improved estimate of a true relative abundance of a target analyte in a mixed unknown sample using a nanopore device further comprises contacting said control or unknown samples with a first probe bound to a first payload, wherein said first probe is configured to bind specifically to said first analyte.
  • the method of determining an improved estimate of a true relative abundance of a target analyte in a mixed unknown sample using a nanopore device further comprises contacting said control or unknown samples with a second probe bound to a second payload, wherein said second probe is configured to bind specifically to said second analyte.
  • the target analyte is correlated with a genetically modified organism.
  • the target analyte comprises a marker associated with the presence or absence of cancer in a patient.
  • Also provided herein is a method of determining a relative quantity of a target analyte in a sample comprising running separately in a nanopore system each of: a first control sample comprising reference analytes and no target analytes, a second control sample comprising target analytes and no reference analytes, a third control sample comprising a known relative abundance of said target analytes and said reference analytes, and an experimental sample comprising an unknown relative abundance of said target analytes and said reference analytes; detecting a quantity of first event signatures associated with a reference analyte and a quantity of second event signatures associated with a target analyte for each sample; and comparing a relative abundance of said quantity of first and second event signatures from said experimental sample with a relative abundance of said quantity of first and second event signatures from each of said first control sample, said second control sample, and said third control sample to determine an estimate of the true relative abundance of said reference analyte and said target an
  • the event signature comprises an electrical signal induced by translocation of said reference analyte through said nanopore.
  • the target analyte and said reference analyte each comprise a polynucleotide. In some embodiments, the reference analyte and said target analyte are discriminated by length
  • the reference analyte and said target analyte are each bound to a sequence-specific probe comprising a payload to facilitate discrimination between said reference analyte and said target analyte in said nanopore device.
  • the relative abundance is a fractional amount of said target analyte compared to the total population of the target analyte and the reference analyte in said sample.
  • Also provided herein is a method of determining a relative abundance of a target analyte in an unknown sample, comprising providing an unknown sample comprising a plurality of reference analytes and a plurality of target analytes; loading said unknown sample into a first chamber of a nanopore device comprising a nanopore disposed between said first chamber and a second chamber; applying a voltage across said nanopore to pass said reference analytes and said target analytes through said nanopore from said first chamber to said second chamber; detecting a number of first electrical signals each associated with the translocation of said reference analyte through the nanopore; detecting a number of second electrical signals each associated with the translocation of said target analyte through the nanopore; and converting a relative abundance of the number of detected first electrical signals and the number of detected second electrical signals to an estimate of the true relative abundance of said target analyte in said unknown sample using a reference value that accounts for at least one error associated with said electrical signal relative abundance.
  • the reference value is determined from a fractional abundance of said first electrical signal determined from a mixed control sample comprising a known amount of target analytes and reference analytes. In some embodiments, the reference value is determined from a fractional abundance of said first electrical signal determined from a mixed control sample comprising a known amount of target analytes and reference analytes. In some embodiments, the reference value is determined from a fractional abundance of said first electrical signal determined from a mixed control sample comprising a known amount of target analytes and reference analytes.
  • the mixed control sample, said target control sample, or said refrence control sample is run in said nanopore device under conditions substantially identical to conditions in said nanopore device during said detection of said first and second electrical signals from said unknown sample.
  • the nanopore device comprises a membrane that separates an interior space of the device into a first chamber and a second chamber, wherein said membrane comprises said nanopore, wherein said first chamber and said second chamber are in fluid communication through said nanopore, and wherein said device comprises an electrode in each chamber for applying a voltage across said nanopore.
  • the electrodes are configured to monitor electrical current through said nanopore. In some embodiments, the electrodes are connected to a power supply.
  • the methods provided herein improve the accuracy of an estimate of fractional abundance of a target analyte in a mixed unknown sample by accounting for false positive or false negative detection errors, or a capture rate constant differential between said target analyte and said reference analyte.
  • a series of controls is run to improve the accuracy of the estimate of fractional abundance, including a reference-only control to account for false positive target analyte detection errors, a target-only control to account for false negative target analyte detection errors, and one or more mixed control samples to account for a capture rate constant differential between the target analyte and the reference analyte.
  • the capture rates between the target analyte and the reference analyte in the mixed unknown sample are relatively consistent, such that the mixed control does not need to be used to improve the estimate of the relative abundance.
  • the relative capture rates between the target analyte and the reference analyte in a mixed sample are known such that a correction term can be applied to data from a mixed unknown sample to compensate for this difference to improve the estimate of fractional abundance without running a mixed control sample.
  • data from a mixed control sample run under substantially identical nanopore conditions using the same target analyte and reference analyte species as in the mixed unknown sample is used to improve the estimate of the fractional abundance without actually running the mixed control sample as part of the method.
  • a threshold value is determined such that a false positive value from the mixed unknown sample is negligible, and a reference-only control does not need to be used to improve the estimate of the relative abundance.
  • the false positive value from a mixed sample is known such that a correction term can be applied to data from a mixed unknown sample to compensate for a false positive error to improve the estimate of fractional abundance without running a reference-only control sample.
  • data from a reference-only control sample run under substantially identical nanopore conditions using the same reference analyte species as in the mixed unknown sample is used to improve the estimate of the fractional abundance without actually running the reference-only control as part of the method.
  • a threshold value is determined such that a false negative value from the mixed unknown sample is negligible, and a target-only control does not need to be used to improve the estimate of the relative abundance.
  • the false negative value from a mixed sample is known such that a correction term can be applied to data from a mixed unknown sample to compensate for a false negative error to improve the estimate of fractional abundance without running a target-only control sample.
  • data from a target-only control sample run under substantially identical nanopore conditions using the same target analyte species as in the mixed unknown sample is used to improve the estimate of the fractional abundance without actually running the target- only control as part of the method.
  • a method of a determining an estimate of a relative abundance of a target analyte to a reference analyte in a mixed sample comprising applying a voltage to a nanopore device to induce translocation of charged analytes through a nanopore sensor separately for each of: a mixed control sample comprising a known relative abundance of target analytes to reference analytes, and a mixed unknown sample comprising said target analytes and said reference analytes, wherein the relative abundance of said target analytes to said reference analytes is unknown; detecting a quantity of first event signatures associated with said reference analyte and a quantity of second event signatures associated with a target analyte for each sample; and determining an estimate of the true relative abundance of said target analytes to said reference analytes in said mixed unknown sample by adjusting a detected relative abundance of said first and second event signatures from said mixed unknown sample using the detected relative abundance of said first and
  • a method of a determining an estimate of a relative abundance of a target analyte to a reference analyte in a mixed sample comprising applying a voltage to a nanopore device to induce translocation of charged analytes through a nanopore sensor separately for each of: a target control sample comprising target analytes, but not reference analytes, a reference control sample comprising reference analytes, but not target analytes, and a mixed unknown sample comprising said target analytes and said reference analytes, wherein the relative abundance of said target analytes to said reference analytes is unknown; detecting a quantity of first event signatures associated with said reference analyte and a quantity of second event signatures associated with a target analyte for each sample; and determining an estimate of the true relative abundance of said target analytes to said reference analytes in said mixed unknown sample by adjusting a detected relative abundance of said first and second
  • the target control sample provides a correction term for false negative detection of target analytes from said mixed unknown sample.
  • the reference control sample provides a correction term for false positive detection of target analytes in said mixed unknown sample.
  • the method of a determining an estimate of a relative abundance of a target analyte to a reference analyte in a mixed sample further comprises applying a voltage to a nanopore device to induce translocation of charged analytes through a nanopore sensor for a target control sample comprising said target analytes, but not said reference analytes.
  • the method of a determining an estimate of a relative abundance of a target analyte to a reference analyte in a mixed sample further comprises applying a voltage to a nanopore device to induce translocation of charged analytes through a nanopore sensor for a reference control sample comprising said reference analytes, but not said target analytes.
  • determining said estimate of the true relative abundance of said target analytes to said reference analytes in said mixed unknown sample comprises adjusting said detected relative abundance of said first and second event signatures in said mixed unknown sample using the detected relative abundance of said first and second event signatures in said target control sample, said reference control sample, and said mixed control sample and the true relative abundance of said target analytes to said reference analytes in said mixed control sample.
  • the mixed control sample comprises a relative abundance of said target analytes to said reference analytes that differs by no more than a factor of 1.2, a factor of 1.5, a factor of 2, a factor of 5, or a factor of 10 relative to said mixed unknown sample.
  • the relative abundance comprises the ratio of target analyte: reference analyte.
  • the parameter a is an estimate of the ratio of the reference analyte capture rate divided by the target analyte capture rate.
  • the relative abundance comprises the fraction of said target analyte in a population of said target analytes and said reference analytes.
  • the estimate of the true fraction of said target analyte in a population of said reference analytes and said target analytes in said mixed unknown sample is determined pa
  • the parameter p is an estimate for the ratio that can compensate for a false positive detection error, a false negative detection error, or both, and wherein the parameter a can be used to compensate for a capture rate constant differential between said target analyte and said reference analyte.
  • the parameter a is an estimate of the ratio of the reference analyte capture rate divided by the target analyte capture rate.
  • kits comprising a control sample comprising a target analyte and a reference analyte at a known relative abundance; and instructions for use to run said control sample and an unknown sample comprising said reference analyte and said target analyte in a nanopore device to determine a relative abundance of said reference analyte and said target analyte in said unknown sample.
  • kits comprising a first control sample comprising a target analyte, wherein said first control sample does not contain a reference analyte; a second control sample comprising said reference analyte, wherein said second control sample does not contain said target analyte; a third control sample comprising said target analyte and said reference analyte at a known relative abundance; and instructions for use to ran said first control sample, said second control sample, said third control sample and an unknown sample comprising said reference analyte and said target analyte separately in a nanopore device to determine a relative abundance of said reference analyte and said target analyte in said unknown sample.
  • a computer-implemented method of determining an estimate of a true fractional abundance of a target analyte in a sample comprising: obtaining data from a nanopore sensor from at least one of a reference analyte control or a target analyte control, wherein said data comprises a plurality of event signatures from target analytes or reference analytes translocating through said nanopore; identifying one or more features of event signatures to differentiate those correlated with target analytes and those correlated with reference analytes; training said support vector machine to identify an optimized threshold to distinguish said first events from said second events and to generate an estimate of the true relative abundance of said reference analytes and said target analytes in a sample, wherein said training comprises use of a control selected from the group consisting of a reference control sample, a target control sample, and a mixed control sample, and wherein training comprises validation using known mixed samples; and using said trained support vector to determine a fractional
  • a computer-implemented method of determining an estimate of a true fractional abundance of a target analyte in a sample comprising: obtaining a set of data from a nanopore device, said data comprising event signatures from at least one control sample and at least one unknown sample; identifying a set of features to use for generating a threshold to discriminate first event signatures correlated with said target analytes from second event signatures correlated with said reference analytes; and estimating a true value of a fractional abundance in said unknown sample using a trained support vector machine.
  • a computer-implemented method of determining an estimate of a fractional abundance of a target analyte in a sample is provided herein.
  • Figure 1 A shows a typical electronic signature of a single-molecule event caused by a dsDNA passing through a nanopore, which a characteristic duration of translocation and decrease in current during translocation.
  • Figure IB shows an all-event scatter plot of max 5G versus duration for 5.6 kb dsDNA recorded in a 22 nm diameter nanopore.
  • Figure 2A shows a typical event when a 727 bp DNA goes through a 25 nm diameter solid-state nanopore at 100 mV in 1M LiCl. The vent area is shaded.
  • Figure 2B illustrates an increase in event duration with increased dsDNA length, while event depth is conserved.
  • Figure 2C shows a plot of the distribution of the logio of the area of all events recorded for dsDNA at each length shown.
  • Figure 3 A depicts an example of a threshold generated between events from type 1 analytes (squares) and type 2 analytes (circles).
  • Figure 3B shows an example of the results of transformation of input features to a higher dimensional space to increase the accuracy of a linear threshold between events from type 1 analytes (squares) and type 2 analytes (circles).
  • Figure 4A shows a probability histogram for all events from a reference analyte sample, a target analyte sample, and a mixed sample according to event area.
  • Figure 4B depicts a graph of the percentage of events that are below an area threshold from reference analyte only (Qref), target analyte only (Qtarg), and a mixed sample of target analytes and reference analytes (Qmix).
  • Figure 4C shows how fractional amount parameter p(q) appears graphically at a q value.
  • Figure 6 shows the results of estimate of target analyte abundance (GMO (%)) over a range of thresholds for discriminating target analyte from reference analyte according to area of an event.
  • Figure 7 shows a prediction of accuracy across a set of testing data from a trained support vector machine with optimal parameters for discriminating event signatures from target and reference analytes.
  • Figure 8 shows an event plot for two molecule types (94bp target dsDNA bound to a probe/payload and 74bp reference dsDNA bound to a probe/payload) that were run as isolated controls sequentially on the same pore.
  • Figure 9A shows a representative event plot of mean 5G vs. duration for the 100% target analyte control sample (closed circles) and the 100% reference analyte control sample (open squares) overlaid.
  • the target analyte is 89bp DNA with G12D-bound probe linked to a 3-branch PEG (denoted G12D-3bPEG).
  • the reference analyte is 89bp DNA with wild-type (c.35G)-bound probe linked to an 8-arm PEG (denoted WT-8armPEG).
  • the thresholds for identifying an event signature as from a target analyte passing through the nanopore create the target tagging box (dashed line).
  • Figure 9B shows the plot from Figure 9A, with data from unknown sample A (triangle) and sample B (star) comprising target analytes and reference analytes overlaid onto the plot.
  • Figure 10 shows a representative event plot of mean 5G vs. duration for the 100% target analyte control sample (closed circles) and the 100% reference analyte control sample (open squares) overlaid. Also plotted is the support vector machine-identified decision boundary (i.e. threshold) for discriminating target analytes from reference analytes.
  • the support vector machine-identified decision boundary i.e. threshold
  • Figure 11 shows events from a 50% target / 50% reference mixture sample plotted on an all-event scatter plot of max 5G versus duration.
  • the target domain box encompasses events associated with a probe-bound mutant targets.
  • Figure 12 shows the results of application of Expectation Maximization Algorithm for Gaussian Mixtures (EMGM) using a 3-Gaussian mixture model to the data from a 50% target / 50% reference mixture sample shown in Figure 11 for identification of target (mutant) and reference (wild-type) populations.
  • EMGM Expectation Maximization Algorithm for Gaussian Mixtures
  • Figure 13 shows the results of application of EMGM using a 3 -Gaussian mixture model to data from a reference-only control sample to establish a false positive fraction.
  • Figure 14 shows the results of application of EMGM using a 3-Gaussian mixture model to data from a mixed unknown sample to identify a relative abundance of mutant (target) molecules in the unknown sample.
  • Figure 15A depicts a flowchart of a method for using consensus calls from multiple pores to determine fractional abundance of a target analyte in a sample.
  • Figure 15B depicts an embodiment of using consensus calls from four pores to determine fractional abundance.
  • Figure 15C depicts a flowchart of a method implemented by the embodiment of the system of Figure 15B, from arbitrary numbers of nanopores.
  • Figure 15D depicts outputs of a method for determining concentration of a target molecule using information derived from a single nanopore and from multiple nanopores.
  • Figure 16A depicts a flowchart of a method for pre-filtering data from nanopores, in accordance with one or more embodiments.
  • Figure 16B depicts depicts an output of the method shown in Figure 16A, where event data from nanopores is plotted by amplitude of electrical signal vs. dwell time.
  • Figure 16C depicts depicts outputs of the method shown in Figures 16A and 16B where the system uses components (PCI and PC2) of a PC A operation to generate Gaussian distributions of count vs. PCI for each sample population.
  • PCI and PC2 components of a PC A operation to generate Gaussian distributions of count vs. PCI for each sample population.
  • Figure 16D depicts data used to generate a separation score using outputs of the PCA operation shown in Figure 16C.
  • Figure 16E depicts a plot of maximum current amplitude against a logarithmic function of the dwell time in seconds, in order to check calibration ratio of a sample.
  • an electrode includes a plurality of electrodes, including mixtures thereof.
  • the term“comprising” is intended to mean that the devices and methods include the recited components or steps, but not excluding others.“Consisting essentially of’ when used to define devices and methods, shall mean excluding other components or steps of any essential significance to the combination.“Consisting of’ shall mean excluding other components or steps. Embodiments defined by each of these transition terms are within the scope of this invention.
  • analyte refers to any molecule, compound, complex, or other entity whose presence can be detected using a nanopore sensor to facilitate
  • target or reference analytes the term target or reference molecule may be used interchangeably.
  • the term“target analyte” refers to a molecule or complex of interest in a sample.
  • the target analyte comprises portion of a polynucleotide having a sequence of nucleic acids of interest.
  • the target analyte can be specifically targeted for binding by a probe to facilitate detection of the target analyte in a nanopore sensor, as described herein.
  • the term“reference analyte” refers to a molecule or complex of interest in a sample, whose abundance is used as a relative measure of quantification for the target analyte.
  • the reference analyte comprises portion of a polynucleotide having a sequence of nucleic acids of interest.
  • the reference analyte can be specifically targeted for binding by a probe to facilitate detection of the target analyte in a nanopore sensor, as described herein.
  • the term“specific binding” or“bind specifically” refers to the targeted binding of a probe to a target analyte or a reference analyte.
  • the term“probe” refers to a molecule that binds specifically to a target analyte or to a fragment thereof.
  • the probe comprises a payload molecule configured to affect the electronic signature generated upon translocation of a complex comprising a target or reference analyte bound to a probe-payload molecule or complex.
  • the probe comprises a payload molecule binding moiety adapted to bind to a payload molecule.
  • the term“payload molecule” refers to a molecule with physical dimensions that facilitate generation of a unique electrical signal when captured in a nanopore within a correlated range of dimensions.
  • a payload molecule may be bound to a target analyte or a reference analyte to facilitate detection of the target analyte or reference analyte in a nanopore device.
  • the payload molecule may also be charged to act as a driver molecule.
  • the payload molecule comprises a probe binding moiety capable of specifically binding a probe molecule, which probe binds specifically to the target analyte or the reference analyte.
  • nanopore refers to a single nano-scale opening in a membrane that separates two volumes.
  • the pore can be a protein channel inserted in a lipid bilayer membrane, for example, or can be engineered by drilling or etching or using a voltage-pulse method through a thin solid-state substrate, such as silicon nitride or silicon dioxide or graphene or layers of combinations of these or other
  • the pore has dimensions no smaller than 0.1 nm in diameter and no bigger than 1 micron in diameter; the length of the pore is governed by the
  • membrane thickness which can be sub-nanometer thickness, or up to 1 micron or more in thickness.
  • the nanopore may be referred to as a "nano channel.”
  • nanopore instrument or“nanopore device” refers to a device that combines one or more nanopores (in parallel or in series) with circuitry for sensing single molecule events.
  • nanopore instruments use a sensitive voltage-clamp amplifier to apply a specified voltage across the pore or pores while measuring the ionic current through the pore(s).
  • a single charged molecule such as a double-stranded DNA (dsDNA)
  • dsDNA double-stranded DNA
  • the measured current shifts indicating a capture event (i.e., the translocation of a molecule through the nanopore, or the capture of a molecule in the nanopore)
  • the shift amount in current amplitude
  • duration of the event is used to characterize the molecule captured in the nanopore.
  • distributions of the events are analyzed to characterize the corresponding molecule according to its shift amount (i.e., its current signature).
  • nanopores provide a simple, label-free, purely electrical single molecule method for biomolecular sensing.
  • the term“event” refers to a translocation of a detectable molecule or molecular complex through the nanopore and its associated measurement via an electrical signal, e.g ., change in current through the nanopore over time. It can be defined by its current, change in current from baseline open channel, duration, and/or other characteristics of detection of the molecule in the nanopore. A plurality of events with similar characteristics is indicative of a population of molecules or complexes that are identical or have similar characteristics (e.g, bulk, charge).
  • an“area” of an event refers to the absolute value of the duration of an event (i.e., the duration the current deviates from an open channel current signal) multiplied by the average change in current from the open channel over the duration of the event (i.e., pA*ms).
  • the term“relative abundance” refers to an amount of an item relative to the total number of related items in a group. For example, in the context of a target analyte in a sample, a relative abundance of the target analyte refers to an amount of a target analyte present in a sample as compared to a reference analyte.
  • a relative abundance of a group of electronic signatures can refer to an amount of a first electronic signature correlated with a target analyte as compared to an amount of a second electronic signature correlated with a reference analyte.
  • control sample refers to a sample containing a known relative abundance of target analyte to reference analyte.
  • Control samples such as reference control samples, target control samples, and mixed control samples are used herein to improve the accuracy of the estimate of a fractional abundance in an unknown sample.
  • control samples comprise target analytes, reference analytes, or both.
  • the term“unknown sample” or an“unknown mixed sample” or a “mixed unknown sample” refers to a sample containing a relative abundance of reference analyte that is unknown.
  • a relative abundance of a reference analyte is considered to be unknown if the relative abundance is to be determined by the method provided herein, even if some value of an estimate is already known.
  • a quantity or concentration of a reference analyte in the sample is known.
  • the term“known sample” refers to a sample containing a known relative abundance of target analyte to reference analyte, and is used to train, validate or provide an estimate of an accuracy a fractional abundance estimation model or feature of the model, such as a threshold.
  • the invention provided herein is a method for determining an estimate of the true relative abundance (e.g., a fractional amount or a ratio) of a target analyte relative to a reference analyte present in a sample.
  • This method takes advantage of a nanopore single molecule counter (i.e., a nanopore device) to detect and discriminate between target analytes and reference analytes in a sample.
  • the methods provided herein improve the accuracy of an estimate of fractional abundance of a target analyte in a mixed unknown sample by accounting for false positive or false negative detection errors, or a capture rate constant differential between said target analyte and said reference analyte.
  • a series of controls is run to improve the accuracy of the estimate of fractional abundance, including a reference-only control to account for false positive target analyte detection errors, a target-only control to account for false negative target analyte detection errors, and one or more mixed control samples to account for a capture rate constant differential between the target analyte and the reference analyte.
  • the capture rates between the target analyte and the reference analyte in the mixed unknown sample are relatively consistent, such that the mixed control does not need to be used to improve the estimate of the relative abundance.
  • the relative capture rates between the target analyte and the reference analyte in a mixed sample are known such that a correction term can be applied to data from a mixed unknown sample to compensate for this difference to improve the estimate of fractional abundance without running a mixed control sample.
  • data from a mixed control sample run under substantially identical nanopore conditions using the same target analyte and reference analyte species as in the mixed unknown sample is used to improve the estimate of the fractional abundance without actually running the mixed control sample as part of the method.
  • a threshold value is determined such that a false positive value from the mixed unknown sample is negligible, and a reference-only control does not need to be used to improve the estimate of the relative abundance.
  • the false positive value from a mixed sample is known such that a correction term can be applied to data from a mixed unknown sample to compensate for a false positive error to improve the estimate of fractional abundance without running a reference-only control sample.
  • data from a reference-only control sample run under substantially identical nanopore conditions using the same reference analyte species as in the mixed unknown sample is used to improve the estimate of the fractional abundance without actually running the reference-only control as part of the method.
  • a threshold value is determined such that a false negative value from the mixed unknown sample is negligible, and a target-only control does not need to be used to improve the estimate of the relative abundance.
  • the false negative value from a mixed sample is known such that a correction term can be applied to data from a mixed unknown sample to compensate for a false negative error to improve the estimate of fractional abundance without running a target-only control sample.
  • data from a target-only control sample run under substantially identical nanopore conditions using the same target analyte species as in the mixed unknown sample is used to improve the estimate of the fractional abundance without actually running the target- only control as part of the method.
  • [00112] in one example use case, we use the methods herein to determine the fractional amount of a transgenic sequence of a genetically modified organism (GMO) to the non-GMO reference sequence within genomic DNA, obtained from a collection of seeds for example. This determination is important for regulatory and economic reasons. The buyer and sellers of seeds with the desired trait require precise and accurate knowledge of the fraction of seeds comprising the desired trait in order for the pricing and transaction to be fair.
  • GMO genetically modified organism
  • the methods provided herein provide %GMO content determination from aggregate seed, grain, flour, and feed presumed to contain between 1-100% GMO content. Seed developers, growers, and regulatory agencies want precise measures and the ability to resolve 10% differences (1.1-fold) in GMO content.
  • % GMO defined as lOOx (GMO event copy number) / (taxon-specific genome reference copy number).
  • mutant relative abundance to non mutant sequences can be used to guide diagnoses, therapies, and disease progression monitoring. Although it can take weeks for tumor imaging results to reveal a
  • the methods described herein allow rapid identification of the relative abundance of mutation markers permits efficient and frequent testing (e.g., daily) by using easily accessible sample types. Critically, such technology could more effectively reveal therapy response by providing more time points of the disease dynamics, while also permitting early detection of relapse.
  • the methods provided herein provide copy number variation determination (CNV) in hereditary cancer screening assays.
  • Copy number variation (CNV) testing for hereditary cancer pre-disposition Goal is to detecting deletions or duplications of gene regulatory elements at ⁇ 1.5-fold difference from reference. 10% differences in the copy number (l.lfold) of the BRCA1 gene for example, may warrant clinical action.
  • a nanopore is formed in a solid-state silicon based substrate, and single molecule experiments are performed by applying a voltage across the pore in a buffered electrolytic solution.
  • Figure 1 A shows a typical single-molecule event caused by a dsDNA passing through a nanopore. Events are quantitated by duration width and maximum conductance depth, max 5G. The max 5G is the current attenuation 51 divided by applied voltage V.
  • the events can display more than one amplitude.
  • Figure IB is an example of this, with fully folded events displaying larger max 5G values and shorter durations, and unfolded events displaying longer durations and shallower max 5G values.
  • Partially folded events display both amplitude levels within the event, starting with the deeper level and finishing with the shallower level, and having a total duration width that is in between that of unfolded and fully folded events. While the 5G and duration distributions show a mixture of modes for dsDNA that can fold, the event area has a single mode distribution for dsDNA, regardless of whether or not the DNA is long enough to fold when passing through the nanopore.
  • Discrimination between target analytes and reference analytes using a nanopore is based on the detection of a sufficiently different event signature upon translocation of each through the nanopore to enable reliable and sensitive detection.
  • the differences in the average event signatures can be based on signature duration, changes in current, features within the signature, or other distinguishable features and combinations thereof.
  • the features used are the basis for the determination of a threshold which acts as a method of identifying event signatures correlated to reference analytes and target analytes to be used for fractional abundance determination described herein.
  • the target and reference fragments are sufficiently different length dsDNA molecules to produce different nanopore event durations.
  • both target and reference analytes are dsDNA, and the feature that creates the distinct event types could be a difference in length of the target and reference analytes.
  • the difference in target and reference event areas, which are created by the difference in length of the target and reference analytes, are used to distinguish the target and reference event signatures (i.e., event profiles).
  • the event area distribution for dsDNA has a single mode. This makes area a useful event feature for classifying events as being the target type or the reference type, when the target and reference analytes are dsDNA of sufficiently different lengths. To generate sufficiently different area distributions, the lengths should be different by at least 100 bp for nanopores larger than 20 nm in diameter.
  • Nanopores 1-20 nm in diameter e.g., formed by controlled dielectric breakdown (Yanagi, Itaru, Rena Akahori, Toshiyuki Hatano, and Ken-ichi Takeda.“Fabricating Nanopores with Diameters of Sub-1 Nm to 3 Nm Using Multilevel Pulse-Voltage Injection.” Scientific Reports 4 (2014): 5000
  • the dsDNA for the target and reference should be at least 20 bp different in length.
  • Figure 2A shows a typical event when a 727 bp DNA goes through a 25 nm diameter solid-state nanopore at 100 mV in 1M LiCl.
  • the event area is shown as the shaded region.
  • Figure 2B shows how event area increases with dsDNA length. Primarily, it is event duration that is increasing while event depth remains conserved, and event area (mean depth times duration) captures this length-dependent increase since it is proportional to duration.
  • Figure 2C shows the distribution of the log-base- 10 of the area (pA*ms) of all events recorded for each DNA length shown, run sequentially on the same nanopore. The distribution of log-base-10 of event areas is approximately normal (Gaussian). As the DNA increases in length, the mean of the distribution increases.
  • target-sequencing comprising dsDNA and reference-sequence comprising dsDNA
  • the two dsDNA lengths at least 300 bp in length, at most 100,000 bp in length.
  • the target and reference dsDNA analytes have a difference in length of at least 10 bp, 20bp, 30 bp, 40 bp, 50 bp, 60 bp, 70bp, 80 bp, 90 bp 100 bp, 150 bp, 200 bp, or 300 bp.
  • an increased difference in length between the target and reference dsDNA analytes facilitates a greater sensitivity and specificity of determination of event signatures correlated with the target and reference analytes, when discriminating by size, which improves the estimation of the relative abundance in the sample.
  • specifying the properties of polynucleotide fragments excised from genomic DNA is a portion of the workflow for fractional abundance determination.
  • fragment specifications can include, e.g., their sequences, lengths, and secondary structures.
  • the fragment specifications enhance the capture and detection of specific sequences by the nanopore device.
  • the target and reference fragments are bound to different payload molecules, such that the target/payload and reference payload molecules produce sufficiently different nanopore event signatures.
  • the different event signatures are a combination of event duration, event maximum depth, event mean depth, and/or other event properties.
  • the target and reference analytes are discriminated by sequence specific payloads that, when each molecule or complex type (target-payload, reference-payload) passes through the pore, a unique nanopore event signature is generated.
  • sequence specific payloads that, when each molecule or complex type (target-payload, reference-payload) passes through the pore.
  • Methods for using probes bound to payloads that bind to each molecule type to facilitate discrimination are described in International Publication No. WO/2015/171169,“Target Detection with a Nanopore,” International Publication No. WO/2014/182634,“A Method of Biological Target Detection Using a Nanopore and a Fusion Protein Binding Agent,” International Publication No.
  • target and/or reference analytes are dsDNA, with unique payload-bound PNAs invading each dsDNA type (target and reference) to create the two macromolecule types to be detected with the nanopore.
  • target and/or reference analytes are single-stranded nucleic acid (ssNA), including DNA or RNA.
  • ssNA single-stranded nucleic acid
  • a payload-bound complementary nucleic acid (e.g., LNA) hybridizes to a region on the ssNA and one or more flanking primers hybridize to the other regions of the ssNA, to create a double-stranded molecule with payload bound, and the payloads are unique for the target and the reference in order to create the unique target and reference event profiles.
  • the fractional abundance framework involves: 1) designing and applying biochemistry methods to convert sample material into the nanopore sensing formats, for both target analyte and reference types; 2) applying a specific nanopore experiment protocol; and 3) applying analytical methods to generate a quantitative estimate for the relative abundance of target to reference analytes. This section is focused on part 1 of the framework. Sample Preparation for Nanopore Detection
  • a molecule comprising the target sequence (termed the“target analyte” or“target molecule”) and a molecule comprising the reference sequence (termed a“reference analyte” or“reference molecule”) may be physically similar: for example target and reference molecules may be of similar molecular weights, or polynucleotide lengths, and may differ by only single nucleotides.
  • the goal of the biochemistry methods is to render target and reference molecules without bias to produce distinct“target” or“reference” event profiles upon translocation through the nanopore.
  • the target: reference mixture measured on the nanopore is representative of the targetreference concentration ratio in the sample.
  • polynucleotide sequence to target, reference, or both molecules to generate distinct event profiles.
  • the majority of DNA fragments obtained from the cell-free circulating DNA fraction of blood or urine are uniformly short 150-200bp in length.
  • Adding polynucleotide sequences by common methods including PCR, ligation, and direct oligonucleotide hybridization allows flexibility to maximize nanopore event distinction.
  • hybridization of chemically modified oligonucleotide probes carrying covalently bound polymer payloads are used to alter target or reference analyte charge and molecular weight without affecting polynucleotide length.
  • the goal is distinct event profiles per target and reference molecule groups.
  • the nanopore measurement and subsequent fractional abundance quantitation can be implemented provided the target and reference are sufficiently enriched (>10 pM) compared to background ( ⁇ 1 pM), and provided the target and reference analytes produce electrical event signatures that can be distinguished from one another and from background, where present.
  • target or reference analytes include polynucleotide sequences (including double and single stranded DNAs, RNAs, and synthetic
  • polynucleotides 20nt-100,000nt in length.
  • the polynucleotide comprising the target sequence is derived from organismal gDNA including from plants, humans, animals, insects, bacteria or viruses.
  • target polynucleotide sequences are derived from exogenous, non-genomic sequences including double or single- stranded RNA or DNA from sources including plasmid, B AC, linear sequence-verified gene blocks, expression cassettes.
  • the methods of detection provided herein include upstream fragmentation of polynucleotides fragmentation of nucleic acid samples, for example, gDNA to sizes 20-100,000nt or base pairs in length
  • the nucleic acid is fragmented sequence-specifically using restriction enzymes, or by using site- directed nucleases including Cas9/sgRNA, TALENS, zinc finger proteins / nucleases, or another fragmentation method known in the art.
  • target or reference analyte enrichment is performed using positive and negative size selection to retain, discard, and elute target fragment sizes.
  • positive and negative size selection for example, low ratio of SPRI beads:DNA (0.6) in the presence of PEGs to retain and discard high molecular weight polynucleotide species (for example >8,000bp DNA), followed by SPRI beads:DNA (1.5: 1) to bind, wash and elute fragment sizes (2000-8000bp for example).
  • target or reference nucleic acids can undergo nucleic acid
  • the fractional abundance framework involves: 1) designing and applying biochemistry methods to convert sample material into the nanopore sensing formats, for both target analyte and reference types; 2) applying a specific nanopore experiment protocol; and 3) applying mathematical methods to generate a quantitative estimate for the fractional amount of target to reference (targetxeference) analytes. This section is focused on part 2, experiment protocol.
  • Described herein are iterations of samples to be run in a nanopore to provide an improved estimate of the true relative abundance of target analytes in a mixed unknown sample.
  • the target analyte and the reference analyte are prepared to ensure reliable discrimination between each species using a nanopore sensor.
  • the characteristics of a fragment comprising a target sequence i.e., the“target fragment” and the characteristics of a fragment comprising the reference sequence (i.e., the “reference fragment” are chosen such that the two fragments produce nanopore event signatures that can be differentiated by one or more signal properties.
  • one or more control mixtures is used to calibrate the estimate of the fractional amount of target to reference in an unknown mixture.
  • the calibration compensates for difference in nanopore capture efficiency between the target and the reference molecule types.
  • an unknown mixture of target and reference analytes is measured on the nanopore, and the fraction abundance of target to reference is
  • more than one unknown mixture of target and reference molecule types, derived from the same sample is measured sequentially on the same nanopore. In some embodiments, more than one unknown mixture of target and reference molecule types, derived from the same sample, is measured in parallel on different nanopores.
  • one or more controls including 100% target alone, 100% reference alone, and known mixtures of target and reference molecules, are measured on the nanopore, prior to and/or after the unknown mixtures.
  • the experiment protocol involves sequentially running one or more controls on the nanopore, before or after, or before and after, running the unknown mixture on the nanopore.
  • the controls can be made of 100% target analytes, or 100% reference analytes, and these are termed“isolated controls.”
  • the controls can also be any known mixture of target and reference analytes, referred to as“mixture controls” or“control mixtures.”
  • the control mixture could be a 1 : 1 ratio of targetxeference analytes, or any other ratio of targetxeference analytes from 0.01 : 1 to 100: 1, or any ratio less than 0.01 : 1 (e.g., 0.001 :1) or any ratio greater than 100: 1 (e.g., 1000: 1) of targetxeference analytes.
  • One or more controls can be run more than once.
  • the controls (isolated and mixtures) and unknown mixture can be run in any order sequentially on the same nanopore.
  • the fluidic channel i.e., chamber
  • no controls are run, and only the unknown mixture is run, and compared to a reference table established by running controls in separate prior experiments, i.e., the controls are not run at the point of use.
  • one or more fluidically isolated channels and nanopore sensors are measuring controls in parallel with a one or more fluidically isolated channels and nanopore sensors measuring unknowns. More than one nanopore could have access to each fluidic channel. In parallelized implementations, no flushing may be necessary, since each pore sees only one reagent set, i.e., a control (isolated or mixture) or an unknown (from a set of 1 or more unknowns).
  • the ratio of the reference analyte to the target analyte in the control mixture concentration is near the anticipated ratio of reference analyte to target analyte in the unknown sample, although this may not be known ahead of time.
  • Each recording period should be long enough to detect at least 100 events for each reagent type, and performance improves as more events are recorded, where the
  • the recording period for each reagent set can be the same or different.
  • An adaptive scheme can stop recording dynamically when the target number of molecules is detected.
  • a desired level of confidence e.g., 95%, 98%, 99%, 99.9%, etc.
  • an experiment protocol with a single nanopore is to run 1) 100% target for recording period T, 2) flush nanopore chamber, 3) 100% reference for recording period T, 4) flush nanopore chamber, 5) 50:50 targetxeference mixture for recording period T, 6) flush nanopore chamber, 7) unknown mixture for recording period T.
  • Recording period T can be 15 sec, 30 sec, 45 sec, 1 min, 5 min, 10 min, or any duration between 1-15 sec or between 10-60 min.
  • Another common experiment protocol is to run (l)-(7), followed by 8) flush nanopore chamber, 9) repeat 100% target for recording period T, 10) flush nanopore chamber, 11) repeat 100% reference for recording period T, 12) flush nanopore chamber, 13) repeat 50:50 targetxeference mixture for recording period T.
  • Another common experiment protocol is to run (l)-(7), followed by 8) flush nanopore chamber, 9) repeat 50:50 targetxeference mixture for recording period T, 10) flush nanopore chamber, 11) repeat 100% reference for recording period T, 12) flush nanopore chamber, 13) repeat 100% target for recording period T.
  • Still another common experiment protocol is to run 1) a targetreference control mixture ratio suspected to be approximately near to the targetreference ratio in the unknown mixture, for recording period T, 2) flush nanopore chamber, 3) unknown mixture for recording period T.
  • Still another common experiment protocol is to run 1) a 1 : 1 targetreference control mixture ratio for recording period T, 2) flush nanopore chamber, 3) unknown mixture for recording period T.
  • an experiment protocol with a single nanopore is to run 1) 100% target for recording period T, 2) flush nanopore chamber, 3) 100% reference for recording period T, 4) flush nanopore chamber, 5) unknown mixture for recording period T.
  • an experiment protocol with a single nanopore is to run 1) 100% target for recording period T, 3) flush nanopore chamber, 4) unknown mixture for recording period T.
  • an experiment protocol with a single nanopore is to run 1) 100% reference for recording period T, 2) flush nanopore chamber, 3) unknown mixture for recording period T.
  • an experimental protocol with a single nanopore is to run only the unknown mixture for a recording period T, and to use data from a lookup table or previous data which contains error correction information derived from a 100% reference control sample, a 100% target control sample, a known targetreference control mixture, or any combination thereof, each run under substantially similar conditions to the experimental protocol for the unknown mixture, to provide at least one correction term to the data generated from the recording period T to improve an estimate of a fractional abundance of a target analyte in the unknown mixture.
  • the recorded events from the controls (if run) and the recorded events from the unknown(s) are mathematically analyzed to predict the fraction amount of target to reference in the one or more unknowns.
  • the fractional abundance framework involves: 1) designing and applying biochemistry methods to convert sample material into the nanopore sensing formats, for both target analyte and reference types; 2) applying a specific nanopore experiment protocol; and 3) applying mathematical methods to generate a quantitative estimate for the fractional amount of target to reference (targetxeference) analytes. This section is focused on part 3 of the framework.
  • the percentage transgene, or GMO% is the ratio R converted to a percentage.
  • the fractional abundance method predicts the relative amount of target to reference, or target to total (sum of target and reference).
  • a calibrant molecule can be added to determine absolute concentration of either the target or the reference molecule.
  • the relative capture rate of a target molecule vs. a calibrant molecule at constant concentration can be correlated with the concentration of the target molecule in a sample, and information derived from multiple nanopores can be used to determine concentration of the target molecule.
  • a single nanopore event feature is compared between target and reference analyte types for calculating the fractional abundance. In some embodiments, more than one nanopore event feature is compared between target and reference analyte types for calculating the fractional abundance.
  • the true ratio of an unknown mixture is denoted Rmix and the true fraction of a mixture is denoted Fmix.
  • the mathematical method generates estimates for Fmix and Rmix, which are denoted F mix and R m * i X.
  • the target and reference molecule constructs are designed and created to give distinct nanopore event signatures.
  • the mathematical method first designs a criterion for binning all recorded events into one or two categories, namely, target positive (equivalently, reference negative) or target negative (equivalently, reference positive).
  • the event criterion uses one or more event features.
  • a single feature is used to create a criterion for binning events. Given the criterion, every event is tagged as being either a target event or a reference event. These are termed“target-tagged” or“reference-tagged.”
  • the fraction of target-tagged events is denoted Q, equal to the number of target- tagged events divided by the total number of events.
  • the fraction of reference-tagged events is 1 -Q.
  • the tagged fraction Q is a function of the concentration fraction F above the
  • Qtarg is close to 1, with 1 -Qtarg representing the false negative fraction.
  • Qref is close to 0, with Qref representing the false positive fraction.
  • the controls satisfy Qtarg > QX.-Y > Qref .
  • the mixture satisfies Qtarg > Qmtx > Qref .
  • the target-tagged fractions from controls ⁇ Qtarg , Qref , QX.-Y ) are run separately and a lookup table is used to reference the values for any new assay that measures Qmtx.
  • the ( Qtarg , Qref , QX.-Y ) are established at the point of use as part of the assay.
  • the ( Qtarg , Qref ) are run separately and a lookup table is used to reference their values, whereas the ( QX.-Y ) value is established at the point of use as part of the assay that measures Qmtx.
  • the target-tagged fractions from controls ( Qtarg , Qref , QX.-Y ) are run more than once at the point of use, and their values are averaged for subsequent use in the formula below.
  • GMO fractional amount of a transgene
  • the parameter p is an estimate for the ratio that can compensate for a false positive detection error, a false negative detection error, or both.
  • a value of Q re f can be used to compensate for a false positive error. If no compensation for a false positive error is to be used, Q re f can be set to 0.
  • a value of Q ta rg can be used to compensate for a false negative error. If no compensation for a false negative error is to be used, Q ta rg can be set to 0.
  • the parameter a is the ratio compensation multiplier.
  • the parameter a is the ratio of two capture rate constants.
  • a capture rate constant is the nanopore event rate divided by concentration for a given molecule type.
  • the parameter a is the reference molecule capture rate constant divided by the target analyte capture rate constant.
  • the multiplier a compensates for difference in nanopore capture and detection between the target and reference molecule types.
  • a is set equal to 1 in equations (1) and (2), to provide the estimates for F mix and R m * i X , respectively.
  • Equation (1) and (2) provide estimates for F mix and R m * i X , respectively.
  • Uncertainty estimates, or error bars, for F mix and R m * i X can also be computed.
  • std (Q) where A is the total number of events.
  • random samples from each Q distribution can be drawn many times, to generate a distribution of values for F mix and R m * i X , by applying equations (1) and (2). Then distributions for F mix and Rm* ix can then be used to compute uncertainty bounds, resulting in F mix + F * d and R mix ⁇
  • the ratio or fraction of events matching or exceeding an event feature criterion is used to estimate the fractional amount of target to reference in an unknown mixture.
  • the criterion is a threshold.
  • Q ( ⁇ j Z ) /N, where N is the total number of events. The same criterion is applied to all controls, isolated and mixtures, and all unknowns, to compute all Q values utilized in the formulas above (equations (l)-(2)).
  • the criterion involves one or more than one inequality equation, and can be a linear or nonlinear function of one or more event features.
  • Each inequality equation has a threshold or range of thresholds associated with it.
  • a criterion is fully specified by the set of inequalities and the corresponding set of thresholds.
  • the criterion is established for a class of target and reference molecule types, and new assays using types of molecules for that class will utilize the criterion already established.
  • the criterion is identified from the control data gathered for any new assay. That is, the criterion is established at run time as part of the fractional abundance protocol.
  • the set of inequalities for the criterion are established a priori from sets of previous experiments using comparable target and reference molecule types, while the set of thresholds for the one or more criterion inequalities are established at run time using the control data.
  • a single event feature is utilized in establishing the criterion.
  • a threshold, labeled“ q ,” is the scalar value that divides target-tagged events from non-target-tagged (i.e., reference-tagged) events based on an inequality.
  • q can represent the vector of threshold values used for the set of inequalities.
  • the ⁇ -threshold is found as the value that produces a desired false positive for Qref .
  • the ⁇ -threshold could be set at the 95 th percentile of Qref to produce a false positive of 5%. In that case, 95% of the reference molecule events have an area less than q.
  • the SFT ⁇ -threshold is found as the value that produces a desired false negative for Qtarg , i.e., the ⁇ -threshold could set at the 5 th percentile of Qtarg to produce a false negative of 5%.
  • the SFT ⁇ -threshold is found as the solution to
  • the threshold would the value that is corresponds to the greatest
  • the ⁇ -threshold range is computed as the values that produce a desired false positive range for Q re f.
  • the ⁇ -threshold ranges could span the 95 th to the 99 th percentiles of Qref.
  • equations (1) and (2) produce a ranges of F ⁇ i x iq) and Rm* i X (c[) values, and the average of these ranges are computed and reported as the predicted F mix and R m * i X values.
  • the target events create a unique subspace on a 2D event plot of mean 5G vs. duration, and events are tagged when duration is greater than a threshold, and when mean 5G is above one threshold and below another threshold.
  • the tagging criterion is represented by three linear inequalities and three thresholds, using two event features (mean 5G, duration).
  • machine learning is used to identify the set of features and feature criterion for tagging each event as a target analyte event or a reference analyte event.
  • support vector machines are used to classify events as target or reference analytes.
  • developing a support vector machine workflow has the follows the steps: 1) load nanopore data, 2) select nanopore event features to differentiate events, 3) model training and testing using controls, 4) data calibration using controls, 5) prediction of unknown target: reference mixtures.
  • an already developed and reduced support vector machine workflow is implemented for automated fractional abundance predictions.
  • machine learning tools are applied to automate the selection of the criterion, including selection of the event features, the form of the inequalities (linear and/or nonlinear) and the threshold values q used in the inequalities.
  • SVMs Support Vector Machines
  • a supervised machine learning method that solves
  • SVMs include: Cortes, C. & Vapnik, V. Machine Learning (1995) 20: 273; and Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992).“A training algorithm for optimal margin
  • a second way to deal with linearly non-separable data is the kernel method (Boser, B. E., et al, cited above). It transforms the input feature space to a higher dimension space. By doing so, the data can be linearly separable ( Figure 3B). Denote the mapping function as 0(x), then the kernel function K can be written as:
  • Equations (3) and (4) hyper-papameters grid search including Kernel types, soft margin constant, and any parameters that kernel funcition may depend on, are solved as part of applying the method.
  • An assay based generalize model generated form SVM including common decision boundaris and common calibration ratio can be applied to unknown mixtures without
  • decision boundaries and common calibration ratio can be applied to unknown mixtures without requirement of control data sets.
  • Other data mining methods including decision tress, neural networks, Native Bayer, Logistic regress, K-nearest neighbor and boosting are also claimed as applicable methods for nanopore data.
  • clustering methods are applied to create the criteria for tagging target events and reference events. Each event is tagged as a target event or a
  • the fractional abundance is the proportion of the target events relative to the sum of the target and reference events. Running controls that provide compensatory information allows adjustments that improve the estimate of the fractional abundance.
  • the clustering method is a maximum likelihood method applied to parameterized models of the distributions of one or more event parameters.
  • a log likelihood function is used as the metric for tracking progress in iterations of the algorithm, which recursively updates the membership assignment of each event in control data and improves the fit of the distributions to the data.
  • the data are modeled using mixtures of parameterized Gaussian distributions. Methods that use finite mixture models, including Gaussian mixture models, to characterize numerical data are well characterized in statistics and applied mathematics (Hand, David J., Heikki Mannila, and Padhraic Smyth. Principles of data mining. MIT press, 2001).
  • the method maximizes the likelihood function with respect to the parameters comprising the means and covariance of the components and the mixing coefficients. Since there is no closed-form solution for the log likelihood, the mode parameters and weights for assigning data to modes are iteratively computed using the Expectation Maximization (EM) technique (C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006).
  • EM Expectation Maximization
  • EMGM The method of applying an EM algorithm applied to GM models to nanopore data for the purpose of generating fraction abundance estimates is termed EMGM.
  • the EMGM method uses prior knowledge about one or more nanopore event signatures that can be used to distinguish the target events from the reference events.
  • the target population may be represented by a single distribution, or more than one distribution.
  • the reference population may be represented by a single distribution, or more than one distribution.
  • the target and reference distribution(s) are established by applying the algorithm to one or more isolated controls and one or more control mixtures.
  • an event in an unknown mixture is tagged as a target event if it is associated with the modeled target distribution(s).
  • a total of three Gaussian distributions could fit the entire data set in a 1 : 1 control mixture, with one mode associated with the target type and two modes associated with the reference type.
  • the algorithm requires only one control mixture for application of the EMGM. Subsequently, the resulting model can be applied to unknown mixtures. In some
  • an additional isolated reference control is used to offset the effects of false positives.
  • application of the EMGM models to 100% reference control established the false positive fraction, which is subtracted from the predicted fraction generated by applying the EMGM models to the unknown mixture. This subtraction can be referred to as false positive compensation (or“FP” compensation).
  • embodiments of the system(s) described above and below can generate estimates for fractional abundance and/or concentration of target sample
  • outputs from individual pores can be processed in combination to generate improved estimates of fractional abundance and/or concentration.
  • estimates of fractional abundance and/or concentration can be generated from a single or multiple microfluidic nanopore devices (e.g., consumables) processing different aliquots of common samples (i.e., from controls, from unknown mixtures) with results from one or more nanopores contributing to estimate generation.
  • One or more methods can thus include one or more of: applying voltages across a set of nanopores in one or more nanopore devices to generate detectable electronic signatures and to induce translocation of charged analytes through the set of nanopores for portions of a sample comprising units of the target analyte and reference analytes; generating a set of event signatures from translocation of units of the target analyte and reference analytes through the set of nanopores; from the set of event signatures, generating a set of parameters
  • Figure 15A depicts a flowchart of a method for using consensus calls from arbitrary numbers of nanopores pores to determine fractional abundance of a target analyte in a sample.
  • a computing system associated with the nanopore device receives 1510 a set of pore parameters corresponding to a set of pores of the nanopore device.
  • the set of pore parameters can be derived from electrical signals (e.g., current measurements) output from electrodes in communication with the set of pores, where the electrical signals are useful in determining [t] and [r], as described above, in order to determine relative and fractional abundance estimates; however, in alternative embodiments, the set of pore parameters can additionally or alternatively be derived from other signals.
  • electrical signals e.g., current measurements
  • the set of pore parameters can additionally or alternatively be derived from other signals.
  • the computing system then evaluates 1520 each of the set of pore parameters according to one or more threshold conditions.
  • the threshold conditions can include threshold conditons that are based on outputs derived from multiple pores. For instance, statistical measures of variability across two or more of the set of pores can be used to design threshold conditions for evaluating whether the output of each pore should be processed in combination with other pore outputs to determine estimates of fractional abundance.
  • Statistical measures of variability can include or be derived from one or more of: a range (e.g., interquartile range) in parameter values across multiple pores, a variance in parameter values across multiple pores, a standard deviation in parameter values across multiple pores, and any other suitable statistical or non- statistical measure.
  • the threshold conditions can be configured in a manner such that each pore is evaluated independently of outputs of other pores.
  • the computing system then, based on evaluation of the set of pore parameters, combines 1530 a subset of the pore parameters that satisfy their respective threshold conditions with a parameter combination operation.
  • the parameter combination operation can output an average parameter value (e.g., mean, median, mode) determined from the subset of pore parameters that satisfy their respective threshold conditions.
  • the average parameter value can be a weighted average, where the weight given to each parameter used to calculate the weighted average can be determined based upon the threshold-based comparisons of step 1520 (e.g., level of satisfaction of the corresponding threshold condition).
  • distances between parameter values and their respective threshold conditions can be used to determine weights. For instance, a parameter value that satisfies the threshold condition to a lesser degree can be given less weight, and a parameter value that satisfies the threshold condition to a higher degree can be given more weight.
  • the computing system then returns 1540 a fractional abundance estimate based upon the output of the parameter combination operation of step 1530, where the fractional abundance estimate describes the percentage of target analyte in a sample compared to the total population of target analytes and reference analytes, as described above.
  • Methods analogous to those depicted in Figure 15A can also be used to determine concentration of a target sample component, as described in more detail in examples related to Figure 15D below.
  • Figure 15B depicts an embodiment of using consensus calls from four pores of a nanopore device to determine fractional abundance of a target analyte in a sample, where Figure 15B is a more specific embodiment of the methods shown in Figure 15 A.
  • outputs e.g., electrical signals or other measurements
  • parameter values PI, P2, P3, and P4 are used to generate parameter values PI, P2, P3, and P4, and arranged, by logic of the computing system, in order from least to greatest.
  • Logic of the computing system then defines an interquartile range (IQR) as P3-P2, and implements a series of threshold conditions to determine which, if any of the values PI, P2, P3, and P4, should be passed through for further analysis.
  • IQR interquartile range
  • a first evaluation 1501 the computing system compares the IQR divided by the mean of parameter values P2 and P3 to a first threshold, and if the IQR/mean(P2, P3) is greater than the first threshold, determines that the experiment has failed (and thus, PI, P2, P3, and P4 should be discarded). However, if the IQR/mean(P2, P3) is less than or equal to the first threshold, P2 and P3 are passed through for further analysis, and the computing system evaluates PI and P4 according to their respective threshold conditions.
  • a second evaluation 1502 the computing system compares PI to a second threshold defined as a function of P2, IQR, and X, where X is a constant.
  • the second threshold is defined as P2-X*IQR, where in a specific example, the value of X is set to 1.5. If PI is less than the second threshold, PI should be discarded; however, if P1 is greater than or equal to the second threshold, PI should be passed through for further analysis.
  • a third evaluation 1503 the computing system compares P4 to a third threshold defined as a function of P3, IQR, and Y, where Y is a constant.
  • the third threshold is defined as P3+Y*IQR, where in a specific example, the value of Y is set to 1.5.
  • X and Y do not have to be identical to each other, and the second and third thresholds can be defined in another suitable manner. If P4 is greater than the third threshold, P4 should be discarded; however, if P4 is less than or equal to the second threshold, P4 should be passed through for further analysis.
  • values of PI, P2, P3, and P4 that pass their respective thresholds are combined 1504 (e.g., by determining a mean value, by determining a weighted mean value).
  • weighting values wl, w2, w3, and w4 correponding to PI, P2, P3, and P4, respectively are used to generate the weighted mean.
  • Figure 15C depicts a flowchart of a method implemented by the embodiment of the system of Figure 15B.
  • Figure 15C represents architecture of logic for evaluating calls for parameter values from multiple nanopores as inputs, and returning outputs (e.g., mean parameter values) with or without confidence values.
  • the architecture can process input arrays of the form [PI, P2, ..., PN ], and return outputs of the form ⁇ state, result ), where state indicates a measure of confidence (e.g., a state of 0 indicates no returned output, a state of 1 indicates a returned output without confidence, and a state of 2 indicates a returned output with confidence), and result is a combined parameter value (e.g., mean parameter value, weighted mean parameter value) determined from multiple nanopores.
  • the mean can be an arithmetic mean, a geometric mean, a weighted mean and/or any other suitable mean or combination function.
  • the logic provides a returned output (0,0).
  • the logic provides a returned output (1, x), where x is the value of the parameter in the input array.
  • the logic compares the difference between the two parameter values to a threshold condition, and if the threshold condition is satisfied, provides a returned output (2, mean(X)), where mean(X) is the mean value of the two parameters in the input array. However, if the threshold condition is not satisfied, the logic provides a returned output (1, mean(X)).
  • the logic determines the IQR of the array, and proceses 1514 a subset of the array that falls within a desired range based on the IQR, where the desired range in Figure 15C is determined based on 25 th and 75 th quartile parameter values in the array. If the length of the subset is less than half of the length of the original array, the logic provides a returned output (1, mean(X good)), where X good is the subset of the array that falls in the desired range.
  • the logic enters a regime 1516 where, if the difference between maximum and minimum values of the subset of the array are greater than a threshold (TH), and the difference between the maximum and minimum values of the IQR values of the array are greater than TH, the logic provides a returned output (1, mean
  • X in IQR represents the values of the input array that fall within the IQR. If the difference between maximum and minimum values of the subset of the array are greater than TH, but the difference between the maximum and minimum values of the IQR values of the array are not greater than TH, the logic determines if the length of values of IQR values of the array is greater than one, and if so, provides a returned output (2, mean (X in IQR)). However, if the length of values of IQR values of the array is not greater than one, the logic provides a returned output (1, mean (X in IQR)).
  • the logic determines if the length of the subset is greater than 1, and if so, provides a returned output (2, mean(X good)). If the length of the subset is not greater than one, the logic provides a returned output (1, mean(X)).
  • embodiments of the methods described can be applied to information derived from any suitable number of nanopores, using any suitable threshold conditions based on other measures of variability.
  • FIG. 15D depicts outputs of a method for determining concentration of a target molecule using information derived from multiple nanopores.
  • an embodiment of the system determines the relative capture rate of a target sample component (e.g., molecule, other analyte) vs. a specific calibrant molecule at concentrate concentration, using information derived from multiple nanopores, and correlates the relative capture rate with the target sample component (e.g., molecule, other analyte) vs. a specific calibrant molecule at concentrate concentration, using information derived from multiple nanopores, and correlates the relative capture rate with the
  • a target sample component e.g., molecule, other analyte
  • concentration of the target sample component in order to determine concentration of the target sample component using information derived from multiple nanopores.
  • the outputs depicted in Figure 15D show percent error in estimates of different concentrations of the target molecule using 1, 2, or 7 control data points, and using predictions from a single nanopore vs. multiple nanopores.
  • Figure 15D top
  • Figure 15D (middle and bottom)
  • use of data from multiple nanopores generally produced lower percent error in estimates of actual concentration of the target molecule at
  • concentrations of 0.5nM, InM, 3nM, 5nM, 7nM, lOnM, and 15nM in comparison to estimates of actual concentration of the target molecule using data from single nanopore, especially as number of control data points increased.
  • some embodiments of the system and associated computing logic can also be configured to omit use of nanopore- derived information (e.g., prior to performing computations using multipore consensus calls), for other reasons that depend upon the quality of the nanopore itself (e.g., low or high frequency noise, summary noise statistics including root mean square noise, pore diameter, rate of growth during the experiment, etc.) which can be algorithmically automated.
  • method(s) can include omitting from consideration data from a nanopore of the set of nanopores based upon an assessment of quality of data form the nanopore.
  • the system can evaluate low frequency noise content (e.g.,
  • the system can additionally or alternatively evaluate high frequency noise content (e.g., mean/median noise power over 0.5-30 kHz range) at a chosen period in the time domain (e.g., every 5 seconds), and omit use of information derived from nanopores associated with high frequency noise content above a threshold level.
  • high frequency noise content e.g., mean/median noise power over 0.5-30 kHz range
  • the system can additionally or alternatively evaluate summary noise content (e.g., RMS of the time domain signal, with ⁇ 20 pA at 30 kHz acceptable), at a chosen period in the time domain (e.g., every 5 seconds), and omit use of information derived from nanopores associated with summary noise content above a threshold level.
  • summary noise content e.g., RMS of the time domain signal, with ⁇ 20 pA at 30 kHz acceptable
  • a chosen period in the time domain e.g., every 5 seconds
  • the system can also evaluate pore diameter (or other pore morphological characteristics) and/or rate of change of pore characteristics over a chosen period in the time domain (e.g., every 5 seconds), with ⁇ 0.25 nm/min acceptable, and omit use of information derived from nanopores associated with morphological characteristics outside of a threshold range and/or associated with rate of change of pore characteristics outside of a threshold range.
  • total resistance is the inverse of the total conductance
  • total resistance is the inverse of the total conductance
  • the first model matches the conductance vs. nanopore diameter data when d/L ⁇ 3/4.
  • the model is:
  • the increase may be due to evaporation of water and a commensurate relative increase in the ion concentration in the “open” chamber above the nanopore to which reagents are added.
  • the first model Gi(d) can be used to estimate diamteter. Otherwise, the second model can be used to estimate diameter.
  • the system can also evaluate sample quality content and omit information derived from nanopores and omit use of information derived from nanopores associated with poor sample quality content (e.g., prior to implementation of the methods shown in Figures 15A and 15B).
  • the system can evaluate sample quality content with a throughput via capture rate or event rate per unit time (above a lower threshold, e.g., 1 event per min, and above an upper threshold, e.g., 10,000 events per min).
  • the system can evaluate sample quality content in terms of amount of separation of populations for samples with more than one species present, including one or more unknowns potentially present and one or more controls present.
  • a sample includes one control or reference component, and one unknown/target sample component present above a minimum fractional amount (e.g., 2%).
  • the system determines values of separation metrics from model estimations (e.g., SVM-based model) that divide the populations within the sample.
  • Values of separation metrics output from the model(s) can include distance values (e.g., shortest distance of the centroid of one population of events to a reference separation boundary /hyper-plane) or any other suitable separation metrics.
  • a 50:50 mixture of S2 samples (0%, 100%, described above) was evaluated with a universal model according to the following steps, depicted in Figure 16A.
  • the method 1600 of the universal model implemented by the system first determines 1610 if the sample populations (e.g., populations of the 50:50 mixture) are within a target region, in order to filter or remove from consideration information from nanopores producing poor quality data (e.g., in terms of nanopore size, in terms of abnormal nanopore morphology, in terms of abnormal interactions with sample processing assays, in relation to detected contamination of samples and/or sample processing materials, etc.).
  • Figure 16B depicts an output of step 1620, where event data from nanopores is plotted by amplitude of electrical signal vs. dwell time, in association with different reagents (i.e., 0% S-adenosylmethionine, 100% S-adenosylmethionine) used to process the sample.
  • the system collected event data from all populations, and defined a target region, outside of which events are categorized as noise. The system then determined a measure of percent noise for each nanopore based on the number of actual events vs. noise events defined by the target region, and if the percent noise for a particular nanopore was greater than a threshold, removed from consideration data from the particular nanopore.
  • the system defined a target region boundary separating a subset of noise events from a subset of actual events in the nanopore data, and determined a percent noise based upon the subset of noise events and the subset of actual events.
  • Other embodiments of step 1610 can, however, be implemented in another manner, in relation to evaluating nanopore noise percentages.
  • the system then evaluates 1620 separation in components of the populations of the sample, in order to provide another step for filtering or removing from consideration information from nanopores producing poor quality data (e.g., in terms of abnormal interactions with sample processing assays, in relation to detected contamination of samples and/or sample processing materials, etc.).
  • the system can perform a principal component analysis (PCA) operation on one or more of: dwell time (e.g., dwell time of a sample component relative to a nanopore), median amplitude of electrical signal output from the nanopore, maximum amplitude of electrical signal output from the nanopore, nanopore area, and any other suitable nanopore-associated factor.
  • PCA principal component analysis
  • the PCA operation implements a transformation (i.e., orthogonal linear transformation) that transforms the data from a first coordinate system to a second coordinate system, such that the greatest variance by some projection of the data lies on a first coordinate (i.e., first component) of the second coordinate system.
  • the second greatest variance lies on a second coordinate
  • the third greatest variance lies on a third coordinate.
  • the PCA operation maps the data onto new coordinates associated with different levels of variance in the data.
  • PCA components rather than values of single parameters (e.g., dwell time, median amplitude of electrical signal output from the nanopore, maximum amplitude of electrical signal output from the nanopore, nanopore area, etc.)
  • the system can efficiently evaluate separation in data regardless of overlap in values of single parameters.
  • the system uses a first component of the PCA operation to check separation of sample populations (e.g., in relation to Gaussian distributions representing each sample population).
  • the separation score SS is then evaluated against a threshold to determine if the level of separation is appropriate.
  • FIG 16C depicts outputs of step 1620 (using the same sample and sample populations of Figure 16B), where the system uses components (PCI and PC2) of a PCA operation to generate Gaussian distributions of count vs. PCI for each sample population.
  • Other embodiments of step 1620 can, however, be implemented in another manner, in relation to evaluating population separation for a sample.
  • the system determines 1630 if a calibration ratio of sample populations is within a target range by implementing the universal model (e.g., universal model including a pre-built model and clustering methods, which may work independently and/or collaboratively) to generate a prediction of the calibration ratio for the sample.
  • the system uses the universal model to generate an output of maximum amplitude in current from a nanopore in terms of nano Amperes (max amp (nA)l against logarithmic base-10 of the dwell time in seconds, in order to check if the 50:50 calibration ratio of the sample is within a target range.
  • class labels S2 0% and 100% represent the different sample populations, and the universal model outputs %(S2 100%) as 68.91%, which is used to check the 50:50 calibration ratio.
  • Other embodiments of step 1630 can, however, be implemented in another manner, in relation to checking of sample calibration ratios.
  • a nanopore device includes at least a pore that forms an opening in a structure separating an interior space of the device into two volumes, and at least a sensor configured to identify objects (for example, by detecting changes in parameters indicative of objects) passing through the pore.
  • Nanopore devices used for the methods described herein are also disclosed in PCT Publication WO/2013/012881, incorporated by reference in its entirety.
  • the pore(s) in the nanopore device are of a nano scale or micro scale.
  • each pore has a size that allows a small or large molecule or microorganism to pass.
  • each pore is at least about 1 nm in diameter.
  • each pore is at least about 2 nm, 3 nm, 4 nm, 5 nm, 6 nm, 7 nm, 8 nm, 9 nm, 10 nm, 11 nm, 12 nm, 13 nm, 14 nm, 15 nm, 16 nm, 17 nm, 18 nm, 19 nm, 20 nm, 25 nm, 30 nm, 35 nm, 40 nm, 45 nm, 50 nm, 60 nm, 70 nm, 80 nm, 90 nm, or 100 nm in diameter.
  • the pore is no more than about 100 nm in diameter.
  • the pore is no more than about 95 nm, 90 nm, 85 nm, 80 nm, 75 nm, 70 nm, 65 nm, 60 nm, 55 nm, 50 nm, 45 nm, 40 nm, 35 nm, 30 nm, 25 nm, 20 nm, 15 nm, or 10 nm in diameter.
  • the pore has a diameter that is between about 1 nm and about 100 nm, or alternatively between about 2 nm and about 80 nm, or between about 3 nm and about 70 nm, or between about 4 nm and about 60 nm, or between about 5 nm and about 50 nm, or between about 10 nm and about 40 nm, or between about 15 nm and about 30 nm.
  • the nanopore device further includes means to move a polymer scaffold across the pore and/or means to identify objects that pass through the pore. Further details are provided below, described in the context of a two-pore device. [00252] Compared to a single-pore nanopore device, a two-pore device can be more easily configured to provide good control of speed and direction of the movement of the polymer scaffold across the pores.
  • the nanopore device includes a plurality of chambers, each chamber in communication with an adjacent chamber through at least one pore. Among these pores, two pores, namely a first pore and a second pore, are placed so as to allow at least a portion of a target polynucleotide to move out of the first pore and into the second pore. Further, the device includes a sensor at each pore capable of identifying the target polynucleotide during the movement. In one aspect, the identification entails identifying individual components of the target polynucleotide. In another aspect, the identification entails identifying payload molecules bound to the target polynucleotide. When a single sensor is employed, the single sensor may include two electrodes placed at both ends of a pore to measure an ionic current across the pore. In another embodiment, the single sensor comprises a component other than electrodes.
  • the device includes three chambers connected through two pores.
  • Devices with more than three chambers can be readily designed to include one or more additional chambers on either side of a three-chamber device, or between any two of the three chambers.
  • more than two pores can be included in the device to connect the chambers.
  • Such a multi-pore design can enhance throughput of target polynucleotide analysis in the device.
  • one chamber could have a one type of target polynucleotide, and another chamber could have another target polynucleotide type.
  • the device further includes means to move a target polynucleotide from one chamber to another.
  • the movement results in loading the target polynucleotide (e.g., the amplification product or amplicon comprising the target sequence) across both the first pore and the second pore at the same time.
  • the means further enables the movement of the target polynucleotide, through both pores, in the same direction.
  • each of the chambers can contain an electrode for connecting to a power supply so that a separate voltage can be applied across each of the pores between the chambers.
  • a device comprising an upper chamber, a middle chamber and a lower chamber, wherein the upper chamber is in communication with the middle chamber through a first pore, and the middle chamber is in communication with the lower chamber through a second pore.
  • Such a device may have any of the dimensions or other characteristics previously disclosed in U.S. Publ. No. 2013-0233709, entitled Dual- Pore Device, which is herein incorporated by reference in its entirety.
  • each pore is at least about 1 nm in diameter.
  • each pore is at least about 2 nm, 3 nm, 4 nm, 5nm, 6 nm, 7 nm, 8 nm, 9 nm, 10 nm, 11 nm, 12 nm, 13 nm, 14 nm, 15 nm, 16 nm, 17 nm, 18 nm, 19 nm, 20 nm, 25 nm, 30 nm, 35 nm, 40 nm, 45 nm, 50 nm, 60 nm, 70 nm, 80 nm, 90 nm, or 100 nm in diameter.
  • each pore is no more than about 100 nm in diameter.
  • the pore is no more than about 95 nm, 90 nm, 85 nm, 80 nm, 75 nm, 70 nm, 65 nm, 60 nm,
  • the pore has a diameter that is between about 1 nm and about 100 nm, or alternatively between about 2 nm and about 80 nm, or between about 3 nm and about 70 nm, or between about 4 nm and about 60 nm, or between about 5 nm and about 50 nm, or between about 10 nm and about 40 nm, or between about 15 nm and about 30 nm.
  • the pore has a substantially round shape. “Substantially round”, as used here, refers to a shape that is at least about 80 or 90% in the form of a cylinder. In some embodiments, the pore is square, rectangular, triangular, oval, or hexangular in shape.
  • the pore has a depth that is between about 1 nm and about 10,000 nm, or alternatively, between about 2 nm and about 9,000 nm, or between about 3 nm and about 8,000 nm, etc.
  • the nanopore extends through a membrane.
  • the pore may be a protein channel inserted in a lipid bilayer membrane or it may be engineered by drilling, etching, or otherwise forming the pore through a solid-state substrate such as silicon dioxide, silicon nitride, grapheme, or layers formed of combinations of these or other materials. Nanopores are sized to permit passage through the pore of the
  • scaffold fusion:payload, or the product of this molecule following enzyme activity.
  • temporary blockage of the pore may be desirable for discrimination of molecule types.
  • the length or depth of the nanopore is sufficiently large so as to form a channel connecting two otherwise separate volumes.
  • the depth of each pore is greater than 100 nm, 200 nm, 300 nm, 400 nm, 500 nm, 600 nm, 700 nm, 800 nm, or 900 nm. In some aspects, the depth of each pore is no more than 2000 nm or 1000 nm.
  • the pores are spaced apart at a distance that is between about 10 nm and about 1000 nm. In some aspects, the distance between the pores is greater than 1000 nm, 2000 nm, 3000 nm, 4000 nm, 5000 nm, 6000 nm, 7000 nm, 8000 nm, or 9000 nm. In some aspects, the pores are spaced no more than 30000 nm, 20000 nm, or 10000 nm apart.
  • the distance is at least about 10 nm, or alternatively, at least about 20 nm, 30 nm, 40 nm, 50 nm, 60 nm, 70 nm, 80 nm, 90 nm, 100 nm, 150 nm, 200 nm, 250 nm, or 300 nm. In another aspect, the distance is no more than about 1000 nm, 900 nm, 800 nm, 700 nm, 600 nm, 500 nm, 400 nm, 300 nm, 250 nm, 200 nm, 150 nm, or 100 nm.
  • the distance between the pores is between about 20 nm and about 800 nm, between about 30 nm and about 700 nm, between about 40 nm and about 500 nm, or between about 50 nm and about 300 nm.
  • the two pores can be arranged in any position so long as they allow fluid communication between the chambers and have the prescribed size and distance between them.
  • the pores are placed so that there is no direct blockage between them.
  • the pores are substantially coaxial.
  • the device has electrodes in the chambers connected to one or more power supplies.
  • the power supply includes a voltage-clamp or a patch- clamp, which can supply a voltage across each pore and measure the current through each pore independently.
  • the power supply and the electrode configuration can set the middle chamber to a common ground for both power supplies.
  • the power supply or supplies are configured to apply a first voltage Vi between the upper chamber (Chamber A) and the middle chamber (Chamber B), and a second voltage V2 between the middle chamber and the lower chamber (Chamber C).
  • the first voltage Vi and the second voltage V2 are independently adjustable.
  • the middle chamber is adjusted to be a ground relative to the two voltages.
  • the middle chamber comprises a medium for providing conductance between each of the pores and the electrode in the middle chamber.
  • the middle chamber includes a medium for providing a resistance between each of the pores and the electrode in the middle chamber. Keeping such a resistance sufficiently small relative to the nanopore resistances is useful for decoupling the two voltages and currents across the pores, which is helpful for the independent adjustment of the voltages. [00271] Adjustment of the voltages can be used to control the movement of charged particles in the chambers.
  • a properly charged particle can be moved from the upper chamber to the middle chamber and to the lower chamber, or the other way around, sequentially.
  • a charged particle can be moved from either the upper or the lower chamber to the middle chamber and kept there.
  • the adjustment of the voltages in the device can be particularly useful for controlling the movement of a large molecule, such as a charged polymer scaffold, that is long enough to cross both pores at the same time.
  • a large molecule such as a charged polymer scaffold
  • the direction and the speed of the movement of the molecule can be controlled by the relative magnitude and polarity of the voltages as described below.
  • the device can contain materials suitable for holding liquid samples, in particular, biological samples, and/or materials suitable for nanofabrication.
  • materials include dielectric materials such as, but not limited to, silicon, silicon nitride, silicon dioxide, graphene, carbon nanotubes, T1O2, HfCk, AI2O3, or other metallic layers, or any combination of these materials.
  • a single sheet of graphene membrane of about 0.3 nm thick can be used as the pore- bearing membrane.
  • both membranes can be simultaneously drilled by a single beam to form two concentric pores, though using different beams on each side of the membranes is also possible in concert with any suitable alignment technique.
  • the housing ensures sealed separation of Chambers A-C.
  • the device includes a microfluidic chip (labeled as“Dual-pore chip”) is comprised of two parallel membranes connected by spacers. Each membrane contains a pore drilled by a single beam through the center of the membrane. Further, the device preferably has a Teflon® housing or polycarbonate housing for the chip. The housing ensures sealed separation of Chambers A-C and provides minimal access resistance for the electrode to ensure that each voltage is applied principally across each pore.
  • the pore-bearing membranes can be made with transmission electron microscopy (TEM) grids with a 5-100 nm thick silicon, silicon nitride, or silicon dioxide windows.
  • Spacers can be used to separate the membranes, using an insulator, such as SU-8, photoresist, PECVD oxide, ALD oxide, ALD alumina, or an evaporated metal material, such as Ag, Au, or Pt, and occupying a small volume within the otherwise aqueous portion of Chamber B between the membranes.
  • a holder is seated in an aqueous bath that is comprised of the largest volumetric fraction of Chamber B. Chambers A and C are accessible by larger diameter channels (for low access resistance) that lead to the membrane seals.
  • a focused electron or ion beam can be used to drill pores through the membranes, naturally aligning them.
  • the pores can also be sculpted (shrunk) to smaller sizes by applying a correct beam focusing to each layer.
  • Any single nanopore drilling method can also be used to drill the pair of pores in the two membranes, with consideration to the drill depth possible for a given method and the thickness of the membranes. Predrilling a micro-pore to a prescribed depth and then a nanopore through the remainder of the membranes is also possible to further refine the membrane thickness.
  • One example concerns a target polynucleotide, having a length that is longer than the combined distance that includes the depth of both pores plus the distance between the two pores.
  • a 1000 by dsDNA is about 340 nm in length, and would be substantially longer than the 40 nm spanned by two 10 nm-deep pores separated by 20 nm.
  • the polynucleotide is loaded into either the upper or the lower chamber. By virtue of its negative charge under a physiological condition at a pH of about 7.4, the polynucleotide can be moved across a pore on which a voltage is applied. Therefore, in a second step, two voltages, in the same polarity and at the same or similar magnitudes, are applied to the pores to move the polynucleotide across both pores sequentially.
  • one or both of the voltages can be changed. Since the distance between the two pores is selected to be shorter than the length of the polynucleotide, when the polynucleotide reaches the second pore, it is also in the first pore. A prompt change of polarity of the voltage at the first pore, therefore, will generate a force that pulls the polynucleotide away from the second pore.
  • the polynucleotide will continue crossing both pores towards the second pore, but at a lower speed.
  • the speed and direction of the movement of the polynucleotide can be controlled by the polarities and magnitudes of both voltages. As will be further described below, such a fine control of movement has broad applications.
  • the utility of two-pore device implementations is that during controlled delivery and sensing, the target polynucleotide or payload-bound target polynucleotide can be repeatedly measured, to add confidence to the detection result.
  • a method for controlling the movement of a charged polymer scaffold through a nanopore device comprises (a) loading a sample comprising a target polynucleotide (e.g., a target polynucleotide amplicon) in one of the upper chamber, middle chamber or lower chamber of the device of any of the above embodiments, wherein the device is connected to one or more power supplies for providing a first voltage between the upper chamber and the middle chamber, and a second voltage between the middle chamber and the lower chamber; (b) setting an initial first voltage and an initial second voltage so that the target polynucleotide moves between the chambers, thereby locating the polymer scaffold across both the first and second pores; and (c) adjusting the first voltage and the second voltage so that both voltages generate force to pull the charged target polynucleotide away from the middle chamber (voltage-competition mode), wherein the two voltages are different in magnitude, under controlled conditions, so that the target polynucleotide
  • a target polynucleotide
  • the sample containing the target polynucleotide is loaded into the upper chamber and the initial first voltage is set to pull the target polynucleotide from the upper chamber to the middle chamber and the initial second voltage is set to pull the target polynucleotide from the middle chamber to the lower chamber.
  • the sample can be initially loaded into the lower chamber, and the target polynucleotide can be pulled to the middle and the upper chambers.
  • the sample containing the target polynucleotide is loaded into the middle chamber; the initial first voltage is set to pull the charged polymer scaffold from the middle chamber to the upper chamber; and the initial second voltage is set to pull the target polynucleotide from the middle chamber to the lower chamber.
  • real-time or on-line adjustments to the first voltage and the second voltage at step (c) are performed by active control or feedback control using dedicated hardware and software, at clock rates up to hundreds of megahertz.
  • Automated control of the first or second or both voltages is based on feedback of the first or second or both ionic current measurements.
  • the nanopore device further includes one or more sensors to carry out the detection of the target polynucleotide.
  • the sensors used in the device can be any sensor suitable for identifying a target polynucleotide amplicon bound or unbound to a payload molecule.
  • a sensor can be configured to identify the target polynucleotide by measuring a current, a voltage, a pH value, an optical feature, or residence time associated with the polymer.
  • the sensor may be configured to identify one or more individual components of the target polynucleotide or one or more components bound or attached to the target polynucleotide.
  • the sensor may be formed of any component configured to detect a change in a measurable parameter where the change is indicative of the target polynucleotide, a component of the target polynucleotide, or preferably, a component bound or attached to the target
  • the senor includes a pair of electrodes placed at two sides of a pore to measure an ionic current across the pore when a molecule or other entity, in particular a target polynucleotide, moves through the pore.
  • the ionic current across the pore changes measurably when a target polynucleotide segment passing through the pore is bound to a payload molecule. Such changes in current may vary in predictable, measurable ways corresponding with, for example, the presence, absence, and/or size of the target polynucleotide molecule present.
  • the senor comprises electrodes that apply voltage and are used to measure current across the nanopore.
  • the conductance G 1/Z are monitored to signal and quantitate nanopore events.
  • the result when a molecule translocates through a nanopore in an electrical field is a current signature that may be correlated to the molecule passing through the nanopore upon further analysis of the current signal.
  • the size of the component can be correlated to the specific component based on the length of time it takes to pass through the sensing device.
  • a sensor is provided in the nanopore device that measures an optical feature of the polymer, a component (or unit) of the polymer, or a component bound or attached to the polymer.
  • One example of such measurement includes the identification of an absorption band unique to a particular unit by infrared (or ultraviolet) spectroscopy.
  • the senor is an electric sensor. In some embodiments, the sensor detects a fluorescent signature. A radiation source at the outlet of the pore can be used to detect that signature.
  • the invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process.
  • the invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process.
  • This example presents results from the application of the fractional abundance (FA) framework to data in which a transgenic (GMO) target sequence is within a 788 bp target dsDNA (i.e,. a target analyte) and the reference sequence (lectin housekeeping gene) is within a 466 bp reference dsDNA (i.e., a reference analyte).
  • GMO transgenic
  • lectin housekeeping gene is within a 466 bp reference dsDNA
  • Quantitation of the fractional amount of transgene target in a sample is achieved below, first by applying the O-test method with a single feature criterion based on event area and using equations (1) and (2), and second by applying the SVM method and using equations (3) and (4).
  • the 466 bp reference DNA and 788 bp target transgenic DNA fragments were generated by PCR from mixtures of conventional and transgene-containing genomic DNA samples using sequence specific oligonucleotide primers. PCR products were purified and concentrated using standard silica membrane columns. Precise fractional mixtures of the two amplicons were prepared from large volumes of the individually generated amplicons, and aliquots of the fractional mixtures and single amplicons were used as standard reference materials for all assays.
  • the reference control sample containing the 466 bp reference DNA was measured in a nanopore device.
  • the target control sample containing the 788 bp transgenic DNA was prepared and measured in the nanopore device.
  • the length differences between the target analyte (788 bp) and the reference analyte (466 bp) generate a unique event signature upon translocation through the nanopore that can be discriminated based on area of the event signature.
  • Figure 4A shows all event area histograms for two isolated control runs, one for the 466 bp reference DNA and one for the 788 bp target transgenic DNA. Also shown is an area histogram from a 3: 10 target: reference control mixture.
  • Figure 4C shows and how the fractional amount parameter p(q) appears graphically at a q value.
  • Table 3 reports prediction results from one nanopore assay for four“unknown” mixed samples (S1-S4) using a control mixture of 0.35: 1 (35% GMO) for compensation. Unknown were blinded in each nanopore assay, so the percent error is not reported in the table. The table also reports the total number of events recorded in each 5 minute period.
  • Nanopore size range was 25-35 nm in diameter.
  • Sl-Sl 1 A total of 11 mixed samples (Sl-Sl 1) were assayed.
  • Table 4 reports the combined estimates, ordered from smallest to largest predicted GMO% value. The reported mean GMO% values are computed by averaging the single-nanopore predictions. The uncertainty of each mean estimate is computed from repeated random sampling of the individual estimates distributions (a Monte Carlo method). Reported is the numerically generated 95 th -percentile confidence interval. The number of times each sample was tested and the true GMO% for each sample are also reported.
  • the isolated control sets were first used for initial feature selection.
  • the initial selection aims to remove highly correlated features, which can cause multicollinearity problems for certain classification methods.
  • the seven identified features were: (i)
  • This procedure is recursively repeated until desired number of feature set is reached.
  • the next step in the method is model training and testing. All events collectively in the isolated controls were randomly sorted into a training dataset and a testing dataset using a 7:3 split.
  • An SVM was trained based on the training dataset with hyper-parameters search algorithm to find the optimal parameters to perform classification.
  • the hyper-parameters tested in grid algorithm are: the kernel type (linear, rbf), regularization parameter (C) and kernel coefficient (gamma).
  • Area Under the Curve of ROC curve (roc auc) was used to evaluate the performance of each hyper-parameters combination.
  • the model having highest roc auc scores was used for the down-stream data processing. For the best parameter combination, the average precision and recall of each class from the testing data were calculated.
  • the model with optimal parameters was then trained by training dataset and tested on testing dataset. Prediction of accuracy on testing data set was generated and is shown in Figure 7. The accuracy across the entire set remained above 97.5%.
  • the next step in the method was data calibration. Calibration can be achieved by applying the model in step 3 to the control mixture data, which generates a correction ratio. The correction ratio is then multiplied by each predicted amount for a unknown mixture. This is equivalent to multiplying by the parameter a in equations (1) and (2).
  • the value for the parameter a is generated by the applying the model to the control mixture in the SVM method, whereas (1) and (2) involve direct calculation of a from the control data sets Q values.
  • Table 5 shows a comparison of GMO% predictions between the Q-test method and the SVM-based method.
  • the value of the SVM method is that it can be automated to apply to dataset which, a priori, may not have a definite criterion that can be applied, a requirement for the Q-test method.
  • the Q-test method is computationally simpler, and is likely preferred for fractional abundance applications that can utilize well-characterized criterion in the Q-test format.
  • this example shows that two comparable lengths can be used for the target and reference dsDNA, where discrimination in nanopore event signature is achieved by using two distinct sequence-specific payloads.
  • transgene-targeting probe was linked to a 4-arm 40kDa PEG and the reference-targeting probe was linked to an 8-arm 40kDa PEG.
  • Figure 8 shows an event plot for two molecule types that were run as isolated controls sequentially on the same pore.
  • a sample containing a 96 bp DNA/probe-payload complex was prepared and measured in a nanopore device.
  • the complex is a model for a fragment comprising the target sequence and bound with a probe-payload.
  • the probe-payload was a PNA-PEG with a 4-arm PEG structure.
  • the fragment comprising the reference sequence was designed to generate a unique event signature upon translocation through the nanopore with which fractional abundance calculations could be achieved.
  • the reference molecule is a 74bp DNA with PNA-PEG bound, where the PEG has an 8-arm structure.
  • the key is that the reference/probe- payload molecule generates a unique event subpopulation that is distinct from the
  • target/probe-payload molecule and both are distinct from any background events when present.
  • Nanopore size range was 25-35 nm in diameter.
  • Spl-Sp6 mixed samples
  • Table 6 reports the combined estimates, ordered from smallest to largest predicted GMO% value. The reported mean GMO% values are computed by averaging the single-nanopore predictions. The uncertainty of each mean estimate is computed and reported as 95 L -r6G06h ⁇ 6 confidence interval. The number of times each sample was tested and the true GMO% for each sample are also reported. Table 6. Combined GMO% predictions using distinct payloads to discriminate target/reference
  • Prediction performance with the two payloads appears to be not quite as good as when using dsDNA length discrimination (Examples 1, 2). In any case, accuracy is better than 6% in all cases, and can be further improved by having more nanopores measuring the pool of molecules in parallel, and combining the resulting estimates.
  • Amplicons were generated from the cell-free circulating DNA fraction obtained from blood plasma and subject to hybridization with oligonucleotide probes targeting both wildtype and mutant KRAS alleles and covalently linked to PEG polymer payloads: probes that target the KRAS wt alleles (c.35G) were linked to either 40kDa 8-arm or 80kDa 2- branch PEG polymers and probes targeting the G12D (c.35G->A) allele were linked to a 40kDa 3 -branch PEG polymers.
  • Figure 9A shows a representative event plot of mean 5G vs. duration for the 100% target analyte control sample (blue closed circles) and the 100% reference molecule control sample (black open squares) overlaid.
  • the target analyte was 89bp DNA with G12D-bound probe linked to a 3-branch PEG (denoted G12D-3bPEG).
  • the reference molecule was 89bp DNA with wild-type (c.35G)-bound probe linked to an 8-arm PEG (denoted WT-8armPEG).
  • the two controls were run sequentially using a 35 nm diameter nanopore at 215 mV (1.0 M LiCl lOmM tris ImM EDTA).
  • the plot suggests a criterion based on three inequalities for tagging target events:
  • sample A shows higher G12D content than sample B, though both are positive compared to the 0.6% false positive rate of the 100% WT control.
  • Table 7 shows the results for samples A and B in rows 1 and 2. Also shown are the results for all patient samples tested. A total of 5 different patient samples were assayed. Samples C and C2 were subsamples from the same patient sample; likewise for sample D, D2 and E, E2. Different subsamples taken from the same patient sample were, in all three cases considered, within 2% of one another. This is despite different people running each nanopore experiment on a different nanopore, and in two cases on a different day. This suggests a reproducible workflow and quantitative fractional abundance method. Table 7. Predicted G12D mutant fraction in blood samples using Q-test method
  • the SVM method was applied for comparison. Using one representative experiment (nanopore NP4 in Table 1), the data was processed using the steps described for applying the SVM method. An event scatter plot of median 5G vs. logio(duration) is shown in Figure 10 for the 100% reference control and the 100% target control overlaid. Also plotted is the SVM-identified decision boundary. The predicted G12D fraction in sample C2 is reported in Table 8 for both the Q-test and SVM methods. The two methods are within 5% of each other. Table 8. Predicted G12D fraction using the Q-test and SVM to determine an optimized threshold (q).
  • Example 5 EMGM for FA of KRAS G12D SNP compared to wild-type using short DNA 189 bp) and two unique payloads
  • EMGM Expectation Maximization Algorithm for Gaussian Mixtures
  • Step 1 log of dwell time (log(dwell)) and median amplitude (medAmp) of 50% target & 50% reference mixture sample was used as input data for the EMGM algorithm (Figure 11).
  • Step 2 Based on the population, a 3-Gaussian mixture model was used to train the model. This model predicted the mutant (target) region in one cluster (diamond). The other 2 clusters (star and square) correspond to wild-type ( Figure 12). We observe that some events within the initial target domain box (Figure 11) are associated with the reference modes by the EMGM algorithm. This is different than the Q-test method, where the box itself defines the population of events that are tagged as targets vs. reference.
  • Step 3 The model was applied on 100% wild-type (reference) sample. The ratio number of events in the mutant (target) region over the total number of events establishes the false positive fraction ( Figure 13), which can be used to improve the fractional abundance estimate.
  • Step 4 The model was used to predict unknown mixtures. The ratio number of events in mutant region over the total number of events was used a predictor of the percentage of mutant molecules in unknown mixture ( Figure 14).

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Organic Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Electrochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Genetics & Genomics (AREA)
  • Hematology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Nanotechnology (AREA)
  • Urology & Nephrology (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analyzing Materials By The Use Of Electric Means (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

L'invention concerne des procédés et des compositions permettant d'établir une estimation améliorée de l'abondance fractionnelle d'analytes cibles (par exemple, des séquences polynucléotidiques spécifiques) dans un échantillon à l'aide d'un capteur à nanopores, par exemple, en corrigeant des erreurs inhérentes à l'identification et de la corrélation de signaux électriques à des quantités d'un analyte cible ou d'un analyte de référence dans un échantillon.
EP19925663.7A 2019-04-22 2019-04-22 Détermination par multipores de l'abondance fractionnaire de séquences polynucléotidiques dans un échantillon Withdrawn EP3959331A1 (fr)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2019/028518 WO2020219011A1 (fr) 2019-04-22 2019-04-22 Détermination par multipores de l'abondance fractionnaire de séquences polynucléotidiques dans un échantillon

Publications (1)

Publication Number Publication Date
EP3959331A1 true EP3959331A1 (fr) 2022-03-02

Family

ID=72940971

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19925663.7A Withdrawn EP3959331A1 (fr) 2019-04-22 2019-04-22 Détermination par multipores de l'abondance fractionnaire de séquences polynucléotidiques dans un échantillon

Country Status (5)

Country Link
EP (1) EP3959331A1 (fr)
JP (1) JP2022530016A (fr)
KR (1) KR20220011639A (fr)
CN (1) CN113966403A (fr)
WO (1) WO2020219011A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114942402B (zh) * 2022-07-20 2022-11-29 武汉格蓝若智能技术有限公司 一种异常电能表定位方法及系统
CN116127288B (zh) * 2023-04-14 2023-09-15 南京邮电大学 基于独立成分分析的纳米孔传感信号噪声去除方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3798317B1 (fr) * 2007-04-04 2024-01-03 The Regents of the University of California Compositions, dispositifs, systèmes, et procédés d'utilisation d'un nanopore
KR101814056B1 (ko) * 2009-12-01 2018-01-02 옥스포드 나노포어 테크놀로지즈 리미티드 생화학적 분석 기구
CN107250380A (zh) * 2015-02-02 2017-10-13 双孔人公司 来自样品背景的靶多核苷酸的纳米孔隙检测
US10102338B2 (en) * 2015-09-24 2018-10-16 Genia Technologies, Inc. Adaptive compression and modification of nanopore measurement data
EP3800469A1 (fr) * 2016-10-24 2021-04-07 Ontera Inc. Abondance fractionnaire de séquences polynucléotidiques dans un échantillon

Also Published As

Publication number Publication date
KR20220011639A (ko) 2022-01-28
WO2020219011A1 (fr) 2020-10-29
JP2022530016A (ja) 2022-06-27
CN113966403A (zh) 2022-01-21

Similar Documents

Publication Publication Date Title
AU2017348009B2 (en) Fractional abundance of polynucleotide sequences in a sample
Lang et al. Estimating the per-base-pair mutation rate in the yeast Saccharomyces cerevisiae
US11486873B2 (en) Multipore determination of fractional abundance of polynucleotide sequences in a sample
CN109923613B (zh) 利用信号变化量数据集的样品内的目标分析物质检测方法
US10978173B2 (en) Method for reducing noise level of data set for a target analyte
Ståhlberg et al. The added value of single-cell gene expression profiling
KR102165933B1 (ko) 둘 이상의 데이터 세트를 이용한 비정상적인 시그널의 검출
WO2020219011A1 (fr) Détermination par multipores de l'abondance fractionnaire de séquences polynucléotidiques dans un échantillon
Duffy et al. Evidentiary evaluation of single cells renders highly informative forensic comparisons across multifarious admixtures
JP2019507863A (ja) 点滅および蛍光反応中の検体の検出
Hao et al. RareVar: a framework for detecting low-frequency single-nucleotide variants
Navascués et al. Power and limits of selection genome scans on temporal data from a selfing population
Kelley et al. Correcting for gene-specific dye bias in DNA microarrays using the method of maximum likelihood
Lottaz et al. High-Dimensional Profiling for Computational Diagnosis
Karkar et al. Statistical modeling of short-tandem repeat capillary electrophoresis profiles
Xu et al. Automated single-nucleotide polymorphism analysis using fluorescence excitation–emission spectroscopy and one-class classifiers
Saleem Analysis of heat-induced DNA damage during PCR and verification, validation and comparative analysis of two PCR megaplexes

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20211119

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20221101