WO2022061307A1

WO2022061307A1 - Methods for accurate and reproducible detection and quantification of target polynucleotides using a nanopore device

Info

Publication number: WO2022061307A1
Application number: PCT/US2021/051366
Authority: WO
Inventors: Andrew Martin SMITH; Denise Ann MCGRATH; Stacy HARVEY; Yanan Zhao
Original assignee: Ontera Inc.
Priority date: 2020-09-21
Filing date: 2021-09-21
Publication date: 2022-03-24

Abstract

Aspects of the present disclosure include methods of a developing a calibration model to determine an estimate of the size and/or concentration of one or more target analytes in an unknown sample using a nanopore device, methods of a determining an estimate of the size and/or concentration of one or more target analytes in a mixed unknown sample using a nanopore device, and methods of a detecting the presence or absence of one or more target analytes suspected to be present in a mixed unknown sample using a nanopore device. The control mixtures analysis algorithms characterize the translocation behavior for each nanopore sensor prior to detecting the analyte(s) of interest, enable means to calibrate each sensor based on the measured translocation event signatures in the absence of an available quantitative model describing translocation dynamics, and enhance the accuracy and precision of target analyte detection.

Description

METHODS FOR ACCURATE AND REPRODUCIBLE DETECTION AND QUANTIFICATION OF TARGET POLYNUCLEOTIDES USING A NANOPORE DEVICE

CROSS-REFERENCE

[0001] This application claims the benefit of U.S. Provisional Application No.: 63/081,303, filed September 21, 2020, and U.S. Provisional Application No.: 63/220,405, filed July 9, 2021, which are hereby incorporated by reference in their entirety.

SEQUENCE LISTING

[0002] The instant application contains a Sequence Listing which has been submitted via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on September 21, 2020, is named “49703_WO_Sequence_Listing_ST25.txt”, and is 2 kilobytes in size.

BACKGROUND

[0003] Nanopores show great potential as single-molecule tools for bioanalytical sensing and sequencing, due to their exceptional sensing capabilities, high-throughput, and low cost. Solid state nanopores are particularly promising for commercial applications because the pores can be mass produced cheaply using existing semiconductor processing technologies. The detection relies on detecting small differences in the ionic current as biomolecules traverse the nanopore. While passing through the pore, the translocating molecule transiently blocks the ionic current, thereby inducing a small dip in the current signal, which is detectable by appropriate electronics. The amplitude and duration of the dip (i.e., a translocation event) in ionic current depends on the diameter and length of the nanopore. For instance, a larger pore diameter tends to produce a smaller ionic current attenuation and a longer pore a longer duration. Additionally, other dimensional variations of the nanopore device can also affect the capture rates of molecules of different sizes.

[0004] In commercializing nanopore sensors, it is highly important to consider the effects of dimensional variations from nanopore to nanopore on the robustness of the sensing results. There is a need to develop methods to minimize the deleterious effects of nanopore variations in order to enhance the accuracy and precision of nanopore sensing of target analytes. Additionally, alternative methods are needed that are robust to nanopore signal drift, and to accommodate assays where more than one target can be quantified. The proposed method overcomes these limitations. BRIEF DESCRIPTION OF THE FIGURES

[0005] FIG. 1 shows an example of translocation current event signal fingerprints of the nanopore device of the present disclosure. For example, the nanopore is loaded with one or more control mixtures, and each molecule within the control mixture is translocated through one or more nanopores within the nanopore device, and a current event is generated as the molecule translocates through the nanopore. The current event can provide information such as, but not limited to, area, amplitude, dwell time, etc. The current event can include current depth.

[0006] FIG. 2 provides an example of raw data of the current event signatures for the control mixture containing 3 DNA molecule populations (e.g., first molecule, second molecule, reference molecule), showing maximum amplitude (pA; y axis) vs dwell time ((log(dwell(sec)); x axis). Each current event signature is depicted as a “dot” in the plot from the control sample mixture. Clusters, not clearly defined, of each DNA population (80 pb, 197 bp, 356 bp) is shown in circles with the dotted lines.

[0007] FIG. 3 shows the current event signatures after applying a clustering algorithm that clusters and labels each of the current event signatures from the respective molecules in the control mixture. The various circles depict the different lengths of the control molecules within the mixture. For each event (each dot), the tag of the event (80bp, 197bp or 356bp) and the feature of each event (FIG. 1) is known.

[0008] FIG. 4 provides an example of a DNA linear length model created by extracting the relevant features of each cluster of FIG. 3. A linear correlation is applied between base pairs (bp) and the median area of each cluster. From the data of FIG. 3, the median amplitude of each cluster (80bp, 197bp and 356bp) can be calculated, and 3 single data points are generated (80bp, 80bp_median_amplitude), (197bp, 197bp_median_amplitude) and (356bp, 356bp_median_amplitude). These 3 data points are plotted to generate a linear model using the cluster base pairs (bp) and median area values after taking the log 10 of both the bp and median amplitude and after fiting the 3 data points with a linear correlation. Thus, a linear correlation between the amplitude of event and base pair for an event is obtained to build a DNA length curve model (e.g., used interchangeably herein as “calibration curve”) . In this example, the “reference molecule” has a DNA length of 80 bp, and the control molecules have a DNA length of 197 bp and 356 bp. This calibration curve can then be applied to an “unknown sample” mixture to determine the length and/or concentration of one or more target molecules within the unknown sample. [0009] FIG. 5A-5B shows an example of the population shifting correction algorithm as described in the present disclosure applied to the density plots. Before applying the population shifting correction, the reference molecule peak is not at lOOObp (Panel A, density plot), and after the population shifting correction, the reference molecule peak is at lOOObp (Panel B, density plot). As shown in FIG. 5A, the shifting correction algorithm can be applied to align the base pairs when the event populations have shifting between different runs for the same nanopore. As shown in the density plot, the reference peak of Panel A of FIG. 5 A shows the reference molecule (e.g., used interchangeably herein as “calibrant”). It is supposed to have a spike peak at 1000 base pairs based on the known size of the reference molecule. But due to nanopore quality issues (e.g., nanopore size change or nanopore membrane surface chemistry change), the peak shifts to the left of lOOObp. Thus, the prediction on the unknown molecule (the right peak on Panel A) would be inaccurate. Therefore, the population shifting correction algorithm aligns the peak of the calibrant/reference molecule back to 1000 bp, which then corrects the peak of the target molecule for accurate size and/or concentration quantification (see Panel B). FIG. 5B provides step by step instructions of how to perform the population shifting correction. All the events in the unknown sample including the calibrant/reference molecule will be corrected using the formula in step 3.

[0010] FIG. 6 provides plots before and after application of data filtering. Panel A shows a plot with the original raw event data generated when the molecules translocate through the nanopore. As can be seen as dots within the solid lines of the clusters, the separation of DNA populations (clusters of the molecules of different lengths, as shown within the dotted lines) between 80 bp and 197 bp, and 197bp and 356bp is not very clear, and overlap (dots within the solid rectangular box) is seen between the molecules of different lengths. This overlap can inaccurately predict “197bp” molecule length event as a 356bp molecule length event or vice-versa. Therefore, the data filter was applied to the events that lay in the region where there is potential overlap to have a clear separation between the molecules of differing lengths. Panel B shows a plot where the data filter has been applied, showing clear distinction and separation between the molecules of differing lengths.

[0011] FIG. 7 provides analysis of a sample that was found, using the method of the present disclosure, to be positive for SARS-CoV-2 (target analyte in this example). Raw translocation event data is provided in Panel A for the control mixture and Panel B for the unknown sample suspected to contain the target analyte (SARS-CoV-2). Note that a reference molecule (80 bp DNA population) is added to the unknown sample mixture and serves as an internal control for the control mixture and the unknown sample mixture. Raw translocation data of control mixture and unknown sample mixture are first grouped by the clustering algorithm and filtered to increase statistical confidence (clustered and filtered data shown in Panel C for the control mixture and Panel D for the unknown sample mixture). A calibration curve (DNA length model) is created for the control mixture to show amplitude vs size (not shown), and the unknown sample mixture data, (after clustering and data filtering) is compared to the calibration curve based on the control mixtures to determine the presence or absence of Covid events. In this example, the reference molecule is found in the unknown sample mixture as the internal control as expected, and the target analyte was detected, since the DNA length of 197 bp for SARS-CoV-2 was detected (e.g. 197 bp events were present in unknown sample). Note that it is a coincidence that the control mixture has the same target lengths of human SARS-CoV-2. The control mixtures do not have to include the length and/or sequence regions of SARS-CoV-2, since the lengths of DNA detected in the unknown sample are compared to the calibration curve generated for the control mixture which can be random DNA sequences.

[0012] FIG. 8 provides analysis of a sample that was found, using the method of the present disclosure, to be negative for SARS-CoV-2 (target analyte in this example). Raw translocation event data is provided in Panel A for the control mixture and in Panel B for the unknown sample suspected to contain the target analyte (SARS-CoV-2). Note that a reference molecule (80 bp DNA population) is added to the unknown sample mixture and serves as an internal control for the control mixture and the unknown sample mixture. Raw translocation data of control mixture and unknown sample mixture are separately first grouped by the clustering algorithm and filtered to increase statistical confidence (clustered and filtered data shown in Panel C for the control mixture and Panel D for the unknown sample mixture). A calibration curve (DNA length model) is created to show amplitude vs size (not shown), and the unknown sample mixture data, (after clustering and data filtering) is compared to the calibration curve created based on the control mixture to determine the presence or absence of SARS-CoV-2 events. In this example, the reference molecule is found in the unknown sample mixture as the internal control as expected, but the target analyte (SARS-CoV-2) was not detected (e.g. 197 bp events were not present in unknown sample)., since the 197 bp events for SARS-CoV-2 was absent. Note that it is a coincidence that the control mixture has the same target length of the target analyte, human SARS-CoV-2. The control mixtures do not have to include the length and/or sequence regions of the target analyte, since the clusters of DNA populations detected in the unknown sample are compared by size and/or concentration to the calibration curve generated for the control mixture, which can be random DNA sequences.

SUMMARY

[0013] Aspects of the present disclosure provide methods for characterizing the translocation behavior for nanopore sensors in a nanopore device prior to detecting the analyte(s) of interest using control mixtures containing calibrant and control molecules and novel analysis algorithms. In some embodiments, the control mixtures and their associated algorithms provide a method to improve the accuracy of target identification in a multiplexed assay. In other embodiments, the control mixtures and their associated algorithms provide a means to determine the relative abundance of one or more target analytes in a mixed unknown sample.

[0014] The physics of transport of molecules, such as DNA, through nanopores has been an area of great scientific interest. Despite recent progress, however, the dynamics of the translocation process, in particular the trajectory profiles of DNA passing through solid-state nanopores, are still lacking. Currently, there is no deterministic model describing how different nanopore geometries can affect translocation dynamics. The algorithms described herein enable means to calibrate each sensor based on the measured translocation event signatures in the absence of an available quantitative model describing translocation dynamics. For example, the present methods use a control mixture with known sizes with sequences which can be unrelated to the unknown sample mixture to calibrate size and quantity of the target analytes. Thus, the control mixture of the present methods is nonspecific to any one assay and can be apply to many different assays using a calibration curve. [0015] Disclosed herein are methods and compositions for determining quantitation of the size and/or concentration of target analytes (e.g., specific polynucleotide sequences) in a sample using a nanopore sensor, e.g., by correcting errors inherent to identifying and correlating electrical signals to amounts of one or more target analytes or reference analyte in a sample. As used herein, the term “size” refers to the length of a molecule, such as, e.g., nucleotide sequence length, base-pair length, and the like.

[0016] The present disclosure provides controls (e.g., control mixtures) that are first introduced to the nanopores for detection. The controls are then removed, and the target analytes (samples) are introduced to the same set of nanopores for detection. The “controls” comprise a mixture of 2 or more molecules (e.g., known or random DNA molecules) of known sizes that allow each nanopore sensor to be characterized based on how the different sizes of DNA translocate through the pore. A “clustering” algorithm, and optionally a data filter is then used to enhance the statistical confidence level of identifying each size population based on event detection by the nanopore. Then, a “calibration curve” (used interchangeably herein as “DNA length model” or “linear calibration curve”) of translocation parameters, such as duration (or dwell time), ionic current attenuation (or amplitude), and area (or amplitude “x” dwell time) versus DNA size can be generated for each pore. This calibration curve generated from the controls is then used to compare to the same event detection parameters measured for the target analytes, which have known specific DNA size and/or concentration, to determine the presence or absence of the analytes. In some embodiments, the quantity (or concentration) of analytes can be determined from the known concentration of the controls.

[0017] The algorithm of the present methods enables a precise calibration of each nanopore sensor based on the measured translocation event signatures from controls to enhance the accuracy and precision of target analyte detection without the need for a quantitative model describing translocation dynamics. In addition, the control mix is generic, and is not specific to any one assay.

[0018] Compared to previous methods of fractional abundance of target analyzed, e.g., as described in PCT Publication Nos.: W02018081178 and WO2017173392, which are hereby incorporated by reference in their entirety, the methods of the present disclosure accommodate assays where more than one target can be quantified, even if such targets are not present in the control mixtures used for calibration during the assay run. Additionally, the present disclosure applies a clustering algorithm of the translocation events of a molecule (e.g., DNA molecule), a data filter, and DNA population shift correction to increase accuracy and precision of the sizing and quantitation determination of the molecule of interest.

[0019] Aspects of the present disclosure include a method of a developing a calibration model to determine an estimate of the size and/or concentration of one or more target analytes in an unknown sample using a nanopore device, comprising: (a) applying a voltage across a nanopore in a nanopore device to generate a detectable electronic signature and to induce translocation of charged analytes through said nanopore, separately, for each of: (i) a control sample mixture comprising: (ia) a first control molecule comprising a polynucleotide with a first molecule length, (ib) a second control molecule comprising a polynucleotide with a second molecule concentration and/or length, and (ic) a first reference molecule comprising a polynucleotide with a first reference molecule length; (b) loading the control sample mixture into a chamber of a nanopore device; (c) applying a voltage across a nanopore in the nanopore device to: translocate each molecule of the control sample mixture, separately, through said nanopore, and generate a detectable electronic signature for each molecule in the control mixture; (d) clustering the detectable electronic signature for each molecule in the control sample mixture by size; and (e) calculating a median event area for each molecule cluster in the control sample mixture and determining a linear correlation between a median area of the detectable electronic signature and size for each molecule in the control sample mixture.

[0020] In some embodiments, the first control molecule further comprises a first control molecule concentration. In some embodiments, the second control molecule further comprises a second control molecule concentration. In some embodiments, the reference molecule further comprises a reference molecule concentration.

[0021] In some embodiments, the method further comprises developing a model based on step (d) and (e) to quantify the size and/or concentration of the one or more target molecules in the unknown sample mixture translocating through the nanopore device.

[0022] In some embodiments, the control mixture further comprises a fourth control molecule comprising a polynucleotide with a fourth molecule concentration and/or length. In some embodiments, the control mixture further comprises a fifth control molecule comprising a polynucleotide with a fifth molecule concentration and/or length. In some embodiments, the control mixture further comprises a: fifth control molecule comprising a polynucleotide with a fifth molecule concentration and/or length; sixth control molecule comprising a polynucleotide with a sixth molecule concentration and/or length seventh control molecule comprising a polynucleotide with a seventh molecule concentration and/or length; and eighth control molecule comprising a polynucleotide with an eighth molecule concentration and/or length.

[0023] In some embodiments, the control mixture further comprises a: ninth control molecule comprising a polynucleotide with a ninth molecule concentration and/or length; tenth control molecule comprising a polynucleotide with a tenth molecule concentration and/or length; eleventh control molecule comprising a polynucleotide with a eleventh molecule concentration and/or length; and twelfth control molecule comprising a polynucleotide with a twelfth molecule concentration and/or length. In some embodiments, the control mixture further comprises a fifth control molecule comprising a polynucleotide with a fifth molecule concentration and/or length; sixth control molecule comprising a polynucleotide with a sixth molecule concentration and/or length; seventh control molecule comprising a polynucleotide with a seventh molecule concentration and/or length; eighth control molecule comprising a polynucleotide with an eighth molecule concentration and/or length, and ninth control molecule comprising a polynucleotide with a ninth molecule concentration and/or length. In some embodiments, the first, second, and third control molecule comprise the same molecule, but a different concentration and/or length.

[0024] In some embodiments, the fourth, fifth, and sixth control molecule comprise the same molecule, but a different concentration and/or length.

[0025] In some embodiments, seventh, eighth, and ninth control molecule comprise the same molecule, but a different concentration and/or length.

[0026] In some embodiments, the size and/or concentration of the first reference molecule is known. In some embodiments, the size and/or concentration of the first control molecule is not known. In some embodiments, the size and/or concentration of the second control molecule is known.

[0027] In some embodiments, the control sample mixture does not comprise the target analyte molecule. In some embodiments, each molecule in the control sample mixture is prepared by nucleic acid amplification.

[0028] In some embodiments, each molecule is DNA or RNA. In some embodiments, one or more target analytes comprises one or more amplicon products. In some embodiments, the amplicon product is a DNA amplicon product or an RNA amplicon product.

[0029] In some embodiments, the size is a base-pair length of each molecule.

[0030] In some embodiments, the electronic detectable signature is a current event signature.

[0031] In another aspect, the present disclosure provides a method of a determining an estimate of the size and/or concentration of one or more target analytes in a mixed unknown sample using a nanopore device, comprising (a) applying a voltage across a nanopore in a nanopore device to generate a detectable electronic signature and to induce translocation of charged analytes through said nanopore, separately, for each of (i) a control sample mixture comprising: (ia) a first control molecule comprising a polynucleotide with a first molecule length, (ib) a second control molecule comprising a polynucleotide with a second molecule concentration and length, and (ic) a first reference molecule comprising a polynucleotide with a first reference molecule length; (ii) an unknown sample mixture comprising: (iia) one or more target molecules, wherein each target molecule comprises a nucleic acid encoding a region of a target transgene; (iib) the first reference molecule; (b) loading the control sample mixture into a chamber of a nanopore device; (c) applying a voltage across a nanopore in the nanopore device to: induce translocation of each molecule, of the control sample mixture, separately, through said nanopore, and generate a detectable electronic signature for each molecule in the control mixture; (d) clustering the detectable electronic signature for each molecule in the control sample mixture by size and/or concentration; (e) calculating a median event area for each molecule cluster in the control sample mixture and determining a linear correlation between a median area of the detectable electronic signature and size for each molecule in the control mixture sample; (f) developing a model based on step (d) and (e) to quantify the size and/or concentration of the one or more target molecules and the first reference molecule in the unknown sample mixture; (g) repeating step (b) through (e) for the unknown sample mixture; and (h) applying the model in step (f) to quantify the size of the one or more target molecules in the unknown sample mixture.

[0032] In some embodiments, the first control molecule further comprises a first molecule concentration. In some embodiments, the second control molecule further comprises a second molecule concentration. In some embodiments, the reference control molecule further comprises a reference molecule concentration. In some embodiments, the method further comprises applying the model in step (f) to quantify the concentration of the one or more target molecules in the unknown sample mixture. In some embodiments, each molecule is DNA or RNA.

[0033] In some embodiments, one or more target molecules comprises one or more amplicon products. In some embodiments, the amplicon product is a DNA amplicon product or an RNA amplicon product.

[0034] In some embodiments, each of the first control molecule, second control molecule, and first reference molecule comprises a size ranging from 5 base pairs to 1000 base pairs.

[0035] In some embodiments, the concentration of each control molecule or reference molecule ranges from 0.01 nM to 10 nM. In some embodiments, the one or more target molecules comprises one or more nucleic acids encoding one or more regions of a transgene associated with a disease or condition.

[0036] In some embodiments, the disease or condition is a virus. In some embodiments, the virus is an influenza virus selected from the group consisting of: parainfluenza virus 1, parainfluenza virus 2, influenza A virus, and influenza B virus.

[0037] In some embodiments, the virus is a coronavirus selected from the group consisting of: coronavirus OC43, coronavirus 229E, coronavirus NL63, coronavirus HKU1, middle east respiratory syndrome beta coronavirus (MERS-CoV), severe acute respiratory syndrome beta coronavirus (SARS-CoV), and SARS-CoV-2.

[0038] In some embodiments, the virus is SARSCoV-2. In some embodiments, the transgene is a SARS-CoV-2 nucleocapsid (N) gene.

[0039] In some embodiments, the one or more regions of the N gene is selected from: the Nl, N2, and N3 region. In some embodiments, the virus is the influenza virus.

[0040] In some embodiments, the influenza A virus is selected from a swine-origin influenza A virus (H1N1), swine-origin influenza A virus (H1N1), influenza A virus subtype H2N2, and influenza A virus subtype H3N2.

[0041] In some embodiments, the transgene is a matrix (Ml) gene. In some embodiments, the transgene is a nonstructural 2 (NS2) gene. In some embodiments, the size is a base-pair length of each molecule. In some embodiments, the electronic detectable signature is a current event signature.

[0042] In some embodiments, the method further comprises applying a population shifting correction algorithm to the reference molecule and the one or more target molecules in the unknown sample to correct for an error in the size and/or concentration of the one or more target molecules, thereby determining an improved estimate of the size and/or concentration of one or more target molecules in said mixed unknown sample.

[0043] In some embodiments, applying the population shifting correction algorithm occurs before applying the model in step (f).

[0044] In some embodiments, the population shifting correction algorithm is carried out with a computer readable medium, comprising instructions, that cause a processor to: find a median log (area) of the first reference molecule in the unknown sample mixture; derive the first reference molecule’s expected log(area) using the model in step (f); calculate a correction factor comprising the equation:

multiply the measured log(area) of all of the detectable electronic signatures for each molecule in the unknown sample mixture and the control sample mixture; and calculate the base pairs of the one or more target molecules by applying the model in step (g) and the corrected log(area).

[0045] In some embodiments, the method further comprises, after step (d), applying a data filtering algorithm to separate out overlapping electrically detectable signature events. In some embodiments, the method further comprises, applying the population shifting correction algorithm and applying a data filtering algorithm to separate out overlapping electrically detectable signature events. In some embodiments, the data filtering algorithm defines a nucleotide base-pair window for each molecule. In some embodiments, the data filtering algorithm is carried out using a computer readable medium, comprising instructions, that cause a processor to apply the data filtering algorithm to the electronically detectable signature events to separate out overlapping electrically detectable signature events from control molecules, one or more target molecules, and the reference molecule.

[0046] In some embodiments, each molecule in the control mixture is prepared by nucleic acid amplification. In some embodiments, each molecule in the unknown sample mixture is prepared by nucleic acid amplification.

[0047] In some embodiments, the method further comprises identifying a concentration of electrically detectable signature events associated with each control molecule and a concentration of electrically detectable signature events associated with said reference molecule. In some embodiments, the method further comprises, identifying a concentration of electrically detectable signature events associated with each target molecule in the unknown sample mixture. In some embodiments, the concentration of electrically detectable signature events associated with each control molecule is identified according to a defined threshold. In some embodiments, the concentration of electrically detectable signature events associated with said reference molecule is identified according to a defined threshold. In some embodiments, the concentration of electrically detectable signature events associated with said target molecule is identified according to a defined threshold.

[0048] In some embodiments, the method further comprises optimizing said threshold to increase accuracy of detection of the control molecules in the control sample mixture, one or more target molecules in the unknown sample mixture, and/or the reference molecule in the control sample mixture and the unknown sample mixture, using a Q-test, a support vector machine, or an expectation maximization algorithm. In some embodiments, the concentration is the absolute concentration of the target analyte in the unknown sample mixture.

[0049] An aspect of the present disclosure provides a method of a detecting the presence or absence of one or more target analytes suspected to be present in a mixed unknown sample using a nanopore device, comprising: (a) applying a voltage across a nanopore in a nanopore device to generate a detectable electronic signature and to induce translocation of charged analytes through said nanopore, separately, for each of (i) a control sample mixture comprising: (ia) a first control molecule comprising a polynucleotide with a first molecule length, (ib) a second control molecule comprising a polynucleotide with a second molecule concentration and length, and (ic) a first reference molecule comprising a polynucleotide with a first reference molecule length; (ii) an unknown sample mixture comprising: (iia) one or more target molecules suspected to be present in the unknown sample mixture, wherein each target molecule comprises a nucleic acid encoding a region of a target transgene; (iib) the first reference molecule; (b) loading the control sample mixture into a chamber of a nanopore device; (c) applying a voltage across a nanopore in the nanopore device to: induce translocation of each molecule, of the control sample mixture, separately, through said nanopore, and generate a detectable electronic signature for each molecule in the control mixture; (d) clustering the detectable electronic signature for each molecule in the control sample mixture by size and/or concentration; (e) calculating a median event area for each molecule cluster in the control sample mixture and determining a linear correlation between a median area of the detectable electronic signature and size for each molecule in the control mixture sample; (f) developing a model based on step (d) and (e) to quantify the size and/or concentration of the one or more target molecules and the first reference molecule in the unknown sample mixture; (g) repeating step (b) through (e) for the unknown sample mixture; and (h) applying the model in step (f) to determine the presence or absence of one or more target analytes in the unknown sample mixture by determining the size and/or concentration of the one or more target molecules in the unknown sample mixture.

DETAILED DESCRIPTION

Calibration Model

[0050] Embodiments described here use control mixtures and analysis algorithms to characterize the translocation behavior for each nanopore sensor prior to detecting the target analyte(s) of interest. In some embodiments, the control mixtures and their associated algorithms provide a method to improve the accuracy of target identification in a multiplexed assay. In other embodiments, the control mixture and their associated algorithms provide a means to determine the relative abundance of one or more target analytes in a mixed unknown sample.

[0051] The algorithms described in the present disclosure enable means to calibrate each sensor based on the measured translocation event signatures in the absence of an available quantitative model describing translocation dynamics. A clustering algorithm of the translocation events is applied along with a data filter to increase accuracy and precision of the sizing and quantitation determination of target analytes in a sample. [0052] Aspects of the present disclosure include methods of a developing a calibration model to determine an estimate of the size and/or concentration of one or more target analytes in a mixed unknown sample using a nanopore device. The methods of the present disclosure enable a precise calibration of each sensor on a nanopore device based on measured translocation event signatures from control mixtures to enhance the accuracy and precision of target analyte detection without the need for a quantitative model describing translocation dynamics. Additionally, the control mixture may be generic, e.g., containing random or known DNA sequences of different lengths and/or concentrations, that do not have to be specific to any one assay.

[0053] In one aspect, the calibration method of the present disclosure includes (a) applying a voltage across a nanopore in a nanopore device to generate a detectable electronic signature and to induce translocation of charged analytes through said nanopore, separately, for each of: (i) a control sample mixture comprising: (ia) a first control molecule comprising a polynucleotide with a first molecule length, (ib) a second control molecule comprising a polynucleotide with a second molecule concentration and/or length, and (ic) a first reference molecule comprising a polynucleotide with a first reference molecule length; (b) loading the control sample mixture into a chamber of a nanopore device; (c) applying a voltage across a nanopore in the nanopore device to: translocate each molecule of the control sample mixture, separately, through said nanopore, and generate a detectable electronic signature for each molecule in the control mixture; (d) clustering the detectable electronic signature for each molecule in the control sample mixture by size; and (e) calculating a median event area for each molecule cluster in the control sample mixture and determining a linear correlation between a median area of the detectable electronic signature and size for each molecule in the control sample mixture.

[0054] In another aspect, the control mixture comprises at least a first control molecule and a first reference molecule (e.g., 2 or more, 3 or more, 4 or more, 5 or more, or 6 or more molecules of differing sizes and/or concentrations).

[0055] A calibration curve (e.g., DNA length model) is created in step (e) based on the linear correlation. The control sample mixture may then be removed, and an unknown sample can be translocated through the nanopore for determination of the size and/or concentration of the target analytes in the unknown sample mixture based on the calibration curve.

[0056] Aspects of the present disclosure further include a method of a determining the size and/or concentration of one or more target analytes in a mixed unknown sample using a nanopore device. The method includes (a) applying a voltage across a nanopore in a nanopore device to generate a detectable electronic signature and to induce translocation of charged analytes through said nanopore, separately, for each of: (i) a control sample mixture comprising: (ia) a first control molecule comprising a polynucleotide with a first molecule length, (ib) a second control molecule comprising a polynucleotide with a second molecule concentration and/or length, and (ic) a first reference molecule comprising a polynucleotide with a first reference molecule length; (ii) an unknown sample mixture comprising: (iia) one or more target molecules, wherein each target molecule comprises a nucleic acid encoding a region of a target transgene; (iib) the first reference molecule; (b) loading the control sample mixture into a chamber of a nanopore device; (c) applying a voltage across a nanopore in the nanopore device to: induce translocation of each molecule, of the control sample mixture, separately, through said nanopore, and generate a detectable electronic signature for each molecule in the control mixture; (d) clustering the detectable electronic signature for each molecule in the control sample mixture by size and/or concentration; (e) calculating a median event area for each molecule cluster in the control sample mixture and determining a linear correlation between a median area of the detectable electronic signature and size for each molecule in the control mixture sample; (f) developing a model (e.g. calibration curve model, DNA length model) based on step (d) and (e) to quantify the size and/or concentration of the one or more target molecules and the first reference molecule in the unknown sample mixture; (g) repeating step (b) through (e) for the unknown sample mixture; and (h) applying the model in step (f) to quantify the size of the one or more target molecules in the unknown sample mixture.

[0057] Aspects of the present disclosure include a method of a detecting the presence or absence of one or more target analytes suspected to be present in a mixed unknown sample using a nanopore device, comprising: (a) applying a voltage across a nanopore in a nanopore device to generate a detectable electronic signature and to induce translocation of charged analytes through said nanopore, separately, for each of: (i) a control sample mixture comprising: (ia) a first control molecule comprising a polynucleotide with a first molecule length, (ib) a second control molecule comprising a polynucleotide with a second molecule concentration and length, and (ic) a first reference molecule comprising a polynucleotide with a first reference molecule length; (ii) an unknown sample mixture comprising: (iia) one or more target molecules suspected to be present in the unknown sample mixture, wherein each target molecule comprises a nucleic acid encoding a region of a target transgene; (iib) the first reference molecule; (b) loading the control sample mixture into a chamber of a nanopore device; (c) applying a voltage across a nanopore in the nanopore device to: induce translocation of each molecule, of the control sample mixture, separately, through said nanopore, and generate a detectable electronic signature for each molecule in the control mixture; (d) clustering the detectable electronic signature for each molecule in the control sample mixture by size and/or concentration; (e) calculating a median event area for each molecule cluster in the control sample mixture and determining a linear correlation between a median area of the detectable electronic signature and size for each molecule in the control mixture sample; (f) developing a model based on step (d) and (e) to quantify the size and/or concentration of the one or more target molecules and the first reference molecule in the unknown sample mixture; (g) repeating step (b) through (e) for the unknown sample mixture; and (h) applying the model in step (f) to determine the presence or absence of one or more target analytes in the unknown sample mixture by determining the size and/or concentration of the one or more target molecules in the unknown sample mixture.

[0058] In some embodiments, if the determined size of the one or more target molecules using the methods described herein corresponds to (e.g., is the same size as) the known target analyte size associated with a disease or condition, then the one or more target analytes are presence in the unknown sample mixture. In contrast, if the determined size of the one or more target molecules using the methods described herein does not correspond to (e.g., is not the same size as) the known target analyte size associated with a disease or condition, then the one or more target analytes are not present in the unknown sample mixture. Non-limiting examples of detection of target analytes are described in Examples 1 and 2 of the present disclosure.

[0059] Additionally, in some embodiments, if the determined concentration of the one or more target molecules using the methods described herein is above a defined threshold (e.g., an amount of target analytes that should be present if the sample contained a target analyte associated with a disease or condition, e.g., based on the volume of the sample collected), then the one or more target analytes are present in the unknown sample mixture. In contrast, if the determined concentration of the one or more target molecules using the methods described herein is below a defined threshold (e.g., an amount of target analytes that should be present if the sample contained a target analyte associated with a disease or condition, e.g., based on the volume of the sample collected), then the one or more target analytes is not present in the unknown sample mixture.

[0060] In some embodiments, the event signature (e.g., translocation event signature, current event signature) can include an event signature selected from an event duration and current event signature, a maximum 6G, a median 6G, an average 6G, a standard deviation of the event signature, a mean or median of the noise power of the event below 50 Hz, a unique pattern in said event signature, an area of an event, a median amplitude, or any combination thereof.

Clustering Algorithm

[0061] The methods of the present disclosure include providing a clustering algorithm to cluster the detectable electronic signatures (e.g. translocation current event signatures) for each molecule in the control mixture and the unknown sample mixture.

[0062] Raw translocation data of controls and/or unknown sample are first grouped by a clustering algorithm and filtered to increase statistical confidence. The clustering algorithm applied to the raw translocation event data outputted after each molecule is translocated through the nanopore device (raw electronic signatures; FIG. 2, and top and bottom left-hand plots of FIG. 7-8). The clustering algorithm clusters and labels each of the raw current event signatures from the respective molecules in the control mixture and/or the unknown sample mixture, as shown in FIG. 3 and the top and bottom right-hand plots of FIGs. 7-8.

[0063] For example, as shown in FIG. 2 “Control Sample”, the method includes clustering the raw translocation current events of a first control molecule, a second control molecule, and a reference molecule, and labeling the clusters (FIG. 3), where the first control molecule is a DNA molecule with a length of 197 bp, a second control molecule is a DNA molecule with a length of 356 bp, and a first reference molecule is a DNA molecule with a length of 80 bp. In this example, since there are 3 different DNAs in the mixture (80bp, 197bp, 356bp). Thus, the clustering algorithm is used to cluster the events in FIG. 2 and the results of the clustering are shown in FIG. 3.

[0064] Clustering is a Machine Learning technique that involves the grouping of data points. Given a set of data points, a clustering algorithm can be used to classify each data point into a specific group. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many fields. In nanopore data analysis of the present methods, the method includes applying a clustering analysis based on the translocation event features, such as duration, current amplitude attenuation, and area (e.g. duration x amplitude), in order to group the data points.

[0065] Clustering algorithms that may be used in the present methods are described in PCT Application No.: WO2018081178 Al, which is hereby incorporated by reference in its entirety. In some embodiments, the clustering algorithm is a EMGM Method (Expectation Maximization Algorithm for Gaussian Mixtures) algorithm.

[0066] In some embodiments, clustering methods are applied to separate the DNA lengths of the control mixture, and optionally the unknown sample mixture when the unknown sample mixture is translocated through a nanopore. Each event is tagged as a target event, a reference molecule event, or control molecule event.

[0067] In some embodiments, the clustering method is a maximum likelihood method applied to parameterized models of the distributions of one or more event parameters.

Iterative application of maximum likelihood estimation to control sets results in fitted model parameters, with one set of distributions associated with target analyte type another set of distributions associated with the reference analyte type, and another set of distributions associated with the control molecule type.

[0068] In some embodiments, a log likelihood function is used as the metric for tracking progress in iterations of the algorithm, which recursively updates the membership assignment of each event in control data and improves the fit of the distributions to the data. In some embodiments, the data are modeled using mixtures of parameterized Gaussian distributions. Methods that use finite mixture models, including Gaussian mixture models, to characterize numerical data are well characterized in statistics and applied mathematics (Hand, David J., Heikki Mannila, and Padhraic Smyth. Principles of data mining. MIT press, 2001).

[0069] In some embodiments, given a Gaussian Mixture (GM) model, the method maximizes the likelihood function with respect to the parameters comprising the means and covariance of the components and the mixing coefficients. Since there is no closed-form solution for the log likelihood, the mode parameters and weights for assigning data to modes are iteratively computed using the Expectation Maximization (EM) technique (CM. Bishop, Pattern Recognition and Machine Learning, Springer, 2006).

[0070] The method of applying an EM algorithm applied to GM models to nanopore data for the purpose of determining size and/or concentration is termed EMGM. Like the Q- test method, the EMGM method uses prior knowledge about one or more nanopore event signatures that can be used to distinguish the target events from the reference events.

[0071] As stated, the target population may be represented by a single distribution, or more than one distribution. Likewise, the reference population may be represented by a single distribution, or more than one distribution. The target and reference distribution(s) are established by applying the algorithm to one or more isolated controls and one or more control mixtures. [0072] Subsequently, after the target distribution(s) are established, an event in an unknown mixture is tagged as a target event if it is associated with the modeled target distribution(s).

[0073] By example, a total of three Gaussian distributions could fit the entire data set in a 1 : 1 control mixture, with one mode associated with the target type and two modes associated with the reference type.

[0074] The algorithm requires only one control mixture for application of the EMGM. Subsequently, the resulting model can be applied to unknown mixtures.

The support vector machine (SVM) Method

[0075] In some embodiments, machine learning is used to identify the set of features and feature criterion for tagging each event as a target analyte event or a reference analyte event. In some embodiments, support vector machines are used to classify events as target or reference analytes. Machine learning methods used for clustering of population events in a nanopore device is described in PCT Application Publication No.: WO2018081178 Al, which is hereby incorporated by reference in its entirety.

[0076] In some embodiments, developing a support vector machine workflow has the follows the steps: 1) load nanopore data, 2) select nanopore event features to differentiate events, 3) model training and testing using controls, 4) data calibration using control mixture, 5) prediction of the size and/or concentration unknown target analytes in an unknown sample. In some embodiments, an already developed and reduced support vector machine workflow is implemented for the calibration models for each assay.

[0077] In some embodiments, machine learning tools are applied to automate the selection of the criterion, including selection of the event features, the form of the inequalities (linear and/or nonlinear) and the threshold values q used in the inequalities. In some embodiments, Support Vector Machines (SVMs), a supervised machine learning method that solves classification problems, are implemented to generate the tagging criterion. References on SVMs include: Cortes, C. & Vapnik, V. Machine Learning (1995) 20: 273; and Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992). “A training algorithm for optimal margin classifiers,” Proceedings of the fifth annual workshop on Computational learning theory, each of which is incorporated by reference in its entirety.

[0078] An assay based generalized model generated from SVM including common decision boundaries and common calibration curves can be applied to unknown mixtures based on the calibration curve generated from the calibration mixtures. [0079] Other data mining methods including decision tress, neural networks, Native Bayer, Logistic regress, K-nearest neighbor and boosting are also claimed as applicable methods for nanopore data.

Data Filtering

[0080] In some embodiments, the method includes applying a data filter to improve accuracy and precision of sizing and quantitation of the target analytes. For example, when populations are not well separated after clustering, e.g., when there is some overlap among the different event populations of molecules (e.g., first control molecule, second control molecule, first reference molecule, etc.), there is low statistical confidence as to which event population group the overlap data points belong to. Since the base pair predictions are known in the unknown sample and the target analyte DNA lengths within the unknown sample mixture are known, a base pair filter (e.g., data filter) can be applied to differentiate the overlapped regions and accurately separate the clusters of population events.

[0081] For example, as shown in FIG. 6, the “unknown sample” plot on the left shows the original, raw translocation event data. The separation between is 197bp and 356bp or 80 bp and 197 bp is not clear, as shown by the solid line boxes (overlap region between DNA length populations). Therefore, an inaccurate prediction of a 197bp event as a 356bp event may occur for those regions. The present disclosure provides a data filtering algorithm to filter out the events that lay in the region where the DNA molecules may have overlaps and to provide clear separation among the various molecules of differing sizes and/or concentrations, as shown in the plot on the right after application of the data filter.

[0082] Data filtering can be applied after clustering or at the same time of clustering the translocation signatures of each of the molecules within the control mixture. In some embodiments, the data filtering algorithm defines a nucleotide base-pair window for each molecule. In some embodiments, the data filtering algorithm is carried out using a computer readable medium, comprising instructions, that cause a processor to apply the data filtering algorithm to the clustered electronically detectable signature events to separate out overlapping electrically detectable signature events from control molecules, one or more target molecules, and the reference molecule.

Calibration Model

[0083] In some embodiments, the method uses a polynucleotide (e.g., DNA)-length or concentration model (e.g., used interchangeably herein as “calibration curve model” or “calibration curve” or “calibration model”) for developing a calibration curve that can be used to determine the DNA length and/concentration of one or more target analytes in an unknown sample mixture. The calibration-curve is generated based on the clustered translocation event data of the control mixtures.

[0084] For example, the DNA-length model is based on a DNA marker ladder, DNA length, and/or concentration of molecules (e.g., DNA). In some embodiments, a DNA ladder comprises one or more pooled polynucleotides (e.g., DNA) of various sizes (e.g., length of DNA molecules in the control sample mixtures), and optionally with various concentrations. In some embodiments, the DNA length and/or DNA concentration model (calibration curves) generated from the control sample mixtures are then applied on the unknown sample to perform a DNA size and concentration analysis.

[0085] In some embodiments, a DNA length-based model is built based on the ‘fingerprint’ of the polynucleotides (e.g., dwell time, amplitude, area, and the like). [0086] In a non-limiting example, as shown in FIG. 2 “Control Sample”, the method includes clustering the raw translocation current events and labeling the clusters (FIG. 3). In this example, since there are 3 different DNAs in the mixture (80bp, 197bp, 356bp), the clustering algorithm is used to cluster the events in FIG. 2 and the results of the clustering are shown in FIG. 3: The terms ‘Target event’ and ‘reference event’ are determined by the DNA length, e.g. the reference molecule, in this example, is 80bp, and the target event for COVID- 19 is 197bp, and a flu event is 356bp. For each event (each dot as shown FIGs. 2-3), the tag of the event (80bp, 197bp or 356bp) is known the feature of each event (FIG. 1), therefore, and the median amplitude of each cluster (80bp, 197bp and 356bp) can be calculated. The median event area is the product of median amplitude and duration (or dwell time). The resulting calculation of the median amplitude of each cluster includes 3 data points (80bp, 80bp_median_amplitude), (197bp, 197bp_median_amplitude) and (356bp, 356bp_median_amplitude). These 3 median amplitude data points are plotted after taking the log 10 of both the length/size (bp) and median amplitude, and the 3 data points are then fitted with a linear correlation (FIG. 4, see data points surrounding dashed circle). Therefore, a linear correlation can be determined between the amplitude of event and base pair (length/size) for an event.

[0087] In a non-limiting example where the calibration curve model is used in an assay, the control mixture comprising the control molecules and reference molecules are first introduced to the nanopores for detection. The control mixture is then removed, and the target analytes and reference molecule (unknown sample mixture) are introduced to the same set of nanopores for detection. The control mixture comprises a mixture of 2 or more DNA molecules of known sizes (or lengths of random DNA sequences) that allow each nanopore sensor to be characterized based on how the different sizes of DNA translocate through the pore. The “clustering” algorithm, and optionally a data filter is used to enhance the statistical confidence level of identifying each size population based on event detection by the nanopore. Then, the calibration curve” of translocation parameters, such as duration (or dwell time), ionic current attenuation (or amplitude), and area (or amplitude x dwell time) versus DNA size can be generated for each pore. The median event area is the product of median amplitude and duration (or dwell time). This calibration curve generated from the control mixture is then used to compare to the same event detection parameters measured for the target analytes, which have known specific DNA size, to determine the presence or absence of the analytes. In some embodiments, the quantity (or concentration) of analytes can be determined from the known concentration of the control mixture.

Population Shift Correction

[0088] In some embodiments, the method further comprises aligning the size of the polynucleotide by shifting the density peaks, using the known calibrant/reference molecule present in an unknown sample.

[0089] The present disclosure includes a population shifting correction algorithm using the feature of reference molecule in the sample as described in FIGs. 5A-5B. In some embodiments, the population has some shifting between different runs for the same nanopore. It is likely caused by nanopore size change or nanopore membrane surface chemistry change. Since there are reference analytes in the unknown sample, the method, in some embodiments, include applying a shifting correction algorithm to align the base pairs, which is shown in FIGs. 5A-5B. In some embodiments, the population shift correction method comprises: identifying the calibrant’s (e.g., reference analyte), median log(area) in the unknown sample; deriving the calibrant’s (reference molecule’s) expected log(area) using a DNA length model by deriving the area value from the linear fit calibration curve of FIG. 4; calculating a correction factor; multiplying the measured log(area) of all the events by the correction factor; and calculating the base pairs of the unknown sample using the length model and corrected log(area). The DNA length model is applied on the unknown sample, which has calibrant (reference) molecules. The predicted DNA length of reference molecule/calibrant is then aligned to its actual DNA length. This step will correct the population shifting when the density peaks are incorrected shifted or offset to the left or right and thus, by applying the population shift correction algorithm, the size of the target analytes and reference analytes can be accurately determined.

[0090] By applying the population shift correction algorithm to the reference molecule and the target molecules in the unknown sample, the accuracy of the quantification of the size and/or concentration of the target molecules is improved, since the population shift algorithm corrects for an error in the size and/or concentration of the one or more target molecules, thereby determining an improved estimate of the size and/or concentration of one or more target molecules in said unknown sample.

[0091] In some embodiments, the population shift correction algorithm is applied before or after applying the model in step (f). In some embodiments, the population shift correction algorithm is carried out with a computer readable medium, comprising instructions, that cause a processor to: find a median log (area) of the first reference molecule in the unknown sample mixture; derive the first reference molecule’s expected log(area) using the model in step (f); calculate a correction factor comprising the equation:

Control Mixtures

[0092] In some aspects, the method includes loading a nanopore device with one or more control mixtures. A combination of using control mixtures (e.g., mixtures of molecules of different sizes of known concentrations) and optionally a data filter is used to enhance statistical confidence level in creating a high fidelity calibration curve for improving the accuracy and precision of target analyte detection (presence or absence of targets’ determination of target size and concentration). Also, the control mixture is generic and can be applied to many different assays.

[0093] In some embodiments, one or more control mixtures comprises two or molecules (e.g., polynucleotides) with different base pairs. In some embodiments, the control mixtures comprise two or more polynucleotides (e.g., DNA) with known lengths and at various concentrations. For example, as shown in slide 3 of Appendix A, three DNA lengths are shown in the control sample: 80 base pairs, 197 base pairs, and 356 base pairs. [0094] In some embodiments, the control mixtures are chosen for multiplexing purposes. For example, 1 control molecule and 1 reference molecule allows calibration of a 2-plex assay, whereas 2 control molecules and 1 reference molecules allow calibration of a 3-plex assay.

[0095] In some embodiments, the control mixture comprises at least a first control molecule with a known size (e.g., base-pair length) and/or concentration; and at least a first reference molecule comprising a known size and/or concentration. In some embodiments, the control mixture comprises a first control molecule comprising a known size and/or concentration, a second control molecule comprising a known size and/or concentration, and a first reference molecule comprising a known size and/or concentration.

[0096] In some embodiments, the control mixture contains a multitude of sizes of DNA (e.g. 3 in the examples presented herein). The control molecules or the reference molecules do not have to be the exact sizes as the target analytes within the unknown sample. In some embodiments, the control molecule or reference molecule can span the size range of target analytes of interest. In some embodiments, each molecule within the control mixture comprises a different length.

[0097] In some embodiments, the control mixture further comprises a fourth control molecule comprising a polynucleotide with at least a fourth molecule concentration and/or length. In some embodiments, the control mixture further comprises at least a fifth control molecule comprising a polynucleotide with a fifth molecule concentration and/or length. In some embodiments, the control mixture further comprises at least a sixth control molecule comprising a polynucleotide with a sixth molecule concentration and/or length; at least a seventh control molecule comprising a polynucleotide with a seventh molecule concentration and/or length; and at least an eighth control molecule comprising a polynucleotide with an eighth molecule concentration and/or length. In some embodiments, the control mixture further comprises at least a ninth control molecule comprising a polynucleotide with a ninth molecule concentration and/or length; at least a tenth control molecule comprising a polynucleotide with a tenth molecule concentration and/or length; at least an eleventh control molecule comprising a polynucleotide with a eleventh molecule concentration and/or length; and at least a twelfth control molecule comprising a polynucleotide with a twelfth molecule concentration and/or length.

[0098] In some embodiments, the control mixture further comprises a fifth control molecule comprising a polynucleotide with a fifth molecule concentration and/or length; sixth control molecule comprising a polynucleotide with a sixth molecule concentration and/or length; seventh control molecule comprising a polynucleotide with a seventh molecule concentration and/or length; eighth control molecule comprising a polynucleotide with an eighth molecule concentration and/or length, and ninth control molecule comprising a polynucleotide with a ninth molecule concentration and/or length.

[0099] In some embodiments, the first, second, and third control molecule comprise the same molecule, but a different concentration and/or length. In some embodiments, the fourth, fifth, and sixth control molecule comprise the same molecule, but a different concentration and/or length. In some embodiments, the seventh, eighth, and ninth control molecule comprise the same molecule, but a different concentration and/or length.

[0100] In some embodiments, the size and/or concentration of the first reference molecule is known. In some embodiments, the size and/or concentration of the first control molecule is not known. In some embodiments, the size and/or concentration of the first control molecule and/or second control molecule is known. In some embodiments, the control sample mixture does not comprise the target analyte molecule.

[0101] In some embodiments, each molecule in the control sample mixture is prepared by nucleic acid amplification.

[0102] In some embodiments, each molecule within the control sample mixture is DNA or RNA. In some embodiments, each molecule within the control sample mixture is an amplicon product.

[0103] In some embodiments, the concentration of each control molecule or reference molecule ranges from 0.01 nM to 10 nM. In some embodiments, the concentration of at least a first control molecule is the same as or is different from the concentration of at least a second control molecule. In some embodiments, the concentration of at least a first control molecule is different from the concentration of at least a reference molecule. In some embodiments, the concentration of at least a first control molecule and at least a second control molecule is the same as or different from the concentration of at least a first reference molecule.

[0104] In some embodiments, each of the control molecules and reference molecules within the control sample mixture comprise known sizes (e.g., lengths, such as base-pair length) ranging from 5 bp’s to 1000 bp’s (such as 5 bp to 10 bp, 5 bp to 500 bp, 10 bp to 400 bp, 40 bp to 500 bp, 80 bp to 400 bp, 5 bp to 500 bp, 60 bp to 500 bp, and the like). In some embodiments, the molecules within the control sample mixture differ in size/length by at least 10 nucleotides, at least 20 nucleotides, at least 50 nucleotides, at least 100 nucleotides, at least 150 nucleotides or at least 200 nucleotides. [0105] In some embodiments, the control mixture comprises polynucleotides of known sizes. In certain embodiments, each molecule within the control mixture comprises sequences unrelated to the unknown target analyte, to calibrate size and quantity. In some embodiments, the control mixture is non-specific to any one assay and can apply to many different assays using the calibration curve of the present methods.

[0106] For example, for detection of an unknown sample suspect to contain target analytes associated with SARS-CoV-2, a control mixture can include a DNA sample with various concentration and lengths. In one non-limiting example, the control mixture includes 3nM 80bp DNA+ InM 197bp DNA+ 1.5nM 356bp DNA.

[0107] In another example where next generation sequencing (e.g., NGS long read) may be used in conjunction with the methods of the present disclosure, the control mixture can include: control 1 : 0.0313nM lOOObp DNA + 0.0313nM 2821bp DNA + 0.0313nM 11002bp DNA + 0.0313nM 34579bp DNA control 2: 0.0313nM lOOObp DNA + 0.0625nM 2821bp DNA + 0.0625nM 11002bp DNA + 0.0625nM 34579bp DNA; and control 3: 0.0313nM lOOObp DNA + 0.125nM 2821bp DNA + 0.125nM 11002bp DNA + 0.125nM 34579bp DNA.

[0108] In another example where next generation sequencing (e.g., NGS short read) may be used in conjunction with the methods of the present disclosure, the control mixture can include: control 1 : 0.5nM 125bp DNA + 2nM 300bp DNA + 2nM 1239bp DNA control 2: 1.25nM 125bp DNA + 1.25nM 300bp DNA + 2nM 1239bp DNA control 3: 2.0 nM 125bp DNA + 0.5nM 300bp DNA + 2nM 1239bp DNA.

Unknown Sample Mixtures

[0109] The unknown sample mixtures of the present methods include one or more target analytes suspected to be in a sample, and at least one reference molecule. As described above, the reference molecule in the unknown samples can include a calibrant molecule for size alignment for correcting translocation shift and duration.

[0110] In some embodiments, the size and/or concentration of the reference molecule is known. In some embodiments, the reference molecule in the unknown sample mixture is prepared by nucleic acid amplification. [OHl] In some embodiments, each molecule (e.g. one or more target analytes and/or one or more reference molecules) within the unknown sample mixture is DNA or RNA. In some embodiments, each molecule within the unknown sample mixture is an amplicon product. In some embodiments, the reference molecule is DNA or RNA. In some embodiments, the reference molecule is an amplicon product. In some embodiments, the reference molecule in the unknown sample mixture is prepared by nucleic acid amplification. In some embodiments, the target analyte molecule is DNA or RNA. In some embodiments, the target analyte molecule is an amplicon product. In some embodiments, the target analyte molecule in the unknown sample mixture is prepared by nucleic acid amplification.

[0112] In some embodiments, said reference analyte and said target analyte are discriminated by length.

[0113] In some embodiments, the concentration of the reference molecule ranges from 0.01 nM to 10 nM. In some embodiments, the concentration of at least a first reference molecule is different from the concentration of one or more target analytes.

[0114] In some embodiments, reference molecule within the unknown sample mixture comprises a known size (e.g., lengths, such as base-pair length) ranging from 5 bp’s to 1000 bp’s (such as 5 bp to 10 bp, 5 bp to 500 bp, 10 bp to 400 bp, 40 bp to 500 bp, 80 bp to 400 bp, 5 bp to 500 bp, 60 bp to 500 bp, and the like).

[0115] In some embodiments, the unknown sample mixture comprises polynucleotides of known sizes. For example the reference molecule comprises a known size and/or concentration. In some embodiments, the target analytes suspected to be in the unknown sample mixture comprises a known size and/or concentration that is associated with a disease or condition and if the methods of the present disclosure determine that same size of the target analytes, the unknown sample mixture is confirmed positive for containing the target analyte associated with a disease or condition. For example, if the size and/or concentration determined by the method of the present disclosure is the same as the expected size and/or concentration of the target analytes associated with a disease or condition, then the sample is positive for that disease or condition.

Nanopore Devices

[0116] A nanopore device, as provided, includes at least a pore that forms an opening in a structure separating an interior space of the device into two volumes, and at least a sensor configured to identify objects (for example, by detecting changes in parameters indicative of objects) passing through the pore. Nanopore devices used for the methods described herein are also disclosed in PCT Publication No. WO/2013/012881, and U.S. Patent Nos.: 9,983,191 and 10,670,590, which are hereby incorporated by reference in their entirety.

[0117] In some aspects, the method includes loading a nanopore device with one or more control mixtures. In some embodiments, one or more control mixtures comprises two or molecules (e.g., polynucleotides) with different base pairs. In some embodiments, the control mixtures comprise two or more polynucleotides (e.g., DNA) with known lengths and at various concentrations. For example, as shown in FIG. 3, three DNA lengths are shown in the control sample: 80 base pairs, 197 base pairs, and 356 base pairs. In some embodiments, the nanopore detects the amplified DNA products of the transgene of interest.

[0118] In one aspect, the present methods of detecting one or more diseases or conditions using a nanopore device. In some embodiments, said event signature comprises an electrical signal induced by translocation of said reference analyte through said nanopore.

[0119] The pore(s) in the nanopore device are of a nano scale or micro scale. In one aspect, each pore has a size that allows a small or large molecule or microorganism to pass. In one aspect, each pore is at least about 1 nm in diameter. Alternatively, each pore is at least about 2 nm, 3 nm, 4 nm, 5 nm, 6 nm, 7 nm, 8 nm, 9 nm, 10 nm, 11 nm, 12 nm, 13 nm, 14 nm, 15 nm, 16 nm, 17 nm, 18 nm, 19 nm, 20 nm, 25 nm, 30 nm, 35 nm, 40 nm, 45 nm, 50 nm, 60 nm, 70 nm, 80 nm, 90 nm, or 100 nm in diameter.

[0120] In one aspect, the pore is no more than about 100 nm in diameter. Alternatively, the pore is no more than about 95 nm, 90 nm, 85 nm, 80 nm, 75 nm, 70 nm, 65 nm, 60 nm, 55 nm, 50 nm, 45 nm, 40 nm, 35 nm, 30 nm, 25 nm, 20 nm, 15 nm, or 10 nm in diameter. [0121] In one aspect, the pore has a diameter that is between about 1 nm and about 100 nm, or alternatively between about 2 nm and about 80 nm, or between about 3 nm and about 70 nm, or between about 4 nm and about 60 nm, or between about 5 nm and about 50 nm, or between about 10 nm and about 40 nm, or between about 15 nm and about 30 nm.

[0122] In some aspects, the nanopore device further includes means to move a polynucleotide molecule across the pore and/or means to identify objects that pass through the pore. Further details are provided below, described in the context of a two-pore device. [0123] In some embodiments, each nanopore (one or more, two or more, three or more nanopores) has a depth (i.e., a length of the pore extending between two adjacent volumes). In one aspect, each pore has a depth that is least about 0.3 nm. Alternatively, each pore has a depth that is at least about 0.6 nm, 1 nm, 2 nm, 3 nm, 4 nm, 5 nm, 6 nm, 7 nm, 8 nm, 9 nm, 10 nm, 11 nm, 12 nm, 13 nm, 14 nm, 15 nm, 16 nm, 17 nm, 18 nm, 19 nm, 20 nm, 25 nm, 30 nm, 35 nm, 40 nm, 45 nm, 50 nm, 60 nm, 70 nm, 80 nm, or 90 nm. [0124] In one aspect, each pore has a depth that is no more than about 100 nm. Alternatively, the depth is no more than about 95 nm, 90 nm, 85 nm, 80 nm, 75 nm, 70 nm, 65 nm, 60 nm, 55 nm, 50 nm, 45 nm, 40 nm, 35 nm, 30 nm, 25 nm, 20 nm, 15 nm, or 10 nm. [0125] In one aspect, the pore has a depth that is between about 1 nm and about 100 nm, or alternatively, between about 2 nm and about 80 nm, or between about 3 nm and about 70 nm, or between about 4 nm and about 60 nm, or between about 5 nm and about 50 nm, or between about 10 nm and about 40 nm, or between about 15 nm and about 30 nm.

[0126] In some embodiments, the nanopore extends through a membrane. In one example, the pore may be a protein channel inserted in a lipid bilayer membrane. Alternatively, it may be engineered by drilling, etching, or otherwise forming the pore through a solid-state substrate such as silicon dioxide, silicon nitride, grapheme, or layers formed of combinations of these or other materials. The length of the nanopore is sufficiently large so as to form a channel connecting two otherwise separate volumes.

[0127] In some such aspects, the length of each pore is greater than 100 nm, 200 nm, 300 nm, 400 nm, 500 nm, 600 nm, 700 nm, 800 nm, or 900 nm. In some aspects, the length of each pore is no more than 2000 nm or 1000 nm. For lengths greater than 200 nm, the nanopore is commonly referred to as a “nanochannel”, though it may also still be referred to as a “nanopore”.

[0128] In one aspect, when the nanopore device comprises two pores in two-pore devices, the pores are spaced apart at a distance that is between about 10 nm and about 1000 nm. In some aspects, the distance between the pores is greater than 1000 nm, 2000 nm, 3000 nm, 4000 nm, 5000 nm, 6000 nm, 7000 nm, 8000 nm, or 9000 nm. In some aspects, the pores are spaced no more than 30000 nm, 20000 nm, or 10000 nm apart. In one aspect, the distance is at least about 10 nm, or alternatively, at least about 20 nm, 30 nm, 40 nm, 50 nm, 60 nm, 70 nm, 80 nm, 90 nm, 100 nm, 150 nm, 200 nm, 250 nm, or 300 nm. In another aspect, the distance is no more than about 1000 nm, 900 nm, 800 nm, 700 nm, 600 nm, 500 nm, 400 nm, 300 nm, 250 nm, 200 nm, 150 nm, or 100 nm.

[0129] In yet another aspect, the distance between the pores is between about 20 nm and about 800 nm, between about 30 nm and about 700 nm, between about 40 nm and about 500 nm, or between about 50 nm and about 300 nm.

[0130] The two pores can be arranged in any position so long as they allow fluid communication between the chambers and have the prescribed size and distance between them. In one aspect, the pores are placed so that there is no direct blockage between them. Still, in one aspect, the pores are substantially coaxial. [0131] Compared to a single-pore nanopore device, a two-pore device can be configured to provide control of speed and direction of the movement of the molecules within the control sample mixture, or unknown sample mixture across the pores.

[0132] In one embodiment, the nanopore device includes a plurality of chambers, each chamber in communication with an adjacent chamber through at least one pore. Among these pores, two pores, namely a first pore and a second pore, are placed so as to allow at least a portion of a target polynucleotide to move out of the first pore and into the second pore. Further, the device includes a sensor at each pore capable of identifying the target polynucleotide during the movement. In one aspect, the identification entails identifying individual components of the target polynucleotide. In another aspect, the identification entails identifying payload molecules bound to the target polynucleotide. When a single sensor is employed, the single sensor may include two electrodes placed at both ends of a pore to measure an ionic current across the pore. In another embodiment, the single sensor comprises a component other than electrodes.

[0133] In one aspect, the device includes three chambers connected through two pores. Devices with more than three chambers can be readily designed to include one or more additional chambers on either side of a three-chamber device, or between any two of the three chambers. Likewise, more than two pores can be included in the device to connect the chambers.

[0134] In one aspect, there can be two or more pores between two adjacent chambers, to allow multiple molecules to move from one chamber to the next simultaneously. Such a multi-pore design can enhance throughput of target polynucleotide analysis in the device. For multiplexing, one chamber could have a one type of target polynucleotide, and another chamber could have another target polynucleotide type.

[0135] In some aspects, the device further includes means to move a target polynucleotide from one chamber to another. In one aspect, the movement results in loading the target polynucleotide (e.g., the amplification product or amplicon comprising the target sequence) across both the first pore and the second pore at the same time. In another aspect, the means further enables the movement of the target polynucleotide, through both pores, in the same direction.

[0136] For instance, in a three-chamber two-pore device (a “two-pore” device), each of the chambers can contain an electrode for connecting to a power supply so that a separate voltage can be applied across each of the pores between the chambers. [0137] In accordance with one embodiment of the present disclosure, provided is a device comprising an upper chamber, a middle chamber and a lower chamber, wherein the upper chamber is in communication with the middle chamber through a first pore, and the middle chamber is in communication with the lower chamber through a second pore. Such a device may have any of the dimensions or other characteristics previously disclosed in U.S. Publ. No. 2013-0233709, entitled Dual- Pore Device, which is herein incorporated by reference in its entirety.

[0138] In one aspect, each pore is at least about 1 nm in diameter. Alternatively, each pore is at least about 2 nm, 3 nm, 4 nm, 5nm, 6 nm, 7 nm, 8 nm, 9 nm, 10 nm, 11 nm, 12 nm, 13 nm, 14 nm, 15 nm, 16 nm, 17 nm, 18 nm, 19 nm, 20 nm, 25 nm, 30 nm, 35 nm, 40 nm, 45 nm, 50 nm, 60 nm, 70 nm, 80 nm, 90 nm, or 100 nm in diameter.

[0139] In one aspect, each pore is no more than about 100 nm in diameter. Alternatively, the pore is no more than about 95 nm, 90 nm, 85 nm, 80 nm, 75 nm, 70 nm, 65 nm, 60 nm, 55 nm, 50 nm, 45 nm, 40 nm, 35 nm, 30 nm, 25 nm, 20 nm, 15 nm, or 10 nm in diameter. [0140] In one aspect, the pore has a diameter that is between about 1 nm and about 100 nm, or alternatively between about 2 nm and about 80 nm, or between about 3 nm and about 70 nm, or between about 4 nm and about 60 nm, or between about 5 nm and about 50 nm, or between about 10 nm and about 40 nm, or between about 15 nm and about 30 nm.

[0141] In some aspects, the pore has a substantially round shape. “Substantially round”, as used here, refers to a shape that is at least about 80 or 90% in the form of a cylinder. In some embodiments, the pore is square, rectangular, triangular, oval, or hexangular in shape. [0142] In one aspect, the pore has a depth that is between about 1 nm and about 10,000 nm, or alternatively, between about 2 nm and about 9,000 nm, or between about 3 nm and about 8,000 nm, etc.

[0143] In some aspects, the nanopore extends through a membrane. For example, the pore may be a protein channel inserted in a lipid bilayer membrane or it may be engineered by drilling, etching, or otherwise forming the pore through a solid-state substrate such as silicon dioxide, silicon nitride, grapheme, or layers formed of combinations of these or other materials. Nanopores are sized to permit passage through the pore of the scaffold:fusion:payload, or the product of this molecule following enzyme activity. In other embodiments, temporary blockage of the pore may be desirable for discrimination of molecule types.

[0144] In some aspects, the length or depth of the nanopore is sufficiently large so as to form a channel connecting two otherwise separate volumes. In some such aspects, the depth of each pore is greater than 100 nm, 200 nm, 300 nm, 400 nm, 500 nm, 600 nm, 700 nm, 800 nm, or 900 nm. In some aspects, the depth of each pore is no more than 2000 nm or 1000 nm.

[0145] In one aspect, the pores are spaced apart at a distance that is between about 10 nm and about 1000 nm. In some aspects, the distance between the pores is greater than 1000 nm, 2000 nm, 3000 nm, 4000 nm, 5000 nm, 6000 nm, 7000 nm, 8000 nm, or 9000 nm. In some aspects, the pores are spaced no more than 30000 nm, 20000 nm, or 10000 nm apart. In one aspect, the distance is at least about 10 nm, or alternatively, at least about 20 nm, 30 nm, 40 nm, 50 nm, 60 nm, 70 nm, 80 nm, 90 nm, 100 nm, 150 nm, 200 nm, 250 nm, or 300 nm. In another aspect, the distance is no more than about 1000 nm, 900 nm, 800 nm, 700 nm, 600 nm, 500 nm, 400 nm, 300 nm, 250 nm, 200 nm, 150 nm, or 100 nm.

[0146] In yet another aspect, the distance between the pores is between about 20 nm and about 800 nm, between about 30 nm and about 700 nm, between about 40 nm and about 500 nm, or between about 50 nm and about 300 nm.

[0147] The two pores can be arranged in any position so long as they allow fluid communication between the chambers and have the prescribed size and distance between them. In one aspect, the pores are placed so that there is no direct blockage between them. Still, in one aspect, the pores are substantially coaxial.

[0148] In one aspect, the device has electrodes in the chambers connected to one or more power supplies. In some aspects, the power supply includes a voltage-clamp or a patchclamp, which can supply a voltage across each pore and measure the current through each pore independently. In this respect, the power supply and the electrode configuration can set the middle chamber to a common ground for both power supplies. In one aspect, the power supply or supplies are configured to apply a first voltage VI between the upper chamber (Chamber A) and the middle chamber (Chamber B), and a second voltage V2 between the middle chamber and the lower chamber (Chamber C).

[0149] In some aspects, the first voltage VI and the second voltage V2 are independently adjustable. In one aspect, the middle chamber is adjusted to be a ground relative to the two voltages. In one aspect, the middle chamber comprises a medium for providing conductance between each of the pores and the electrode in the middle chamber. In one aspect, the middle chamber includes a medium for providing a resistance between each of the pores and the electrode in the middle chamber. Keeping such a resistance sufficiently small relative to the nanopore resistances is useful for decoupling the two voltages and currents across the pores, which is helpful for the independent adjustment of the voltages. [0150] Adjustment of the voltages can be used to control the movement of charged particles in the chambers. For instance, when both voltages are set in the same polarity, a properly charged particle can be moved from the upper chamber to the middle chamber and to the lower chamber, or the other way around, sequentially. In some aspects, when the two voltages are set to opposite polarity, a charged particle can be moved from either the upper or the lower chamber to the middle chamber and kept there.

[0151] The adjustment of the voltages in the device can be particularly useful for controlling the movement of a large molecule, such as a charged molecule, that is long enough to cross both pores at the same time. In such an aspect, the direction and the speed of the movement of the molecule can be controlled by the relative magnitude and polarity of the voltages as described below.

[0152] The device can contain materials suitable for holding liquid samples, in particular, biological samples, and/or materials suitable for nanofabrication. In one aspect, such materials include dielectric materials such as, but not limited to, silicon, silicon nitride, silicon dioxide, graphene, carbon nanotubes, TiO2, HfO2, A12O3, or other metallic layers, or any combination of these materials. In some aspects, for example, a single sheet of graphene membrane of about 0.3 nm thick can be used as the pore- bearing membrane.

[0153] Devices that are microfluidic and that house two-pore microfluidic chip implementations can be made by a variety of means and methods. For a microfluidic chip comprised of two parallel membranes, both membranes can be simultaneously drilled by a single beam to form two concentric pores, though using different beams on each side of the membranes is also possible in concert with any suitable alignment technique. In general terms, the housing ensures sealed separation of Chambers A-C.

[0154] In one aspect, the device includes a microfluidic chip (labeled as “Dual-pore chip”) is comprised of two parallel membranes connected by spacers. Each membrane contains a pore drilled by a single beam through the center of the membrane. Further, the device preferably has a Teflon® housing or polycarbonate housing for the chip. The housing ensures sealed separation of Chambers A-C and provides minimal access resistance for the electrode to ensure that each voltage is applied principally across each pore.

[0155] More specifically, the pore-bearing membranes can be made with transmission electron microscopy (TEM) grids with a 5-100 nm thick silicon, silicon nitride, or silicon dioxide windows. Spacers can be used to separate the membranes, using an insulator, such as SU-8, photoresist, PECVD oxide, ALD oxide, ALD alumina, or an evaporated metal material, such as Ag, Au, or Pt, and occupying a small volume within the otherwise aqueous portion of Chamber B between the membranes. A holder is seated in an aqueous bath that is comprised of the largest volumetric fraction of Chamber B. Chambers A and C are accessible by larger diameter channels (for low access resistance) that lead to the membrane seals.

[0156] A focused electron or ion beam can be used to drill pores through the membranes, naturally aligning them. The pores can also be sculpted (shrunk) to smaller sizes by applying a correct beam focusing to each layer. Any single nanopore drilling method can also be used to drill the pair of pores in the two membranes, with consideration to the drill depth possible for a given method and the thickness of the membranes. Predrilling a micro-pore to a prescribed depth and then a nanopore through the remainder of the membranes is also possible to further refine the membrane thickness.

[0157] By virtue of the voltages present at the pores of the device, charged molecules can be moved through the pores between chambers. Speed and direction of the movement can be controlled by the magnitude and polarity of the voltages. Further, because each of the two voltages can be independently adjusted, the direction and speed of the movement of a charged molecule can be finely controlled in each chamber.

[0158] One example concerns a target polynucleotide, having a length that is longer than the combined distance that includes the depth of both pores plus the distance between the two pores. For example, a 1000 by dsDNA is about 340 nm in length, and would be substantially longer than the 40 nm spanned by two 10 nm-deep pores separated by 20 nm. In a first step, the polynucleotide is loaded into either the upper or the lower chamber. By virtue of its negative charge under a physiological condition at a pH of about 7.4, the polynucleotide can be moved across a pore on which a voltage is applied. Therefore, in a second step, two voltages, in the same polarity and at the same or similar magnitudes, are applied to the pores to move the polynucleotide across both pores sequentially.

[0159] At about the time when the polynucleotide reaches the second pore, one or both of the voltages can be changed. Since the distance between the two pores is selected to be shorter than the length of the polynucleotide, when the polynucleotide reaches the second pore, it is also in the first pore. A prompt change of polarity of the voltage at the first pore, therefore, will generate a force that pulls the polynucleotide away from the second pore.

[0160] Assuming that the two pores have identical voltage-force influence and |V1| = |V2| + 5V, the value 6V > 0 (or < 0) can be adjusted for tunable motion in the |V1| (or V2) direction. In practice, although the voltage-induced force at each pore will not be identical with VI = V2, calibration experiments can identify the appropriate bias voltage that will result in equal pulling forces for a given two-pore chip; and variations around that bias voltage can then be used for directional control.

[0161] If, at this point, the magnitude of the voltage-induced force at the first pore is less than that of the voltage-induced force at the second pore, then the polynucleotide will continue crossing both pores towards the second pore, but at a lower speed. In this respect, it is readily appreciated that the speed and direction of the movement of the polynucleotide can be controlled by the polarities and magnitudes of both voltages. As will be further described below, such a fine control of movement has broad applications. For quantitating target polynucleotides, the utility of two-pore device implementations is that during controlled delivery and sensing, the target polynucleotide or payload-bound target polynucleotide can be repeatedly measured, to add confidence to the detection result.

[0162] Accordingly, in one aspect, provided is a method for controlling the movement of a charged molecule through a nanopore device. The method comprises (a) loading a sample comprising a target polynucleotide (e.g., a target polynucleotide amplicon) in one of the upper chamber, middle chamber or lower chamber of the device of any of the above embodiments, wherein the device is connected to one or more power supplies for providing a first voltage between the upper chamber and the middle chamber, and a second voltage between the middle chamber and the lower chamber; (b) setting an initial first voltage and an initial second voltage so that the target polynucleotide moves between the chambers, thereby locating the molecule across both the first and second pores; and (c) adjusting the first voltage and the second voltage so that both voltages generate force to pull the charged target polynucleotide away from the middle chamber (voltage-competition mode), wherein the two voltages are different in magnitude, under controlled conditions, so that the target polynucleotide scaffold moves across both pores in either direction and in a controlled manner.

[0163] In one aspect, the sample containing the target polynucleotide is loaded into the upper chamber and the initial first voltage is set to pull the target polynucleotide from the upper chamber to the middle chamber and the initial second voltage is set to pull the target polynucleotide from the middle chamber to the lower chamber. Likewise, the sample can be initially loaded into the lower chamber, and the target polynucleotide can be pulled to the middle and the upper chambers.

[0164] In another aspect, the sample containing the target polynucleotide is loaded into the middle chamber; the initial first voltage is set to pull the charged molecule from the middle chamber to the upper chamber; and the initial second voltage is set to pull the target polynucleotide from the middle chamber to the lower chamber.

[0165] In one aspect, real-time or on-line adjustments to the first voltage and the second voltage at step (c) are performed by active control or feedback control using dedicated hardware and software, at clock rates up to hundreds of megahertz. Automated control of the first or second or both voltages is based on feedback of the first or second or both ionic current measurements.

Sensors

[0166] As discussed above, in various aspects, the nanopore device further includes one or more sensors to carry out the detection of the target polynucleotide. Embodiments described here use control mixtures containing control and calibrant molecules and novel analysis algorithms to characterize the translocation behavior for each nanopore sensor prior to detecting the analyte(s) of interest.

[0167] The sensors used in the device can be any sensor suitable for identifying a target polynucleotide amplicon bound or unbound to a payload molecule. For instance, a sensor can be configured to identify the target polynucleotide by measuring a current, a voltage, a pH value, an optical feature, or residence time associated with the molecule. In other aspects, the sensor may be configured to identify one or more individual components of the target polynucleotide or one or more components bound or attached to the target polynucleotide. The sensor may be formed of any component configured to detect a change in a measurable parameter where the change is indicative of the target polynucleotide, a component of the target polynucleotide, or preferably, a component bound or attached to the target polynucleotide. In one aspect, the sensor includes a pair of electrodes placed at two sides of a pore to measure an ionic current across the pore when a molecule or other entity, in particular a target polynucleotide, moves through the pore. In certain aspects, the ionic current across the pore changes measurably when a target polynucleotide segment passing through the pore is bound to a payload molecule. Such changes in current may vary in predictable, measurable ways corresponding with, for example, the presence, absence, and/or size of the target polynucleotide molecule present.

[0168] In a preferred embodiment, the sensor comprises electrodes that apply voltage and are used to measure current across the nanopore. Translocations of molecules through the nanopore provides electrical impedance (Z) which affects current through the nanopore according to Ohm’s Law, V= IZ, where V is voltage applied, I is current through the nanopore, and Z is impedance. Inversely, the conductance G = 1/Z are monitored to signal and quantitate nanopore events. The result when a molecule translocates through a nanopore in an electrical field (e.g., under an applied voltage) is a current signature that may be correlated to the molecule passing through the nanopore upon further analysis of the current signal.

[0169] When residence time measurements from the current signature are used, the size of the component can be correlated to the specific component based on the length of time it takes to pass through the sensing device.

[0170] In one embodiment, a sensor is provided in the nanopore device that measures an optical feature of the molecule, a component (or unit) of the molecule, or a component bound or attached to the molecule. One example of such measurement includes the identification of an absorption band unique to a particular unit by infrared (or ultraviolet) spectroscopy.

[0171] In some embodiments, the sensor is an electric sensor. In some embodiments, the sensor detects a fluorescent signature. A radiation source at the outlet of the pore can be used to detect that signature.

Methods for detecting target analytes in an unknown sample mixture in a nanopore device [0172] In some aspects, the methods of the present disclosure can be used for next generation sequencing. In some embodiments, the methods of the present disclosure can be used for diagnosing and/or detecting a virus, such as an influenza virus or a coronavirus based on the detection of the size and/or concentration of one or more target analytes associated with the disease or condition. For example, the unknown sample mixture may be suspected to include target analytes associated with the specific disease or condition. Thus, if, after applying the methods of the present disclosure, an amount (e.g., amount or concentration) of the target analyte is above a threshold and the size of the target analyte matches the associated size of the target analyte associated with the disease or condition, the unknown sample is positive for that specific disease or condition. In contrast, if, after applying the methods of the present disclosure, an amount (e.g., amount or concentration) of the target analyte is below a threshold, and the size of the suspected target analyte is not present or is not the same as the associated size of the target analyte associated with the disease or condition, the unknown sample is negative for that specific disease or condition. See e.g., Examples 1 and 2 disclosed herein. [0173] In some embodiments, the methods of the present disclosure can be used for diagnosing and/or detecting an infection caused by coronavirus, influenza virus, rhinovirus, respiratory virus syncytial virus, metapneumovirus, adenovirus, and boca virus.

[0174] In some embodiments, the virus is an influenza virus selected from the group consisting of: parainfluenza virus 1, parainfluenza virus 2, influenza A virus, and influenza B virus.

[0175] In some embodiments, the virus is a coronavirus selected from the group consisting of: coronavirus OC43, coronavirus 229E, coronavirus NL63, coronavirus HKU1, middle east respiratory syndrome beta coronavirus (MERS-CoV), severe acute respiratory syndrome beta coronavirus (SARS-CoV), and SARS-CoV-2. In some embodiments, the coronavirus is SARSCoV-2. In some embodiments, the coronavirus is a variant of SARS- CoV or SARS-CoV-2.

[0176] In some embodiments, the one or more target analytes comprises one or more nucleic acids encoding one or more regions of a transgene associated with a disease or condition. In some embodiments, the one or more target analytes are amplified prior to loading the unknown sample mixture containing the one or more target analytes in the nanopore device. In some embodiments, the method comprises amplifying one or more target analytes in the unknown sample with a forward primer comprising a nucleotide sequence of: cctcgaggacaaggcgttccaatta (SEQ ID NO: 1), and a reverse primer comprising a nucleotide sequence of: catatgatgccgtctttgttagcaccat (SEQ ID NO: 2). In some embodiments, the target analyte comprises a target region comprising a nucleotide sequence of: CCTCGAGGACAAGGCGTTCCAATTAACACCAATAGCAGTCCAGATGACCAAATT GGCTACTACCGAAGAGCTACCAGACGAATTCGTGGTGGTGACGGTAAAATGAAA GATCTCAGTCCAAGATGGTATTTCTACTACCTAGGAACTGGGCCAGAAGCTGGAC TTCCCTATGGTGCTAACAAAGACGGCATCATATG (SEQ ID NO: 3).

[0177] In some embodiments the disease or condition is caused by a virus. In some embodiments, the one or more target analytes comprises one or more nucleic acids encoding one or more regions of a virus. In some embodiments, the virus is an influenza virus selected from the group consisting of: parainfluenza virus 1, parainfluenza virus 2, influenza A virus, and influenza B virus. In some embodiments, the virus is a coronavirus selected from the group consisting of: coronavirus OC43, coronavirus 229E, coronavirus NL63, coronavirus HKU1, middle east respiratory syndrome beta coronavirus (MERS-CoV), severe acute respiratory syndrome beta coronavirus (SARS-CoV), SARS-CoV-2, or variants thereof. In some embodiments, the virus is SARS CoV-2 or a variant of SARS-CoV-2. In some embodiments, the transgene is a SARS-CoV-2 nucleocapsid (N) gene. In some embodiments, the one or more regions of the N gene is selected from: the Nl, N2, and N3 region. In some embodiments, the virus is the influenza virus. In some embodiments, the influenza A virus is selected from a swine-origin influenza A virus (H1N1), swine-origin influenza A virus (H1N1), influenza A virus subtype H2N2, and influenza A virus subtype H3N2. In some embodiments, the transgene is a matrix (Ml) gene. In some embodiments, the transgene is a nonstructural 2 (NS2) gene.

EXAMPLES

Example 1: Analysis of Covid positive sample.

[0178] (1) Raw translocation data of controls and unknown sample are first grouped by clustering algorithm and filtered to increase statistical confidence; (2) a calibration curve of amplitude vs size is created; and (3) the unknown data (after clustering and filtering) is compared to the calibration curve to determine the presence or absence of Covid events. [0179] As shown in FIG. 7, this example provides analysis of a sample that was found, using the method of the present disclosure, to be positive for SARS-CoV-2 (target analyte in this example). Raw translocation event data is provided in Panel A for the control mixture and Panel B for the unknown sample suspected to contain the target analyte (SARS-CoV-2). Note that a reference molecule (80 bp DNA population) is added to the unknown sample mixture and serves as an internal control for the control mixture and the unknown sample mixture. Raw translocation data of control mixture and unknown sample mixture are first grouped by the clustering algorithm and filtered to increase statistical confidence (clustered and filtered data shown in Panel C for the control mixture and Panel D for the unknown sample mixture). A calibration curve (DNA length model) is created for the control mixture to show amplitude vs size (not shown), and the unknown sample mixture data, (after clustering and data filtering) is compared to the calibration curve based on the control mixtures to determine the presence or absence of Covid events. In this example, the reference molecule is found in the unknown sample mixture as the internal control as expected, and the target analyte was detected, since the DNA length of 197 bp for SARS-CoV-2 was detected (e.g. 197 bp events were present in unknown sample). Note that it is a coincidence that the control mixture has the same target lengths of human SARS-CoV-2. The control mixtures do not have to include the length and/or sequence regions of SARS-CoV-2, since the lengths of DNA detected in the unknown sample are compared to the calibration curve generated for the control mixture which can be random DNA sequences. Example 2: Analysis of Covid negative sample.

[0180] (1) Raw translocation data of controls and unknown sample are first grouped by clustering algorithm and filtered to increase statistical confidence; (2) a calibration curve of amplitude vs size is created; and (3) the unknown data (after clustering and filtering) is compared to the calibration curve to determine the presence or absence of Covid events. [0181] As shown in FIG. 8, this example provides analysis of a sample that was found, using the method of the present disclosure, to be negative for SARS-CoV-2 (target analyte in this example). Raw translocation event data is provided in Panel A for the control mixture and in Panel B for the unknown sample suspected to contain the target analyte (SARS-CoV- 2). Note that a reference molecule (80 bp DNA population) is added to the unknown sample mixture and serves as an internal control for the control mixture and the unknown sample mixture. Raw translocation data of control mixture and unknown sample mixture are separately first grouped by the clustering algorithm and filtered to increase statistical confidence (clustered and filtered data shown in Panel C for the control mixture and Panel D for the unknown sample mixture). A calibration curve (DNA length model) is created to show amplitude vs size (not shown), and the unknown sample mixture data, (after clustering and data filtering) is compared to the calibration curve created based on the control mixture to determine the presence or absence of SARS-CoV-2 events. In this example, the reference molecule is found in the unknown sample mixture as the internal control as expected, but the target analyte (SARS-CoV-2) was not detected (e.g. 197 bp events were not present in unknown sample)., since the 197 bp events for SARS-CoV-2 was absent. Note that it is a coincidence that the control mixture has the same target length of the target analyte, human SARS-CoV-2. The control mixtures do not have to include the length and/or sequence regions of the target analyte, since the clusters of DNA populations detected in the unknown sample are compared by size and/or concentration to the calibration curve generated for the control mixture, which can be random DNA sequences.

EQUIVALENTS AND SCOPE

[0182] Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments in accordance with the invention described herein. The scope of the present invention is not intended to be limited to the above Description, but rather is as set forth in the appended claims.

[0183] In the claims, articles such as “a,” “an,” and “the” may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include “or” between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process.

[0184] It is also noted that the term “comprising” is intended to be open and permits but does not require the inclusion of additional elements or steps. When the term “comprising” is used herein, the term “consisting of’ is thus also encompassed and disclosed.

[0185] Where ranges are given, endpoints are included. Furthermore, it is to be understood that unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or subrange within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise.

[0186] All cited sources, for example, references, publications, databases, database entries, and art cited herein, are incorporated into this application by reference, even if not expressly stated in the citation. In case of conflicting statements of a cited source and the instant application, the statement in the instant application shall control.

[0187] Section and table headings are not intended to be limiting.

Claims

What is claimed is:

1. A method of a developing a calibration model to determine an estimate of the size and/or concentration of one or more target analytes in an unknown sample using a nanopore device, comprising:

(a) applying a voltage across a nanopore in a nanopore device to generate a detectable electronic signature and to induce translocation of charged analytes through said nanopore, separately, for each of:

(i) a control sample mixture comprising:

(ia) a first control molecule comprising a polynucleotide with a first molecule length,

(ib) a second control molecule comprising a polynucleotide with a second molecule concentration and/or length, and

(ic) a first reference molecule comprising a polynucleotide with a first reference molecule length;

(b) loading the control sample mixture into a chamber of a nanopore device;

(c) applying a voltage across a nanopore in the nanopore device to: translocate each molecule of the control sample mixture, separately, through said nanopore, and generate a detectable electronic signature for each molecule in the control mixture;

(d) clustering the detectable electronic signature for each molecule in the control sample mixture by size; and

(e) calculating a median event area for each molecule cluster in the control sample mixture and determining a linear correlation between a median area of the detectable electronic signature and size for each molecule in the control sample mixture.

2. The method of claim 1, wherein the first control molecule further comprises a first control molecule concentration.

3. The method of any one of claims 1-2, wherein the second control molecule further comprises a second control molecule concentration.

4. The method of any one of claims 1-3, wherein the reference molecule further comprises a reference molecule concentration. The method of any one of claims 1-4, wherein the method further comprises developing a model based on step (d) and (e) to quantify the size and/or concentration of the one or more target molecules in the unknown sample mixture translocating through the nanopore device. The method of any one of claims 1-5, wherein the control mixture further comprises a fourth control molecule comprising a polynucleotide with a fourth molecule concentration and/or length. The method of any one of claims 1-6, wherein the control mixture further comprises a fifth control molecule comprising a polynucleotide with a fifth molecule concentration and/or length. The method of any one of claims 1-7, wherein the control mixture further comprises a: fifth control molecule comprising a polynucleotide with a fifth molecule concentration and/or length; sixth control molecule comprising a polynucleotide with a sixth molecule concentration and/or length; seventh control molecule comprising a polynucleotide with a seventh molecule concentration and/or length; and eighth control molecule comprising a polynucleotide with an eighth molecule concentration and/or length. The method of any one of claims 1-8, wherein the control mixture further comprises a: ninth control molecule comprising a polynucleotide with a ninth molecule concentration and/or length; tenth control molecule comprising a polynucleotide with a tenth molecule concentration and/or length; eleventh control molecule comprising a polynucleotide with a eleventh molecule concentration and/or length; and twelfth control molecule comprising a polynucleotide with a twelfth molecule concentration and/or length. The method of claim 5, wherein the control mixture further comprises a fifth control molecule comprising a polynucleotide with a fifth molecule concentration and/or length; sixth control molecule comprising a polynucleotide with a sixth molecule concentration and/or length; seventh control molecule comprising a polynucleotide with a seventh molecule concentration and/or length; eighth control molecule comprising a polynucleotide with an eighth molecule concentration and/or length, and ninth control molecule comprising a polynucleotide with a ninth molecule concentration and/or length. The method of claim 10, wherein the first, second, and third control molecule comprise the same molecule, but a different concentration and/or length. The method of claim 10, wherein the fourth, fifth, and sixth control molecule comprise the same molecule, but a different concentration and/or length. The method of claim 10, wherein the seventh, eighth, and ninth control molecule comprise the same molecule, but a different concentration and/or length. The method of any one of claims 1-10, wherein the size and/or concentration of the first reference molecule is known. The method of any one of claims 1-11, wherein the size and/or concentration of the first control molecule is not known. The method of any one of claims 1-15, wherein the size and/or concentration of the second control molecule is known. The method of any one of claims 1-15, wherein the control sample mixture does not comprise the target analyte molecule. The method of any one of claims 1-15, wherein each molecule in the control sample mixture is prepared by nucleic acid amplification. The method of any one of claims 1-18, wherein each molecule is DNA or RNA. The method of any one of claims 1-19, wherein one or more target analytes comprises one or more amplicon products. The method of claim 20, wherein the amplicon product is a DNA amplicon product or an RNA amplicon product. The method of any one of claims 1-21, wherein the size is a base-pair length of each molecule. The method of any one of claims 1-22, wherein the electronic detectable signature is a current event signature. A method of a determining an estimate of the size and/or concentration of one or more target analytes in a mixed unknown sample using a nanopore device, comprising

(i) a control sample mixture comprising:

(ib) a second control molecule comprising a polynucleotide with a second molecule concentration and length, and

(ii) an unknown sample mixture comprising:

(iia) one or more target molecules, wherein each target molecule comprises a nucleic acid encoding a region of a target transgene;

(iib) the first reference molecule;

(b) loading the control sample mixture into a chamber of a nanopore device;

(c) applying a voltage across a nanopore in the nanopore device to: induce translocation of each molecule, of the control sample mixture, separately, through said nanopore, and generate a detectable electronic signature for each molecule in the control mixture;

(d) clustering the detectable electronic signature for each molecule in the control sample mixture by size and/or concentration; (e) calculating a median event area for each molecule cluster in the control sample mixture and determining a linear correlation between a median area of the detectable electronic signature and size for each molecule in the control mixture sample;

(f) developing a model based on step (d) and (e) to quantify the size and/or concentration of the one or more target molecules and the first reference molecule in the unknown sample mixture;

(g) repeating step (b) through (e) for the unknown sample mixture; and

(h) applying the model in step (f) to quantify the size of the one or more target molecules in the unknown sample mixture. The method of claim 1, wherein the first control molecule further comprises a first molecule concentration. The method of any one of claims 24-25, wherein the second control molecule further comprises a second molecule concentration. The method of any one of claims 24-26, wherein the reference control molecule further comprises a reference molecule concentration. The method of any one of claims 24-27, wherein the method further comprises applying the model in step (f) to quantify the concentration of the one or more target molecules in the unknown sample mixture. The method of any one of claims 24-28, wherein each molecule is DNA or RNA. The method of any one of claims 24-29, wherein one or more target molecules comprises one or more amplicon products. The method of claim 30, wherein the amplicon product is a DNA amplicon product or an RNA amplicon product. The method of any one of claims 24-31, wherein each of the first control molecule, second control molecule, and first reference molecule comprises a size ranging from 5 base pairs to 1000 base pairs. The method of any one of claims 24-32, wherein the concentration of each control molecule or reference molecule ranges from 0.01 nM to 10 nM. The method of any one of claims 24-33, wherein the one or more target molecules comprises one or more nucleic acids encoding one or more regions of a transgene associated with a disease or condition. The method of claim 34, wherein the disease or condition is a virus. The method of claim 35, wherein the virus is an influenza virus selected from the group consisting of: parainfluenza virus 1, parainfluenza virus 2, influenza A virus, and influenza B virus. The method of claim 35, wherein the virus is a coronavirus selected from the group consisting of: coronavirus OC43, coronavirus 229E, coronavirus NL63, coronavirus HKU1, middle east respiratory syndrome beta coronavirus (MERS-CoV), severe acute respiratory syndrome beta coronavirus (SARS-CoV), and SARS-CoV-2. The method of claim 37, wherein the virus is SARSCoV-2. The method of claim 38, wherein the transgene is a SARS-CoV-2 nucleocapsid (N) gene. The method of claim 39, wherein the one or more regions of the N gene is selected from: the Nl, N2, and N3 region. The method of claim 35, wherein the virus is the influenza virus. The method of claim 35, wherein the influenza A virus is selected from a swine-origin influenza A virus (H1N1), swine-origin influenza A virus (H1N1), influenza A virus subtype H2N2, and influenza A virus subtype H3N2. The method of claim 36, wherein the transgene is a matrix (Ml) gene. The method of claim 36, wherein the transgene is a nonstructural 2 (NS2) gene. The method of any one of claims 24-45 wherein the size is a base-pair length of each molecule. The method of any one of claims 24-45, wherein the electronic detectable signature is a current event signature. The method of any one of claims 24-46, wherein the method further comprises applying a population shifting correction algorithm to the reference molecule and the one or more target molecules in the unknown sample to correct for an error in the size and/or concentration of the one or more target molecules, thereby determining an improved estimate of the size and/or concentration of one or more target molecules in said mixed unknown sample. The method of claim 47, wherein the said applying the population shifting correction algorithm occurs before applying the model in step (f). The method of claim 47, wherein the population shifting correction algorithm is carried out with a computer readable medium, comprising instructions, that cause a processor to: find a median log (area) of the first reference molecule in the unknown sample mixture; derive the first reference molecule’s expected log(area) using the model in step (f); calculate a correction factor comprising the equation:

multiply the measured log(area) of all of the detectable electronic signatures for each molecule in the unknown sample mixture and the control sample mixture; and calculate the base pairs of the one or more target molecules by applying the model in step (g) and the corrected log(area). The method of any one of claims 24-49, wherein the method further comprises, after step (d), applying a data filtering algorithm to separate out overlapping electrically detectable signature events. The method of any one of claims 48-49, wherein the method further comprises, applying the population shifting correction algorithm and applying a data filtering algorithm to separate out overlapping electrically detectable signature events. The method of any one of claims 50-51, wherein the data filtering algorithm defines a nucleotide base-pair window for each molecule. The method of any one of claims 50-52, wherein the data filtering algorithm is carried out using a computer readable medium, comprising instructions, that cause a processor to apply the data filtering algorithm to the electronically detectable signature events to separate out overlapping electrically detectable signature events from control molecules, one or more target molecules, and the reference molecule. The method of any one of claims 24-53, wherein each molecule in the control mixture is prepared by nucleic acid amplification. The method of any one of claims 24-53, wherein each molecule in the unknown sample mixture is prepared by nucleic acid amplification. The method of any one of claims 24-55, wherein the method further comprises identifying a concentration of electrically detectable signature events associated with each control molecule and a concentration of electrically detectable signature events associated with said reference molecule. The method of any one of claims 24-56, wherein the method further comprises, identifying a concentration of electrically detectable signature events associated with each target molecule in the unknown sample mixture. The method of any one of claims 24-57, wherein the concentration of electrically detectable signature events associated with each control molecule is identified according to a defined threshold. The method of any one of claims 24-58, wherein the concentration of electrically detectable signature events associated with said reference molecule is identified according to a defined threshold. The method of any one of claims 24-58, wherein the concentration of electrically detectable signature events associated with said target molecule is identified according to a defined threshold. The method of any one of claims 24-60, wherein the method further comprises optimizing said threshold to increase accuracy of detection of the control molecules in the control sample mixture, one or more target molecules in the unknown sample mixture, and/or the reference molecule in the control sample mixture and the unknown sample mixture, using a Q-test, a support vector machine, or an expectation maximization algorithm. The method of claim 40, wherein the concentration is the absolute concentration of the target analyte in the unknown sample mixture. A method of a detecting the presence or absence of one or more target analytes suspected to be present in a mixed unknown sample using a nanopore device, comprising:

(i) a control sample mixture comprising:

(ii) an unknown sample mixture comprising:

(iia) one or more target molecules suspected to be present in the unknown sample mixture, wherein each target molecule comprises a nucleic acid encoding a region of a target transgene;

(iib) the first reference molecule;

(b) loading the control sample mixture into a chamber of a nanopore device;

(d) clustering the detectable electronic signature for each molecule in the control sample mixture by size and/or concentration;

(e) calculating a median event area for each molecule cluster in the control sample mixture and determining a linear correlation between a median area of the detectable electronic signature and size for each molecule in the control mixture sample; (f) developing a model based on step (d) and (e) to quantify the size and/or concentration of the one or more target molecules and the first reference molecule in the unknown sample mixture;

(g) repeating step (b) through (e) for the unknown sample mixture; and

(h) applying the model in step (f) to determine the presence or absence of one or more target analytes in the unknown sample mixture by determining the size and/or concentration of the one or more target molecules in the unknown sample mixture.

50