WO2023102502A2 - Enriched peptide detection by single molecule sequencing - Google Patents

Enriched peptide detection by single molecule sequencing Download PDF

Info

Publication number
WO2023102502A2
WO2023102502A2 PCT/US2022/080781 US2022080781W WO2023102502A2 WO 2023102502 A2 WO2023102502 A2 WO 2023102502A2 US 2022080781 W US2022080781 W US 2022080781W WO 2023102502 A2 WO2023102502 A2 WO 2023102502A2
Authority
WO
WIPO (PCT)
Prior art keywords
peptide
target
sample
standard
peptides
Prior art date
Application number
PCT/US2022/080781
Other languages
French (fr)
Other versions
WO2023102502A3 (en
WO2023102502A4 (en
Inventor
Norman Leigh Anderson
Morteza RAZAVI
Original Assignee
Siscapa Assay Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siscapa Assay Technologies, Inc. filed Critical Siscapa Assay Technologies, Inc.
Publication of WO2023102502A2 publication Critical patent/WO2023102502A2/en
Publication of WO2023102502A3 publication Critical patent/WO2023102502A3/en
Publication of WO2023102502A4 publication Critical patent/WO2023102502A4/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/34Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving hydrolase
    • C12Q1/37Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving hydrolase involving peptidase or proteinase
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins

Definitions

  • the inventions herein relate to quantitative measurement of proteins, and provides significant improvements in the sensitivity, accuracy, throughput and cost of measuring clinically important proteins in biological samples such as blood. More than 100 different proteins are currently measured by clinical diagnostic tests in blood (1), each requiring a separate test and a separate aliquot of sample. Such tests are typically immunoassays, and make use of indirect detection of protein targets by antibodies, opening the door to a variety of interferences and associated clinical errors (2).
  • the cost and complexity of this paradigm for clinical laboratory testing severely constrains the health benefits obtainable from measurement of clinical biomarker proteins, and effectively precludes emerging applications such as high frequency longitudinal testing to establish personal biomarker baselines and health models.
  • the inventions herein also relate to peptide library preparation for quantitative single molecule analysis.
  • nucleic acid versions of such methods typically aim to provide enormous throughput (e.g., gigabases per run) of sequence (i.e., digital) data - a feature required to address whole genome, whole exome, or RNASeq sequencing requirements - but are not focused on precise quantitative (i.e., analog) measurements of the amount of a particular type of molecule.
  • Peptides and proteins are made of 20 common amino acids, as opposed to only 4 common bases in either RNA or DNA, requiring a much greater degree of analytical discrimination to sequence peptides as opposed to DNA.
  • PCR polymerase chain reaction
  • the present invention provides a general approach to the preparation of peptide libraries for quantitative single molecule analysis, and specific implementations appropriate for use with several alternative single molecule detectors (nanopores, optical imaging systems, and single molecule stepwise sequencing systems).
  • a key obstacle in formulating the invention has been the multi-dimensional nature of the problem, encompassing as it does the areas of protein and peptide chemistry, oligonucleotide chemistry and sequence design, antibody selection, single molecule detection by optical, chemical and electrical technologies, and requirements of specific clinical diagnostic assays.
  • Key aspects of the invention involve adaptation of technologies from each of these areas in a novel combination.
  • the invention preserves the fundamental benefits of direct detection of analyte molecules (a strength of mass spectrometry in comparison with indirect detection methods such as immunoassays), while offering the potential for improved test sensitivity, sequence specificity, and lower cost - all of which improve commercial competitiveness against legacy immunoassay technologies and enable expanded use of protein biomarkers in medicine and pharmaceutical R&D.
  • reagents and methods of the present invention a substantial improvement in the throughput, cost and sensitivity of protein analysis can be achieved.
  • nanopore sequencing taken as an example of sequence-sensitive single molecule detection, we can, in principle, estimate the performance of a system optimized for peptide quantitation.
  • peptides can be delivered as oligonucleotide constructs of about 50 bases in length; 2) nanopore sequencers process (read) oligonucleotides at a rate of approximately 400 bases/sec; 3) accurate measurement of the amount of a peptide requires detection and counting of less than 5,000 molecules; and 4) a commercially available nanopore cartridge contains 3,000 simultaneously readable nanopores, it would be theoretically possible to identify and precisely measure 30 peptide targets (representing 30 distinct clinically-relevant proteins) in a single sample in approximately 6 seconds.
  • 96 samples could be analyzed in 10 minutes using a single cartridge, compared to approximately 10 hr using liquid chromatography - mass spectrometry (LC-MS): an advantage of 60-fold in speed with less than I/IO* 11 the equipment cost.
  • LC-MS liquid chromatography - mass spectrometry
  • the ability to recognize and count individual analyte molecules can, at least in theory, offer the maximum assay sensitivity possible by any method, approximately 1,000 times as sensitive as mass spectrometry (-5,000 vs 6,00,000 molecules required for quantitative measurement, respectively). Such an improvement in sensitivity would enable precise measurement of almost all the 100+ clinically- established blood protein biomarkers in much less than 1 microliter (1/20 111 of a drop) of blood.
  • the present inventions address several major challenges in protein analysis, making use of a number of methods well-known in the art, in novel combinations and in combination with entirely novel concepts disclosed herein. 3.1 PROTEIN QUANTITATION CHALLENGES .
  • Blood represents the largest and deepest version of the human proteome present in any sample: in addition to the classical “plasma proteins” and cellular proteins of red cells, white cells and platelets, it contains all tissue proteins (as leakage markers) plus very numerous distinct immunoglobulin sequences (8).
  • proteins in plasma exhibit an extraordinary dynamic range in abundance: more than 10 orders of magnitude in concentration separate albumin and the rarest proteins now measured clinically.
  • a single peptide as a quantitative surrogate for the parent protein, provided that there is one (or some other known number) of copies of the peptide per protein molecule; i.e., that the peptide molar amount (or number of molecules) is equal to (or some known multiple of) the protein’s molar amount (or number of molecules).
  • TARGET(s) proteolytically-derived peptide segments within it as "target peptides"
  • a good target peptide for quantitation purposes is one that is a) proteotypic for the protein (i.e., occurs in no other protein of the species from which the sample is derived); b) occurs a known number of times (usually once) in the protein sequence, allowing the peptide to be used as a surrogate measure of the molar amount of the protein; c) is efficiently detected by a chosen detector; and d) behaves reliably in a practical sample preparation workflow appropriate to the assay objectives (which may include, for example, specific binding and enrichment compared to other un-selected peptides).
  • Methods for selection of TARGET peptides from a wide range of target proteins for conventional mass spectrometric detection is well-known in the art, but not directly relevant to selection of optimal peptides for single molecule detection. .
  • Digestion of proteins to peptides serves to “simplify” the structure of a protein sample, by eliminating complicated protein shapes (and their associated unique physical properties and protein: protein interactions), at the expense of increasing the numbers of molecules present.
  • the immense variety of folded protein structures present in a biological sample is transformed by digestion into a larger set of essentially unstructured short, linear peptides.
  • Proteins exhibit a very wide range of physical properties, ranging from soluble to insoluble, compact to extended, positively to negatively charged, with half-lives of seconds to months, and thus each protein represents an individual challenge in terms of handling and measurement.
  • proteolytic digestion of a given protein to peptides generally yields a mixture of peptide molecules from which an example can almost always be chosen that is unique to a given target protein (and thus can serve as a quantitative surrogate for it) and has properties compatible with a selected measurement method (encapsulated by the aspirational phrase “in every bad protein there is at least one good peptide”). For this reason, peptide-level detection is less susceptible to interferences, and more compatible with universal sample preparation methods, than protein-level detection.
  • a typical human protein yields about 50 peptides upon digestion with trypsin, and thus a sample containing, for example, 5,000 proteins is likely to yield a tryptic digest containing 250,000 different peptides.
  • Peptides of the length of typical tryptic peptides (5 to 25 amino acids in a typical tryptic digest) do not generally exhibit stable folded structures and thus do not generally interact with one another to form stable multi- peptide structures. This overall absence of stable interactions between digest peptides overcomes the major source of interference and error in technologies such as conventional immunoassays.
  • Proteolytic digestion is widely used in proteomics to fragment proteins for analysis by mass spectrometry (10) and other analytical methods.
  • Digestion of a sample such as plasma is typically carried out by first denaturing the sample proteins (e.g., with detergents such as deoxycholate, organic solvents, urea or guanidine HC1), reducing the disulfide bonds in the proteins (e.g., with tris(2-carboxyethyl)phosphine (TCEP), dithiothreitol or mercaptoethanol), alkylating the cysteines to prevent re-formation of disulfides (e.g., by addition of iodoacetamide which reacts with the free -SH group of cysteine), quenching excess iodoacetamide by addition of more dithiothreitol or mercaptoethanol, and finally (after removal or dilution of the denaturant) addition of the selected proteolytic enzyme (e.g.
  • trypsin Lys-C, etc.
  • incubation to allow digestion.
  • the action of trypsin is terminated, either by addition of a chemical inhibitor (e.g., TLCK) or by denaturation (through heat or addition of denaturants, or both) or removal (if the trypsin is on a solid support) of the trypsin.
  • Digestion destroys protein: protein interactions and thus generally eliminates interferences that occur in conventional immunoassays.
  • proteolytic digestion protocols have been developed, and some have been shown to exhibit extremely high quantitative reproducibility when implemented on automated platforms (4).
  • Most such protocols involve use of a single proteolytic step with a single enzyme (typically trypsin), while in a few cases two enzymes are used together (e.g., Lys-C and trypsin) in order to improve efficiency: Lys-C is smaller than trypsin and more stable at elevated temperature and in the presence of denaturants, and therefore able to cleave proteins that are otherwise relatively resistant to trypsin attack.
  • this approach makes use of sequential digestion by Lys-C followed by trypsin, with these two steps carried out at different temperatures or at different denaturant concentrations.
  • the sequential use of Lys-C and trypsin to improve digestion efficiency does not allow oriented construction of peptide-polymer constructs as disclosed in the present invention.
  • a transferrin peptide In a plasma digest, a transferrin peptide is expected to outnumber sTfR peptides by almost 1,000 to 1; to outnumber hepcidin peptides by almost 5,000 to 1, and to outnumber Tg peptides by 28,000,000 to 1.
  • These proteins are measured today in separate assays, each optimized for a different abundance level, and typically offering an assay dynamic range of -1,000 (i.e., a range in which most clinical specimen concentrations of that protein are expected to fall).
  • amplification-based assay systems sacrifice certainty regarding analyte identity, since the actual molecular targets are not themselves observed or measured by the ultimate detectors used in such assays (which detect binding reagents such as antibodies instead), with the result that unexpected interfering molecules can be measured and genuine analyte molecules can fail to generate signal.
  • Confidence that the intended analyte, and only this analyte, is being measured requires direct analyte detection by a detector capable of discriminating the correct analyte from all others, as exemplified by sequence sensitive single molecule detectors used in the present invention.
  • the SISCAPA method described in previous disclosures (US7632686) and publications (3, 9, 13-15) is a general approach for protein quantitation involving digesting proteins (e.g., with trypsin) into peptides that can be enriched by specific affinity capture and further fragmented in a mass spectrometer (e.g., by LC- MS/MS) to generate a sequence-based identification and a measure of amount by comparison to an internal standard.
  • This approach combines the advantages of classical immunoassays (sensitivity, throughput) with those of mass spectrometry (specificity, multiplexability, wide linear dynamic range), while overcoming the limitations of each.
  • a major improvement would result from the replacement of mass spectrometry by a sequence-sensitive single molecule detector, coupled with sample preparation technology capable of delivering to the detector the small numbers (e.g., 100-10,000) of purified analyte molecules that, when counted by the detector, generate the required assay precision (in this case determined by counting statistics).
  • sample preparation technology capable of delivering to the detector the small numbers (e.g., 100-10,000) of purified analyte molecules that, when counted by the detector, generate the required assay precision (in this case determined by counting statistics).
  • a peptide is chemically modified, for example to create a linkage to another molecule during assembly of a novel multi-part construct.
  • Site specific linkage chemistries are known for amino groups (e.g., the n-terminal amino group, and the epsilon amino group of lysine); for carboxyl groups (e.g., the c-terminal carboxyl, and sidechain carboxyls of aspartic and glutamic acids); sulfhydryl groups of cysteine residues; and a variety of other less frequently used chemistries.
  • Examples of effective click reagent pairs useful in creation of constructs according to the invention include i) reaction of an azide with an alkyne functionality (some requiring Cu(I) catalysis, which is less preferred in some embodiments); ii) reaction of an azide with a cyclooctyne such as DBCO (dibenzocyclooctyne, also called DIBO), Aza- dibenzocyclooctyne (ADIBO) or BCN (bicyclo[6.1.0]non-4-yne) by means of a strain- promoted alkyne cycloaddition (SPAAC) reaction without the need for a Cu catalyst; and iii) reaction of a tetrazine (Tz, such as methyltetrazine) with a trans-cyclooctene (TCO), also without the need for a Cu catalyst.
  • DBCO dibenzocyclooctyne
  • ADIBO Aza- dibenzocyclo
  • an internal standard in an analytical assay is highly desirable as it provides a stable reference against which the desired analyte can be measured.
  • mass spectrometric detection of peptides a synthetic stable isotope labeled version of a target peptide can easily be made and used as an internal standard (the well-known method of “isotope dilution mass spectrometry”). The approach works well because the labeled and unlabeled peptides are chemically and structurally identical, and thus behave the same through any sample preparation protocol, yet can be distinguished reliably by measuring their masses in the final mass spectrometer detection step.
  • the ratio between the amounts of the natural and isotopically labeled forms detected by the final MS analysis allows the concentration of the natural peptide in the sample mixture to be calculated.
  • the approach can be multiplexed to cover multiple peptides measured in parallel, and can be automated through computer control to afford a general system for protein measurement (13).
  • Single molecule detectors are unable to measure peptide mass accurately enough (or in most cases at all) to use stable isotope versions as internal standards in this manner.
  • an alternative peptide labeling strategy to create single molecule internal standards capable of a) behaving like the targeted peptide analyte during the steps of sample preparation, while b) being clearly distinguishable from the target by the chosen single molecule detection technology.
  • Use of the term “standard” in this specific sense is distinct from other forms of “standards” that can be introduced into workflows for quality control of separations, monitoring of chemical reaction yields, etc., rather than improving quantitation of a single specific analyte.
  • a variety of types of biologically derived antibodies e.g., polyclonal, monoclonal and oligoclonal antibodies derived from mice, rabbits, humans, camelids and other species
  • molecules derived from antibodies by molecular biology techniques e.g., antibodies selected from libraries using phage display and other techniques
  • aptamers based on DNA, or RNA, and including a variety of modified bases and backbones
  • BINDERs a variety of modified bases and backbones
  • a TARGET peptide can be coupled to a carrier protein (e.g., keyhole limpet hemocyanin: KLH) and used to immunize an animal (such as a rabbit, mouse, chicken, goat, camelid or sheep) by one of the known protocols that efficiently generate anti-peptide antibodies.
  • a carrier protein e.g., keyhole limpet hemocyanin: KLH
  • KLH keyhole limpet hemocyanin
  • SISCAP A technology 3, 9, 13-15
  • antibodies preferably monoclonal antibodies, can be developed that bind and capture a specific low abundance tryptic peptide from the digest of a very complex sample such as human blood plasma (which may contain 250,000 distinct peptides, some at very high abundance), and thereby enrich the peptide substantially (e.g., more than 10,000-fold).
  • BINDERS e.g., antibodies
  • Discovery of such BINDERS requires use of very specific screening processes to find reagents that do not bind non- TARGET peptides and retain the TARGET peptide long enough to wash non-binding peptides away (typically 10-15 minutes in many automated protocols).
  • the screening process does not assess equivalence of binding TARGET and STANDARD (stable isotope-labeled peptide) since it is known there will be no difference (at least for 15 N and 13 C isotopic labels).
  • a peptide TARGET and its cognate internal STANDARD molecule are not chemically identical (as is the case with stable isotope labeled standards), the very specificity of effective BINDERS creates a major problem: if a BINDER binds a STANDARD more or less tightly (or with different kinetics) than its cognate TARGET peptide, then binding will impact the ratio of TARGET to STANDARD molecules and lead to an incorrect assay result.
  • the selection of TARGETS, STANDARDS and BINDERS that successfully preserve the quantitative ratio is therefore critical for enablement of quantitative internally-standardized single molecule detection.
  • sequence information can identify the analyte unambiguously, thus enabling direct analyte detection.
  • Methods that make use of antibodies to recognize intact protein analytes e.g., immunoassays cannot provide this level of certainty, and are classified as indirect detection methods.
  • sequence information (18, 19) or structural information closely tied to sequence.
  • sequence information is the primary deliverable.
  • the present invention provides reagents, methods, and kits for the preparation of peptide libraries suitable for quantitative analysis by a variety of sequence-sensitive single molecule detection technologies, including, but not limited to, the following examples: 3.9.1 Single molecule sequencing using nanopores.
  • Biopolymers can pass through nanopores (both biological and inorganic) in suitable membranes.
  • Signals e.g., through-pore ion current, or cross-pore tunneling current
  • Nanopore methods of DNA and RNA sequencing have been developed and successfully commercialized ⁇ (20, 21)).
  • Nanopore analysis of peptides and proteins is advancing rapidly ⁇ (19, 22-25)), but discrimination of 20 different amino acids presents a far greater challenge than discrimination of 4 nucleic acid bases.
  • the most mature approach for nanopore analysis of peptides is one involving in-line linkage of peptides and nucleic acids into a hybrid polymer, allowing use of some features of a successful commercially-available DNA sequencing platform to be applied to peptides (e.g., international publication WO 2021/111125 Al). Similar methods are likely to work with a variety of alternative platforms including, but not limited to, alternative biological nanopores (26, 27), inorganic nanopores (28, 29), DNA-origami nanopores (1,_2)_ and the like.
  • Single protein molecules can also be arrayed in a regular pattern on a planar surface and probed by a succession of “promiscuous” binding agents to build up a pattern of epitope occurrences in each molecule (30-32).
  • Machine learning approaches can be used to interpret these epitope occurrence patterns to identify most proteins produced in a given organism despite the stochastic nature of individual binding events. In the context of short peptides, rather than whole proteins, this approach does not deliver direct peptide sequence information.
  • a limited but sequence-specific fingerprint of a peptide can be accomplished by detecting the order of fluorophores coupled to specific amino acids on a single TARGET peptide molecule (33).
  • a technology has been developed by functionalizing a peptide with one type of fluorophore (Cy3) at the N-terminal site and a second type of fluorophore (Cy5) on an internal cysteine residue.
  • the method then monitored the order in which the two fluorophores passed through Alexa488-labeled ClpP14 protease, as detected using the separation-dependent Forster resonance energy transfer effect (“FRET”).
  • FRET separation-dependent Forster resonance energy transfer effect
  • the instrument platforms associated with these technologies typically provide for simultaneous “sequencing” of millions to billions of peptide molecules, with successive amino acids decoded in successive cycles of reagent recognition and terminal amino acid removal. Thus any of them can be used to generate sequence data (or “approximate” sequence data) from the peptide libraries prepared using the invention.
  • a DNA sequence can be progressively generated that encodes all or part of the peptide amino acid sequence (the peptide being destroyed in the process: i.e., the reading process is degradative).
  • the DNA molecule can subsequently be read using any of the established DNA sequencing methodologies.
  • the resulting “approximate” sequence information may nevertheless be sufficient to recognize one peptide sequence among a limited set of expected alternatives.
  • a variety of methods can be used to immobilize millions of individual peptide molecules and adjacent DNA molecules so as to produce a DNA library encoding sequence information from the original peptide library.
  • peptides are typically linked to the solid support via the c-terminal carboxyl group, leaving the n- terminus free.
  • fluorescent labeling (6-38)
  • specific amino acids e.g., cysteine SH, lysine NH2, etc.
  • chemical methods and records the disappearance of these fluorescent signals when a labeled amino acid is cleaved off during a sequence of degradative (e.g., Edman) steps.
  • DNA sequences particularly to synthesize, sequence, and splice together DNA sequences of various lengths
  • barcodes designed, recognizable sequence tags
  • NGS next generation sequencing
  • a variety of DNA barcode systems have been developed with the object of reliable identification of the original source sample in NGS applications. Note: this use of the term barcode (meaning a designed tag used for labeling) is distinct from an alternative usage applied to endogenous DNA sequences found to be characteristic of biological species and used to identify presence of a species in a sample comprised of multiple organisms.
  • DNA barcodes are also used in other applications where the barcodes are “read” by hybridization of a complementary probe that can be detected by optical or other detection means (44) without sequencing (e.g., using a fluorescently labeled complementary-sequence probe reagent detected by single molecule microscopic imaging).
  • a complementary probe that can be detected by optical or other detection means (44) without sequencing (e.g., using a fluorescently labeled complementary-sequence probe reagent detected by single molecule microscopic imaging).
  • Such methods have been successfully applied for single molecule fluorescence detection of up to 1,000 different mRNA sequences in single cell images using 16 different 30-mer readout probes in a 16-bit modified Hamming distance 4 code (45).
  • Such coding methods enable efficient sample barcoding and demultiplexing in single molecule imaging platforms.
  • ECC error-correcting codes
  • error detection and correction can be critical when single molecules are being detected and counted to determine a quantitative result.
  • Machine learning methods have been successfully developed that allow the identities and/or sequences of individual molecules to be deduced from complex signal patterns.
  • Nucleic acid sequences can be derived from current traces measured as DNA or RNA molecules pass through nanopores using highly-trained neural networks to recognize and interpret conductivity transitions (49).
  • Proteins can be recognized by machine learning based on optically-detected stochastic binding of multiple promiscuous affinity reagents to single molecules (31).
  • machine learning approaches make it possible to improve the recognition of molecules by all the above single molecule technologies by building mathematical models based on large numbers of reference examples, and incorporating more data for each example than is practical in human-designed programs.
  • the current dominant methods for direct detection of peptide molecules by MS have significant limitations. These include A) sensitivity limited by the performance of available mass spectrometers (currently limited to 10-100 amol of peptide, equivalent to 6 million to 60 million molecules of a peptide); B) low throughput (largely due to the limited speed of typical liquid chromatography systems employed); C) lack of robustness of the liquid chromatography systems used to separate peptides and introduce them into the MS; D) level of expertise required to operate LC-MS systems); E) high cost of LC-MS systems and the consequent limited adoption in clinical laboratories and F) impracticality of use in low-technology environments.
  • MS typically resolves and identifies analytes based on one or a few parameters that are derived from the peptide sequence (typically its mass and the masses of one to three of its specific fragments), but it does not typically determine the entire peptide sequence and is therefore susceptible to various forms of identification error.
  • the invention provides significant improvements in assay sensitivity by making use of single molecule counting technologies instead of mass spectrometry detection, with the potential to make quantitative measurements at the level of hundreds to thousands of analyte molecules (i.e., >1, 000-fold improvement compared to MS methods, including SISCAP A-MS).
  • the invention provides sequence-based assay specificity through direct detection and counting of analyte molecules without the use of liquid-chromatography or expensive mass spectrometer instruments.
  • the invention makes use of certain technologies and platforms that have been extensively developed for nucleic acid applications (e.g., DNA and RNA sequencing), some of which have been implemented commercially as small, inexpensive instruments capable of generating accurate results in low-technology environments.
  • a further object of the invention is to significantly lower the cost of making precise measurements of protein biomarkers, drugs and targets, and thereby to enable expanded use of quantitative protein tests in diagnostics and in longitudinal health monitoring.
  • the invention provides methods for improved protein quantitation by adapting a novel specific affinity enrichment strategy to allow detection of enriched peptides by technologies other than mass spectrometry - specifically technologies that enable counting individual peptide molecules in a sequence-specific manner.
  • amino acid in the context of the present disclosure is used in its broadest sense and is meant to include organic compounds containing amine (NH2) and carboxyl (COOH) functional groups, along with a side chain (e.g., a R group) specific to each amino acid.
  • NH2 amine
  • COOH carboxyl
  • side chain e.g., a R group
  • amino acids refer to naturally occurring L amino acids or residues.
  • amino acid further includes D-amino acids, retro-inverso amino acids as well as chemically modified amino acids such as amino acid analogues, naturally occurring amino acids that are not usually incorporated into proteins such as norleucine, and chemically synthesized compounds having properties known in the art to be characteristic of an amino acid.
  • amino acid analogues naturally occurring amino acids that are not usually incorporated into proteins such as norleucine, and chemically synthesized compounds having properties known in the art to be characteristic of an amino acid.
  • analogues or mimetics of phenylalanine or proline which allow the same conformational restriction of the peptide compounds as do natural Phe or Pro, are included within the definition of amino acid.
  • Such analogues and mimetics are referred to herein as "functional equivalents" of the respective amino acid.
  • analyte may refer to any of a variety of different molecules, or components, pieces, fragments or sections of different molecules that one desires to measure or quantitate in a sample.
  • anti-peptide antibody (a class of specific binding agent, or BINDER) as used herein means a macromolecule capable of non-covalently and reversibly binding to a peptide in a manner that is specific to all or a portion of the peptide’s sequence.
  • BINDER specific binding agent
  • the term includes a variety of types of macromolecules as indicated in the definition of “antibody” above, and is not limited to the proteins conventionally considered antibodies.
  • barcode includes any distinguishing physical, chemical or sequence characteristic of a peptide construct capable of having multiple values that can be determined by a single molecule detection method. Nucleic acid sequences can be used as barcodes, for example by providing a set of distinguishable sequences, a different one of which can be linked to the peptides of each sample, identifying (“decoding”) the source of these peptides after the peptides from multiple samples are pooled for efficient processing in a single molecule detection system.
  • molecular barcodes can be used as well, including sets of glycan structures (which can be decoded using various specific lectins, for example), peptides (when these can be linked to and sequenced with the TARGET peptides); non-biological polymers distinguishable by length or content of alternative polymer units; and small molecules including colored or fluorescent dyes.
  • binding includes any physical attachment or close association, which may be permanent or temporary. Generally, reversible binding includes aspects of charge interactions, hydrogen bonding, hydrophobic forces, van der Waals forces, etc., that facilitate physical attachment between the molecule of interest and the analyte being measured.
  • the “binding” interaction may be brief as in the situation where binding causes a chemical reaction to occur. Reactions resulting from contact between the binding agent and the analyte are also within the definition of binding for the purposes of the present invention, provided they can be later reversed.
  • BINDER antibody
  • anti-peptide affinity reagent affinity reagent
  • specific affinity reagent affinity reagent
  • affinity capture reagent affinity capture reagent
  • anti-peptide antibody as used herein mean a reagent having the ability to reversibly bind to a specific TARGET peptide (and its cognate STANDARD) in a manner that is specific to all or a portion of the peptide’s sequence.
  • BINDER will typically bind a TARGET peptide with greater affinity, greater kinetic on-rate or lower kinetic off-rate than a majority of the other peptides present in samples, sample digests, or other sources of contamination.
  • the terms include antibodies and fragments thereof as well as non-naturally occurring or synthetic antigen binding molecules.
  • IgG antibodies polyclonal, monoclonal, oligoclonal, etc.
  • other antibody isotypes fragments thereof, such as Fab fragments, murine, chimeric, and other non-human or not fully human antibodies and fragments thereof
  • synthetic (non-naturally occurring) antigen binding formats such as single chain antibodies and bispecific antibodies, as well as aptamers (including DNA, RNA and other polymeric aptamers) and binding proteins built from non- antibody structures (e.g., nanobodies).
  • BINDER ID means a molecular barcode identifying the BINDER to which a molecule bound in an enrichment step.
  • biological means a drug produced by a biological mechanism, such as a protein; i.e., a protein therapeutic, or protein drug.
  • biomolecules refers to any molecule present in a biological system, and includes proteins, nucleic acids (specifically DNA and RNA in its various forms, both intracellular and extracellular), complex sugars (glycans and the like), lipids, and a variety of metabolites.
  • denaturant includes a range of chaotropic and other chemical agents that act to disrupt or loosen the 3-D structure of proteins without breaking covalent bonds, thereby rendering them more susceptible to proteolytic treatment. Examples include urea, guanidine hydrochloride, ammonium thiocyanate, trifluoroethanol and deoxycholate, as well as solvents such as acetonitrile, methanol and the like.
  • the concept of denaturant includes non-material influences capable of causing perturbation to protein structures, such as heat, microwave irradiation, ultrasound, and pressure fluctuations.
  • click chemistry means the use of pairs of chemical groups that react with each other but not with other chemical groups commonly found in biomolecules: i.e., they are bio-orthogonal coupling mechanisms.
  • Commonly used click chemical pairs include, but are not limited to, a 3’ transcyclooctyne (TCO) group reacting bio-orthogonally with a tetrazine group (e.g., methyltetrazine (Me-TZ)), and bicyclononyne (BCN) reacting bio-orthogonally with an azide group.
  • TCO transcyclooctyne
  • Me-TZ methyltetrazine
  • BCN bicyclononyne
  • copper (Cu) ions serve as catalysts for a click reaction, and in other instances, typically involving a strained cyclic alkyne, a catalyst is not required.
  • clonotypic means uniquely characteristic of a clonal product, typically referring to a peptide sequence unique to a specific monoclonal antibody.
  • cognate means a relationship between molecules in which either 1) the molecules each contain a region that has the same structure as the other, or 2) the molecules can bind together by a specific interaction.
  • cognate peptides can share a region of identical sequence, which may be from 2 amino acids up to the full length.
  • the difference between cognate peptides can be a difference in sequence, or a difference due to attachment or removal of some atom(s) or groups (including one or more entire amino acids), or the addition to the peptide or a chemical group of any size (including oligonucleotides, peptides, “handles” such as biotin, and reactive groups able to subsequently bond to other molecules).
  • cognate BINDER or “cognate affinity capture reagent” means a specific affinity reagent (e.g., a specific binding reagent, BINDER) that is capable of specifically binding a cognate TARGET peptide and/or cognate STANDARD, in the sense that the cognate affinity capture reagent is designed, generated or selected to have a specific affinity for an epitope comprising part or all of its cognate peptide sequence.
  • BINDER specific binding reagent
  • degradation sequencing technology means a technology in which peptide molecules are disassembled one amino acid at a time (or in some cases two amino acids at a time), typically from one end, and the terminal amino acid identified, e.g., as one of the 20 common amino acids found in proteins, or as one of a subset of amino acids.
  • the identification can be obtained directly by optical or electrical readout, and in some cases the amino acid identity is translated into another molecular form (e.g., DNA) for later readout using a different technology.
  • drug and “therapeutic” mean a type of molecule that may, under appropriate circumstances of dosing and timing, interact with components of a subject’s body to modify biological processes, including disease processes, normal processes, aging and the like.
  • a drug may be a small molecule such as aspirin, or a macromolecule such as a protein (e.g., insulin), a nucleic acid (such as an anti-sense drug), or carbohydrate (such as heparin).
  • Drugs that comprise or are derived from monoclonal antibodies represent a growing class of therapeutic agents with particular advantages in terms of extreme specificity for endogenous protein and other targets involved in disease processes.
  • ESD electrospray ionization
  • flag is used herein as equivalent to Barcode, and may be any type of distinguishing molecular feature including, but not limited to, a polymer of dissimilar subunits encoding an identification relevant to sample analysis.
  • FRET Formar resonance energy transfer
  • a donor chromophore initially in its electronic excited state, may transfer energy to an acceptor chromophore through nonradiative dipole-dipole coupling.
  • the efficiency of this energy transfer is inversely proportional to the sixth power of the distance between donor and acceptor, making FRET extremely sensitive to small changes in distance, generally on scales of 1 to 10 nm.
  • immobilized enzyme means any form of enzyme that is fixed to the matrix of a support by covalent or non-covalent interaction such that the majority of the enzyme remains attached to the support of the membrane.
  • ligation means the joining of an end of a polymer chain (such as a nucleic acid) to an end of another polymer chain to form a combined linear polymer.
  • the term includes joining by enzymatic means (such as that of a DNA ligase, splicing means such as CRISPR, and other well-known molecular biology techniques for joining and splicing nucleic acid sequences) and chemical means (such as the use of click chemistry).
  • Linkage means a connection between originally separate molecules, and includes common covalent connections between units found in biopolymers and man-made polymers, as well as connections made using chemistries such as the well-known “click” chemistries, reactions such as those between amino groups and NHS esters, and formation of sugar-phosphate bonds when oligonucleotides are ligated together, as well as strong but non- covalent connections such as the interaction between biotin and streptavidin.
  • Linker means a segment of a molecule comprising an atomic configuration capable of, or arising from, formation of a linkage between two or more initially separate molecules.
  • MALDI means Matrix Assisted Laser Desorption Ionization and related techniques such as SELDI, and includes any technique that generates charged analyte ions from a solid analyte-containing material on a solid support under the influence of a laser or other means of imparting a short energy pulse.
  • Mass spectrometer means an instrument capable of separating molecules on the basis of their mass m, or m/z where z is molecular charge, and then detecting them. In one embodiment, mass spectrometers detect molecules quantitatively.
  • An MS may use one, two, or more stages of mass selection. In the case of multistage selection, some means of fragmenting the molecules is typically used between stages, so that later stages resolve fragments of molecules selected in earlier stages. Use of multiple stages typically affords improved overall specificity compared to a single stage device.
  • MRM mass spectrometry In which measured molecules are selected first by their intact mass and secondly, after fragmentation, by the mass of a specific expected molecular fragment.
  • MS configurations may be used to analyze the molecules described.
  • Possible configurations include, but are not limited to, MALDI instruments including MALDI-TOF, MALDI-TOF/TOF, and MALDLTQMS, and electrospray instruments including ESI-TQMS and ESLQTOF, in which TOF means time of flight, TQMS means triple quadrupole MS, and QTOF means quadrupole TOF.
  • the terms “molecular tag”, “molecular flag”, or “molecular feature” mean a structural component of a molecular construct that can be detected by a single molecule detector and assigned a significance in the interpretation of counted molecules (e.g., distinction between TARGET and STANDARD tags, barcodes identifying samples, barcodes identifying BINDERS, etc.)
  • particle or “bead” mean any kind of particle in the size range between lOnm and 1cm, and includes magnetic particles and beads.
  • peptide library preparation means a method used to convert the proteins in a biological sample into a collection of peptides modified so as to be detectable, identifiable and countable by a sequence sensitive single molecule detector.
  • proteolytic treatment or “proteolytic enzyme” may refer to any of a large number of different enzymes, including trypsin, chymotrypsin, LysC, ArgC, AspN, GluC, v8 and the like, as well as chemicals, such as cyanogen bromide, that, in the context of the methods described herein, acts to cleave peptide bonds in a protein or peptide in a sequence-specific manner, generating a collection of shorter peptides (a digest).
  • proteolytic enzyme may refer to any of a large number of different enzymes, including trypsin, chymotrypsin, LysC, ArgC, AspN, GluC, v8 and the like, as well as chemicals, such as cyanogen bromide, that, in the context of the methods described herein, acts to cleave peptide bonds in a protein or peptide in a sequence-specific manner, generating a collection of shorter peptides (a digest).
  • proteotypic peptide means a peptide whose sequence is unique to a specific protein in an organism, and therefore may be used as a stoichiometric surrogate for the protein, or at least for one or more forms of the protein in the case of a protein with splice variants.
  • sample means any complex biologically-generated sample derived from humans, other animals, plants or microorganisms, or any combinations of these sources.
  • Complex digest means a proteolytic digest of any of these samples resulting from use of a proteolytic treatment.
  • SAMPLE ID means a molecular barcode identifying the sample from which a molecule was obtained, i.e., its sample of origin.
  • a sample barcode present in a construct identifies the sample of origin and allows this identity to be recovered after constructs from multiple samples have been pooled and analyzed together in a single molecule detector.
  • ratchet mechanism means a molecular-scale device capable of pulling, pushing, unzipping (in the case of complementary strands of nucleic acids), or otherwise regulating the motion of linear molecules in discrete steps.
  • sequence-sensitive single molecule detection means detection and counting of individual molecules using a method capable of differentiating between different linear biopolymer sequences occurring in the molecules.
  • a “sequence-sensitive single molecule peptide detector” means a detector, instrument, technology, chemistry, or multi-component system that is able to achieve sequence-sensitive single molecule detection of peptides.
  • Such a detector need not achieve 100% accuracy to accomplish the objectives of the invention, since the number of different peptide sequences that must be distinguished from one another and counted in the invention is a small number (e.g., 1, 1-5, 1-10, 5-20, 10-50, 25- 100, 50-200, or more peptides) compared to number of peptides present in a digest of a complex biological sample (typically hundreds of thousands of peptides in the digest of a sample such as blood plasma).
  • a small number e.g., 1, 1-5, 1-10, 5-20, 10-50, 25- 100, 50-200, or more peptides
  • the term includes nanopore-based sequencing of nucleic acids, proteins and peptides; fluorescence-based methods such as fluorosequencing (36-38) including Edman methods; “reverse-translation” of peptide sequencing into DNA sequences followed by DNA sequencing (the “Proteocode” technology developed by Encodia: https://www.encodia.com/technology); “FRET” fingerprinting of peptides (36, 51) single molecule imaging methods (31) and other related methods.
  • sampling nanopore and “nanopore” as used herein refer to ion- conductive pores capable of functioning in an ion-impermeable membrane or vessel wall, and through which linear polymers can pass.
  • Typical nanopores are of biological origin (e.g., MspA), comprising one or more protein molecules, or created by engineering (e.g., versions of biological nanopores modified by mutation, rearrangement or combination of proteins; very small holes etched or drilled in thin metallic or ceramic substrates; or DNA assemblies).
  • a recording of the current flowing through a nanopore over time is referred to as a “trace” or “squiggle”.
  • sequential degradation refers to a process in which amino acid residues are removed, in sequence order, from one terminus of a peptide.
  • sequential degradation can be employed in a process in which a peptide’s terminal amino acid is “recognized” (e.g., by binding of one of a series of affinity regents specific for the various amino acids presented at the terminus) and its identity determined or recorded for later evaluation, after which the terminal amino acid can be cleaved off (e.g., using enzymes such as exoproteases, classical Edman chemistry, or other chemistries capable of removing a terminal amino acid) and the process repeated to determine a sequence of amino acids from the peptides’ terminus.
  • a process can employ recognition reagents that report information on two or more terminal amino acids at a time, and a cleavage process can be employed that removes two or more terminal amino acids per cycle.
  • the process need not sequence all amino acids in a peptide to generate TARGET peptide or STANDARD identifications and single molecule counts that are useful in the invention.
  • SISCAP A means the method described in US Patent No. 7,632,686, and in Mass Spectrometric Quantitation of Peptides and Proteins Using Stable Isotope Standards and Capture by Anti-Peptide Antibodies (SISCAP A) (Journal of Proteome Research 3: 235-44 (2004).)
  • small molecule or “metabolite” means a multi-atom molecule other than proteins, peptides and DNA; the term can include but is not limited to amino acids, steroid and other small hormones, metabolic intermediate compounds, drugs, drug metabolites, toxicants and their metabolites, and fragments of larger biomolecules.
  • stable isotope means an isotope of an element naturally occurring or capable of substitution in proteins or peptides that is stable (does not decay by radioactive mechanisms) over a period of a day or more.
  • the primary examples of interest in the context of the methods described herein are C, N, H, and O, of which the most commonly used are 13C and 15N.
  • tissue sample means a liquid sample generated from a sample of a solid biological tissue (e.g., liver, brain, skin, etc.) by a method that results in a solution containing tissue molecules.
  • a solubilized tissue sample may contain one, a few, many or almost all tissue molecules in solution.
  • Tissue solubilization can be achieved by a variety of methods including grinding, pulverization, ultrasonication, homogenization, and similar mechanical methods, as well as exposure to liquid solutions including detergents, solvents, protease inhibitors, salts, buffers, and the like.
  • STANDARD may be any altered version of the respective TARGET fragment or TARGET peptide that is 1) bound by the appropriate BINDER with an affinity and kinetics very similar to that with which the cognate TARGET fragment or TARGET peptide is bound, and 2) differs from it in a manner that can be distinguished from the cognate TARGET peptide by a sequence-sensitive single molecule peptide detector (e.g., by means of some sequence difference, amino acid modification, inclusion of a non-natural chemical group), or a mass spectrometer (either through direct measurement of molecular mass or through mass measurement of fragments, e.g., through MS/MS analysis), or by another equivalent means.
  • a suitable TARGET peptide and its STANDARD would produce distinguishable ion current signatures while passing through the nanopore.
  • STANDARD tag or “STANDARD flag” means a molecular tag or feature within or attached to a STANDARD peptide enabling a single molecule detector to distinguish the STANDARD tag from a TARGET tag.
  • the STANDARD tag may be the absence of any TARGET tag.
  • a STANDARD tag may consist of the absence of a feature present in a cognate TARGET tag, the presence in the STANDARD tag of a feature absent in the cognate TARGET tag, or the presence of different features in the STANDARD tag and TARGET tag. Multiple different STANDARD tags may be used, provided that the STANDARD tags are distinguishable from any TARGET tags.
  • standardized sample digest or “standardized sample” means a protein or peptide sample to which one or more STANDARD version(s) of one or more TARGET peptide or protein analytes have been added in an amount that is a) known (in terms of concentration, mass, moles or other physical units) or b) consistent between samples (allowing quantitative comparison of TARGET peptide amounts between samples even if the absolute amount of the STANDARD added is not known).
  • the ratio between TARGET peptide and STANDARD represents and preserves information concerning the amount of the TARGET peptide in the sample, allowing this information to be recovered by later quantitative analysis even if a variable amount of the TARGET peptide and STANDARD pair is recovered during a suitable enrichment process (i.e., a process that does not distinguish between the TARGET peptide and STANDARD peptides) prior to analysis.
  • stoichiometric refers to relationships between quantities of different molecules.
  • the word stoichiometry refers to presence of different elements or compounds in simple integral ratios, as prescribed by an equation or formula.
  • a TARGET peptide sequence that occurs once in the sequence of a parent protein target has a 1 : 1 stoichiometric relationship with the target, and can therefore be used as a quantitative surrogate to measure amounts of the protein.
  • stoichiometry means a ratio relationship between molecules (or elements) that may have any numerical value, including non-integer values.
  • two different proteins in blood or in a cell can have a relative stoichiometry extending over a very broad range, in principle from one molecule (the lower limit) of a low abundance protein to hundreds of billions of molecules (or more) for a high abundance protein in the same sample.
  • stoichiometric flattening refers to processes by which different molecules (e.g., peptides) that are present in a sample (e.g., a biological sample digest) at different concentrations (or in different amounts in mass or molar terms) are brought closer to equal concentrations or amounts.
  • An example of such a process is an affinity enrichment method in which a larger relative fraction of a low abundance molecule is captured while a smaller relative fraction of a higher abundance molecule is captured (e.g., by adjusting the amounts of the corresponding affinity reagents, such as antibodies, used to accomplish this capture), the captured molecules being then separated from the sample and released from capture, resulting in a more nearly equivalent amount of the molecules in the processed sample.
  • affinity enrichment method e.g., by adjusting the amounts of the corresponding affinity reagents, such as antibodies, used to accomplish this capture
  • internal standard versions of the molecules e.g., STANDARD versions of TARGET peptides
  • both TARGET peptide and STANDARD measured in the resulting enriched sample are added before this enrichment step, and both TARGET peptide and STANDARD measured in the resulting enriched sample.
  • the ratio of TARGET peptide to STANDARD can be used to calculate the TARGET peptide abundance in the original sample.
  • Subject or “Patient” means a biological individual such as an individual human being or an animal.
  • TARGET or “TARGET peptide” means a peptide chosen as a TARGET fragment of a protein or peptide.
  • the TARGET may be any piece of a protein or peptide which can be produced by a reproducible fragmentation process (e.g., digestion using a proteolytic enzyme, or without a fragmentation if the TARGET fragment is the whole analyte) and whose abundance or concentration can be used as a surrogate for the abundance or concentration of the analyte.
  • TARGET tag or “TARGET flag” means a molecular tag or feature within or attached to a TARGET peptide enabling a single molecule detector to distinguish the TARGET tag from a STANDARD tag.
  • the TARGET tag may be the absence of any STANDARD tag.
  • a TARGET tag may consist of the absence of a feature present in a cognate STANDARD tag, the presence in the TARGET tag of a feature absent in the cognate STANDARD tag, or the presence of different features in the TARGET tag and STANDARD tag. Multiple different TARGET tags may be used, provided that the TARGET tags are distinguishable from any STANDARD tags.
  • tag is used herein as equivalent to Barcode, and may be any type of distinguishing molecular feature including, but not limited to, a polymer of dissimilar subunits encoding an identification relevant to sample analysis.
  • T/S tag or “T/S flag” means either a TARGET or a STANDARD tag, or a set of mixed TARGET and STANDARD tags, as will generally be present in a standardized sample.
  • VEHICLE means a molecule (for example a polymer such as an oligonucleotide or a polyethylene glycol, a linker comprising chemically reactive sites such as NHS or click chemistry groups, or a macromolecular carrier such as a bead or a “SNAP” particle), to which TARGET and STANDARD peptides (together with their associated distinguishing tags) can be linked in order to facilitate single molecule detection.
  • a VEHICLE can include barcodes identifying a sample of origin, barcodes identifying a BINDER used to enrich specific cognate TARGET and STANDARD peptides.
  • VEHICLES can also include one or more additional molecular structures that facilitate the transport of TARGET and STANDARD peptides to a single molecule detector, their presentation to such a detector, their transport through a detector (such as a nanopore), or their immobilization to a site or in a region observed by such a detector.
  • the inventions herein provide improved quantitative measurement of proteins, and peptides derived from them, through improvements to previous methods including the replacement of mass spectrometric detection by other detection techniques capable of identifying and counting individual molecules.
  • Figure 1 Examples of abundances, TARGET peptide and STANDARDS for proteins in human plasma
  • Figure 12 Assembly of a Peptide:Oligo Construct for Nanopore Detection
  • Figure 14 Scheme for double ligation of tryptic peptides with c-terminal lysine (i.e., two amino groups) using “Click” chemistry
  • Figure 16 Peptide Loop Insertion with Enzymatic Cut: Insertion of TARGET Peptide Into Oligo VEHICLE as a Loop, followeded by Sequence-Specific Enzymatic Oligo Cleavage
  • Figure 30 Ligation of Double-Tag Rope-Tow Constructs
  • Figure 31 Analysis of Detection Events in Affinity Imaging Detection
  • Figure 35 Multiplex Test Panel for SARS-CoV-2: A multiplex combination of molecules detectable using nanopore sequencing, including a) peptides from the SARS-CoV- 2 NCAP protein; b) SARS-CoV-2 Spike and NCAP protein linear epitopes whose binding by patient antibodies indicates vaccination or exposure to the virus; c) proteotypic peptides of three proteins used as plasma biomarkers of inflammation; and d) the RNA genome of SARS-CoV-2.
  • Figure 39 Assembly of a Peptide:Oligo Construct for Reverse Translation Detection
  • a molecular construct and vehicle comprising:
  • the molecular tag is a target tag that identifies the peptide as a peptide created by proteolytic digestion of a biological sample.
  • the molecular construct and vehicle of paragraph 1 wherein the peptide comprises a synthetic peptide and the molecular tag is a standard tag that identifies the synthetic peptide as an internal standard.
  • the molecular construct and vehicle of paragraph 2 wherein more than 90% percent of the target molecules present in said sample digest are linked to target tags.
  • the molecular construct and vehicle of paragraph 2 further comprising a SAMPLE barcode identifying the sample of origin.
  • the molecular construct and vehicle of paragraph 1 further comprising a BINDER barcode identifying a binder to which the construct has been bound.
  • the molecular construct and vehicle of any of the preceding paragraphs wherein the barcode or the tag is an oligonucleotide.
  • the molecular construct and vehicle of any of the preceding paragraphs wherein the sequence-sensitive single molecule detector comprises a nanopore, a single molecule imaging system, or a single molecule degradative peptide sequencer.
  • a plurality of reagents comprising: the molecular construct and vehicle of paragraph 3, a tag reagent capable of reacting with a target peptide in a proteolytic digest of a biological sample to create the molecular construct of paragraph 2, and a binder that binds to said molecular constructs of paragraph 2 and paragraph 3 with similar affinity and kinetics.
  • the plurality of reagents of paragraph 10 wherein the binder contacting a standardized sample digest comprising the molecular construct and vehicle of paragraph 2 and paragraph 3 binds the molecular construct of paragraph 2 and paragraph 3 in a ratio equal within 2%, 5%, 10% or 20% to the ratio in which they are present in said standardized sample digest.
  • the plurality of reagents of paragraph 10 further comprising: one or more reagents capable of proteolytic fragmentation of sample proteins, and one or more solid supports for binders, including magnetic beads, non-magnetic beads, porous supports usable in packed columns, or chemical reagents capable of introducing reactive groups into peptides.
  • a calibrator sample for peptide quantitation by a sequence-sensitive single molecule detector comprising an amount of the molecular construct and vehicle of paragraph 2 and paragraph 3 in a known ratio.
  • the calibrator sample of paragraph 13 wherein at least one of the constructs is present in known amount or concentration.
  • a standardized sample digest derived from a proteolytic digest of a biological sample comprising: an amount of a molecular construct comprising a target tag and a target peptide, said construct being a target peptide construct and an amount of a molecular construct comprising a standard tag and a peptide whose sequence is the same or similar to the sequence of said target peptide, said construct being a standard peptide construct, wherein the target peptide is generated by proteolytic digestion of a target protein in said biological sample, wherein said target and standard tags can be distinguished by a single molecule detector and comprise chemical or structural groups covalently joined to peptides in their respective constructs, wherein said target tag is covalently attached to a plurality of the peptides present in said sample digest, wherein said target peptide construct comprises more than 90% of the target peptide molecules present in said sample digest and wherein said standard peptide construct is prepared separately and added to said digest in a known amount, or in a consistent relative amount across a multiplicity
  • the standardized sample digest of paragraph 7 wherein the number of molecules of the standard peptide construct added to the sample digest differs by no more than a factor of 100 from the number of molecules of the target peptide construct in said sample digest.
  • the standardized sample digest of any one of paragraphs 15-16 further comprising one or more additional standard peptide constructs having a different standard tag from each other and with each construct at a different relative abundance.
  • a stoichiometrically-flattened standardized sample comprising a plurality of pairs of cognate standard and target peptide constructs enriched from a standardized proteolytic digest of a biological sample by binding to their respective cognate binders, wherein a pre-enrichment ratio calculated by dividing the number of molecules of a first target peptide construct that is the most numerous of said target peptide constructs in the standardized sample digest by the number of molecules of a second target peptide construct that is the least numerous of said target peptide constructs in the standardized sample digest is more than 10 times larger than a post-enrichment ratio calculated by dividing the number of molecules of said first target peptide construct by the number of molecules of said second target peptide construct in said enriched sample.
  • a method for the measuring the amount of a selected target protein in a biological sample comprising: proteolytically digesting said sample, modifying a plurality of peptides in the digested sample by adding a target tag to form a plurality of constructs comprising a selected target peptide derived from, and proteotypic of, said target protein, said plurality of constructs being target construct molecules, adding an amount that is known and/or consistent between a set of samples of a prepared standard peptide construct that is a cognate of said selected target peptide construct and comprises a standard tag, forming a standardized digest, enriching said cognate target and standard peptide constructs by contacting said standardized digest with a cognate binder, forming bound constructs, separating said bound constructs from unbound constructs to form enriched constructs, releasing said enriched constructs from said binder, linking said enriched constructs to a vehicle capable of presenting said enriched constructs to a sequence-sensitive single molecule detector, counting said enriched target construct molecules and said
  • the method of claim 22, wherein the calculating is performed by multiplying the amount of standard construct added by the ratio of the number of target construct molecules counted to the number of standard construct molecules counted by said detector. the method of claim 22 or 23, wherein, independently or in any combination:
  • said proteolytic digestion comprises at least two sequential steps resulting in peptide cleavage at different sites, and wherein peptides are covalently modified between two such steps (or wherein said first sequential step cleaves at lysine residues),
  • tags are added to said peptides by reaction with peptide amino groups
  • tags are added to said peptides by chemical reaction at a single site in an n- terminal amino acid
  • tags are added to said peptides by chemical reaction at a c-terminal lysine residue
  • said constructs comprise a non-peptidic component attached to an n-terminal amino acid and a different non-peptidic component attached to a c-terminal lysine residue
  • said proteolytic digestion comprises at least two sequential steps resulting in peptide cleavage at different sites, and wherein peptides retain an unmodified n- terminal amino group when presented to said detector,
  • a sample barcode is linked to said constructs encoding the identity, or relative position within a sample set, of said standardized samples; a plurality of said standardized samples is pooled; said sample barcodes associated with construct molecules are read using a sequence-sensitive single molecule detector; and the counts of target and standard construct molecules for each sample are separated based on said sample ID barcode identifying the sample from which they were enriched, and wherein said barcode may be an oligonucleotide,
  • a binder barcode is linked to said constructs identifying the binder by which they were enriched, and wherein said barcode may be an oligonucleotide,
  • said detector determines all or part of the amino acid sequence of the peptide components of said construct molecules by a stepwise degradative process (or wherein the sequence of said target peptide is encoded in a nucleic acid component linked to target tags, standard tags, sample tags, binder tags and vehicles, and is read, and counted, by conventional DNA sequencing),
  • said detector recognizes and decodes the target and standard constructs comprising target peptides, tags and optional additional barcodes using time- dependent variations in an electrical or optical parameter measured while the construct molecules transit a nanopore (or wherein said nanopore is biological, e.g., the common protein nanopores occurring in nature or derivatives thereof, or nucleic acid constructs, or a hole in a solid state inorganic material, e.g., Si3N4, SiO2, graphene, or MoS2) (or wherein said target and standard constructs comprise non-peptide polymers including nucleic acids that engage with a molecular motor, e.g., a polymerase or helicase to regulate the speed at which the constructs move through a nanopore (or wherein the nanopore detection is continued, and construct counts accumulated, until reaching pre-determined threshold numbers of counts, which may be based on counts required for each peptide sequence, e.g., to provide a pre-determined precision according to counting statistics, or counts required to achieve a pre
  • said constructs are located on a support and detected by sequential binding of a plurality of binders comprising a detectable label, wherein independently or in any combination: o peptides in constructs are identified by using cognate binders labeled with optically detectable moieties including fluorescent dyes or proteins o multiple binders recognize distinct epitopes within a target peptide o kinetic binding analysis of binder-construct interactions is used to improve the specificity of detection o detection of binding or lack of binding by binders is interspersed with sequential removal of n-terminal amino acids or peptide segments o one or more of the said target tags, standard tags, sample tags, binder tags, or vehicles is an oligonucleotide and is detected by hybridization of an optically-labeled complementary oligonucleotide.
  • a method for the measuring amounts of a plurality of selected target proteins in a biological sample comprising independently or in any combination: proteolytically digesting proteins in said sample to yield peptides, modifying said peptides by covalent chemical addition of a target tag to form a plurality of constructs, including target construct molecules comprising selected proteotypic target peptides derived from said target proteins, adding prepared standard constructs that are cognates of said target constructs and comprise a standard tag in amounts that are known and/or consistent between a set of samples, forming a standardized digest, enriching said cognate target and standard construct pairs by contacting said standardized digest with cognate binders, forming bound constructs, separating said bound constructs from unbound constructs to form enriched constructs, releasing said enriched constructs from said binder, linking said enriched constructs to a vehicle capable of presenting said enriched constructs to a sequence-sensitive single molecule detector, recognizing and counting said enriched target construct molecules and said enriched standard construct molecules using a sequence-sensitive single
  • the inventions herein provide improved quantitative measurements of the amounts of proteins, in a way that is highly specific, extremely sensitive, multiplexable with wide dynamic range, capable of very high throughput with low cost per measurement, and amenable to implementation on compact, inexpensive equipment.
  • the invention combines a series of known and new processes in a novel combination, and provides novel advantages over existing protein measurement methods.
  • the invention comprises proteolytic fragmentation of target protein(s), addition of internal standard versions of one or more peptides in known amount (these standard peptides being detectably different from the sample digest peptides based on incorporation of molecular tags into either sample digest peptides, added standard peptides, or both), enrichment of selected sample peptides (TARGETs) and cognate internal standards (STANDARDS) by specific affinity selection on BINDERS, and single molecule identification and counting of the resulting enriched peptides.
  • the invention provides the maximum sensitivity attainable by direct analyte detection (i.e., detection without amplification).
  • TARGET peptides from among the candidate peptides produced by digestion of a target protein, based on theoretical (e.g., in silico) and/or experimentally determined features and performance as a quantitative surrogate of the protein analyte.
  • TARGET molecular tags Coupling TARGET molecular tags to TARGET peptides to form TARGET constructs, coupling STANDARD molecular tags to STANDARD peptides to form STANDARD constructs, or both.
  • sample-identifying barcodes to TARGET and STANDARD peptide constructs in a sample digest when samples are to be pooled prior to single molecule detection.
  • Sample barcode addition can be carried out either before or after specific enrichment by BINDERS.
  • the present disclosure provides methods for preparation and analysis of protein- containing biological samples compatible with any of the sequence-sensitive single molecule detection methods.
  • the invention provides a means to measure the amount of a peptide molecule (termed a "TARGET” peptide), typically a proteolytic fragment of a sample protein resulting from proteolytic digestion of a biological sample.
  • sample proteins are proteolytically digested and standardized by addition of an internal standard peptide or peptide construct (STANDARD) to create a standardized sample, from which the TARGET peptide and STANDARD are enriched and individually counted using sequence-sensitive single molecule detection (e.g., nanopore sequencing).
  • sequence-sensitive single molecule detection e.g., nanopore sequencing
  • TARGET peptides the analytes to be measured
  • STANDARD internal standard
  • BINDERS specific affinity reagents
  • barcodes used to distinguish TARGET and STANDARD constructs and/or to distinguish peptides on constructs derived from different samples.
  • these components are selected, prepared or optimized in ways surprisingly distinct from earlier work in which mass spectrometry has been used for peptide detection.
  • TARGET peptide segments within it are selected as “TARGET” peptides. Selection can be accomplished using an “in silico” approach (e.g., by “digesting” the sequence of a target protein, known for example from the genome sequence of the relevant species, using a computer to cut the sequence at sites predicted based on the known cleavage specificities of a selected protease or chemical fragmentation method), an experimental approach (e.g., from a list of peptide fragments actually observed in a digest of the protein or a sample containing it), or both.
  • a preferred TARGET peptide can be defined in some embodiments by criteria selected from a set intended to identify peptides that optimize the performance of the assay. In some embodiments it is preferred that a TARGET peptide has one or more characteristics that improve its performance in an assay according to the invention, including, but not limited to, peptides that are or have:
  • Efficient digestion Produced rapidly and in high (ideally >90%) yield by digestion of the target protein through an efficient, inexpensive proteolytic treatment with an enzyme (e.g., a protease such as trypsin) or a high-yield chemical treatment (e.g., CNBr cleavage);
  • an enzyme e.g., a protease such as trypsin
  • a high-yield chemical treatment e.g., CNBr cleavage
  • Proteotypic sequence A sequence that is unique to the target protein (unless measurement of a family of proteins is an object of the assay), i.e., that it is "proteotypic" for the protein and appears in no other proteins likely to be found in the intended sample (or more preferably, that it occurs in no other natural protein coded for by the genome of the species of interest), and that it occurs in the target protein sequence in a known number of locations (typically one location, but potentially more than one if the peptide sequence is repeated in the protein);
  • a sequence containing structural features e.g., “immunogenic” epitopes in the case of antibody affinity reagents
  • that facilitate development of specific affinity reagents capable of binding the peptide with high affinity, and specifically a slow off- rate e.g., monoclonal antibodies, aptamers, etc.
  • Solubility Favorable physico-chemical properties, including solubility in aqueous solutions, little or no binding to materials used in sample preparation and analysis vessels and devices (e.g., nanopores), and little or no tendency to aggregate.
  • Stability Low rate of spontaneous chemical degradation (e.g., by methionine or tryptophan oxidation, asparagine or glutamine de-amidation, etc.).
  • Recognizable sequence A sequence that has features making it easily distinguishable from other sequences using the chosen sequence-sensitive single molecule detection means. If, for example, the detection means is nanopore sequencing, then peptide sequences that produce distinctive current traces as molecules pass through a nanopore, thus allowing them to be distinguished from other peptides, are preferred. Typical nanopores produce current signals that are reflective of a stretch of 3-6 contiguous amino acids (a “kmer”) inside the pore. Amino acids have a multiplicity of different side-chain volumes, and while these volumes do not always directly determine the nanopore “blockade currents”, sequences with more variable patterns of side chain volume are preferred.
  • the detection means involves recognition and recording of a terminal amino acid and its removal in a cyclical process to expose a new terminus (e.g., using degradative sequencing technology), then sequences that include amino acids for which the terminal recognition is most accurate, and/or least confusing, are likely to be preferred. If the detection means involves binding of recognition molecules to peptide epitopes, then peptides with multiple distinct epitopes are preferred.
  • cysteine is avoided due to its potential to form bridges between peptides, or alternatively a step is included in the sample preparation to block cysteines (e.g., via alkylation by iodoacetamide).
  • one or more cysteine residues present in a peptide are used as reactive sites for introduction of linkages to other molecules, or labels that can assist in peptide recognition by a sequence-sensitive single molecule detector.
  • Amino groups Specific numbers and sites of amino groups (lysine side chain and n-terminal). As the most preferred sites for chemical linkage to other molecules, the number and position of amino groups is an important factor in the design of some constructs required for efficient presentation of peptides to a sequence-sensitive single molecule detector. For example, a tryptic peptide with c-terminal arginine and no internal lysine residues has a single amino group located at its n-terminus (i.e., it is “single amino”), and therefore has a unique site at which certain amino-reactive chemistries can establish a covalent linkage between the peptide and another molecule.
  • a tryptic peptide having a c-terminal lysine and no internal lysines has two amino groups: one at the n-terminus and one at the epsilon amino group of the lysine side chain, thus facilitating methods in which a peptide is linked into a longer polymer by coupling at both ends.
  • Carboxyl groups Specific numbers and sites of carboxyl groups (glutamic and aspartic acid side chains and at the c-terminus). As an alternative preferred site for chemical linkage to other molecules, the number and position of carboxyl groups is an important factor in the design of some constructs required for efficient presentation of peptides to a sequence-sensitive single molecule detector.
  • Electric charges Specific numbers of charged amino acids (e.g., lysine, arginine, histidine, and the n-terminus with positive charges; and glutamic and aspartic acids and the c- terminus with negative charges) and the sum of these (i.e., the net charge of the peptide at the working pH of a sequence-sensitive single molecule detector).
  • charged amino acids e.g., lysine, arginine, histidine, and the n-terminus with positive charges
  • the total charge of a peptide can significantly affect its movement through a nanopore under the influence of an electric potential between cis and trans compartments, between which the nanopore serves as a conduit: in some embodiments a net negative peptide charge is preferred, such that the peptide is pulled through the pore from cis to trans (i.e., in the same direction a negatively charged oligonucleotide would be). In some embodiments a net positive peptide charge is preferred, such that a peptide is dragged through the pore by another molecule to which it is attached (and which has a net negative charge).
  • positive charge(s) be localized towards the end of the peptide that is last to enter the pore (i.e., the trailing end of the peptide) so as to help maintain the peptide in an extended, linearized form as it passes into and through a nanopore.
  • TARGET peptides are generated through cleavage of proteins by trypsin (an inexpensive and well-understood protease that cuts polypeptide chains preferentially c-terminal to lysine and arginine residues). Tryptic TARGET peptides can be selected to contain either 1 (“Single amino”) or 2 (“Double amino”) amino groups.
  • proteases e.g., Lys-C, Arg-C, pepsin, papain, chymotrypsin, etc.
  • chemical cleavage reactions cyanogen bromide (CNBr) cleaving at methionine (Met) residues; BNPS- skatole cleaving at tryptophan (Trp) residues; formic acid cleaving at aspartic acid-proline (Asp-Pro) peptide bonds; hydroxylamine cleaving at asparagine-glycine (Asn-Gly) peptide bonds, and 2-nitro-5-thiocyanobenzoic acid (NTCB) cleaving at cysteine (Cys) residues) can also generate peptides with characteristics allowing their use as TARGET peptides.
  • peptides with a single amino group are preferred because this provides a unique and chemically convenient site that can be covalently coupled to other molecules used in enrichment or detection of peptides.
  • Such peptides are used in some embodiments to create “rope-tow” constructs for nanopore sequencing through combination with oligonucleotides, as described below.
  • the TARGET peptide has either no net charge or a net positive charge in order to facilitate peptide movement within a nanopore.
  • Peptides of the “Single amino” group may be selected to contain no aspartic or glutamic acids so as to minimize the contribution of negative charges on the peptide (i.e., the peptide preferably has zero or net positive charge that, in some embodiments, helps resist being pulled through a nanopore by its attachment to a negatively charged polymer construct), and no cysteine (so as to avoid the necessity for a method step to block these reactive groups).
  • Single amino peptides with no aspartic or glutamic acid residues also have a single carboxyl group, which is useful in some embodiments that rely on anchoring a peptide to a support via its c-terminus, leaving the unmodified n-terminus available for binding of affinity reagents or sequential degradation (e.g., by Edman chemistry).
  • TARGET peptides e.g., “Double amino” peptides
  • a single lysine residue present at or near the c-terminus whose epsilon-amino group provides a second reactive amino group, in addition to the n-terminal amino group at the opposite end of the peptide molecule.
  • Linkage through these two amino groups allows a peptide to be coupled “in-line” with preceding and succeeding polymers via amine-reactive chemistry to form a continuous thread.
  • Peptides with an internal lysine, or multiple lysines are less preferred in such embodiments due to the potential for multiple non-linear constructs.
  • TARGET peptides are selected for other configurations of reactive sites.
  • a peptide it is preferred that a peptide have a single carboxyl group and this criterion can be met by peptides with no aspartic or glutamic acid residues while possessing a free c-terminal carboxyl.
  • other specific amino acids are desired so as to facilitate labeling of peptide molecules with amino acid-specific detection reagents.
  • aspartic or glutamic acid carboxyl groups in the peptide can be converted to positive charged sites by a chemical modification, e.g., activation by a carbodiimide and reaction with a reagent having an amino group (that couples to the carboxyl) and a second positively charged group.
  • peptides having post-translational modifications are selected.
  • the n-terminal amino group of tryptic peptide VHLTEEPK from the beta chain of human hemoglobin (Hb) is modified by glycation in a fraction of molecules in the blood as a result of slow reaction with blood glucose (the modified Hb is referred to as HbAlc, and is used clinically as a measure of average blood glucose over time in a test for diabetes).
  • HbAlc the modified Hb is referred to as HbAlc, and is used clinically as a measure of average blood glucose over time in a test for diabetes.
  • the unmodified form of this peptide is thus “Double amino”, while the modified form is “Single amino”.
  • the n-terminal amino group of unmodified peptide is first blocked by reaction under conditions favoring reaction with the lower pK n-terminal amino group, and subsequently both modified and unmodified forms of the peptide are coupled to other molecules by the single remaining amino group of the c-terminal lysine.
  • an internal standard is typically a synthetic, same-sequence version of a TARGET peptide including one or more amino acids comprising stable isotope labels (typically referred to as a Stable Isotope Standard or SIS) that allow it to be distinguished from the sample-derived TARGET peptide by mass measurement in the MS instrument (i.e., the well-known method of “isotope dilution”).
  • MS mass spectrometry
  • a key feature of this approach to internal standardization is that the same method can be used to create the standard in all cases: for example, with tryptic peptides an effective standard can be made by synthesizing the TARGET sequence with a c-terminal amino acid (typically either lysine or arginine) containing stable isotopes (e.g., all 12C replaced by 13C and all 14N replaced by 15N). Therefore no advanced design or experimental testing and selection is required in a particular case: one approach works in all cases. This is not the case in the present invention, in which some embodiments require an involved selection and/or manufacturing strategy to identify and produce cognate TARGETS, STANDARDS and BINDERS that function properly together in the invention. 6.4.2 Limited use of internal standards in nucleic acid-based technologies
  • nucleic acid quantitation is important (e.g., in detection of one or more specific sequences such as SARS-CoV-2 sequences)
  • PCR and related technologies are typically employed that rely on amplification, and the result is expressed in terms of the number of amplification cycles required to achieved a certain detection threshold (e.g., a “Ct value”).
  • a certain detection threshold e.g., a “Ct value”.
  • a single molecule detection technology is not expected to be able to reliably detect small mass differences (a few atomic mass units) between the otherwise identical chemical structures of a TARGET peptide and an isotopically-labeled STANDARD with the same sequence (i.e., same chemical structure), and therefore other differences in molecular character besides isotopic mass must be employed.
  • a STANDARD is identified and prepared for each TARGET peptide and added to the sample in known or constant amount before, during or after digestion, but before enrichment, to act as a quantitative reference at the detection step.
  • a sample digest to which STANDARDS corresponding to cognate TARGETS have been added in known or constant amount is referred to herein as a “standardized sample digest”.
  • a sample digest may be standardized with respect to a single TARGET, or with respect to multiple TARGETS.
  • the amount of a TARGET peptide can be compared with the amount of added STANDARD, and thereby measured, by multiplying the amount of STANDARD by the observed ratio of TARGET peptide to STANDARD in a sample.
  • the STANDARD is very similar to the TARGET peptide, i.e., as close as possible to being indistinguishable from it during steps of the workflow before the detection step, while being clearly distinguished from it at the detection step - in other words a cognate sequence peptide standard herein referred to as a STANDARD.
  • the STANDARD serves as an internal standard against which the TARGET peptide amount is compared, for example by comparing the number of TARGET peptide molecules to the number of STANDARD molecules, providing a ratio measurement.
  • a known amount of STANDARD e.g., a known mass, or molar amount, or known number of molecules
  • multiplication of the ratio by this amount yields the amount of TARGET peptide (or mass or the number of TARGET peptide molecules) in the sample digest.
  • a STANDARD construct and the STANDARD peptide it comprises are advantageous for a STANDARD construct and the STANDARD peptide it comprises to be as similar as possible to the respective TARGET construct and the TARGET peptide it comprises, since this similarity minimizes the probability that the ratio between them (which encodes the desired quantitative result of the analytical process) will be skewed or altered by some physical or chemical process in any step of an analytical workflow prior to detection, including enrichment by a cognate BINDER.
  • TARGET and STANDARD constructs (or the respective peptides) must be highly specific in order to bind these peptides and not the enormous variety of other peptides present in a digest of a biological sample
  • some embodiments of the invention make use of TARGET and STANDARD peptides that are identical (i.e., perfect cognates).
  • Alternative approaches, in which limited modifications of peptide sequence or structure distinguish TARGET and STANDARD peptide components are less ideal and less general, but in some cases may be practically useful.
  • the non-peptidic components of the TARGET and STANDARD constructs should also be cognates, though with relaxed similarity constraints. It is therefore advantageous for the non- peptidic components of the TARGET and STANDARD constructs to have similar physical properties such as mass, physical dimensions, shape, charge, hydrophobicity, solubility, etc.
  • oligonucleotide TARGET and STANDARD tags wherein the tags have the same length and may have the same base composition (implying the same molecular mass), but different sequences, allowing them to be distinguished by DNA sequencing or by specific hybridization to complementary probes.
  • Linkage of such oligo tags to TARGET to STANDARD peptides can be accomplished using bifunctional linkers (for example including flexible polymer components such as polyethylene glycol between the oligo and peptide attachment sites) that reduce any steric hindrance the oligo may exert on the peptide that could affect binding to a BINDER.
  • bifunctional linkers for example including flexible polymer components such as polyethylene glycol between the oligo and peptide attachment sites
  • Such a level of similarity reduces the probability of skewing of the TARGET to STANDARD construct ratio because of differences in the diffusion, charge repulsion, epitope- masking, or solubility of the two constructs.
  • TARGET and STANDARD constructs form a cognate construct pair.
  • a STANDARD construct, the cognate BINDER, a TARGET tag and any linker required to link the TARGET tag to TARGET peptide molecules in a sample digest form a cognate reagent set useful for specific measurement of the TARGET peptide and its parent target protein in a sample (i.e., they can serve as a kit for measuring the protein).
  • the STANDARD is created by replacement or alteration (e.g., by chemical modification) of one or more amino acids in the TARGET peptide sequence, or by addition of amino acids or other chemical structures.
  • the replacement, addition or alteration a) does not result in any significantly difference in binding of the TARGET peptide and the cognate STANDARD to the cognate BINDER, and b) results in an easily detected change in the result from a sequence-sensitive single molecule detector (e.g., a different ion trace during transit of a nanopore compared to the TARGET peptide, or a different amino acid sequence detected by a degradative sequencing process, or a difference in the set of epitope-specific binders detected by a single molecule imaging platform).
  • a sequence-sensitive single molecule detector e.g., a different ion trace during transit of a nanopore compared to the TARGET peptide, or a different amino acid sequence detected by a degradative sequencing process, or a difference in the
  • one or more amino acids or other chemical groups can be added to either the n-terminal or c-terminal end of the TARGET peptide to create a STANDARD, with the same constraints (e.g., an easily detected change in the result of a sequence-sensitive single molecule detector, but no significantly difference in binding of the TARGET peptide and STANDARD to the cognate specific BINDER).
  • these replacements and/or modifications are made to residues outside the peptide epitope to which a selected BINDER binds - such epitopes are typically linear contiguous regions of 4-8 amino acids in the case of IgG antibody BINDERS, leaving numerous potential modification sites available outside this region in a TARGET peptide 8-25 amino acids long.
  • a single serine (S) residue may be added to either the n-terminus or c-terminus of the sequence of a TARGET peptide to create a cognate STANDARD.
  • Any other amino acid, or sequence of amino acids, that is clearly recognized by a sequence-sensitive detector can in theory be used instead of serine, the choice of added amino acid(s) being free, constrained only by the requirements of STANDARDS generally (i.e., BINDER binding equivalent to that of the cognate TARGET peptide, etc.) and any sequence constraints arising from any chemistry required to present peptides for detection.
  • addition of a residue after the lysine provides a STANDARD that is chemically identical to the TARGET peptide along the entire chain of connected atoms between the peptide’s two amino groups (the n-terminal amino and lysine epsilon amino group) while comprising an appended serine residue “side chain”.
  • an amino acid such as serine can be added to the n-terminus.
  • any amino acid(s) or chemically linkable group of atoms can be added to one or the other terminus, to an internal amino acid, or to both termini, to create a STANDARD version of a TARGET peptide sequence.
  • Figure 2 illustrates the challenge in practice of designing a simple addition of amino acids to the n-term or c-term of a peptide TARGET to create a cognate STANDARD while preserving equivalent binding to a BINDER.
  • LLGPHVEGLK proteotypic for human mesothelin
  • Each variant was mixed with a similar amount of the unmodified TARGET (LLGPHVEGLK), and the ratio of variant candidate STANDARD and TARGET signals measured by mass spectrometry before and after enrichment by a rabbit monoclonal antibody with specific affinity for this peptide. All n-terminal additions result in a dramatic decrease in binding of candidate STANDARDS compared to the TARGET: none are enriched to more than 4% of the level of the TARGET. The epitope recognized by this antibody thus probably includes the n-terminus and cannot accommodate an added amino acid.
  • C-terminal additions are successfully enriched, with recoveries compared to TARGET of 27% (-PC) to 5386% (-PW); i.e., widely varying depending on the specific amino acid added after the proline.
  • the antibody binding therefore appears to be affected by c-terminal additions, and in some cases (e.g., -PW) these c-terminal variants bind in preference to the original TARGET against which the antibody was made.
  • Only 2 of the 39 variants examined bind the TARGET and candidate STANDARD at near-equivalence: c-terminal -PP and -PQ additions bind with approximately 99% and 102% recovery relative to the TARGET sequence.
  • a STANDARD with an added residue at the c-terminus (the end closest to a DNA motor on the cis-side of the pore) so as ensure that the STANDARD variation is read, even if the peptide is longer than the nanopore’ s read depth and as a result some of the peptide’s n-terminal residues are not read during a period of controlled movement of the peptide through the nanopore.
  • Sets of STANDARDS generated by addition of a constant c-terminal residue or residue pair to form the STANDARDS will in general require accurately reading a minimal subsequence of 2 amino acids more than the minimum required to distinguish the TARGETS themselves.
  • longer reads of 5, 6, 7, 8, or 10 amino acids, or the entirety of the peptide’s sequence may be required to identify the TARGET peptide and STANDARD molecules with sufficient accuracy (e.g., 99.5%, 99%, 98%, or 95% accuracy) to enable use of the ratio TARGET-to- STANDARD molecule counts to calculate a precise estimate of TARGET peptide amount.
  • an added residue indicating STANDARD status is preferred at the n-terminus so as to ensure that the distinction between TARGET and STANDARD peptides is read at the beginning, and does not require sequencing to the end of the entire peptide.
  • Accurately reading a minimal subsequence of 3 amino acids starting from the n-terminus is often sufficient to distinguish among a small set (e.g., 20) of TARGET peptides and their respective STANDARDS.
  • TARGET peptide and STANDARD molecules Given the likelihood of imperfect reads, and the potential contamination with other, un-selected peptides, longer reads may be required to confidently identify the TARGET peptide and STANDARD molecules.
  • a sequential enzymatic (e.g., exoprotease) or chemical (e.g., Edman) process to remove single amino acid residues from one terminus of a peptide
  • the advantage of rapid definitive identification of TARGET and STANDARD sequences based on just a few terminal residues is substantial, since it could allow early termination of the cyclical read process, thus leading to a significant decrease in the number of cycles required and thus in analysis time, with associated decreases in cost and increased throughput.
  • larger numbers of TARGET peptides and STANDARDS are used and need to be discriminated: for example, 25, or 50, or 100, or 200, or 400, or 600, or 800, or 1,000 TARGET peptides and their cognate STANDARDS; in such cases, based on an analysis of the uniqueness of the sequences, it may be desirable or required that a detector determine more of the peptide sequence, up to a complete sequence of some or all of the peptides.
  • initial studies can be undertaken in which the TARGET peptides are sequenced beyond 3 or 4 residues, up to complete sequences, in order to detect the presence of any interfering peptides (i.e., peptides that share short n-terminal or c-terminal sequences with TARGET or STANDARD sequences, or otherwise generate output that can be confused with the pre-selected TARGET and STANDARD sequences) likely to be present in a given sample type.
  • interfering peptides i.e., peptides that share short n-terminal or c-terminal sequences with TARGET or STANDARD sequences, or otherwise generate output that can be confused with the pre-selected TARGET and STANDARD sequences
  • TARGET or STANDARD sequences i.e., sequencing up to or beyond the amino acid residue where the interfering peptide is no longer identical to a TARGET or STANDARD sequence
  • STANDARDS generated by modification of a TARGET amino acid sequence face several challenges that motivate exploration of alternative approaches. These include rarity (the low probability of finding a modified sequence that binds equivalently to a cognate BINDER); lack of generality (the fact that each TARGET and cognate BINDER represent a separate case that must be individually optimized); and the fact that only some single molecule detection technologies are likely to be able to detect such a sequence difference reliably.
  • one or more amino acid residues of a TARGET peptide may be modified to generate a STANDARD.
  • a large number of non-canonical amino acids that are known in the biochemical literature can be substituted for residues of the TARGET peptide or added to its sequence.
  • a large number of naturally occurring chemical modifications of amino acids are known and can be introduced into residues of the TARGET peptide during or after synthesis to form a STANDARD.
  • a large number of artificial chemical modifications can be made to amino acids of the TARGET peptide to form a STANDARD.
  • terminal blockages Two examples of small but significant modifications are terminal blockages: 1) acetylation of an n-terminal amino group or 2) amidation of a c-terminal carboxyl group, both of which can be carried out easily during synthesis of a STANDARD peptide having the same sequence as a cognate TARGET, and both of which represent small alterations in the peptide structure. These small alterations can be “read” at a later stage of a single molecule workflow by reaction of peptides with a chemical reagent capable of efficiently combining with exposed amino or carboxyl groups (respectively).
  • blockage of a STANDARD’S c-term carboxyl can prevent reaction with a reagent that nevertheless reacts with a TARGET’S c-terminus: if the result of reaction with the reagent (which may for example add polymer or other structures to the TARGET’S structure) is detectable by a single molecule detector such as a nanopore, then the distinction between TARGET and STANDARD required by the invention can be provided. Any of these modifications may be used to create a STANDARD, provided that it meets the criteria described above (equivalent binding to a specific enrichment reagent, and equivalent reactivity in any required chemical reactions involved in sample preparation).
  • TARGET and STANDARD molecules may be “read” completely during passage through a nanopore, reducing the potential for confusion between expected TARGET and STANDARD sequences, or with potentially interfering sequences.
  • nanopore sequencing embodiments capable of halting the reading of a peptide after reading a small number of amino acids and ejecting the peptide from the nanopore based on confidently identifying it as a specific TARGET or STANDARD sequence, the uniqueness of n-terminal or c-terminal sequences remains important and provides an opportunity to reduce time spent on unproductive sequence reading and therefore increase throughput of molecule counting.
  • STANDARDS and cognate TARGETS share an identical amino acid sequence but differ in an attached chemical group.
  • STANDARDS can comprise a peptide sequence linked to one member of a pair of “click” chemistry groups (e.g., TCO, capable of reacting bio-orthogonally with molecules comprising a tetrazine group, the other member of the click pair, or vice versa), while cognate TARGETS comprise the same (or very similar) peptide sequence linked to one member of a different pair of “click” chemistry groups (e.g., BCN, capable of reacting bio-orthogonally with molecules comprising an azide group, the other member of that click pair, or vice versa).
  • BCN capable of reacting bio-orthogonally with molecules comprising an azide group, the other member of that click pair, or vice versa
  • click-activated TARGETS and STANDARDS are generally inert until they encounter a molecule comprising the opposite pair member, at which time they spontaneously react forming a covalent linkage.
  • Such click-activated TARGETS and STANDARDS are therefore each capable of reacting specifically with different additional molecules (e.g., oligonucleotides comprising the appropriate different click groups) at a later stage of a sample preparation workflow.
  • TARGETS and STANDARDS comprise different chemical linkage groups (e.g., selected from the above-mentioned click pairs) connected to the peptide by similar or identical spacers (e.g., polyethylene glycol of length 1, 2, 3, 4, 5 or more polymer units) thus reducing any potential impact of the difference in chemical linker structures (e.g., TCO and BCN as mentioned above) on the relative binding of TARGETS and STANDARDS to a cognate BINDER.
  • chemical linkage groups e.g., selected from the above-mentioned click pairs
  • spacers e.g., polyethylene glycol of length 1, 2, 3, 4, 5 or more polymer units
  • a peptide is attached to another molecule to label it as a TARGET vs a STANDARD, to barcode it (e.g., to identify the sample from which it came), to facilitate or regulate its passage through a nanopore, or a variety of other purposes useful in a single molecule detection workflow.
  • the distinction between TARGETS and STANDARDS is encoded in an attached, non-peptidic “tag” component rather than in the peptides’ structures themselves or in chemical linkage groups (e.g., click groups) they comprise.
  • this is accomplished by preparing the STANDARD prior to its addition to a sample digest in a form that is already attached to a detectable tag (e.g., a nucleic acid sequence tag) that specifically indicates its status as a STANDARD.
  • a detectable tag e.g., a nucleic acid sequence tag
  • an oligonucleotide VEHICLE comprises a 5’ phosphate 52 (to facilitate ligation with other nucleic acid chains), a preceding sequence 29, a residue 33 (indicated by X) capable of forming a linkage 34 with a terminal residue of peptide 52 (in this case a STANDARD peptide having the same sequence as a cognate TARGET), an abasic stretch 36 running alongside the peptide (forming a rope-tow construct as described herein), and a following sequence 30 comprising a tag sequence 54 (indicated by a box) that identifies the construct as containing a STANDARD peptide.
  • the peptide GFVEPDHYVVVGAQR is a member of the class of “single amino” peptides, and thus comprises only a single amino group which is located at its n-terminus.
  • Cognate TARGET peptide 53 (example shown in Figure 3B) in such an embodiment is attached to a VEHICLE of similar overall structure as the STANDARD construct, but comprising a different nucleic acid sequence tag 55 that indicates its status as TARGET.
  • the VEHICLE nucleic acid sequences can be read by a nanopore and their location in relation to a peptide (e.g., preceding or following with pre- determined proximity) can be used to identify each peptide molecule as a TARGET or STANDARD molecule.
  • the overall similarity of the VEHICLES attached to the pre-prepared STANDARD ( Figure 3A) and sample digest-derived TARGET peptides ( Figure 3B) minimizes any potential difference in binding of the peptide portions of the constructs (TARGET and STANDARD peptides 52 and 53 being structurally identical) to cognate peptide sequence-specific BINDERS.
  • the sequence tags distinguishing the TARGET and STANDARD VEHICLES are optimized for high sequence accuracy in a given sequence-specific detection system (e.g., a nanopore reading system, or an affinity reagent imaging system).
  • STANDARD tags and TARGET tags The primary function of STANDARD tags and TARGET tags is to distinguish peptide constructs added to a sample as internal standards (STANDARD constructs) from peptide constructs that incorporate peptides created by proteolytic digestion of the sample proteins (TARGET constructs).
  • the TARGET tag may be the absence of any STANDARD tag.
  • the STANDARD tag may be the absence of any TARGET tag.
  • a STANDARD tag may consist of the absence of a feature present in a cognate TARGET tag, the presence in the STANDARD tag of a feature absent in the cognate TARGET tag, or the presence of different features in the STANDARD tag and TARGET tag.
  • such presence/absence features may include differences in the sequence of oligonucleotide tags.
  • the importance of maintaining unbiased (unskewed) ratio relationships between TARGET and STANDARD constructs i.e., preserving their cognate character, specifically in regard to interaction with a cognate BINDER
  • argues against large structural differences between the TARGET and STANDARD tags e.g., presence vs absence of a sizable chemical group.
  • Multiple different STANDARD tags may be used, provided that the STANDARD tags are distinguishable from any TARGET tags.
  • Multiple different TARGET tags may be used, provided that the TARGET tags are distinguishable from any STANDARD tags.
  • each different peptide STANDARD is prepared attached to a respective cognate VEHICLE that comprises a nucleic acid sequence tag that specifically identifies that STANDARD amino acid sequence and distinguishes it from a plurality of other STANDARDS that may be used in the same workflow.
  • VEHICLE that comprises a nucleic acid sequence tag that specifically identifies that STANDARD amino acid sequence and distinguishes it from a plurality of other STANDARDS that may be used in the same workflow.
  • the oligo tag sequences used to identify and distinguish cognate STANDARDS and TARGETS in a cognate group are selected so as to be chemically very similar (e.g., same length and base composition) while being reliably distinguishable (e.g., different base sequence).
  • the tags are unlikely to have any differential effect on the binding of cognate TARGET and STANDARD molecules to the cognate BINDER, thus preserving the ratio of TARGET to STANDARD in the standardized digest.
  • the TARGET and cognate STANDARD molecules can be identified and counted reliably, thus providing an accurate value for the ratio of TARGET to STANDARD.
  • STANDARD-VEHICLE constructs are prepared and added to the sample digest after sample digest peptides have been incorporated into similarly-structured TARGET constructs (e.g., Figure 3B).
  • the structure of the STANDARD peptide molecule may be identical to the TARGET peptide structure (e.g., it can be a synthetic version of a known cognate TARGET peptide sequence), while their respective STANDARD and TARGET VEHICLES comprise distinct nucleic acid sequence tags (54 and 55 in Figure 3), thus ensuring that the cognate BINDER will bind the attached peptides equivalently, and thereby accurately preserve the TARGET -to- STANDARD ratio present in a standardized sample digest.
  • Figure 4 shows a method in which a short identifying oligo a STANDARD tag 62 is attached to a STANDARD peptide 52, in this case by an amine-reactive N-hydroxysuccinamide (NHS) group 61 attached by linker 34 to a suitable DNA nucleotide of the oligo (for example an amino-modified C6 dT base to which NHS functionality has been added during manufacture).
  • NHS N-hydroxysuccinamide
  • the 16 base long oligo tag 62 has a molecular weight of about 5,000 daltons, substantially less than the VEHICLES described in the embodiment shown in Figure 3, and therefore less expensive and also able to diffuse and bind BINDERS more rapidly in solution.
  • oligonucleotide tag sequences of reduced (or longer) length capable of specifically hybridizing with complementary sequences as required in the steps of Figure 4 C and D.
  • TARGET and STANDARD oligonucleotide tags (e.g., 62 or 63) may be provided of lengths ranging from 4 to 30 bases, more preferably 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16-30 bases.
  • Oligo tag sequences are designed according to well-known principles to maximize specific binding to a complementary sequence (e.g., 64 or 68) and dissociate at a reasonable (melting) temperature, while minimizing the potential to hybridize with other oligo sequences used in the workflow, or to form self-associations (e.g., intra-molecule hairpins or inter-molecular hybrids).
  • a complementary sequence e.g., 64 or 68
  • self-associations e.g., intra-molecule hairpins or inter-molecular hybrids.
  • FIG. 4B The product of the reaction of the oligo tag’s NHS and the peptide’s NH2 groups of Figure 4A is shown in Figure 4B, while the equivalent reaction for TARGET peptide 53 with TARGET tag 66 is shown in Figure 4 E and F.
  • the STANDARD construct of Figure 4B is added to a sample digest (in this example a tryptic digest) whose sample peptides (including the TARGET) are prepared as TARGET -tag constructs (as in Figure 4F), thus forming a standardized sample digest.
  • a sample digest in this example a tryptic digest
  • TARGET and STANDARD peptides are linked to, and thus identified by, oligo sequence tags of the respective TARGET and STANDARD tags (oligos 62 and 66 in Figure 4). Enrichment of the TARGET and STANDARD peptides using the cognate BINDER isolates these two peptides and their attached respective oligo tags (the peptides having identical structures but derived from different sources), preserving the TARGET-to- STANDARD ratio in the standardized sample.
  • the enriched bound TARGET and STANDARD constructs can be “completed” by hybridization and ligation to respective “secondary VEHICLES” as needed for various single molecule detection methods.
  • Completed constructs shown in Figures 4 C and D (for the STANDARD peptide) and Figure 4 G and H (for the TARGET peptide) are particularly useful in the case of nanopore sequencing.
  • short oligo tag 62 (which identifies peptide 52 as a STANDARD molecule) hybridizes with a complementary sequence 64 bringing the 3’ residue of the oligo 62 into proximity with the phosphorylated 5’ end of an oligo comprising an abasic region 36 (abasic backbone links are symbolized by “o”) and a following sequence 63.
  • the 5’ terminus of the secondary VEHICLE comprises one or more bases before the abasic region (e.g., AA at the 5’ end of secondary VEHICLE oligo 36 in Figure 4) that are capable of hybridizing with complementary bases in oligo 70 (TT in oligo segment 70 of Figure 4D), and these bases are different in STANDARD and TARGET secondary VEHICLES so as to minimize potential hybridization of a secondary STANDARD oligo (e.g., 36 + 63) with a TARGET complementary sequence 70, and vice versa.
  • a secondary STANDARD oligo e.g., 36 + 63
  • the number of such hybridizing bases at the 5’ end of segments 36 may be selected to have an extended length that is less than the total length of linker 34 in order to avoid overlap of the peptide with these hybridizing bases when the complete construct passes through a nanopore.
  • the length of abasic region 36 is chosen so as to avoid overlap with the peptide 52 as it transits a nanopore, and oligo sequence 63 is designed so as to engage with a DNA motor and regulate pore transit of the peptide, allowing measurement of its current trace (“squiggle”).
  • oligo sequence 63 may be a unique sequence also indicating status as a STANDARD molecule (i.e., different from the sequence of oligo 67 in the TARGET VEHICLE construct) resulting in a “double-tag” (i.e., redundantly tagged) construct.
  • double-tag approach is that each peptide is identified as STANDARD or TARGET by sequence information both before and after the peptide.
  • alternative linkage chemistries including various click chemistry linkages as described elsewhere herein, are used to create a linkage between a peptide and a nucleic acid tag.
  • Figure 5 shows an example in which a tag oligo prepared with a 3’ transcyclooctyne (TCO) group is reacted with a peptide derivatized at its n-terminus with methyltetrazine (Me-TZ) to yield an “in-line” construct (one in which the oligo and peptide sequences form a single continuous polymer).
  • Figure 5A shows preparation of a TARGET construct using a tag sequence identifying the construct as a TARGET (labeled OLIGO- TARGET tag).
  • this modification (labeling peptides with the OLIGO- TARGET tag) is applied to all peptides in a sample digest (for example by derivatizing all digest peptide amino groups with NHS-tetrazine and reacting these with an excess of the reactive oligo comprising 3’ TCO), thereby ensuring that all molecules of the TARGET are taken into account in subsequent steps of an analytical workflow.
  • Figure 5B shows preparation of a STANDARD construct using a tag sequence identifying the construct as a STANDARD (i.e., the OLIGO-STANDARD tag).
  • the STANDARD construct is prepared separately, for example using synthetic STANDARD peptide having the same (or generally similar) sequence as the TARGET, and is added to sample digests to serve as an internal quantitative standard (i.e., to standardize the sample digests).
  • the physical properties of the TARGET and STANDARD oligo tags e.g., length, base composition, and melting temperature
  • Figure 5C shows TARGET and STANDARD constructs bound to cognate antibody BINDER molecules immobilized on a magnetic bead.
  • the peptide components can be chemically identical (same sequence) capable of identical interaction with the BINDER.
  • the peptide is linked to the oligo via its c-terminus (e.g., via a c-terminal carboxyl and the epsilon amino group of a c-term lysine) rather than the n-term as shown.
  • the oligo is linked to the peptide via the 5’ end rather than the 3’ as shown.
  • STANDARD constructs can be prepared in a stepwise process comprising 2 discrete attachment steps, thereby allowing different molecular components to be added at (or near) the peptide N-terminus and C-terminus, and establishing a consistent orientation of the peptide with respect to the overall construct (thereby avoiding the need to recognize a peptide in two different polarities).
  • the 2-step digestion process described herein for preparing sample peptide libraries is used to generate STANDARD constructs.
  • a synthetic peptide comprising the cognate TARGET sequence ending in Lys (i.e., a 2-amino peptide), and comprising an added n-terminal sequence ending in Arg (e.g., GSGR in the case of trypsin second cleavage, or any suitable peptide ending in the amino acid at whose c- terminal position a second proteolytic enzyme cleaves), can be processed in a series of steps similar to those used to process sample peptides as shown in Figure 6.
  • a synthetic STANDARD precursor can be generated with the sequence XXXRXXXXXK.
  • This peptide has both an n-terminal and a lysine epsilon amino group, both of which can be reacted with a suitable linkage reagent such as NHS-BCN, thus adding a click linker to both ends of the peptide, after which excess NHS- BCN can be removed.
  • a suitably activated flag group e.g., an oligonucleotide
  • an appropriate click partner e.g., azide to react with BCN
  • Cleavage with a second enzyme e.g., trypsin
  • a second enzyme e.g., trypsin
  • a second linkage reagent such as NHS-TCO
  • a second linkage reagent such as NHS-TCO
  • Use of different mutually orthogonal click groups at the two termini offers the ability to add distinct oligos to the two ends via the distinct click pair reactions, and to postpone one or both oligo additions until after BINDER enrichment of the TARGET and cognate STANDARD peptides.
  • one or both of the oligos may be coupled to the peptide prior to enrichment on a BINDER, provided that the same process is applied to the peptides of the digest (including TARGET peptides) prior to enrichment by a BINDER.
  • a TARGET peptide, its cognate STANDARD and the cognate specific affinity BINDER reagent(s) may be developed together (co-evolved or co- optimized) to achieve optimal performance; i.e., through an iterative process comparing assay performance of various combinations of versions of the reagents, through a full-matrix comparison of all available variants of each, or by molecular engineering guided chemical knowledge and experimental results.
  • multiple distinguishable STANDARDS are provided for a single TARGET, and added in different amounts so as to establish a standard curve against which the TARGET can be quantitated.
  • the multiple STANDARDS may be distinguished by connection to distinct oligo sequence tags, or they may contain different and distinguishable structural modifications of the TARGET peptide (e.g., different amino acids added to its sequence).
  • the different STANDARDS may be added to a sample digest in different amounts to standardize it: for example, three STANDARD versions (A, B, C) of a given TARGET peptide may be added to a digest in 0.1 : 1.0 : 10 relative amounts, thus generating a 3-point calibration curve.
  • Multi-level STANDARDS increase the likelihood that at least one STANDARD will be present in an amount (and thus number of molecules) close to the amount of a TARGET in an unknown sample.
  • the pre-established ratio between different STANDARD version provides an internal check on the quantitative precision and linearity of a single molecule detection system.
  • One or more specific affinity reagents capable of binding the peptide TARGET and STANDARD specifically (i.e., while not binding a potentially vast number of other peptides that can be present in a sample digest) are used in some embodiments of the invention to capture the TARGET peptide and STANDARD prior to single molecule detection.
  • a reagent generically as a BINDER, and include within that term not only canonical antibodies such as IgG, but also numerous types of proteins and other macromolecules (e.g., aptamers) known in the art to be able to bind to particular peptide sequences with specific affinity.
  • antibodies preferably monoclonal antibodies
  • a specific low abundance tryptic peptide from the digest of a very complex sample such as human blood plasma (which may contain 250,000 distinct peptides, some at very high abundance), and thereby enrich the peptide substantially (e.g., more than 10,000-fold).
  • a variety of types of biologically derived antibodies e.g., polyclonal, monoclonal and oligoclonal antibodies derived from mice, rabbits, humans, camelids and other species
  • molecules derived from antibodies by molecular biology techniques e.g., antibodies selected from libraries using phage display and other techniques
  • aptamers, and other molecular constructs can be created to achieve the purpose of specific peptide binding - all of these are included within the term BINDER as used herein.
  • a synthetic peptide having the TARGET peptide sequence can be coupled to a carrier protein (e.g., keyhole limpet hemocyanin: KLH) and used to immunize an animal (such as a rabbit, mouse, chicken, goat, camelid or sheep) by one of the known protocols that efficiently generate anti-peptide antibodies.
  • a carrier protein e.g., keyhole limpet hemocyanin: KLH
  • the peptide used for immunization and antibody purification may contain additional c-terminal or n-terminal residues (e.g., cysteine) added to the TARGET peptide sequence.
  • the resulting extended TARGET peptide can be conveniently coupled to carrier KLH that has been previously reacted with a heterobifunctional reagent such that multiple SH-reactive groups are attached to the carrier.
  • a polyclonal antiserum can be produced containing antibodies directed to the peptide, to the carrier, and to other non-specific epitopes.
  • Specific polyclonal anti-peptide antibodies can be prepared from an immunized animal’s serum by affinity purification on a column containing tightly-bound peptide.
  • a column can be easily prepared by reacting an aliquot of synthetic TARGET peptide made with a cysteine residue added to one end with a thiol-reactive solid support. Crude antiserum can be applied to this column, which is then washed and finally exposed to 10% acetic acid (or other elution buffer of low pH, high pH, or high chaotrope concentration) to specifically elute anti-peptide antibodies.
  • acetic acid or other elution buffer of low pH, high pH, or high chaotrope concentration
  • one of a variety of methods known in the art is used to generate candidate clonal antibody proteins, genes or gene sequences (from the cells, genes, or proteins of an immunized animal, or from natural or artificial protein binder libraries such as phage display libraries) that can be screened to select a monoclonal antibody (or other BINDER molecule) with the ability to enrich the TARGET peptide and cognate STANDARD from a complex peptide mixture (e.g., a sample digest) under a specified set of solution conditions.
  • a monoclonal antibody or other BINDER molecule
  • Such screening can be carried out using the method of the invention (i.e., screening in the assay of ultimate use), or by alternative methods such as the SISCAPA method with MS detection.
  • Monoclonal antibodies particularly those produced by recombinant methods, have the advantages of homogeneity, superior performance, scalable production and longevity compared to polyclonal mixtures.
  • a preferred method of selecting a monoclonal (homogeneous) anti-peptide antibody BINDER for use in the invention includes testing whether a candidate antibody (e.g., product of a clone) binds the TARGET peptide and STANDARD equally, and selecting for use the antibody product of a clone or clones that bind these peptides most equally (i.e., with least bias towards one or the other, and thus capable of capturing both from a mixture without changing the ratio between them).
  • a candidate antibody e.g., product of a clone
  • Preserving the ratio of TARGET to STANDARD unchanged during capture and enrichment by the BINDER is desirable since the invention involves measuring this ratio (by counting TARGET peptide and STANDARD molecules) to calculate the amount of TARGET peptide (and hence target protein) in the original sample.
  • some embodiments make use of BINDERS that exhibit differential, but reproducible, binding, thus allowing correction of the ratio as described above.
  • multiple BINDERS are used to enrich a single peptide.
  • TARGET peptides are sufficiently long to include multiple epitopes: a typical antibody linear epitope is 4-6 amino acids long, while TARGET peptides may be 6-30 amino acids in length - those of length 12-30 amino acids have a high likelihood of comprising 2 or more non-overlapping epitopes.
  • multiple BINDERS targeting linear, non-overlapping epitopes in a peptide can be generated and used to increase enrichment specificity.
  • monoclonal antibodies to multiple linear, non-overlapping epitopes of a peptide are made using conventional hybridoma or other cloning techniques to select among clones created by immunization of an animal with peptide and/or derivatives of it.
  • antibodies to multiple linear, non-overlapping epitopes of a peptide are selected from libraries such as naive or immunized phage display libraries of single-chain antibodies.
  • aptamers to multiple linear, non-overlapping epitopes of a peptide are selected from libraries or evolved using iterative selection approaches well-known in the art.
  • multiple BINDERS of different types e.g., polyclonal or monoclonal antibodies, aptamers, etc.
  • targeting linear, non-overlapping epitopes are used together for increased affinity and/or specificity.
  • Some embodiments therefore make use of polyclonal antibodies purified from sera of immunized animals, or, alternatively, oligoclonal mixtures of antibody-like molecules extracted from large libraries (e.g., naive or immunized phage display libraries).
  • methods well known in the art are used for affinity purification of multiple distinct antibody specificities from such polyclonal antisera using multiple affinity media, each comprising a linear peptide subsequence of a TARGET peptide sequence.
  • multiple BINDERS with affinities for distinct, ideally non- overlapping, epitopes on a TARGET peptide are simultaneously affinity purified by binding to synthetic TARGET peptide.
  • a TARGET peptide with multiple epitopes binds multiple BINDERS from a mixture until each epitope is saturated with BINDER, thereby establishing a balanced mixture of BINDERS to the epitopes.
  • Figure 8B shows 3 BINDERS (I, II and III) bound to different linear epitopes (1, 2, 3) on a single peptide molecule (the peptide shown in Fig 8A).
  • this is achieved by affinity purification of BINDERS from a polyclonal antiserum (or pool of antisera) generated in response to immunization with the peptide, or multiple fragments of it. In some embodiments, this is achieved by capture of a mixture of BINDERS from a variety of sources selected to bind to the peptide, or fragments of it. By saturating the TARGET peptide epitopes with BINDERS competing to bind to various epitopes, a population of BINDERS is produced that covers the peptide (i.e., has a BINDER bound to all, or at least a substantial fraction, of the epitopes present on each peptide molecule).
  • the BINDERS are expected to have varying affinities and specificities, some or most of which may be individually insufficient to effectively capture the peptide from a sample digest.
  • a combination of these BINDERS can be linked together covalently (e.g., using bi-functional chemical crosslinkers or click connectors) or non-covalently (e.g., by reaction of biotin-labeled BINDERS with multivalent streptavidin) into a multivalent (e.g., bi-specific, tri-specific, or quadra-specific) BINDER (Fig 8C), that can be eluted from the peptide and subsequently used for affinity enrichment of TARGET and cognate STANDARD from standardized sample digests.
  • a multivalent BINDER e.g., bi-specific, tri-specific, or quadra-specific
  • a multi-epitope peptide as a “scaffold” or “template” on which a plurality of BINDERS to short epitopes are assembled and then linked to form a larger multi- epitope BINDER.
  • a multi-epitope (or multi-valent) BINDER is immobilized (e.g., on magnetic beads, a column, a surface, etc.) and used to capture and enrich TARGET and STANDARD peptides and peptide constructs according to the invention.
  • Such a multi-valent BINDER can bind the peptide with much higher affinity than any of the individual BINDERS.
  • This effect is well known in the art as “avidity”: the increased binding efficacy of a multivalent BINDER compared to a monovalent BINDER.
  • Natural antibodies exploit this effect, comprising 2 binding sites in IgG and 5 binding sites in IgM.
  • this avidity effect is exploited by crosslinking the individual epitope BINDERS while in proximity to one another (in situ in Figure 8C) to create a multi-BINDER construct as shown in Figure 8D.
  • a similar approach is carried out using individual monoclonal BINDERS, either crosslinked together or expressed in a combined recombinant product similar to well-known “bi-specific” or “tri-specific” therapeutic antibody constructs.
  • multiple BINDERS with affinities for distinct, ideally non- overlapping, epitopes on a TARGET peptide are separately affinity purified by binding one after another to a series of immobilized synthetic TARGET peptides.
  • multiple BINDERS with affinities for distinct, ideally non-overlapping, epitopes on a TARGET peptide are used to sequentially affinity purify TARGETS and cognate STANDARDS from a standardized digest: TARGETS and cognate STANDARDS that bind to (and are subsequently eluted from) each successive BINDER specific to one of multiple TARGET epitopes are rendered purer than the set of TARGETS and cognate STANDARDS that bind to and are eluted from a BINDER to a single epitope.
  • the higher peptide sequence specificity obtainable with multi- epitope BINDERS provides an important addition to the overall specificity of a range of single molecule detection technologies.
  • potential assay interferences is reduced for all single molecule detection methods.
  • Multi-epitope BINDERS are described in more detail in U.S Provisional Patent Application No. 63/381,722, incorporated herein by reference in its entirety.
  • the invention makes use of a specific enrichment step to enrich both TARGET peptide and cognate STANDARD from a sample digest, thereby creating an “enriched standardized digest sample” (or “enriched standardized sample digest”).
  • the enrichment step is carried out by a cognate affinity capture reagent (BINDER) to which the two peptides bind equivalently, such that the ratio of their amounts after enrichment is the same or nearly the same as before enrichment.
  • BINDER cognate affinity capture reagent
  • the TARGET and STANDARD peptides bind to the cognate affinity capture reagent with identical affinity and kinetics, preserving the ratio between them exactly.
  • the TARGET to STANDARD ratio after enrichment is within 2%, within 5%, within 10%, within 20%, or within 30% of the ratio before enrichment.
  • enrichment using a BINDER results in a change in the TARGET to STANDARD ratio, which change is consistent across a range of samples and assay replicates.
  • prior knowledge of the factor by which the ratio is changed by enrichment (established by measurements of TARGET to STANDARD ratios in a sample or digest before and after enrichment) allows correction of the ratio observed after enrichment to yield the correct relative amounts of TARGET and STANDARD in the sample digest.
  • a homogeneous cognate affinity capture reagent (BINDER, described below) is selected from a plurality of alternative BINDERS for its ability to bind the TARGET and STANDARD peptides equivalently (or with a consistent ratio shift).
  • BINDERS a homogeneous cognate affinity capture reagent
  • TARGET and STANDARD peptides are selected from a plurality of alternatives to bind to a cognate BINDER equivalently.
  • TARGET peptide, STANDARD peptide, and cognate BINDER are each selected from a plurality of alternatives so as to maximize the property of equivalent (or consistent) TARGET and STANDARD peptide binding.
  • the enrichment process might capture 100%, or 90% or 10% or 1% or 0.1% or 0.01% or 0.001% of the TARGET and STANDARD peptides present in a sample digest, but in each case the ratio of TARGET peptide to STANDARD molecules would remain the same, and thus a measurement of the TARGET peptide to STANDARD ratio would provide the same answer, which would be equal to (or correctable to) the ratio present in the original standardized sample.
  • the proportion of a peptide captured by the enrichment step is therefore an adjustable feature of the invention, which can be used to capture more of one TARGET peptide & its cognate STANDARD than another.
  • Such adjustments make it possible to use the enrichment step to capture most or all of a low abundance TARGET peptide, while capturing only a small fraction of the molecules of a high abundance TARGET peptide - with the result that the difference in absolute molar amounts of the two TARGET peptides can be significantly reduced by this differential enrichment.
  • This use of the enrichment step to bring multiple TARGET peptides to similar abundances is referred to as “stoichiometric flattening” or “equalization”, and provides a means by which amounts of molecules with high and low abundances in the original sample can be measured using a measurement technology with limited dynamic range (e.g., single molecule counting methods).
  • the flattening approach (differential enrichment of different TARGET/STANDARD cognate pairs) generates a “flattened enriched standardized sample digest sample”.
  • the affinity of the BINDERS most useful for enrichment is in the range of 0.01 to 10 nanomolar, more particularly with a preferred half-ofif-time (reciprocal of the off-rate) of at least several minutes, or more preferably 10-15 minutes.
  • Off-rate is particularly important since it governs the length of time that unbound materials, including non-TARGET peptides, can be washed away (e.g., using conventional manual and automated workflow steps such as magnetic bead manipulation in 96-well plates) while retaining the TARGET and STANDARD peptides on the BINDER.
  • Higher affinity BINDERS are typically required to enrich lower abundance TARGETS, i.e., to capture peptides present in a digest at low concentration.
  • specific solution conditions or changes in solution conditions, are employed in a sample preparation workflow to preferentially dissociate less-tightly-bound peptides while retaining the correct BINDER cognate TARGETS and STANDARDS.
  • bound peptides are exposed to increasing chaotrope or denaturant concentrations, increasingly acidic or basic solution pH, increasing salt concentrations, and/or increasing temperature to dissociate less-tightly-bound peptides prior to final elution of the enriched TARGETS and STANDARDS.
  • BINDER specificity may be more important than affinity, and BINDER selection methods correspondingly adapted.
  • a preferred property of a BINDER is the ability to release bound peptides rapidly at a desired point in a workflow, as a result of a change in solution conditions - for example as a result of a change in pH (e.g., pH 3.0 or pH 9.5), addition of a chaotrope (e.g., ammonium thiocyanate or KC1) or organic solvent (e.g., 50% acetonitrile in water), increase in temperature, or the application of an electrical field (as in electrophoresis).
  • a chaotrope e.g., ammonium thiocyanate or KC1
  • organic solvent e.g., 50% acetonitrile in water
  • BINDERS are selected that tightly and specifically bind the cognate TARGET and STANDARD peptides from a digest, and then release these peptides only when in close proximity to a site at which sequence-sensitive single molecule detection can occur (e.g., a nanopore, or an immobilization site on a support), thus maintaining the peptides in concentrated form and reducing losses due to diffusion.
  • Nanopore sequencing (53) typically relies on the presence of a concentrated salt solution (e.g., 0.4M KC1) to provide sufficient charge carriers to create a measurable open channel current through the nanopore (typically 20-200 pA).
  • BINDER reagents are selected that release their TARGET and STANDARD peptide cargo when exposed to such salt conditions, i.e., when the antibodies are placed in the solution present on the cis side of a nanopore sequencing device, or when exposed to a high concentration of salt in a salt gradient in which salt concentration increases closer to a nanopore.
  • a selected BINDER i.e., releasing bound peptides in a high-salt environment
  • peptides can be retained on a physically manipulatable solid support (e.g., magnetic beads) until they are in a sequencing chamber itself (or in a region of that chamber nearest to a nanopore), thereby minimizing potential losses and dilution that could occur if peptides were eluted elsewhere and later transported into the sequencing chamber.
  • chaotropic anions such as SCN are incorporated into the solutions of a nanopore cis compartment (or both compartments) in addition to or in place of Cl anions conventionally used, in order to facilitate release of peptides from a BINDER. It will be evident to those skilled in the art that a range of chaotropic anions or cations can be used to effect peptide release from BINDER over a range of concentrations suitable for optimization in particular device configurations.
  • enriched TARGET and STANDARD constructs are to be immobilized on a support
  • the constructs are delivered into proximity with the support while they are bound to the cognate BINDER (e.g., on easily manipulatable magnetic beads).
  • BINDER-bound constructs are present at a very high effective local concentration, and may be conveniently moved from one environment to another without loss. This feature is an important advantage of the BINDER enrichment step when applied to low abundance peptides and their detection by single molecule counting.
  • a BINDER with specific affinity for the TARGET peptide and STANDARD may be immobilized on a solid support in order to facilitate separation of the antibody and its bound peptide and/or peptide construct cargo from a complex sample digest, to wash away unbound molecules, to concentrate bound peptides, and to deliver bound peptides to a site where they are available for sequencing.
  • Typical solid supports used for this purpose include magnetic beads (allowing collection of beads from a liquid suspension by magnetic force) or a porous column (e.g., an affinity column) through which liquids may be pumped.
  • the BINDER is immobilized on commercially available protein G- derivatized magnetic beads (Dynabeads G; Thermo Fisher) and optionally crosslinked covalently with dimethyl pimelimidate (DMP) according to the manufacturer’s instructions.
  • the antibody is immobilized on tosyl-activated Dynabead magnetic beads.
  • the anti-peptide antibody can be immobilized on solid phase chromatography media (e.g., POROS G resin) packed in a column and crosslinked using DMP. Such a column can bind the TARGET peptide specifically from a peptide mixture (e.g., a tryptic digest of serum or plasma) and, following a wash step, release the TARGET peptide under elution conditions.
  • a homogeneous cognate affinity capture reagent e.g., a monoclonal antibody BINDER, wherein all or nearly all molecules have the same sequence
  • a homogeneous cognate affinity capture reagent e.g., a monoclonal antibody BINDER, wherein all or nearly all molecules have the same sequence
  • Inhomogeneous affinity capture reagents are difficult to characterize in detail, and can contain variants that bind one or the other of the TARGET and STANDARD peptides more strongly.
  • BINDERS typically preferred: e.g., monoclonal antibodies or sequence-defined aptamers.
  • chemical or enzymatic reactions for the purpose of modifying a TARGET (or STANDARD) peptide are carried out in solution, and in some embodiments one or more reactions are carried out while a peptide or peptide construct is bound to a BINDER, which may or may not itself be bound to a solid support.
  • one or more reactions are carried out while a peptide is bound to a BINDER linked to a solid support, thus allowing the peptide to be contacted with reagents, and removed from contact, by physical movement of the support between liquids (e.g., by removal of magnetic beads carrying BINDER and bound peptides from liquid in one vessel and deposition of the beads in a different vessel where they are exposed to a different reagent), or equivalently by movement of liquids in contact with the support (e.g., by pumping one reagent and then a second reagent over a porous column, or magnetic bead mass, to which BINDER and its peptide cargo are bound).
  • liquids e.g., by removal of magnetic beads carrying BINDER and bound peptides from liquid in one vessel and deposition of the beads in a different vessel where they are exposed to a different reagent
  • movement of liquids in contact with the support e.g., by pumping one reagent and then a second reagent
  • manipulation of peptides on a support allows the peptide to be washed free of a reagent by exposure to a wash solvent prior to contact with a subsequent reagent. Movement of peptides between liquids by movement of a BINDER or support to which they are bound reduces or eliminates the need for purification or concentration of intermediate peptide forms created during a sequence of one or more chemical reactions.
  • peptides are bound to a solid support by means other than interaction with a specific BINDER, e.g., by binding of peptides to a generic support such as a reversed phase support (e.g., C18) or an ion exchange support.
  • a specific BINDER e.g., by binding of peptides to a generic support such as a reversed phase support (e.g., C18) or an ion exchange support.
  • other molecules e.g., oligo and other polymers that, with peptides, form constructs amenable to sequence-sensitive single molecule detection
  • the invention provides for the optional elimination of some or all of these BINDER amino groups by chemical blockage (e.g., by reductive methylation, by PEGylation using commercially- available NHS-PEG or other reagents, conversion of lysine residues to homoarginine by treatment with O-methylisourea, or other chemical modifications known in the art), by protein engineering (e.g., by replacing some or all lysines in a recombinant antibody sequence with arginines or other amino acids), or various other means.
  • chemical blockage e.g., by reductive methylation, by PEGylation using commercially- available NHS-PEG or other reagents, conversion of lysine residues to homoarginine by treatment with O-methylisourea, or other chemical modifications known in the art
  • protein engineering e.g., by replacing some or all lysines in a recombinant antibody sequence with arginines or other amino acids
  • Specific affinity reagents to be used in such embodiments may be selected so as not to contain any lysine residues in the TARGET peptide binding site, since these residues would likely be blocked along with other lysines, potentially leading to a loss of binding activity.
  • Non-protein BINDERS such as DNA and RNA aptamers and other similar molecules, may contain no amino groups to begin with, eliminating the need to block these prior to process aimed at modifying peptide amino groups.
  • BINDER amino groups that could participate in side reactions has the effect of avoiding waste of expensive reagents used in amino group modifications of TARGET and STANDARD peptides, including use in creating concatenated constructs of these.
  • blockage e.g., by PEGylation
  • any other proteins present on the capturing support e.g., Protein A or Protein G used to guide antibody immobilization on solid supports such as Dynabeads G magnetic beads
  • Nanopore sequencing devices are typically operated with a negative electrode in the cis compartment (where the input molecules to be sequenced are added) and a positive electrode in the trans compartment: this polarity induces an oligo, which is strongly negatively charged on account of its sugar-phosphate backbone, to migrate towards and through the pore to initiate sequencing. In some embodiments this polarity also serves to move a negatively charged bead towards the pore, contributing towards the goal of delivering peptide-oligo constructs in close proximity to the pore.
  • the methods used for single molecule detection have the capability to detect very large numbers of molecules (e.g., IO 10 in (54) far exceeding the requirements for quantitative measurement of a modest number of peptides in one sample.
  • some embodiments connect sample-specific labels (“barcodes”) to the TARGET and STANDARD peptides present in a sample digest, or enriched from a sample digest by a BINDER, allowing the TARGET and STANDARD peptides from multiple samples to be combined prior to peptide detection (i.e., multiplexed), and afterwards de- multiplexed to associate them with the correct original samples.
  • DNA provides an ideal medium for implementation of such barcodes since, as essentially a digital medium, it is easy to synthesize, cut, ligate, copy, and detect by both sequencing and hybridization.
  • Alternative barcode polymers can be employed, such as peptides and synthetic chemical polymers, although these may be significantly more difficult to generate, manipulate and detect than oligonucleotides.
  • sample barcoding systems have been developed using sets of distinct DNA barcodes to identify nucleic acid molecules derived from different samples prior to sequencing, or to facilitate optical readout of individual nucleic acid molecules in imaging systems.
  • sample barcodes with identical or very similar base composition but distinguishable sequences are preferred in order to minimize differences in physical properties between constructs on account of barcode properties.
  • sample barcodes are appended or linked to TARGET and STANDARD constructs prior to enrichment by cognate BINDERS.
  • sample barcodes are appended or linked to TARGET and STANDARD constructs after enrichment by cognate BINDERS, in which case smaller amounts of the DNA barcodes are required.
  • FIGS 9 and 10 illustrate schematically a 2-level encoding scheme used in some embodiments.
  • a specific peptide here labeled Peptide- A
  • a DNA sequence tag here labeled OLIGO-TARGET
  • the cognate internal standard formed by linkage of a synthetic version of Peptide- A with a distinct DNA sequence tag is added to the digest, creating a standardized digest (standardized with respect to Peptide-A).
  • sample barcodes comprising a plurality of modules (Codes) are linked to these constructs using conventional methods that may include ligation to the TARGET/STANDARD tag, chemical linkage (e.g., using click chemistry), non-covalent means (e.g., biotin on one oligo and streptavidin on the other), or a variety of other linkage means known in the art.
  • the sample barcodes can be linked to a site on the peptide different from the site at which the TARGET/STANDARD tag is connected.
  • the scheme for sample barcoding shown in Figure 9 provides a construct compatible with a variety of single molecule detection methods, as described below.
  • barcode modules at positions 1, 2, 3 and 4 are used to encode “bits” in a 10-bit binary sample code.
  • each DNA base is 1 of 4 alternatives (2 bits of information)
  • 10 bits of information could theoretically be encoded in a short sequence of 5 bases.
  • all the methods envisioned for reading the sample barcode in a single molecule detection system are subject to error, and avoiding sample misassignment errors is a high priority in many applications (e.g., clinical). A preferred approach is therefore to add redundancy to the sample code.
  • this is done by providing a unique sequence module comprising multiple bases (e.g., 4 to 30 bases depending on the preferred readout method) corresponding to each of the bits in the desired sample code space (number of samples to be identified).
  • the error detection and correction methods of Hamming can be used, and in the case of a 10-bit code, Hamming extended parity error detection involves the addition of 4 parity bits to the 10-bit code, resulting in a total of 14 bits of information.
  • the example of Figure 9 simplifies this coding scheme to use only those code values having 3 or 4 bits set to a value of 1, which reduces the sample coding space to 105 samples that can be identified with very high accuracy, but reduces the total number of DNA modules that need to be in any one sample code.
  • 4 modules are included in any sample code, selected from among 14 different DNA sequences selected using computational and experimental methods well known in the art for minimal likelihood of confusion during readout. A mistaken read on any one of these modules can be corrected by the coding scheme, and mistakes in 2 modules can be detected (but not corrected).
  • Those skilled in the art will recognize that many alternate coding schemes exist, with greater or lesser numbers of bits, of larger of smaller numbers of identifiable samples, and of great or lesser numbers of bases in each DNA module.
  • peptide single molecule detection includes the ability to read nucleic acid sequences interspersed with peptide sequence (e.g., current DNA sequencing nanopore platforms) or else together with peptide sequence that has been reverse-translated into DNA (e.g., reverse translation platforms).
  • modules of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more bases are used.
  • DNA sample barcodes and TARGET vs STANDARD tags linked to peptides can be read directly by passage through a suitable nanopore sequencing system.
  • the accuracy of individual basecalls can be greater than 99%, and therefore the accuracy with which one of a small set of sequences (designed to be distinct) can be recognized is high.
  • the sample barcode (e.g., a binary number identifying the sample, an alphanumeric code taken from a physical sample label, or any type of computer encodable sample identification) can be encoded directly (2 -bits per base) or with redundancy in the form of multiple bases per code bit, additional parity bits (including error detection and correction), or any information representation scheme that can be encoded in DNA or another nanopore-readable polymer with 2 or more distinguishable units.
  • the association of a code with an individual peptide molecule is accomplished through the covalent linkage between the two that is established in the peptide library preparation workflow of the invention.
  • the root oligo is prepared for conventional high-throughput DNA sequencing, generating a sequence comprising the peptide sequence (or a representation of it), its identity as a TARGET or STANDARD, and its sample of origin.
  • DNA sample barcodes can be detected by sequential hybridization with labeled oligos complementary to the sample barcodes.
  • Complementary oligos can be labeled with a variety of fluorescent or colored dyes, with quantum dots or other optically detectable nanoparticles, with enzymes capable of generating a localized signal (e.g., luminescence), or a variety of other compositions known in the art for the generation of a spatially localized extemally- detectable signal.
  • a set of barcode sequences is used that are designed to have high specificity (minimal cross-hybridization of one barcode with the probes complementary to the other barcodes).
  • the lengths of the barcode sequence modules generally impact the specificity with which they are recognized by complementary probes, the kinetics with which they bind and the temperature at which they can be removed after being read (i.e., analogous to the “melting temperature”).
  • modules of 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more bases are used.
  • a larger set of barcodes is used, but constructs need include only a limited number of these. For example, if 11 distinct barcodes are used, but only 4 or fewer of these are included in any given construct, several hundred different samples can be uniquely identified. Use of fewer barcodes in a construct is advantageous since it reduces the length and cost of the sample barcoding oligos required: individual distinct barcodes may be 20-30 bases long.
  • a further improvement in sample barcoding is provided by the use of Hamming codes.
  • Hamming codes For example, in a coding scheme with 10 data bits and 4 parity bits (14 bits total, here corresponding to 14 distinct DNA barcodes), and using only 3 or 4 of these barcodes in any individual peptide construct, it is possible, using the features of the Hamming scheme, to identify and correct any single error in detection of any of the individual barcodes. Error detection is of great value in preventing mis-attribution of molecules to the wrong sample, which could result in erroneous quantitative results derived from errors in the counts of TARGET and STANDARD molecules in a sample digest.
  • Figure 10 illustrates the process of decoding an example construct in 16 cycles using a Hamming code: in each cycle one of 16 oligo probes complementary to one of the DNA codes is applied, detected when present, and removed.
  • the first 2 cycles involve probes that determine whether the peptide is a TARGET or a STANDARD (either one probe binds or the other).
  • 14 probes complementary to each of the 14 DNA sequence modules described above are successively applied, detected when present, and removed.
  • the result is a 14-bit binary number that is capable of identifying one of 105 different samples with single (1-bit) error correction and double (2 -bit) error detection (referred to as “SECDED” in the art).
  • SECDED single (1-bit) error correction and double (2 -bit) error detection
  • Some embodiments of the invention comprise a series of steps to transform a protein- containing sample into an enriched standardized digest sample, or a flattened enriched standardized digest sample, prior to sequence-sensitive single molecule detection and counting.
  • the invention is equally applicable to protein samples from sources such as blood, blood plasma and blood serum, as well as other sources, such as tissue homogenates, animal, plant or microbial samples, other body fluids, environmental samples and the like.
  • An important feature of the invention is its generality, allowing the design of similar protocols and using similar reagents and equipment to prepare peptide libraries suitable for analysis of a wide range of different proteins in a variety single molecule detection systems.
  • the invention makes use of specific features of peptides generated by particular enzyme cleavages; features provided by multiple click chemistry pairs; requirements for specific peptide and oligo orientation; multiple levels of barcoding; and detailed control of the capture and enrichment of peptide:oligo constructs by specific affinity reagents.
  • steps of the general method may be carried out in a different order than that outlined below. Nevertheless the invention requires that steps of sample digestion to peptides, generation of TARGET constructs from TARGET peptides, and addition of STANDARD constructs to the digest (thus creating a standardized digest) must precede enrichment of TARGET constructs and STANDARD constructs from the standardized digest using BINDERS.
  • a general approach for sequence-based protein quantitation involves digesting sample proteins (e.g., with trypsin) into peptides.
  • sample proteins e.g., with trypsin
  • disulfide bonds may be broken and proteins denatured to disrupt secondary and tertiary structure.
  • Samples can be any kind of protein-containing sample without limitation, including body fluids, tissues, tissue lysates, tissue extracts, bacterial, fungal, animal and plant samples, recombinant proteins including protein drugs, food products, and the like.
  • preparation of proteolytic peptides from a complex sample is carried out by a series of reagent addition steps which may include: denaturing a protein sample (e.g., with detergents such as deoxycholate or CHAPS, organic solvents, urea or guanidine HC1), reducing the disulfide bonds in the proteins (e.g., with tris(2- carboxyethyl)phosphine (TCEP), dithiothreitol or mercaptoethanol), alkylating the cysteines (e.g., by addition of iodoacetamide, or iodoacetic acid, which react with the free -SH group of cysteine preventing reformation of disulfide bonds), quenching excess iodoacetamide by addition of more dithiothreitol or mercaptoethanol, and (after removal or dilution of the denaturant) addition of the selected proteolytic enzyme (e.g.
  • trypsin followed by incubation to allow digestion.
  • a chemical inhibitor e.g., TLCK
  • denaturation through heat or addition of denaturants, or both
  • removal of the trypsin if the trypsin is on a solid support.
  • proteolytic digestion including automated methods using only liquid addition steps (14). In some embodiments it has been shown that automated digestion of biological samples can be very reproducible, exhibiting minimal variations (e.g., CV ⁇ 2%) between replicate samples.
  • a desired peptide can be liberated by proteolysis without the need for disulfide reduction and alkylation (e.g., peptides that do not contain cysteine residues, and are not sterically constrained by nearby disulfide bridges), and in some cases without denaturation (e.g., peptides exposed on the surfaces of a protein).
  • a range of alternative proteolytic enzymes can be used instead of trypsin to produce peptides defined by specific cleavage sites (including GluC, Lys-C, Arg-C, chymotrypsin, papain, pepsin, V8 protease, and the like), and chemical agents can also be used (e.g., CNBr cleavage at methionine residues).
  • a simplified digestion protocol comprising addition of protease to a liquid protein-containing sample without prior denaturation, reduction of disulfides or blockage of resulting cysteines.
  • heat is used to improve protein digestion by partial denaturation of protein substrates before the trypsin (or other proteolytic enzyme) is denatured, often using a variable temperature profile such as a ramp from room temperature to a higher temperature (e.g., 70C for a plasma sample).
  • digestion can be carried out by immobilized proteolytic enzymes such as trypsin.
  • Trypsin has been immobilized in the art at very high concentrations (e.g., on derivatized porous nylon, PVDF or nitrocellulose membranes) and used to perform very rapid (e.g., ⁇ 1 minute) digestion of proteins.
  • Proteolytic digestion disrupts protein: protein interactions by largely if not completely eliminating tertiary structure when a large protein is reduced to short peptides free to diffuse apart.
  • This conversion of a large complex protein molecule to a series of short peptides offers a significant improvement in protein quantitation, since it removes the primary sources of assay interferences observed with immunoassays (in which a protein: protein interaction that blocks an epitope used by an assay antibody can result in a false negative result, while false positives can result from bridging interactions involving protein components not expected to be involved in the assay).
  • An example in which tryptic digestion overcomes such an interference is the SISCAPA assay for thyroglobulin (55).
  • An average-length human protein produces about 50 peptides upon tryptic digestion, from which an assay designer can choose one or more peptides suitable for specific applications.
  • This feature expands the range of detection alternatives compared to intact protein detection. Since intact proteins are so diverse in their physical properties, and therefore difficult to measure in many circumstances, the ability to select a proteotypic peptide from a range of alternatives as a stoichiometric surrogate for the intact protein is a major advantage of the digestion approach. It is often observed that within every “bad” protein there is at least one “good” peptide for a given application.
  • TARGET peptides it is advantageous for TARGET peptides to have a single n- terminal amino group (i.e., “single amino peptides”), and in this case digestion is preferably carried out using an enzymatic protocol in which most peptides of an appropriate length do not contain lysine (which contains a side chain amino group), and thus have a unique amino group at the n-terminus of the peptide chain.
  • Another approach to decrease the proportion of double-amino peptides is to chemically convert the lysine epsilon-amino groups to homoarginine in a guanidination reaction with methylisourea (56).
  • a guanidination reaction with methylisourea (56).
  • double amino peptides digestion with Lys-C in place of trypsin typically leads to generation of peptides having both an n-terminal amino group and a c-terminal lysine with its side chain amino group.
  • peptides it is advantageous to link peptides to a specific type of molecule at or near the n-terminus, and link the peptide to a different type of molecule at or near the c- terminus - such an approach can be used to generate a construct in which the peptide is “in- line” between preceding and following polymeric components (such as oligonucleotides) useful in detection and identification of the construct.
  • the invention provides an improved alternative method making use of sequential proteolytic cleavages and coupling procedures comprising the following steps: 1) cleavage of sample proteins c-terminal to lysine residues (e.g., using the enzyme Lys-C) to produce “Lys peptides”, a number of which may include internal Arg residues; 2) reaction of the lysine side chain amino groups and the exposed peptide n-terminal amino groups of these peptides, in one or more steps (which may, for example, include click chemistry ligations), with a first added molecule (e.g., a first oligonucleotide); 3) removal or depletion of any remaining uncoupled amount of this first added molecule (or intermediate chemical components); 4) cleavage of the Lys peptides at internal arginine residues when present in a second proteolytic step (e.g., by addition of enzymes such as trypsin, Arg-C, etc.), thereby
  • the single molecule methods used in some embodiments of the present invention are not typically biased against long peptides.
  • recognition of multiple epitopes in a peptide can provide increased specificity, as well as the possibility of leveraging avidity effects to improve BINDER capture efficiency.
  • Each of the two sequential linkages to ends of the peptide may, in some embodiments, involve a sequence of reactions, for example an initial reaction with an amine coupling reagent such as an NHS or sulfo-NHS conjugate of a click reagent (e.g., NHS-BCN, NHS-DBCO, NHS-TCO, NHS-tetrazine, NHS-azide, an NHS-alkyne, or the like), followed by reaction of the click group thus introduced with a corresponding click chemistry partner attached to the molecule to be added (e.g., reaction of a BCN-modified peptide with an azide-modified oligo tag or barcode).
  • an amine coupling reagent such as an NHS or sulfo-NHS conjugate of a click reagent
  • a click reagent e.g., NHS-BCN, NHS-DBCO, NHS-TCO, NHS-tetrazine, NHS-azide, an NHS-alkyne, or
  • the first linkage step makes use of an NHS-BCN reagent to activate peptide amino groups which are then reacted with an azide-activated oligo to accomplish the first linkage.
  • a second linkage is carried out by activating the freshly-exposed n-terminal amino group with NHS-tetrazine and subsequent reaction with a DCO-activated oligo. Because of the general orthogonality of BNC-Azide and tetrazine-DCO click reactions, these two steps are unlikely to cross-react even if some amount of reagent persists from the first modification step.
  • the first step of the 2-step procedure is carried out to the point of activating the lysine amino group with a member of a first click pair (e.g., BCN) but without linkage to the first oligo, and a second activation step, following the second proteolytic step, is accomplished using a member of a second orthogonal click pair (e.g., tetrazine), after which both activated peptide groups can be reacted simultaneously with the respective orthogonal activated oligos (i.e, with an azide-activated oligo at the lysine site and a TCO-activated oligo at the n-terminus).
  • a first click pair e.g., BCN
  • a second activation step following the second proteolytic step
  • the overall yield of correctly modified products can be increased by removal of the modifying reagents (e.g., NHS-BCN and an azide- labeled oligo) used in the first step (e.g., coupling an oligo to the lysine amino group via BCN- azide click coupling) before executing the second cleavage to expose fresh reactive groups (e.g., n-terminal amino groups).
  • the modifying reagents e.g., NHS-BCN and an azide- labeled oligo
  • This removal step decreases the probability that the peptide reactive groups exposed by the second cleavage (e.g., the amino terminal NH2 groups shown among the peptides of Figure 6D) will be modified in the same way as the amino groups exposed after the first digestion (e.g., addition of group Ml), and instead react only with a second reagent or reagents that introduce a different addition (labeled M2 in Figure 6E).
  • the peptide reactive groups exposed by the second cleavage e.g., the amino terminal NH2 groups shown among the peptides of Figure 6D
  • the first digestion e.g., addition of group Ml
  • the reagents involved in adding group Ml to the peptides are removed by separation of peptides from the solution phase (e.g., by capturing the peptides on a suitable support such as a reversed phase or ion exchange support and then washing the soluble reagents away), by size exclusion separation to separate peptides from low molecular weight reagents (such as NHS-BCN, etc.), or by exposing the mixture to a solid support comprising a substantial content of free amino groups to which any un-reacted amino-modifying reagents can couple before removal of the support.
  • a suitable support such as a reversed phase or ion exchange support and then washing the soluble reagents away
  • size exclusion separation to separate peptides from low molecular weight reagents (such as NHS-BCN, etc.)
  • exposing the mixture to a solid support comprising a substantial content of free amino groups to which any un-reacted amino-modifying reagents can couple before removal of the support
  • peptides whose sequences are bounded by a c-terminal lysine and an n-terminal amino acid that is immediately preceded in the protein sequence by an arginine will, with high likelihood, be modified with distinct added groups on the two termini, as desired.
  • specific peptides with these characteristics i.e., a preceding arginine and c-terminal lysine
  • TARGETS can be selected as TARGETS and efficiently incorporated in a predetermined orientation (i.e., n-term to c-term or vice versa) into constructs amenable to single molecule detection and counting.
  • an in-line construct is assembled that comprises, in order, an oligonucleotide in 5’ to 3’ orientation, a peptide in C-term to N-term orientation (opposite to the conventional method of writing a peptide sequence) and a further oligonucleotide in 5’ to 3’ orientation.
  • the final (3’) oligo is omitted from such a construct, the peptide’s n-terminus is exposed and available for sequential degradation by Edman (or alternative) chemistries used to read peptide sequence (e.g., Encodia or Quantum-Si technologies).
  • the use of the two enzymes in sequence, with the first enzyme cleavage step followed by coupling of an added molecule to available amino groups, while leaving the n-terminal amino group created by the second enzyme cleavage step is a novel and effective method for the generation of a peptide-oligo construct having a free, unmodified n-terminal amino group available for cyclical degradative sequencing.
  • a linear construct comprising a leading oligonucleotide, a central peptide, and a trailing oligonucleotide according to the invention.
  • the method of the invention allows each segment to be assembled in a specific orientation as required by a detector such as a nanopore regulated by a DNA motor; e.g., a leading oligo oriented 5’-to-3’, followed by a peptide that is oriented c-terminal- to-n-terminal, followed by an oligo oriented 5’-to-3’ (as shown in Figure 12).
  • the second amino group modification can be omitted (i.e., just using sequential first digestion-modification-second digestion) to produce tethered peptides with free n-termini as required by some degradative sequencing single molecule detectors.
  • a c-terminal lysine is a preferred in-line linkage site in multiple embodiments of the sequential cleavage method described here
  • other cleavage sites can be used in place of Arg cleavage (as expected above in a second cleavage using trypsin), since many proteolytic enzymes, as well as chemical agents such as CNBr, generate a fresh n-terminal amino group when they cleave a polypeptide.
  • cleavage specificities are available: for example, the enzymes AspN, GluC, chymotrypsin, elastase or even relatively non-specific proteinase K can be used in combination with Lys-C or equivalent enzymes to generate different sets of double-amino peptides and extend the applicability of the approach to proteins not well-covered by the R. . K method.
  • a related alternative embodiment makes use of chemical linkages to peptide carboxyl groups instead of amino groups, and employs a series of steps to 1) cleave sample proteins n- terminal to Asp residues (e.g., using the enzyme Asp-N) to produce “Asp peptides” having an n-terminal Asp residue; 2) react the Asp side chain carboxyl groups and the exposed peptide c-terminal carboxyl groups of these peptides (and any internals Glu carboxyl side-chains), in one or more steps (which may, for example, include click chemistry ligations), with a first added molecule (e.g., a first oligonucleotide); 3) removal or depletion of any remaining uncoupled amount of this first added molecule (or intermediate chemical components); 4) cleavage of the Asp peptides resulting from the first cleavage at one or more selected internal residues (e.g., by addition of trypsin to cleave K or R
  • TARGET peptides those that lack an internal Glu residue since this would comprise a 3 rd carboxyl group when AspN is used initially.
  • a symmetrical situation would obtain if an enzyme with GluN specificity were used initially, and preferred TARGET peptides would be those lacking internal Asp residues.
  • appropriate STANDARDS can be produced according to any of the 2-step methods described above by carrying out a similar set of steps using a synthetic peptide with an extended n-terminal sequence providing the same second-step cleavage site as the process applied in processing samples.
  • the cognate STANDARD and TARGET constructs are distinguished by sequences incorporated into one or more oligos linked to the peptide.
  • one of more of the linkage steps in assembling STANDARD constructs makes use of a different click chemistry pair than that used in assembling TARGET constructs.
  • multiple different 2-step peptide modification processes as described above are carried out in parallel on a sample, and their results combined to provide a collection of TARGET peptide constructs providing improved protein coverage or detection performance compared to a single 2-step procedure (e.g., the R...K method described initially).
  • TARGET peptide constructs are created by linkage of TARGET tags to TARGET peptides in a sample digest.
  • TARGET tags are linked to common chemical features of peptides, for example peptide n-terminal and/or lysine epsilon amino groups. The common occurrence of such features implies that it would be advantageous to convert a large fraction, and potentially all, peptides in a digest into constructs of the form of TARGET constructs, irrespective of whether each such construct is to be measured against a STANDARD construct.
  • a TARGET peptide in a sample or set of samples is “standardized” by addition of a known quantity of its respective STANDARD (the STANDARD based on the sequence of the TARGET peptide with modification as disclosed above).
  • the resulting “standardized sample digest” may be so standardized with respect to one TARGET peptide, or to multiple TARGETS (requiring multiple cognate STANDARDS).
  • Multiple STANDARDS may be added together at one time, or at different times (e.g., after part of a standardized sample is analyzed, additional STANDARDS may be added to permit subsequent analysis additional TARGET peptides.
  • the added quantity of a STANDARD is known in absolute quantitative terms (e.g., in grams/sample or grams/liter, in moles/sample or moles/liter, or molecules per sample or molecules per liter), and in some embodiments the amount of STANDARD added is known to be the same as, or have a defined ratio to, the amount of STANDARD added to other samples (thus allowing multiple samples to be compared on a consistent scale, particularly useful when samples run in a batch are compared with one another, or in longitudinal studies measuring changes in amounts of biomarker proteins occurring between serial samples from an individual).
  • a sample to which STANDARDS have been added corresponding to a set of TARGET peptides is considered a “standardized sample” with respect to those TARGET peptides.
  • one or more STANDARDS are added to a protein sample before or during digestion of the sample proteins to peptides.
  • one or more STANDARDS are added after digestion but prior to enrichment.
  • additional STANDARDS are added to a digest sample that has been previously analyzed according to the invention for an earlier set of TARGET peptides and STANDARDS, enabling cycles of measurement for successive panels of peptides in a sample.
  • the quantity of STANDARD added to a sample is chosen based on the amount of the respective TARGET peptide expected to be present in the sample(s). Specifically, the quantity of STANDARD may be based on the average or median amount of TARGET peptide observed or known to be present in similar samples, so that the ratio of TARGET peptide to STANDARD molecules falls in a range centered close to 1.0 (i.e., equal amounts). Depending on the variation in TARGET peptide amount in the samples, the ratio may for example range from 0.5 to 2, or 0.2 to 5, or 0.1 to 10, or 0.01 to 100.
  • STANDARD amount based on TARGET peptide ranges is that it best avoids situations where the ratio is very large (e.g., 1,000: 1).
  • Extensive investigations of the population ranges of clinical protein analytes (59) have shown that the observed range is protein-specific.
  • High-abundance blood proteins such as albumin or hemoglobin usually vary by small amounts (much less than 2-fold), while acute phase proteins such as C-reactive protein (CRP) or serum amyloid A (SAA) can increase by 1000-fold in a serious infection.
  • CRP C-reactive protein
  • SAA serum amyloid A
  • the optimal situation for efficient quantitation by counting TARGET and STANDARD peptide molecules is one in which the ratio is as close to 1 : 1 as practically possible.
  • a person skilled in the art could design an assay according to the invention to measure hemoglobin using the population average value for the TARGET peptides as the STANDARD amount, while in CRP it could be preferable to set the STANDARD amount higher than the population average TARGET amount so as to better center the TARGET to STANDARD ratio for this highly inducible protein closer to 1.0.
  • the amounts of molecules of different TARGET peptides measured together in a multiplex assay should be as nearly equal as possible. This arrangement results in optimal precision achievable with a given total capacity for counting molecules according to counting statistics, and can be achieved by stoichiometric flattening during enrichment as described below.
  • two or more TARGET peptides are selected for a protein, yielding independent measurements of the protein’s amount that can be combined to deliver improved precision, or used to circumvent sequence (i.e., genetic) or post-translational variation in TARGET peptides in a population of samples.
  • sequence i.e., genetic
  • post-translational variation in TARGET peptides in a population of samples.
  • different TARGET peptides from the same protein will be present in equal amounts after complete digestion (i.e., present in molar amounts equal to the molar amount of the parent protein).
  • multiple TARGETS are selected from a protein that exhibits highly variable amounts in relevant samples, and their respective cognate STANDARDS are added at different amounts so that the TARGET -to- STANDARD ratio of at least one of the TARGETS is close enough to 1 : 1 to be efficiently countable and thereby furnish an accurate ratio measurement.
  • three TARGET peptides are selected and their respective STANDARDS added to the sample digest at O.lx, l.Ox, and 10.
  • the amounts of the respective BINDERS used to enrich the peptides are adjusted separately to bring the peptides into flat stoichiometry (near equivalence) before detection and counting.
  • MS mass spectrometry
  • the dynamic range of modem triple-quadrupole MS instruments approaches 100,000-fold for a given analyte (e.g., peptide), and the MS (or LC-MS) system can accommodate two different peptide molecules that differ in abundance by 1 million-fold and still produce some quantitative information on both (e.g., by selecting a low-efficiency detection mode such as an infrequent MRM fragment mass for the high abundance molecule and a very efficient detection mode for the low abundance molecule).
  • a low-efficiency detection mode such as an infrequent MRM fragment mass for the high abundance molecule and a very efficient detection mode for the low abundance molecule.
  • the detection process is typically driven by a chromatographic separation of 2 to 60 minutes prior to introduction of separated peptides into the MS, and is essentially insensitive to the total number of molecules in the applied sample: the analytical run will “consume” all the applied sample and will occupy the same period of time regardless of the number of molecules being analyzed.
  • the situation pertaining in single molecule detection and counting is very different: the number of molecules sequenced and counted depends directly on time.
  • the number of molecules analyzed is a direct function of the time required to observe a typical molecule’s sequence (e.g., 0.25sec for a 50bp equivalent length oligo:peptide construct at a typical rate of 200bp/sec through a single nanopore) multiplied by the number of pores operating in parallel: running the device twice as long will typically detect twice as many peptides.
  • nanopore detection methods are inherently limited by the number of molecules that can be sequenced per time.
  • the precision of ratios between TARGET and STANDARD peptides is largely determined by counting statistics, with more counts yielding higher precision.
  • this principle is applied to generate enriched peptide samples in which a) the number of added STANDARD molecules is close to the average number of TARGET peptide molecules in typical samples, and b) the sum of TARGET + STANDARD molecules is approximately the same for all proteins and peptides being measured in a multiplex panel.
  • This principle referred to herein as “Stoichiometric Flattening” is described in greater detail below.
  • degradative peptide sequencing methods a series of sequential steps is applied in parallel to a large number of immobilized peptide molecules, with the number of molecules being determined by the physical scale of the device used (number of peptides that can be immobilized and resolved by the detector) and the number of such runs.
  • a large but limited number of immobilized peptide molecules is detected in a run, with the number of molecules being determined by the physical scale of the device used (number of peptides that can be immobilized and resolved by the detector) and the number of such runs.
  • the overall throughput in terms of the number of molecules analyzed per unit of time is determined by the number of molecules sequenced per run, the duration of a run, and the number of runs (for a batch method).
  • efficiency is maximized by sequencing similar numbers of each TARGET and STANDARD peptide, instead of allowing one or more high abundance peptides to occupy a large fraction of the capacity, reducing the numbers of molecules from lower abundance TARGETS and thus the precision of their measurement.
  • a fixed amount of each STANDARD e.g., an equal volume aliquot from the same STANDARD stock solution
  • a multiplicity of samples e.g., an equal volume aliquot from the same STANDARD stock solution
  • This approach enables relative comparison of protein amounts between samples, but does not directly provide absolute quantitative information (e.g., in mass or concentration units) without the use of external calibrators (see e.g., US provisional patent application 63/213,371 - entitled Calibration of Analytical Results in Dried Blood Samples, filed 6/22/21, incorporated herein in its entirety).
  • the amounts of STANDARD molecules added represent known quantitative amounts (i.e., numbers of molecules, mass, or concentration), in which case the absolute amounts of TARGET peptides can be estimated by multiplying the STANDARD amounts by the observed TARGET peptide to STANDARD ratios.
  • STANDARDS are generated from synthetic (e.g., recombinantly expressed) protein constructs whose digestion yields STANDARDS in relative ratios defined by their copy number in the construct’s sequence (see for example Patent Application US 2006/0154318).
  • the STANDARDS are provided in physical forms enabling simplified manipulation and addition (as described in US9274124).
  • the STANDARDS are added as peptides in solution.
  • the amounts of each STANDARD added can be determined according to (or equal to) the average or baseline levels of corresponding TARGET peptides observed in a subject’s prior samples, thus providing STANDARD levels that are more or less equal to the expected TARGET peptide levels for that subject.
  • This approach provides optimal precision and efficiency for the measurement of longitudinal samples from that subject, and represents an ideal case of personalized protein measurement with the potential to maximize precise detection of small protein changes from baseline levels over time. 7.4 ENRICHMENT OF TARGET PEPTIDES AND STANDARDS.
  • peptide-oligo constructs according to the invention are purified by a process that reversibly captures peptides (e.g., reversed-phase adsorbents such as Cl 8 resins) with the result that oligonucleotides and other non-peptide components not part of peptide-oligo constructs can be washed away and thus removed.
  • a process that reversibly captures peptides e.g., reversed-phase adsorbents such as Cl 8 resins
  • peptide-oligo constructs according to the invention are purified by a process that reversibly captures oligonucleotides (e.g., adsorbents such as Ampure resins) with the result that peptides, remaining proteins and other non-peptide components not part of peptide-oligo constructs can be washed away and thus removed.
  • oligonucleotides e.g., adsorbents such as Ampure resins
  • addition features of the peptide-oligo constructs are used to isolate them, for example binding of a biotin group engineered into the oligo, the peptide or the linkage between them can be captured by immobilized streptavidin as a means of cleaning up the desired constructs.
  • BINDERS are used to carry out specific affinity enrichment of the respective cognate TARGET peptide and STANDARD pairs.
  • the TARGET peptide, its STANDARD and the BINDER designed to bind them collectively form a cognate set of molecules specialized for the measurement of a specific TARGET peptide and thus its parent protein.
  • BINDERS are immobilized on magnetic beads and these beads mixed with the standardized digest and incubated to allow binding of peptides to the BINDER.
  • the BINDERS can be bound to commercially-available Dynabeads G via Protein G’s affinity for the Fc domain of IgG, and optionally covalently linked to the beads using DMP crosslinking or other equivalent crosslinking methods forming bonds between the magnetic bead and the BINDER.
  • BINDERS are bound to other types of magnetic particles such as Tosyl-activated beads, or otherwise chemically bound to particles that can be manipulated to allow exposure to and removal from a sample.
  • the BINDERS bind the respective TARGET peptides and STANDARDS when placed in contact with them.
  • BINDERS are in contact with a standardized sample digest for a 30-minute incubation period with shaking to keep the beads suspended.
  • BINDERS are incubated with standardized digest for shorter periods (e.g., 1, 2, 5, 10, 15, 20 or 25 minutes) and in some embodiments, BINDER are incubated with standardized digest for longer periods (e.g., 45, 60, 90, 120, or 180 minutes, or 4, 5, 6, 9, 12, 18 or 24 hours).
  • the BINDER beads can be collected together using a magnet and either removed from the digest (for example using a Kingfisher device provided by ThermoFisher); held in a vessel (for example by magnetic attraction to the side of a well of a 96-well plate) while the digest solution is removed to another container by a pipetting device (e.g., an Agilent Bravo, Beckman Counter Biomek, Hamilton, Tecan or other liquid handling robot); held magnetically in a conventional pipette tip while surrounding liquid is expelled and replaced (e.g., as in the established “Magtration” technology); processed in a “magnetic bead trap” (60) or processed manually (e.g., by manipulation of vessels, magnets and handheld pipettes).
  • the standardized digest remaining after separation from BINDERS may be preserved apart from the BINDERS and stored or subjected to additional processes to measure additional constituents at a later time.
  • the beads with BINDERS and specifically bound peptide cargo are washed by addition to, and mixing with, aliquots of a wash solution, after which the beads are recollected and separated from the wash.
  • the beads are washed 1, 2, 3, or 4 times with volumes of 50 to 400uL of wash solution, which may include buffers (e.g., PBS or Tris, typically at pH between 6.0 and 8.5 when antibody BINDERS are used), gentle detergents (e.g., CHAPS or deoxycholate), and/or low concentrations of an organic solvent (e.g., 5-20% acetonitrile added to help remove peptides bound to beads non-specifically).
  • buffers e.g., PBS or Tris, typically at pH between 6.0 and 8.5 when antibody BINDERS are used
  • gentle detergents e.g., CHAPS or deoxycholate
  • an organic solvent e.g., 5-20% acetonitrile added to help remove peptides bound to beads non-specifically
  • the BINDERS are designed or selected to have half-off-times (the time period over which half the bound molecules become unbound, i.e., the dissociation half-life) longer than the time required to execute the sequence wash steps (typically 10-15 minutes using current laboratory automation systems).
  • TARGET and STANDARD peptides that leak from the BINDER can be re-bound by the same or other BINDER sites before being lost.
  • experimentation with specific BINDERS, TARGETS and STANDARDS, specific wash solution compositions and temperatures, specific wash volumes and vessel geometries, and specific sample digest matrices is required to optimize a) the enrichment of the TARGETS and STANDARDS and b) the removal of the other digest components.
  • the purer the TARGETS and STANDARDS are after enrichment, the better the invention will function to measure them precisely.
  • the recovery of TARGETS and STANDARDS from a digest by enrichment using BINDERS is evaluated by successively contacting 2 or more separate aliquots of BINDER with the digest, and comparing the amounts eluted from the first and second BINDER aliquots.
  • an effective BINDER will typically capture 80% or more of these peptides on the first capture step, and the second (and possible subsequent) capture steps will capture successively smaller amounts of the peptides.
  • the ratio of TARGET and STANDARD captured in the first capture divided by the amount of these peptides in the sum of first and second captures provides a useful index of the overall recovery.
  • BINDERS immobilized on magnetic beads are contained within microfluidic systems capable of moving the beads between different liquid volumes to effect the steps of the invention.
  • Microfluidic systems and technologies well known in the art allow use of reduced liquid volumes (and thereby less dilution of low-concentration analytes such as enriched peptides) and more complex, multi-step chemical processes with reduced losses as compared to conventional lab-scale (i.e., 5-500uL) liquid handling processes (for example in a magnetic bead trap device (60) .
  • BINDERS are immobilized on columns through which sample digest and wash solutions can be passed (3), typically using a liquid chromatography system.
  • BINDERS are contacted with a standardized digest in solution (i.e., BINDERS free in solution, not bound to a support), thereby maximizing freedom of diffusion and potentially providing faster binding to TARGET and STANDARD peptides or respective VEHICLE constructs.
  • the BINDERS can be themselves captured on magnetic beads (e.g., protein G coated beads in the case of antibody BINDERS, streptavidin coated beads in the case of BINDERS that have been previously biotinylated, etc.) or on columns functionalized with equivalent capture functionalities.
  • peptides captured by and eluted from BINDERS are subjected to one or more additional cycles of BINDER enrichment.
  • bound peptides are eluted from BINDERS by a change to specific elution solution conditions (e.g., pH 2.5 in the case of antibody BINDERs), reversal of these conditions (e.g., neutralization to pH near 7.0) can allow the peptides to bind to respective BINDERS once again.
  • BINDER typically fresh BINDER
  • a second BINDER enrichment cycle thus begins with a much smaller amount of total peptide material, in which the TARGET and STANDARD peptides represent a much larger fraction of dissolved material (mainly peptides).
  • a second BINDER enrichment cycle is carried out using fresh BINDER (i.e., BINDER that has not previously been exposed to complex digest), and this additional cycle further depletes non-target matrix peptides, while recovering a large fraction of the TARGET and STANDARD peptides, and resulting in a purer sample of the TARGET and STANDARD peptides of interest.
  • fresh BINDER i.e., BINDER that has not previously been exposed to complex digest
  • Increasing the fraction of peptides that are desired TARGET or STANDARD peptides directly improves the efficiency of single molecule detection aimed at measuring those peptides by decreasing the time and resources spent sequencing other peptides.
  • one or more additional enrichment cycles are carried out using a smaller amount of BINDER (e.g., a smaller volume of beads) than used in the initial enrichment cycle, resulting in an opportunity to reduce the volume in which peptide constructs are eluted and thereby increasing the concentration of TARGET and STANDARD peptide constructs introduced into a detector.
  • BINDER e.g., a smaller volume of beads
  • the ability to deliver TARGET and STANDARD peptides in a small volume, or on a small number of beads, can improve the efficiency of single molecule detection.
  • Providing peptides for single molecule detection in as concentrated a form as possible can be important in specific detection methods, for example in delivering peptides to the vicinity of a sequencing nanopore as described below.
  • a first BINDER capture step is carried out using BINDER immobilized on a large number of small (e.g., 1, or 2.8 or 5 micron diameter) magnetic beads, thus maximizing the dispersion of BINDER in the sample digest volume and decreasing the diffusion distance and time required for peptide capture, after which the captured peptides are eluted and recaptured by fresh BINDER immobilized on a smaller number of beads, with the result that the peptides captured in the second round are both purer (fewer non-TARGET peptide molecules) and more concentrated (e.g., when the beads are collected magnetically into a mass for removal from contact with the peptide-containing liquid).
  • small e.g., 1, or 2.8 or 5 micron diameter
  • recovery of TARGET and STANDARD peptides in a very small volume of beads allows the bound peptides to be exposed to very small volumes (equal to or slightly greater than the included volume of the bead mass) of reagents used in the preparation of peptides for linkage to VEHICLES as described below.
  • a second capture makes use of BINDER bound to a small number of larger (e.g., 10-40 micron diameter) beads each having a greater BINDER capacity, such that each bead can be taken through the series of chemical steps to prepare it for and/or complete VEHICLE linkage to it in a separate container, as described below.
  • a first BINDER enrichment cycle is carried out to recover and purify the TARGET and STANDARD peptides from a mass of sample digest, and the same or similar peptide-specific binders used to identify these TARGET and STANDARD peptides in a single molecule detection system.
  • the first BINDER enrichment cycle can thus serve to remove non-TARGET peptides present in the digest (thus minimizing analytical capacity wasted on irrelevant peptides), and optionally to improve the stoichiometric flatness of a series of different TARGETS to be detected and counted (as described in detail below).
  • a fluorescently-labeled BINDER can be used to detect its cognate TARGET and STANDARD peptides in an optical imaging system of the kind used in high-throughput DNA sequencing or in similar protein detection systems (e.g., US 2021/02397, and thereby used to count the numbers of such molecules.
  • a second class of BINDER that specifically recognizes a unique tag present in the STANDARD but not in the TARGET peptide is used to separately distinguish the STANDARD peptide molecules from the TARGET molecules.
  • this approach is used to identify all STANDARD molecules in one recognition step (using a BINDER specific to the STANDARD molecule tag), and each different TARGET+ STANDARD pair is identified by a separate detection step using the TARGET-specific cognate BINDER.
  • the tag distinguishing the STANDARD molecules can be an added amino acid sequence on either amino or carboxy terminus of the TARGET peptide sequence (for example the well-known FLAG peptide sequence used in recovering expressed proteins), a chemical group bound to the STANDARD (such as biotin), or any of a variety of distinctive chemical structures unlikely to be found in the group of TARGET peptides.
  • this approach can be applied to count whole protein molecules instead of proteolytic peptides using peptide-sequence-specific BINDERS to identify their cognate linear sequence epitopes in intact target protein molecules.
  • a first BINDER enrichment cycle can use BINDERS specific for an intact protein as well as BINDERS specific for linear peptide epitopes, and STANDARDS can be versions of the intact protein with any of a variety of unique tags as for peptides.
  • a plurality of TARGET and STANDARD peptides is enriched by the corresponding plurality of BINDERS, and the relative amounts, kinetic properties or solution conditions of the BINDER enrichment are selected or adjusted so as to accomplish some degree of stoichiometric flattening; i.e., to diminish differences in the relative amounts of different TARGET + STANDARD peptide pairs.
  • some degree of stoichiometric flattening i.e., to diminish differences in the relative amounts of different TARGET + STANDARD peptide pairs.
  • RNA e.g., mRNA
  • the use of mass spectrometric detection makes it possible to achieve some degree of stoichiometric flattening by choosing target peptides that have very different ionization properties; i.e., choosing peptides with extremely high MS detection performance to represent low abundance proteins (where sensitivity is a major challenge) and choosing peptides with much lower performance to represent higher abundance molecules.
  • target peptides that have very different ionization properties; i.e., choosing peptides with extremely high MS detection performance to represent low abundance proteins (where sensitivity is a major challenge) and choosing peptides with much lower performance to represent higher abundance molecules.
  • Such peptide choices are an important component of stoichiometric flattening in MS methods, in addition to the adjustment of relative BINDER enrichments as described above.
  • each peptide molecule takes some time to be analyzed - during this time one pore is occupied with one molecule, or a fixed number of molecules requires a given block of time to analyze. While the throughput of such a process can be increased by providing multiple pores or molecule immobilization sites, or decreasing read time, there remains a direct relationship between the number of molecules detected and the time devoted to the analysis of that sample.
  • the detection of BINDER-enriched TARGET and STANDARD peptides can be facilitated by certain chemical modifications, including covalent linkage to polymeric molecules on one or both ends (i.e., on or near peptide n- or c-termini), or linkage to a support, surface or bead, resulting in constructs with improved uptake and sequence readout by a sequence-sensitive detector, and/or incorporating additional information beyond the peptide itself in the form of detectable polymer sequences (e.g., DNA sequence tags).
  • detectable polymer sequences e.g., DNA sequence tags
  • peptides are chemically modified while bound non-covalently to a BINDER (e.g., a BINDER that is used to enrich them from the peptide sample).
  • BINDER e.g., a BINDER that is used to enrich them from the peptide sample.
  • Modification of the peptides while thus non-covalently anchored to a BINDER facilitates exchange of reagents between steps of a multi- step series of chemical modifications, avoids the necessity for other more cumbersome purification methods between steps (e.g., to separate modified peptides from reagents and unmodified peptides), and allows the peptides to be concentrated when necessary (e.g., by gathering magnetic beads bearing the BINDER into a solid mass with minimal included liquid).
  • a series of chemical steps are used to prepare peptides for analysis, and it may be necessary to remove the chemical reagents required for one step, and in some cases wash them way, before adding reagents for the next step.
  • the off-rate of the BINDER is low, with a half-off-time for example longer than 10-15 minutes (as is typical for antibody BINDERS developed for use in SISCAP A)
  • one or more rapid chemical reactions can be carried out before peptide molecules dissociate from the BINDER.
  • BINDERS can be concentrated (e.g., by collecting magnetic beads into a mass or small volume, or using a column format bearing a high density of immobilized BINDERs) during steps of the process, and during these periods any peptide that dissociates from a BINDER is likely to quickly rebind to another BINDER site given the high local BINDER concentration. This kinetic effect effectively prolongs the time available for chemical modification of BINDER- bound peptides.
  • the BINDER is an antibody
  • solution conditions that would denature the antibody e.g., strong detergents such as SDS at high concentrations, extremes of pH, or high temperatures
  • the peptide to be released from the antibody e.g., pH below 3.5 or presence of 2M NH4SCN
  • the BINDERS themselves are modified prior to use in capturing TARGET and STANDARD peptides in order to prevent or diminish their reaction with reagents intended to react with the peptide cargo.
  • some or all of any free amino groups on an antibody BINDER can be blocked, for example by PEGylation using commercial NHS-PEG reagents.
  • modifications may be required to prevent hybridization between the aptamers and sequences being attached or ligated to peptides on a BINDER.
  • the peptide modifications can be carried out while the peptides are bound to a support by a general but less-specific mechanism (e.g., to a reversed-phase support such as C18 particles), or free in solution.
  • a general but less-specific mechanism e.g., to a reversed-phase support such as C18 particles
  • TARGET peptides from the group of tryptic peptides with c-terminal lysine when preparing a construct with polymer additions at both ends (a “Double amino” peptide), or selecting TARGET peptides from the group of tryptic peptides with c-terminal arginine (which peptides will have a single n-terminal amino group available to react with a linker; i.e., a “Single amino” peptide) when an extension on only one end (the n-terminus) of the peptide is desired.
  • a linker i.e., a “Single amino” peptide
  • an advantage of linkage through amino groups is that the peptide subsequently has little or no positive charge (e.g., if it contains no His resides), and at least one (the c-terminal carboxyl) and perhaps more negative charges (if the peptide contains Asp or Glu amino acids).
  • Peptides with a net negative charge have the same charge polarity as nucleic acids (negative, on account of the phosphate groups), facilitating the movement of both types of polymers through a pore using the same polarity electric field.
  • linkage through peptide carboxyl groups can be used (61) but this approach has limited ability to distinguish between c-terminal carboxyls and side chain carboxyls of aspartic and glutamic acids, and thus could present additional constraints on peptide selection (i.e., de-selection of Asp and Glu containing TARGET peptides) or give rise to multiple constructs due to side reactions.
  • TARGET peptides devoid of aspartic and glutamic residues, and hence having a unique carboxyl group at the peptide c- terminus are used with carboxyl coupling chemistries well-known in the art to link peptides through the c-terminus.
  • Linkage through a cysteine sulfhydryl group is frequently favored when a peptide’s sequence can be freely designed - however the occurrence of a cysteine residue at the n-terminus or c-terminus of a proteolytic peptide in a natural protein is infrequent, thus representing a limiting constraint on TARGET peptide selection.
  • Chemistries are known in the art for specific chemical modification of, and/or linkage to, histidine, tyrosine, tryptophan and other amino acids, and these can also be used in the invention.
  • Figure 14 illustrates an embodiment for modification of enriched TARGET and STANDARD molecules (either of these being labeled a “Peptide” in the Figure) by linkage of an n-terminal amino group to a single-stranded oligonucleotide Leader (labeled Oligo 1) and an epsilon-amino group of a c-terminal lysine to a single-stranded oligonucleotide Trailer (labeled Oligo 2).
  • a hybrid molecule can be considered an “in-line” peptide- oligonucleotide construct; i.e., one in which the construct forms a continuous backbone with “side-groups” consisting primarily of bases and amino acid side chains.
  • the linkages are carried out in a series of steps using “click” chemistry, a bio-orthogonal reaction chemistry (25, 62, 63).
  • a peptide’s n- terminal amino group is reacted with azidoacetic anhydride (AAA) under conditions of pH (e.g., pH 5.5 to 7.0, preferably 6.7) that effectively prevent or reduce reaction with a lysine’s tertiary amino group (25, 64), resulting in the introduction of an azide functionality at the peptide’s n-terminus (shown in Step 2 of Figure 14).
  • AAA azidoacetic anhydride
  • this azide group is then allowed to react with an amount of oligonucleotide Oligo 1 (Step 3 in Figure 14), which has been prepared with an aza-dibenzocyclooctyne (ADIBO) functionality on the 3’ end, forming a “click” chemistry linkage of the Oligo 1 to the peptide n-terminus (25) (Step 4 in Figure 14).
  • ADIBO aza-dibenzocyclooctyne
  • the peptide’s c-terminal lysine tertiary amino group is allowed to react with an azidoacetic acid NHS ester (Step 5 in Figure 14), thereby introducing an azide functionality on the peptide’s C-terminal lysine side chain (Step 6 in Figure 15).
  • this azide group is then allowed to react with an amount of oligonucleotide Oligo 2 (Step 7 in Figure 14), which has been prepared with an aza- dibenzocyclooctyne (ADIBO) functionality on its 5’ end, forming a “click” chemistry linkage of the Oligo 2 to the peptide’s C-terminal amino acid (Step 8 in Figure 14).
  • ADIBO aza- dibenzocyclooctyne
  • oligo polarities i.e., 5’ to 3’ or vice versa
  • the Oligos can each be single-stranded, or they can be rendered duplexes by hybridization with complementary sequences over part or all of their length(s).
  • the locations of the “click” reactive pairs can be inverted (e.g., modifying one or both of the peptide ends with ADIBO, and the Oligos with azide).
  • Linkage at the peptide c-terminus can alternatively be introduced through modification of the peptide c-terminal carboxyl group (e.g., a c-terminal arginine, in which case the n-terminal amino group is the only amino group in the peptide) instead of a lysine tertiary amino group.
  • some or all of the steps of the double ligation scheme shown in Figure 15 are carried out with peptides bound to a BINDER on magnetic beads.
  • peptides longer than the groove of a typical BINDER binding site typically 4-8 amino acids
  • BINDER that binds an epitope that does not include either peptide terminus leaves both peptide termini available for reaction as shown.
  • Magnetic beads carrying the BINDER and bound peptides can be exposed stepwise to the sequence of reagents as shown by moving the beads from one reagent solution to the next, with optional wash steps in between as required, or alternatively the beads can be gathered to the side of a vessel using a magnet and the sequence of reagent solutions added and removed, or manipulated in a microfluidic device.
  • Examples include azide and tetrazine functionalities that are capable of specific, bio-orthogonal reactions with alkyne functionalities, some requiring Cu(I) catalysis (which is less preferred in some embodiments), or strain-promoted alkyne cycloaddition (SPAAC) reactions with bicyclononyne (BCN) or cyclooctyne and derivatives such as dibenzocyclooctyne (DIBO) or Aza-dibenzocyclooctyne (ADIBO shown in Figure 14).
  • SPAAC strain-promoted alkyne cycloaddition
  • one or both of the oligos linked to the peptide is part of a multimolecular construct designed to facilitate nanopore sequencing.
  • Figure 15 illustrates a modification of the embodiment of Figure 14 in which the oligo joined to the N-terminal amino group of a peptide (Oligo 1) is part of a duplex having a site at which a “motor protein” can be (or in some embodiments already is) bound.
  • the overall construct shown in Figure 15 provides a leading oligo motor construct capable of controlling movement of the peptide through the pore (providing a stepwise ratchet motion as used, for example, in the commercially-available Oxford Nanopore sequencing platform “Y-adapter”; https://nanoporetech.com/sites/default/files/s3/literature/product-brochure.pdf), followed by a peptide to be sequenced, and a trailing oligo 2.
  • a tether molecule that hybridizes to some sequence in the construct may be used to associate the construct with the membrane in which the nanopore is located, thereby increasing the Leader’s probability of entering the nanopore.
  • alternative chemistries can be used to join the peptide and oligos, such as use of an NHS-functionalized oligo that can react directly with a lysine or n- terminal amino group.
  • these disclosed inventions do not include the use of BINDERS to carry and/or move TARGET peptides during one or more steps of oligo-peptide construct assembly, washing and purification, and do not include STANDARDS or use of BINDERS for enrichment or stoichiometric flattening.
  • a sequential enzymatic approach is used in some embodiments to derivatize peptides with two different added molecules, resulting in an ordered construct.
  • a peptide can be linked to a specific oligo on the n- terminus and a different oligo on the c-terminal lysine reside.
  • This approach is not completely general since it requires a specific order relationship between an arginine immediately preceding the peptide sequence and the peptide’s c-terminal lysine.
  • Such peptides are common but not universal in proteins, and when added to a requirement such as uniqueness in the human proteome, it is probable that only a subset of proteins can be represented by proteotypic peptides having such a structure.
  • Some embodiments therefore make use of the localization of TARGET and STANDARD peptide constructs on BINDERS, which themselves can be bound to particles such as magnetic beads, to transport the peptides (e.g., as peptides, or as part of peptide-oligo constructs as disclosed above) into close proximity to the site of detection (e.g, a nanopore, a surface on which they may be imaged, or a well in which they may be subjected to degradative sequencing).
  • the site of detection e.g, a nanopore, a surface on which they may be imaged, or a well in which they may be subjected to degradative sequencing.
  • the BINDERS are attached to spherical magnetic particles, which can be gathered together into a compact mass by magnetic forces. In such a mass of spherical particles, the particles occupy about 74% of the total volume. Thus elution of constructs from binders on beads in such a compact mass releases the constructs into about 26% of the volume of the mass, and given the sub-microliter volume of this mass in many practical embodiments, the constructs can be present in volumes of interstitial liquid in the range of tens to hundreds of nanoliters.
  • the immobilization of TARGET peptides, STANDARDS and their derivatives and constructs by non-covalent binding to BINDERS that are immobilized on supports such as magnetic beads provide improved methods for delivery of these molecules to sequencing machinery.
  • magnetic beads carrying BINDERS and their peptide construct cargoes are added directly to the cis chamber of a nanopore sequencing device, where the beads sink under the influence of gravity and come to lie on the membrane in which the nanopore is located. As described in US 2021/0147904, this presentation of sequenceable polymers adjacent to the membrane improves capture by the pore by orders of magnitude.
  • the salt solution of the cis compartment acts to slowly release the peptides from the BINDERS, for example with an off-rate equivalent to elution over a period of 15 to 180 minutes.
  • Eluted peptide constructs are then captured by the membrane through a hydrophobic (e.g., cholesterol) tether as described in the commercial Oxford Nanopore device.
  • the peptide constructs diffuse in 2-dimensions on the membrane and are efficiently presented to the pore for capture and threading into the pore for sequencing.
  • the BINDERS are attached to magnetic beads (or other particles) by a cleavable link such as a disulfide-containing linker. Exposure to a disulfide reducer (e.g., TCEP or mercaptoethanol) can thus release the BINDERS and their cargo from the beads into solution.
  • a disulfide reducer e.g., TCEP or mercaptoethanol
  • the BINDER can be captured by the membrane and then diffuse in 2-dimensions, eventually bringing bound peptide constructs into close proximity to the nanopore.
  • the BINDERS are released free into solution and the peptide constructs are not eluted from the BINDERS under the conditions of the cis chamber (e.g., 0.4M KC1), so that peptide constructs, still bound by BINDERS, are captured by and threaded into the nanopore.
  • the force with which the electric field acts on the construct to pull it through the nanopore causes the construct to be pulled free of the BINDER while the speed of this motion is regulated by a DNA motor acting on the construct’s oligonucleotide component.
  • tethered versions of the BINDERS are added into the cis chamber and allowed to contact and be retained by the membrane, forming a dense surface of BINDER binding sites on the membrane within which the nanopore lies.
  • This surface comprising a dense plane of binding sites for TARGET and STANDARD peptide molecules, is able to capture these molecules from the contents of the cis compartment and thus provide an increased local concentration of constructs in the plane of the pore, and with the ability to diffuse in the membrane plane so as to deliver constructs for threading into the pore.
  • peptide:oligonucleotide constructs can be propelled back and forth though a nanopore by reversal of the transmembrane electric potential (a process described as “flossing”) to repeatedly read and re-read a sequence to provide greater accuracy (65).
  • this flossing approach is used to read selected peptides multiple times under computer control. This approach is particularly useful in confirming the sequences of peptides present at low abundance; i.e., when few copies of the peptide have been encountered and the potential for quantitative error is large.
  • the nanopore control system can act on the observation of low frequency in real-time and implement a multi-read flossing protocol to verify the identity as a rare sequence. Achieving certain identification of a low frequency peptide sequence is more consequential than for high frequency peptides. 7.5.8 Inclusion of separative steps in addition to specific affinity capture and enrichment in a workflow.
  • proteolytic peptides in a sample digest are captured on a solid support (e.g., C18-coated magnetic beads), thus allowing non-peptide sample components to be removed, allowing the immobilized peptides to be reacted with chemical reagents (e.g., for click chemistry derivatization as described below), and excess reagents to be removed from the peptides prior to their elution (e.g., using 50% acetonitrile) for use in subsequent steps.
  • chemical reagents e.g., for click chemistry derivatization as described below
  • a conventional magnetic bead-based DNA “cleanup” e.g., using commercially-available Ampure beads (Beckman Coulter)
  • a conventional magnetic bead-based DNA “cleanup” can be carried out after assembly of oligo:peptide constructs generated according to the invention in order to remove any excess reagents, short oligos, un-linked peptides, and/or enzymes prior to delivery of peptide constructs to a single molecule sequencer (e.g., a nanopore).
  • a single molecule sequencer e.g., a nanopore
  • Modern laboratory robotics provides means to automate workflows involving multiple such affinity separative steps as well as precise liquid additions.
  • a TARGET or STANDARD peptide is linked to a polymer at one or more sites (e.g., at one or both ends) by stable linkages (e.g., by covalent bonds or very stable non-covalent bonds).
  • these linkages are made between specific sites at, or near, the peptide’s terminus (or termini) and one or more polymer molecules (e.g., nucleic acids including DNA or RNA, chemical variants of these including phosphorothioate backbones, “locked” nucleic acids (“LNA”), peptides including polyglutamic or polyaspartic acids, and the like).
  • an object of the invention is to cause a peptide to pass through a nanopore in an extended conformation, allowing the sequence of amino acids to be “read” by measurement of current flowing through the pore or other equivalent means.
  • peptides of interest may be immobilized and subjected to a series of binding interactions or covalent modifications (e.g., stepwise degradation), and in such embodiments the linkage of peptides to a surface by a single unique site (e.g., a unique amino group such as an n-terminal amino group) may be preferred. 7.6.1 Peptide selection for pore signature and ligation properties.
  • Some embodiments make use of monitor peptides selected from among a target protein’s proteolytic fragments based on features including A) their ability to generate a distinct sequence or fingerprint in a single-molecule sequence-sensitive detector (e.g., a “squiggle” or ion current signature over time in a nanopore sequencing system) compared to other peptides (particularly other peptides that may be used with the selected peptide in a multiplex panel assay); B) their content of reactive groups (e.g., primary (amino terminal) and tertiary (lysine) amines, primary (carboxy terminal) and side chain (aspartic and glutamic acid) carboxyl groups, cysteine sulfhydryls, etc.) of potential use in labeling or ligation reactions; C) their uniqueness to a target protein (i.e., they are typically required to be “proteotypic” so as to act as a surrogate of the pre-specified target protein exclusively); and other
  • Proteolysis using trypsin is common in peptide analysis for several reasons including low cost of the enzyme and its ability to generate peptides having a positive charge at the c-terminus (a useful feature in mass spectrometric analysis).
  • selection from a tryptic digest of TARGET peptides with a c-terminal lysine ensures the presence of amino groups at both peptide termini, while selection of c-terminal arginine peptides ensures the presence of only one amino group (at the n-terminus).
  • alternative enzymes can be used as well.
  • LysC cuts only at lysines (not arginine) and therefore on average generates longer peptides than trypsin, and each of these (apart from a protein’s c-terminal peptide) will have a lysine (and its tertiary amino group) at the c-terminus (features that can be exploited for linking purposes in the invention).
  • TARGET peptide selection criteria in the present invention are therefore significantly different from selection criteria for mass spectrometry.
  • Some embodiments apply a strategy to connect polymeric molecules (e.g., nucleic acids, polypeptides, and other polymers) to either or both ends of a peptide to form a heteropolymer construct, including ligation of a polymer to the n-terminal amino group and another (or another molecule of the same) polymer to a site near the c-terminus (e.g., through the c-terminal carboxyl group or to the tertiary amino group of a c-terminal lysine residue).
  • polymeric molecules e.g., nucleic acids, polypeptides, and other polymers
  • One such ligated polymer (the Leader, which may be highly charged to facilitate movement by an electrophoretic process) can be used to initiate threading of the construct into a sequencing nanopore and another (the Trailer, at the opposite end of the peptide) can be used to assist in ratcheting the peptide through the ion current sensing region of a nanopore.
  • the Leader which may be highly charged to facilitate movement by an electrophoretic process
  • the Trailer at the opposite end of the peptide
  • Intervening polymer segments can comprise sequences that identify samples or other data associated with the peptides.
  • Concatamers can be assembled using means including “click” chemistry or DNA or peptide ligases, and can be comprised of any mixture of peptides, oligonucleotides, or other polymers.
  • Some embodiments make use of a continuous polymer backbone to which a peptide can be linked through a single site and “dragged” through a nanopore by movement of the backbone, which may include single or double-stranded regions that interact with DNA motors controlling this movement in a ratchet-like manner.
  • the invention makes use of real-time peptide sequence evaluation and counting (e.g., as provided in commercial DNA nanopore sequencers) to update counts of individual TARGET peptides and their cognate STANDARD peptides, and thereby estimate the precision with which each has been measured up to this point based on counting statistics and other statistical methods.
  • the availability of updated precision estimates allows the analytical device (e.g., a sequence-sensitive single molecule detector such as a nanopore sequencer) to terminate an analytical run when pre-defined precision criteria are met (e.g., when the variance of one or more specific peptide counts, estimated for example as the square root of the number of peptides counted, divided by the number of counts is less than a target such as 5%).
  • Precision targets can be different for different TARGET peptides.
  • a minimum number of counts can be specified for a given TARGET peptide + STANDARD pair, such that the lower abundance of the pair must achieve this minimum number to satisfy a precision target for the ratio between the two (which represents the assay’s quantitative result).
  • the invention also makes use of real-time peptide sequence evaluation and counting (e.g., as provided in commercial DNA nanopore sequencers) to stop sequencing of peptides whose precision targets have already been met and eject these molecules from a sequencing pore in order to allow entry of a different molecule that may contain peptides whose precision targets remain unmet.
  • the analytical system is thereby able to focus its attention on peptides in need of more counts (e.g., low abundance peptides) at the expense of peptides whose count targets have already been met (e.g., higher abundance peptides). This increase in efficiency increases throughput and lowers cost of peptide counting.
  • a peptide is inserted into a polymer chain by reacting two chemical groups at or near the peptide termini with adjacent or nearby residues of a polymer such as a nucleic acid, followed by cleavage of the polymer between these residues.
  • a peptide 71 having reactive groups (here amino groups in a double amino peptide) on both ends is activated by addition of groups 72 (for example BCN click groups, indicated in the Figure as X, that can be added to the peptide’s amino groups through reaction with an NHS derivative of BCN) in Figure 16B.
  • the resulting activated peptide is reacted with oligo 62 (Figure 16C) having complementary click groups 74 (e.g., azide, indicated in the Figure as Y) attached to adjacent bases (here shown as a dinucleotide TT).
  • T was selected as the attachment base because of the commercial availability of synthetic oligonucleotides having an azide attached to one or more T residues (e.g., from Integrated DNA Technologies, Inc.); however alternative attachment means involving other bases or non-base components capable of being incorporated into synthetic oligos can be used.
  • the two attachment sites can be adjacent bases as shown in Figure 16, or they can be separated by one or a few intervening bases (e.g., as shown in Figure 17).
  • the result upon reaction between X and Y groups to form covalent linkages 75 is a peptide loop conjugate shown in Figure 16D.
  • the linkage sites on the oligo 62 are designed such that the surrounding sequence (when hybridized to the complementary strand 65) is recognized by a restriction endonuclease (e.g., Pad) capable of cutting the oligonucleotide backbone between the two bases to which the peptide is linked (the enzyme Pad cuts between the second pair of T residues 76 in the sequence TTAATTAA, as shown in Figure 16E).
  • a restriction endonuclease e.g., Pad
  • the enzyme Pad cuts between the second pair of T residues 76 in the sequence TTAATTAA, as shown in Figure 16E.
  • the result is a nanopore-sequenceable construct 77 comprising the peptide with leading and trailing oligo segments, either or both of which can comprise oligo sequence tags identifying the peptide as either a STANDARD or TARGET.
  • peptides may be inserted in both orientations, i.e., n-term first (shown as 77 in Figure 16F), or c-term first (shown as 78 in Figure 16G).
  • Suitable data analysis algorithms are constructed to recognize a specific peptide’s squiggles in either orientation. Reaction of the activated peptide with the oligo bearing linkage sites can be carried out before or after the oligo is hybridized with the complementary strand to allow site- specific cleavage between the adjacent peptide-linked bases.
  • Linkage of the peptide to oligo can be carried out after peptides are enriched by BINDERS (provided that STANDARD peptides are identified by a structural difference from cognate TARGET peptides, which difference can be sensed by a nanopore), or before BINDER enrichment (in which case the oligo sequences 62 or 63 can be used as tags indicating whether a peptide is a STANDARD or a TARGET, and the BINDER is used to enrich assembled peptide-oligo constructs).
  • the loop method can provide a simpler method of inserting a peptide into an oligo sequence than other methods requiring stepwise linkage of linear peptide and oligo molecules.
  • linearization of a peptide loop construct is accomplished using chemical rather than enzymatic means.
  • Figure 17 shows an example similar to that of Figure 16, with the difference that the nucleic acid bases providing oligo attachment sites 74 (here shown as T residues) are separated by an intervening chemical structure (shown here as residue labeled “Z”).
  • this intervening structure is a base linked to either or both adjacent T’s by a phosphorothioate linkage.
  • the backbone of oligo 62 can be cleaved at the Z position using chemical means (e.g., using iodine, aqueous silver nitrate or mercuric chloride (66) or chloride assisted by myeloperoxidase (67)).
  • the intervening structure comprises a photocleavable spacer or linker (e.g., the linker designated /iSpPC/ available from Integrated DNA Technologies, or linker 26-6888 available from GeneLink) that can be cleaved by exposure to near UV light (e.g., 300-350nm wavelength).
  • Photochemical cleavage provides an extremely efficient way to effect the linearization of a peptide:oligo loop construct.
  • a variety of specific chemical and enzymatic cleavage mechanisms can be used to selectively cleave an oligo VEHICLE after insertion of a peptide loop to yield a linear oligo:peptide:oligo construct.
  • the spacing between two peptide attachment sites 74 can be as short as no intervening bases, or as long as 2, 3, 4, 5, 6 or up to 10, 20 or 30 bases.
  • An advantage of a short distance between sites 74 is the increased rate of formation of the second linkage after the first has formed: formation of the first linkage transforms the reaction from a bi-molecular (diffusion-limited) reaction between the peptide and oligo into a uni-molecular reaction that is likely to be more rapid, thereby increasing the probability that both nearby linkage sites are connected to opposite ends of the same peptide molecule (rather than two separate peptide molecules). 7.6. 7.3 Loop assembly without a disruptible linkage
  • a peptide loop construct is assembled by reacting a peptide having reactive groups on both ends (e.g., Figure 16B) with a double-stranded oligo similar to that shown in Figure 16C except that the adjacent modified T bases comprising reactive groups 74 (indicated as “Y” groups) are not linked by sugar-phosphate bond (i.e., the DNA backbone is interrupted between them) causing oligos 62 and 63 to be separate molecules.
  • Oligos 62 and 63 are held in place by their hybridization with the complementary continuous oligo 64 + 65, thereby holding the reactive sites 74 in close proximity, where they react preferentially with the activated groups 74 on either end of a peptide molecule, creating a continuous oligo- peptide-oligo construct amenable to single molecule detection using nanopores or other methods.
  • a variety of alternative chemical linkages can be used to connect the peptide and oligos, including click chemical linkages as described elsewhere herein.
  • the appropriate corresponding reactive groups can be placed on oligos 62 and 63 to enable peptide insertion with defined polarity (e.g., peptide inserted n-term to c-term in a 5’ to 3’ oligo VEHICLE).
  • the peptide n-term amino and c-term lysine epsilon amino groups can be labeled with BCN and TCO respectively, and the 5’ and 3’ T residues of the adjacent pair labeled respectively with azide and tetrazine.
  • the peptide n-term reacts with the 5’ T (BCN + azide) and the peptide c-term reacts with the 3’T (TCO + tetrazine) creating a construct with all oligo and peptide components in a unique defined order.
  • oligonucleotide sequences into which a peptide is inserted comprise encoded information (“tags”) identifying a peptide as a TARGET or a STANDARD (the STANDARD constructs being assembled in a separate process from the sample-derived TARGET constructs).
  • tags encoded information
  • identifying a peptide as a TARGET or a STANDARD the STANDARD constructs being assembled in a separate process from the sample-derived TARGET constructs.
  • unique 16-base tags are provided, which can be incorporated either 5’ to the peptide attachment site (62) or 3’ to it (63), or both (as shown).
  • the tags must be long enough to provide reliable identification when read by a nanopore or other single molecule reader - the examples of Figures 16, 17 show 16- base tags on both 5’ and 3’ ends, while the examples of Figures 18, 19 and 20 show the use of 8-base tags on the 3’ end to distinguish TARGET and STANDARD peptides ( Figure 18). If the accuracy of sequence recognition is very high, a short tag sequence of a few bases can suffice; however, for increased accuracy and reliability, a tag of at least 4, or 5, 6, 7, 8, or more bases can be used.
  • oligonucleotide sequence identifying a peptide as TARGET or STANDARD in the oligo on the 3’ side of the peptide (i.e., the portion of the oligo that follows the peptide through the nanopore) since the oligo component on the 5’ side may not be read effectively when the adjacent peptide slips through the DNA motor without ratcheting).
  • Figure 18 shows successive steps in the assembly of TARGET constructs ( Figure 18 A, B, C and D), and cognate STANDARDS ( Figure 18 E, F, G, and H), using two different 8-base oligo tag sequences 63 and 67 to distinguish them (in this case the 5’ oligo 62 is the same for TARGET and STANDARD constructs, as it is unlikely to be readable during 5’-to- 3’ transit through a nanopore.
  • the TARGET and STANDARD peptide molecules are distinguished by oligo tags 63 and 67, joined with the respective peptides in separate processes (the STANDARD constructs being assembled separately from sample digest preparation), the TARGET and STANDARD peptides are of identical chemical structure, thereby ensuring their equivalent binding by the cognate BINDER.
  • peptide:oligo constructs of small size and then join these to larger oligos to provide sufficient oligo length upstream (i.e., 3’-wards in the case of a 5’-to-3’ nanopore reading system) to allow the pore to “read” a significant length of the peptide.
  • oligo tag to distinguish TARGET and STANDARD peptides
  • the oligo CCTGAACCTZTTATCCAGT has a molecular weight of approximately 5,700, and the complete oligo:peptide construct including the peptide ESDTSYVSLK is 6,800 dalton.
  • a construct of this size diffuses significantly faster than a construct comprising an oligo as shown in Figure 16 (12,700 daltons of oligo and approximately 1,100 daltons of peptide, for a total of 13,800 daltons).
  • Figure 19 shows an embodiment to achieve this extension in which oligo:peptide:oligo constructs 91 (a TARGET with oligo tag 63 shown in Figure 19A) and 92 (a STANDARD with oligo tag 67 shown in Figure 19E) prepared according to the steps of Figure 18 are annealed with complementary strands 93 and 94 respectively ( Figure 19B and
  • peptide:oligo constructs are enriched by capture on specific BINDERS, and the kinetics of such capture can be improved by using relatively small constructs capable of rapid diffusion, and having less propensity to form large aggregates.
  • constructs 91 (TARGET) and 92 (STANDARD) are captured by interaction of the peptide components with BINDERS 98 (which may be sequence-specific anti -peptide antibodies, for example) attached to magnetic beads 51.
  • BINDERS 98 which may be sequence-specific anti -peptide antibodies, for example
  • the capture reaction can be carried out either before ( Figure 20A) or after ( Figure 20B) cleavage of the oligo to “linearize” a peptide loop insertion construct ( Figures 16, 17 and 18).
  • the capture can be carried out with the sequenceable construct alone or hybridized to a complementary oligo.
  • Those with skill in the art will understand that capture of smaller constructs is likely to proceed more quickly, with lowered potential for non-specific interactions.
  • extension of peptide:oligo constructs (as in Figure 19) and/or addition of complementary strands is carried out after the BINDER affinity enrichment step in order to take advantage of capturing smaller constructs. Nevertheless, extension of peptide:oligo constructs (as in Figure 19) and/or addition of complementary strands can also be carried out prior to the BINDER enrichment step.
  • the lengths of oligo segments in the constructs are optimized to minimize potential for failure to generate effective loop constructs.
  • Figure 21 A shows an effective assembly in which both ends of the peptide connect to nearby sites on the oligo (following the processes shown in Figures 16, 17 and 18) generating a small construct capable of hybridizing with a complementary strand ( Figure 2 ID) and thereafter forming a full-length sequenceable construct (e.g., through the extension steps shown in Figure 19).
  • Figures 21B and C show cases in which only one end of a peptide successfully links to a site on the oligo, and in these cases the resulting constructs contain only a 5’ or a 3’ oligo segment, or in which two different peptide molecules react with nearby oligo linkage sites (Figure 21D).
  • Figure 21E, F and G have low melting temperatures (e.g., below 20C, as in the examples shown), and therefore are unlikely to form in appreciable quantities at room temperature and above, compared to the construct in Figure 2 IE which is much more stable and thus capable of participating efficiently in the extension steps of Figure 19.
  • the polymers attached to the peptides can fulfill several important functions in the process of sequence-sensitive peptide detection according to the invention, as exemplified in Figure 22, including: A) providing a highly charged “guide thread” that rapidly threads through a sequencing nanopore (ahead of the peptide) as a result of a voltage potential across the membrane in which a nanopore is embedded; B) work together with a protein nanomachine (e.g., a DNA motor such as a helicase) adjacent to the sequencing pore to provide a molecular “ratchet” that moves the peptide (or allows it to move under the influence of a cross-membrane electric potential) through the pore in discrete steps at a controlled rate (i.e., slow enough to allow accurate measurements of ion current through the pore for each step, despite being stochastic in nature); and C) provide a pore-sequenceable “sequence tag” or “barcode” whose nucleotide (or amino)
  • a sequence tag identifies a construct containing a TARGET peptide (and/or its associated STANDARD) as coming from a specific sample among a multiplicity of samples whose enriched peptides have been pooled (“multiplexed”) together after addition of sample-specific tags and prior to sequencing together in one sequencing run.
  • a sample “barcode” can be included in a conventional sequencing adapter or provided as a separate polymer module to be linked with all of the constructs derived from a specific sample, as is commonly done using commercially available kits (for example https://store.nanoporetech.com/us/native-barcoding-expansion-l-12.html).
  • a tag for identification of a sample in a pool is well-known in the art as a means for multiplexing two or more samples to be combined in a pool for DNA sequencing: the tag allows sequences of the molecules from each sample to be separated after bulk sequencing. Sets of such tags are commercially available for the Oxford Nanopore system. Additional characteristics that may be encoded in the DNA Leader or Trailer tag include identification as a member of a specific TARGET or STANDARD peptide subset in cases in which subsets of TARGET peptides are separately extracted from a sample digest.
  • Functions A and B above can potentially be fulfilled by a homopolymer (e.g., a DNA homopolymer of one of the 4 bases, or a peptide homopolymer of glutamic or aspartic acid), while function C requires a polymer (like DNA or peptide) made of multiple different monomers that can be distinguished by a nanopore sequencing system.
  • sample barcodes are used together with TARGET vs STANDARD differentiating barcode tags (the former required in principle only once per sequenceable construct molecule, while the latter are required in association with each peptide molecule in a construct to identify its source). Barcodes provide an efficient way to record and recover information about the nature and source of individual peptide molecules in the invention, and thus exploit an advantageous feature of technologies such as nanopores that are capable of “reading” both oligonucleotide and polypeptide polymers.
  • Leader polymer 1 (labeled Oligo 1), linked to the n-terminal residue of peptide 3, threads through nanopore 5 in membrane 4 (shown in cross-section) as a result of its negative charge (i.e., the oligo’ s phosphate backbone) and the application of an electric potential between the “cis” side 9 of the membrane 4 and the “trans” side 10, in this case with the “trans” side positive.
  • the peptide 3 follows Leader 1 into the pore.
  • a second oligo 2 (Oligo 2), attached to the c-terminal residue of the peptide 3, is part of a complex governing the movement of the peptide through the pore.
  • this complex includes a protein nanomachine 7 (e.g., a DNA motor) interacting with oligo 2, another oligo 6 with a sequence partially complementary to oligo 2, and another oligo 8 forming a “tether” promoting an association between the membrane 4 and the construct.
  • a protein nanomachine 7 e.g., a DNA motor
  • oligo 2 e.g., a DNA motor
  • oligo 6 e.g., a DNA motor
  • another oligo 8 forming a “tether” promoting an association between the membrane 4 and the construct.
  • Tethering a construct close to the membrane e.g., by providing a cholesterol functionality that adheres to the membrane while allowing free diffusion of the complex in the plane of the membrane while seeking an open pore
  • Tethering a construct close to the membrane is known to increase the rate at which construct molecules thread into the nanopore by more than 1,000-fold.
  • the current passing through nanopore 5 in membrane 4 between the cis chamber 9 and the trans chamber 10 is measured (typically in picoamps given an electrolyte concentration in chambers 9 and 10 of 100-500mM salt).
  • This current changes as amino acids or nucleic acid bases transit the narrow throat of nanopore 5, with the speed of transit regulated by nanomachine 7.
  • a variety of DNA motor nanomachines 7 are known in the art and can be used, including helicases, polymerases, etc.
  • the nanopore is protein MspA or a derivative thereof, and nanomachine 7 is a processive enzyme motor protein such as a helicase, capable of regulating the passage of DNA through the nanopore 5.
  • the motor is pre-positioned on adapter Oligo 2, where specialized bases prevent it from contacting the rest of the DNA before loading into the nanopore. This scheme is commercially available (Oxford Nanopore Technologies).
  • the DNA motor that engages with the oligo and regulates its passage through the nanopore is offset from the region of the nanopore (the “throat”) where the bases and/or amino acids modulate the through-pore current (i.e., where the polymer is read”).
  • This offset provides the means by which peptides can be sequenced, by allowing the motor to engage oligo bases “above” the peptide while the peptide is in the throat begin read.
  • regions “below” the peptide by the offset distance cannot typically be read, but instead move rapidly through the nanopore until bases are once again engaged by the DNA motor.
  • Construct 81 (a typical rope-tow construct) produces sequence information when moving through a nanopore whose offset between motor and throat 82 is approximately 8 bases.
  • sequence is obtained in this example for regions of the construct about 8 bases 5’- ward of each base in the oligonucleotide sections 89, but not for the non-base-containing (e.g., abasic) sections 87.
  • non-base-containing (e.g., abasic) sections 87 can be read, as can peptide section 86, but region 83 produces no sequence information due to lack of DNA motor engagement.
  • this effect limits the length of peptide sequence that is observable using nanopore sequencing.
  • this offset limits the readable portion of a peptide linked near the 5’ end of an abasic stretch to a fixed length of amino acid chain measured from the 3’ end of the abasic stretch (i.e., the beginning of the oligo sequence that can engage the DNA motor).
  • peptides having a constant length i.e., number of amino acids
  • a section of peptide sequence of 8-12 amino acids should be readable using current nanopore systems (in future extendable to greater lengths by providing longer pores), and that the length of an abasic region parallel to a linked peptide molecule can be optimized so as to match (and possibly slightly exceed) the linear extended length of a peptide of 8, or 9, 10, 11 or 12 amino acids.
  • This range of peptide lengths occurs very commonly in tryptic peptides of human and other proteins, indicating the likelihood that a proteotypic peptide of pre-specified length can be found for almost all target proteins.
  • Nanopores such as aerolysin, a-hemolysin, fragaceatoxin C (FraC), MspA can be used.
  • Motors working on DNA e.g., Oligo 2 as shown
  • Oligo 2 are known that can control movement of the construct towards the positively polarized trans side by “paying out” the oligo in steps while consuming chemical energy (e.g., ATP in the Oxford Nanopore system), as are motors such as phi29 DNAP (23) or helicase Hel308 (65) that “pull” an Oligo up against the electrophoretic force using ATP.
  • Non-enzymatic methods can also be employed, such as “unzipping” of a DNA duplex under the influence of an electrophoretic force, and variation of the cross-membrane potential to regulate transit speed (68).
  • Peptides can take the place of nucleic acids (e.g, replacing Oligo 2), in which case the motor function during readout as a construct transits a nanopore can be implemented using an “unfoldase” (69), or using ClpX on the “trans” side of the membrane.
  • Nanopore Transistor An alternative nanopore technology using “Field Effect Nanopore Transistor” has been described in US 9,341,592 B2 assigned to iNanoBio.
  • a variety of non-biological nanopore technologies have been disclosed, many using semiconductor technology to create thin inorganic membranes (e.g, of Si3N4, SiO2, graphene, or MoS2) with small holes that function as nanopores, and in some cases enabling use of quantum mechanical tunneling currents across the lumen of the hole in addition to measurements of current through the hole as signals indicative of the transiting polymer sequence.
  • the construct moves through the pore generating sequence information (as the timeline of pore current) from the peptide, and optionally Trailer Oligo 2 (e.g., containing a sequence tag), until the end of the construct is reached, at which point the construct is released by protein nanomachine 7 and the construct completes its movement through the pore and floats free into “trans” compartment 10. At this point another construct molecule can enter the pore from the “cis” side for sequencing, repeating the cycle.
  • sequence information as the timeline of pore current
  • Trailer Oligo 2 e.g., containing a sequence tag
  • sequence-related information can only be obtained from about 10-25 amino acids of the peptide 3 nearest to the linkage to the Trailer oligo 2 (i.e., at the c-terminal end of peptide 3 in Figure 22A).
  • a second set of constructs is generated similar to those described above, but in which the peptide is linked in the opposite orientation by swapping the linkages to the two oligos in Figure 15.
  • Such constructs Figure 22C and D) allow sensing of sequence-related information from about 10-25 amino acids nearest to the n-terminal end of peptide 3. Combining results from the two reads, analogous to sequencing complementary strands of DNA, provides greater coverage of the peptide sequence as well as overlapping reads of the middle region of short peptides.
  • means known in the art are applied to extend the length of a nanopore (and lengthen Oligo 2 if necessary) so as to be able to read longer amino acid sequences. These include stacking a spacer protein above the entrance to a nanopore as described, for example in WO 2021/111125, or construction of multi-component stacks (70).
  • a preferred alternative to the forgoing elaborate chemistry required to prepare “in-line” peptide-polymer constructs is implemented by linking peptides by only one end to a continuous polymer VEHICLE (e.g., an oligonucleotide) having a plurality of available linkage sites, thus forming a long continuous polymer chain with multiple peptide branches.
  • VEHICLE e.g., an oligonucleotide
  • This VEHICLE construct is reminiscent of a rope-tow ski lift in which skiers can grasp any of a multiplicity of handles on a continuously moving rope to be pulled uphill.
  • the continuous polymer serves as the rope, and peptides are attached to the rope via linkers.
  • the motion of such a hybrid molecule traversing a nanopore can be regulated (e.g., to produce the desired ratchet motion) by interaction of the continuous polymer chain with a suitable motor (e.g., a continuous oligonucleotide interacting with a DNA motor).
  • a suitable motor e.g., a continuous oligonucleotide interacting with a DNA motor.
  • Short complementary oligos can be hybridized to the long continuous oligo as necessary to facilitate its interaction with a motor such as a DNA helicase.
  • the attached peptides have a net positive or zero charge (i.e., experience an electrical force opposite to that of the polymer, or else no force, under the influence of the electric potential across the membrane) so that the polymer pulls them through the pore by the peptide end attached to the polymer, thus reducing the chance that the peptide’s free end will move forward and “bunch up” in the pore, potentially clogging it.
  • the region of polymer lying alongside a peptide is “featureless” allowing the peptide sequence signal to be recognized with minimal interference from the polymer background.
  • This featurelessness can be achieved through the use of a homopolymer stretch alongside the peptide, for example an “abasic” stretch completely devoid of bases in the case of an oligonucleotide polymer.
  • the unit-spacing of the extended form oligo backbone i.e., base pair spacing
  • a rope-tow VEHICLE can be created comprising i) an abasic stretch, ii) a reactive linker group within or adjacent to an abasic stretch and which is capable of making a covalent linkage to a peptide end, and iii) a stretch adjacent to the abasic stretch that is capable of engaging with a DNA motor (i.e., a stretch comprising bases, either single or double-stranded) to regulate movement of the oligo through a nanopore.
  • a DNA motor i.e., a stretch comprising bases, either single or double-stranded
  • rope-tow constructs are formed so as to be capable of assembling into longer concatamers, through enzymatic ligation (e.g., via DNA ligase), transposase recombination, CRISPR insertion, or chemical coupling (e.g., using click chemistry).
  • a rope-tow construct has the advantage that it requires only a single linkage to a peptide, usually at either the peptide’s n- or c-terminus, instead of two sites as required when a peptide is inserted into an oligo ‘in-line” with the oligo backbone (the arrangement described in the prior art).
  • the oligo backbone through the abasic stretch provides a uniformly distributed negative charge (e.g., the canonical sugar- phosphate backbone of DNA) that largely masks the net charge and charge distribution of an attached peptide, and allows the electric potential applied between cis and trans sides of a nanopore to exert a near-uniform force on the construct irrespective of the peptide’s composition.
  • a uniformly distributed negative charge e.g., the canonical sugar- phosphate backbone of DNA
  • the abasic stretch has a length equal to or greater than the length of selected TARGET and STANDARDS peptides that can be linked to the linkage site, such that a peptide in an extended configuration can lie “alongside”, and parallel to, the extended oligo backbone without overlapping nucleic acid bases (i.e., such that the cross-sectional area of the linked peptide:oligo construct along the peptide is that of the peptide plus the backbone).
  • one or more additional dsDNA segments are included in the oligo that contain sequence information useful to identify and establish registration of ionic signatures in a nanopore, to identify a construct as associated with a particular sample in a pool, or analyte in a panel (e.g., by identifying specific STANDARDS), or for quality control.
  • the oligo comprises one or more regions to which a DNA motor can bind, or where one can be pre-loaded (which regions may not be limited to natural DNA bases, abasic structures, or to a conventional sugar: phosphate backbone).
  • the oligo comprises regions of DNA or RNA that can interact with a DNA motor to regulate passage of the oligo through a nanopore in a ratchet motion.
  • multiple peptides are linked to multiple abasic sites on a prepared rope-tow oligo to form a peptide: oligo concatamer.
  • such an extended rope-tow “template” oligo VEHICLE having a plurality of empty linkage sites and abasic stretches, is present in the cis compartment of a nanopore and reacts with TARGET and STANDARD peptides (introduced into the compartment in solution, or bound to BINDER from which they dissociate under conditions prevailing in the compartment, or on other solid supports) to form rope-tow peptide:oligo constructs in the vicinity of a nanopore.
  • an empty extended “template” rope-tow oligo VEHICLE is bound to the nanopore’s membrane by a tether.
  • the empty rope-tow oligo continues to react with and accumulate attached peptides after the commencement of a sequencing run.
  • a rope-tow oligo VEHICLE allows peptides functionalized with one member of a “click” reagent pair to react directly with a “click” site (comprising the other member of a “click” pair) on a pre-synthesized VEHICLE, which can be of any convenient length, and contain any number of abasic stretches, thus creating a continuous polymer capable of threading and being read continuously by a nanopore having a processive DNA motor.
  • the peptides are attached to the VEHICLE backbone in an orientation such that they are dragged through the pore alongside an abasic stretch of sugar-phosphate backbone.
  • the nanopore current trace describing the peptide sequence thus reflects the combined areas and chemical properties of backbone and amino acids passing through the reading region (i.e., the “throat” of the nanopore) as the two parallel polymer segments are ratcheted through the pore by the interaction of a DNA motor at the entrance to the pore with an oligo region following the abasic stretch.
  • oligonucleotide sequences can be implemented with alternative oligonucleotide sequences, backbone chemistries (e.g., PNA, etc.), base-free regions (e.g., positions with side groups smaller than normal bases or abasic) of various lengths designed to accommodate peptides of various lengths, and chemical connecting groups (e.g., various click combinations, amino-reactive groups, etc.).
  • tryptic peptides ending in arginine are preferred, as these have a single amino group (the n-terminal amine), which conveniently provides a single specific site for attachment of a click group for facile linkage to a “rope-tow” oligo.
  • tryptic peptides ending in lysine are connected by one of the peptide’s two amino groups following blockage on one amino group (e.g., blockage of the n-terminal amino group by reaction at near-neutral pH), thus allowing a peptide to be coupled in the opposite orientation (c-term-first into the pore) compared to linkage via the n-terminal group, and providing the ability to “read” peptide sequences in both directions.
  • aspartic and glutamic acid-free peptides are linked to the oligo via the unique carboxyl group at the peptide’s c-terminus.
  • peptides are attached to the oligo via a site near the 5’ end of an extended abasic region so that as the oligo is drawn 5 ’-first through a nanopore, the peptide (whether n-terminal first, via an n-terminal linkage, or c-terminal first via a c-terminal linkage) is pulled through the pore lying alongside the oligo backbone.
  • a method of concatenating peptides for nanopore sequencing in which a continuous polymer is prepared comprising single-stranded oligonucleotide segments 23 with stretches of abasic sites 21 (positions in which there is no base attached to the continuous sugar-phosphate backbone) and through which the chain therefore has a diminished cross-sectional area on account of the lack of bases.
  • the length of the abasic stretch is designed to be longer than any of the peptides to be linked to it (including the length of any chemical linkers).
  • a nucleic acid residue 22 preceding (e.g., 5’ to) and adjacent to one or more abasic regions 21 is provided (during synthesis or through subsequent modification) that includes a reactive chemical linking group (e.g., one of a pair of click-chemistry groups, or a reactive group such as an amino group where a click functional group can be installed), capable of combining with the other of the pair of click groups that is attached to one terminus (e.g., the n-terminus) of a peptide molecule 3.
  • a reactive chemical linking group e.g., one of a pair of click-chemistry groups, or a reactive group such as an amino group where a click functional group can be installed
  • abasic stretches provides a length of backbone devoid of bases, and whose diminished cross-sectional area allow a peptide chain attached at the leading (typically 5’) end of the stretch to be pulled through a nanopore parallel to the abasic backbone.
  • the nanopore occlusion in the area of the abasic region is thus due to the peptide chain plus the parallel oligonucleotide backbone, and through this stretch the amino acid sequence can be read from the changes in nanopore ion current during transit under control of the DNA motor interacting with the DNA sequence to the 3’ end of the abasic region.
  • the polymer chain is synthesized chemically or ligated together from chemically-synthesized units, for example using oligonucleotide synthesis and incorporating DNA, RNA, modified DNA and abasic synthons (such abasic sequences can be obtained commercially from, e.g., Integrated DNA Technologies as dSpacer, rSpacer or Abasic II residues).
  • abasic sequences can be obtained commercially from, e.g., Integrated DNA Technologies as dSpacer, rSpacer or Abasic II residues.
  • alternatives to the common DNA or RNA backbones are used, such as peptide nucleic acid or phosphorothioate backbones, or any of a variety of linear polymers that can be joined to oligonucleotide backbones to form a continuous molecule.
  • polymer VEHICLE constructs described herein as comprising “oligos” or DNA can alternatively be formed of other polymers, other backbones and a variety of natural and modified bases or side-groups.
  • linkage groups described for coupling peptides to rope-tow constructs can be any of a variety of coupling chemistries including “click” chemistry, amine-reactive chemistries (e.g., NHS esters), carboxyl -reactive chemistries, etc.
  • Reactive sites may be created in synthetic oligos by a variety of means, for example by including an amino-modified version of an internal base such as 5' Amino Modifier C6 dT or a 3’ Amino Modifier (both available commercially e.g., through Integrated DNA Technologies, Inc. custom DNA synthesis service). These amino groups can be converted to NHS derivatives as part of oligo manufacture, and can be further converted to click chemistry groups where required.
  • an amino-modified version of an internal base such as 5' Amino Modifier C6 dT or a 3’ Amino Modifier (both available commercially e.g., through Integrated DNA Technologies, Inc. custom DNA synthesis service).
  • the abasic stretches as described can be any structure that preserves the continuity of the backbone and preferably has a smaller cross-sectional area than canonical single-stranded DNA or RNA.
  • the backbone in the abasic stretches comprises negative charges, mimicking the uniform negative charge distribution of a sugar-phosphate backbone.
  • the modified residues comprising linkage sites can be any of a variety of residues comprising a linker group, and the linker may be attached to a base, to a sugar, or to a phosphate group.
  • the linkage site is preceded (i.e., in the 5’ direction) by one or more abasic sites.
  • Such preceding abasic sites can be provided to generate a high-current (almost open-pore) start signal preceding the current profile attributable to a linked peptide and parallel polymer backbone.
  • a complementary oligonucleotide 24 is generated that hybridizes with oligo 23, except through the abasic regions (where there are no bases on 23 with which to hybridize), and in the region of the 5’ terminus of oligo 23 (where the 5’ region comprises a leader sequence, or non-oligo charged polymer, that threads through the nanopore initially and may comprise a site for binding a DNA motor 7). In some embodiments only a fraction of the corresponding residues hybridize.
  • the complementary oligo 23 is interrupted, comprising only segments that hybridize with the non-abasic (“natural”) segments of oligo 23, leaving the abasic stretches single-stranded.
  • the number of complementary strand bases aligned with abasic stretches is not the same as the number of abasic sugar-phosphates, such that one strand is longer than the other in the abasic region, leading to a kink in the duplex at abasic regions, increased exposure of the region to the environment and lesser steric hindrance in the reaction of the oligo linkage site with that of the peptide.
  • the length of abasic regions can be set to accommodate the length of peptides that are intended to be read (e.g., the TARGET and STANDARD peptides), so that the abasic regions are at least as long as the extended length of the peptides plus any linking groups, thereby ensuring that a full-sized nucleic acid base and an amino acid do not transit the throat of the nanopore together, potentially clogging it.
  • Abasic regions are typically 6 to 50 backbone (sugar-phosphate) units long.
  • nanopores with throats of larger dimensions can be used to accommodate oligo and attached peptides in parallel (i.e., roughly double the throat area of nanopores currently used for DNA sequencing), in which case the stretches of abasic sites alongside the peptides in Figure 23 can be replaced with canonical DNA with backbone present with normal nucleotides attached.
  • the stretch of oligo alongside the peptide is a homopolymer, thus providing a consistent background against which the variations of nanopore current due to different amino acids of the peptide can be detected.
  • peptides are prepared for conjugation with rope-tow oligo VEHICLES by functionalizing the n-terminal amino group with a reactive moiety such as a click chemistry reagent suitable for joining to the modified oligo attachment site.
  • a reactive moiety such as a click chemistry reagent suitable for joining to the modified oligo attachment site.
  • TARGET peptides ending in Arginine are preferred since they have only a single amino group (the n-terminus) and therefore are derivatized at only one site by reaction with an amino selective reagent.
  • a peptide amino group is derivatized while the peptide is localized on a BINDER, and later released into solution to react with a rope-tow oligo VEHICLE.
  • a peptide amino group is derivatized while the peptide is localized on a BINDER, and subsequently reacted with a solution of rope-tow oligo molecules, after which any unreacted rope-tow oligo is washed away prior to elution from the BINDER.
  • This approach has the advantage that a large majority of the rope-tow molecules will have attached peptides (i.e., most or all attachments sites will be “loaded” with peptides), and subsequent concatentation of these loaded rope-tow oligos will generate a fully loaded construct capable of yielding a plurality of TARGET and STANDARD peptide counts in one nanopore read.
  • peptide amino groups are not modified but rather react directly with an oligo whose reactive sites can react directly with an amino group (e.g., the oligo VEHICLE has NHS linkage sites).
  • peptides are eluted from BINDERS using a competing (i.e., “displacer”) peptide of the same or similar sequence to the TARGET and STANDARD peptides and introduced, when elution is required, at higher concentration than the TARGET and STANDARD peptides or BINDER binding sites. After a duration of one or a few BINDER half-off-times, the BINDER is saturated with the displacer peptide and the Target and STANDARD peptides will be free in solution.
  • Displacer peptides can be modified so as to be unable to participate in linkage reactions taking place after elution of TARGET and STANDARD peptides, e.g., by blocking the n-terminal amino group and/or the c-terminal carboxyl, and any lysine amino group.
  • a displacer peptide of the same sequence as the TARGET (or STANDARD), but with the n-terminal amino group acetylated during or after synthesis can be used to displace and thus release the bound peptides without interfering in the amino group chemistry.
  • An advantage of eluting TARGET and STANDARD peptides using a displacer peptide is that no other solution conditions need be changed, reducing the likelihood of eluting non-specifically bound materials from the BINDERS, carrier beads or other supports.
  • use of a displacer peptide with a net positive charge results in no displacer peptide migrating towards or into the nanopore.
  • not all reactive sites on the rope-tow oligo VEHICLE react with peptides, leaving empty peptide-accommodating sites. Empty sites are easily recognized in nanopore ion current traces by their short duration and lack of major current modulation.
  • a rope-tow oligo (with peptides attached) and its complementary strand can be joined to a prepared adapter 25 comprising the components required to present the oligo to a sequencing nanopore, start threading into the pore and regulate movement into the pore (e.g., using a DNA motor).
  • the adapter is modeled after a commercially available “Y-adapter” provided by Oxford Nanopore, which can be ligated to a DNA duplex (e.g., aligned by a T-overhang) via a simple kit procedure according to the manufacturer.
  • DNA motors are loaded at periodic sites along the rope-tow oligo between the peptide attachment sites in order to provide means to continue ratcheting the rope-tow VEHICLE through a nanopore if a motor falls off when transiting a peptide-loaded or abasic site.
  • rope-tow constructs are prepared by joining a peptide to an oligo having a single abasic stretch (i.e., with a capacity of a single peptide), and these short rope- tow constructs are subsequently assembled into concatamers using linkage methods described above and illustrated in Figures 23 and 24 (e.g., click chemistry or enzymatic ligation).
  • such short, single attachment site oligos are reacted with peptides while the peptides remain on the BINDER, allowing unreacted oligos to be washed away prior to elution of peptides from the BINDER, and resulting in subsequent concatamerization of only “loaded” oligos, thus avoiding empty abasic sites and wasted sequence reads.
  • double-stranded oligo-peptide rope-tow VEHICLE constructs are introduced into prepared double-stranded sequenceable constructs (e.g., the products of well-known commercially-available library preparation kits and methods used with the Oxford Nanopore system) by recombinant processes (e.g., by use of transposases, CRISPR mechanisms, “tagmentation”, hybridization and repair, and the like) to form sequenceable concatamers.
  • a bead functionalized with a single type of BINDER is used to capture and transport molecules of a single TARGET + STANDARD pair to the vicinity of a nanopore, where the peptides are eluted by one of the methods described above and allowed to combine (e.g., via “click” linkages as described above) with a VEHICLE construct molecule (e.g., a “rope-tow” construct) pre-positioned at (i.e., already threaded into or directly available to) the nanopore.
  • an incubation period can optionally be included to allow eluted peptide molecules to couple with the VEHICLE prior to the start of motion through the nanopore.
  • Motion of the construct through the pore can be initiated by an increase in the trans-membrane voltage pulling the typically negatively-charged construct through the nanopore, at which point a ratchet mechanism (e.g., a DNA motor) begins to feed the construct through the nanopore for reading.
  • a ratchet mechanism e.g., a DNA motor
  • each nanopore is prepared with a single VEHICLE in place, and is used to sequence only that VEHICLE molecule - in such an embodiment it can be advantageous to use a long VEHICLE with a large number of peptide binding sites, for example a VEHICLE of lOOkb equivalent length with abasic stretches comprising peptide linkage sites every 100b and therefore able to accommodate 1,000 peptide molecules (a number sufficient to provide a precise TARGET -to-STAND ARD ratio, and thus protein amount, when the TARGET and STANDARD are in approximately equal amounts).
  • such pre-positioned VEHICLES associated to nanopores are combined with mixtures of different TARGET + STANDARD pairs.
  • a bi-functional support in which a BINDER 52 specific for a TARGET 48 + STANDARD 49 peptide pair is immobilized on a support (e.g., magnetic beads 51) that also carries immobilized VEHICLE molecules comprising a tag (e.g., a sequenceable DNA tag 55) that is assigned to the TARGET.
  • a support e.g., magnetic beads 51
  • immobilized VEHICLE molecules comprising a tag (e.g., a sequenceable DNA tag 55) that is assigned to the TARGET.
  • An example using specific sequences is shown in Figure 25A to illustrate the concepts, while not limiting the scope of sequences, binders and chemistries that can be used.
  • An object of this arrangement is to capture a TARGET + STANDARD peptide pair via the cognate BINDER, separate these peptides from other peptides, and subsequently allow the captured peptide molecules to react with the VEHICLE comprising the TARGET’S tag (which can be considered a VEHICLE cognate to the TARGET sequence), producing a construct whose sequenceable (e.g., DNA) tag identifies the TARGET and STANDARD peptides expected to be attached.
  • sequenceable e.g., DNA
  • a bi-functional support (which may, for example, be one or more magnetic particles) carries multiple molecules of a BINDER and multiple molecules of a VEHICLE (e.g., a rope-tow construct 45) incorporating a sequence tag 55 indicative of the BINDER (and thus TARGET) identity.
  • Figure 25A shows a rope-tow VEHICLE comprising a nanopore sequencing adapter 41 (including a DNA motor 50) followed by two tandem copies of a rope-tow construct, each of which comprises a linkage site 47 to which a peptide (48 or 49) is covalently linked by linkage 46.
  • the construct can include tens, hundreds or thousands of repeats of a rope-tow construct, providing the capability to attach tens, hundreds or thousands of peptide molecules to a single VEHICLE molecule.
  • Figure 25B shows a magnetic bead 51 to which is attached multiple copies of the VEHICLE construct 53 (e.g., the construct shown in Figure 25A) and multiple copies of BINDER (52) with bound TARGET (48) and STANDARD (49) peptides.
  • This configuration results from exposure of the bifunctional bead to a standardized sample digest containing TARGET and STANDARD peptides, allowing the BINDERS to capture these peptides specifically, and subsequent removal of the beads from the digest and washing unbound peptides away.
  • Figure 25C shows the configuration of the bifunctional bead after the peptides are eluted from the BINDERS and allowed to react with the linkage sites 47 on the VEHICLE constructs 53.
  • the VEHICLE constructs are subsequently released from the beads using any of a variety of well- known chemical (e.g., reduction of an S-S bond) or enzymatic (e.g., cleavage at a restriction endonuclease site) means and delivered to a sequencing nanopore.
  • chemical e.g., reduction of an S-S bond
  • enzymatic e.g., cleavage at a restriction endonuclease site
  • the peptides can be chemically modified while bound to the BINDERS in order to introduce a linkage group capable of combining with the VEHICLE linkage sites (e.g., site 47).
  • a linkage group capable of combining with the VEHICLE linkage sites (e.g., site 47).
  • either the peptide linker group or the VEHICLE linker group, or both, are present while TARGET and STANDARD peptides are bound to the BINDERS in a form that is not reactive with the counterpart group, thereby reducing or eliminating premature reaction of free reagents with either of the linking groups.
  • an activation step is carried out to convert the inactive linker forms to active forms capable of reacting with counterpart groups to form linkages 46 between peptides and VEHICLES.
  • the linkages of peptides to VEHICLES result from reactions between a pair of “click” chemistry groups i.e., one on the peptide (e.g., on its n-terminal amino group) and one on the VEHICLE (e.g., at linker site 47), at least one of which was prepared initially in an unreactive form and converted afterwards to an active form able to react to form linkage 46.
  • the yield of peptides linked to VEHICLES is further improved by carrying out the elution step (and subsequent reactions to form linkages 46) after the bead has been placed in a nanowell, thus restricting diffusion of eluted peptides away from the bead and VEHICLES to a very small volume.
  • the VEHICLES are released from the beads before, during or after elution of the peptides from the BINDERS.
  • the BINDERS are either selected so as to contain no reactive amino groups (e.g., nucleic acid aptamers contain no free amino groups), or else the amino groups of the BINDERS are blocked to avoid creation of active linkage-capable groups on the BINDERS that could compete for reaction with the VEHICLE reactive sites.
  • oligo:peptide constructs of the invention are purified before sequencing by binding to and elution from a support designed to capture and enrich nucleic acids from a sample (e.g., Agencourt AMPure XP beads (Beckman Coulter)).
  • a support designed to capture and enrich nucleic acids from a sample (e.g., Agencourt AMPure XP beads (Beckman Coulter)).
  • peptide constructs are concatenated to optimize detection performance, specifically throughput. Presentation of peptides as concatamers allows a nanopore to continuously read molecules, and avoids delays that can arise if a pore must wait for each of a series of short constructs to approach and thread the nanopore to be read.
  • the ligation approach can be modified to enable assembly of long strings of covalently linked molecules, in this case with polymer linkers (e.g., DNA) between peptide molecules in an alternating pattern.
  • the DNA linker is modified compared to the Oligos of Figure 15 by providing ADIBO “click” functionality on one end (the 3’ end in this example) and an amine reactive NHS functionality on the other (5’) end.
  • the individual peptides having a c-terminal lysine residue are modified to introduce azide functionality on the n-terminus (as shown in Step 3 of Figure 28) while on the BINDER, and then, after removal of the unreacted AAAH, released from the BINDER into solution and exposed to the modified oligos (here labeled “Linker”)
  • the result is rapid formation of “click” links to form extended chains of peptide+linker repeats.
  • the motor might be expected to fall off the concatamer when it encounters the peptide segment (thus requiring a new motor assembly on each DNA linker as shown in Figure 28B) - however as disclosed in WO 2021/111125 there are existing motor proteins that can slide over a peptide and engage a subsequent DNA segment.
  • “click” chemistry is employed to assemble concatamers without intervening DNA linkers.
  • one end of peptides e.g., the n-terminus
  • one of a pair of “click” reagents e.g., azide functionality introduced using azidoacetic anhydride, while the other end (e.g., the tertiary amino group of a c-terminal lysine residue) is derivatized with Aza-dibenzocyclooctyne (ADIBO).
  • ADIBO Aza-dibenzocyclooctyne
  • alternative molecular motors capable of stepwise, ratchet-like processing of polypeptide chains are used (70) instead of the enzymes used with nucleic acids.
  • constructs are concatenated by hybridization as shown in Figure 24.
  • the constructs of Figure 15 are joined in series by hybridization with an oligo (labeled Oligo 3) that has sequence regions complementary with both Oligo 1 and Oligo 2.
  • an enzymatic ligation is carried out (at site indicated by the black triangles) to covalently join the successive constructs into a continuous single stranded molecule comprising repeats of Oligo 1, peptide and Oligo 2.
  • Motor proteins are pre-positioned on the constructs prior to ligation as shown.
  • a small proportion of the Oligo 1 Linkers of Figure 28 are specialized as “guide threads” optimized to engage and enter sequencing pores most efficiently (e.g., by having optimal means for engaging tethers to bring the concatamer into contact with the membrane allow its diffusion to a pre).
  • Such Linkers have “click” linkage sites on only one end, and so are incorporated only at one end of a concatamer (typically the 5’ end).
  • the resulting extended chains will be comprised of this TARGET peptide and STANDARD molecules in the same ratio as present in the original sample digest (provided that the reactivities of the TARGET and STANDARD with the other polymer components of the concatamer are equal, or nearly so - this equivalence being a criterion for selection of the STANDARD structure with respect to the TARGET peptide).
  • TARGET peptides and their respective STANDARDS e.g., forming a protein biomarker panel
  • the concatamers that form in solution comprising these peptides will be composed of random mixtures of the different TARGET peptides and their STANDARDS, and each nanopore read will include a variety of TARGET peptides (and cognate STANDARDS).
  • each concatamer molecule is comprised only one type of TARGET and its STANDARD, then recognition and classification of the peptides using a nanopore current trace would be simplified, based on the a priori expectation that the peptide sequences read would all be variations of a single TARGET sequence.
  • the constructs described above are concatenated so as to join into a given chain only molecules of a given TARGET peptide and its STANDARD (i.e., each chain being homogenous with respect to the TARGET to be read out), and will generate counts of only these two peptides.
  • this objective is achieved by concatenating together constructs bound to an individual bead that is functionalized (or coated) with molecules of a single type (specificity) of BINDER.
  • this can be achieved by fixing each BINDER to beads separately (such that each bead has copies of only one BINDER on it) and subsequently pooling the beads to capture the various TARGET and STANDARD peptides.
  • the beads can be distributed into very small (e.g., femtoliter) containers (such as those used in Illumina DNA sequencing technology) or droplets (as commonly used in digital PCR and associated microfluidic methods), effectively isolating each bead in a separate container.
  • very small containers such as those used in Illumina DNA sequencing technology
  • droplets as commonly used in digital PCR and associated microfluidic methods
  • linking groups to the peptides can be carried out prior to distribution of beads into individual containers (i.e., while the peptides remain on the BINDER on the beads), or after the beads are distributed to containers.
  • the peptides are eluted from the BINDER (by exposure to eluting conditions or reagents, including displacer peptides), combined with the derivatized Oligos, and allowed to concatenate, forming sequenceable concatamer constructs as described in the invention. Since the peptides in each tiny container arise from one bead, and this bead bore only a single specificity of BINDER, the container’s peptide contents can be joined into concatamers containing only that TARGET peptide and its STANDARD.
  • Concatamers comprising copies of only one TARGET peptide and its STANDARD are advantageous for two reasons: a) the probability of correctly recognizing the peptide sequence can be increased because multiple copies of the same (or similar STANDARD) sequence are detected successively in the current trace (“squiggle”) from a single nanopore and these jointly used to form a consensus assignment (e.g., by machine learning algorithms well-known in the nanopore sequencing art) to one of a set of pre-selected TARGET peptide sequences, and b) if a peptide sequence can be determined early in the processing of a long concatamer through the nanopore (i.e., in the first few of a large number of concatenated peptides), and that sequence has already been detected enough times to achieve the required assay sensitivity and/or precision, the concatamer can be ejected from the nanopore to enable the nanopore to begin reading a different sequenceable construct (a concept referred to as “computational enrichment of target sequences”).
  • the tiny container into which a single bead (with a single specificity of BINDER) is directed also comprises one or more sequencing nanopores, which are thereby devoted to sequencing a single TARGET and STANDARD pair.
  • microfluidic means are employed to control the movement of such beads into separate pore containers and optionally deliver one or more successive reagents into the container.
  • each bead described above, bearing a single BINDER specificity and carrying bound molecules of a single TARGET peptide and STANDARD pair, is placed in the region of a single nanopore.
  • the nanopore region can be a container in which one nanopore is present, and may be electrically isolated or having liquid or electrical connection to other nanopores, but having little or no diffusion between nanopore regions.
  • the BINDER-bound peptides are eluted from the bead in the nanopore vicinity, and, due to their physical proximity to the nanopore and the nanopore’s isolation from other beads, the bead’s peptide cargo is detected, recognized as a specific TARGET peptide or STANDARD, and counted by passage of the constructs containing them through the nanopore.
  • each BINDER is used separately to extract its TARGET peptides and STANDARD from a digest (a process that can be implemented as a sequence of separate BINDER captures), and then these peptide cargoes are processed separately to form similar homogenous concatamers.
  • the BINDER can be immobilized on physically separate supports (e.g., on separate porous affinity supports including column chromatography beads, or as separate zones on a porous membrane such as nitrocellulose as used in conventional lateral flow immunoassays), or the different BINDER can be exposed to a sample digest sequentially, one at a time, to produce separate captures.
  • beads e.g., PierceTM Protein A/G Magnetic Agarose Beads, diameter 10-40 microns
  • BINDER capacity larger than typical magnetic beads (e.g., Dynabeads, e.g., 2.8-micron diameter) are used in order to collect more molecules for presentation to a single nanopore.
  • Methods for placing beads near nanopores to increase the rate and/or probability of sequenceable constructs entering the pore have been described in the art (72, 73) but they fail to encompass the current purpose of presenting one (or a small subset) of homologous constructs on one or a small number of beads to a given pore.
  • microfluidic means known in the art are used to distribute single beads to individual nanopores.
  • Nanopore sequencing adapters can be ligated to one or a series of peptide- oligonucleotide constructs in tandem using a commercial ligase (e.g., T4 DNA ligase) capable of joining a 5’ phosphate of one oligo with a 3’ hydroxyl of another.
  • a commercial ligase e.g., T4 DNA ligase
  • the linkage can be facilitated by providing a single base sticky end, for example the T/A overhang at sites 45 shown in Figure 7B.
  • splint oligos having a 5’-end complementary to the 5’ end of a VEHICLE or sequencing adapter, and a 3’ end complementary to the 3’ end of another VEHICLE (shown in Figure 29).
  • Annealing of two peptide-oligo constructs (such as a rope-tow constructs of Figure 26) with a splint such as 56 in Figure 29C allows formation of a head-to-toe chain of peptide-oligo constructs amenable to ligation by T4 ligase into a continuous long nanopore sequenceable molecule.
  • Ligation of this chain with an appropriate sequencing adapter, optionally using a short splint 57 renders this molecule ready for entry into a nanopore.
  • double-tag Rope-tow constructs according to the invention are directly suitable for ligation to a sequencing adapter ( Figures 30A and B), and head-to-tail ligation in series ( Figures 30C and D).
  • each Rope-tow construct (the “right-hand” construct being ligated) has a 5 ’-phosphate and is hybridized to a complementary strand having a proj ecting 3 ’ A that base pairs with a proj ecting 3’ T on the “left-hand” construct being ligated.
  • a ligase e.g., T4 ligase
  • a multiplex panel of proteins can be measured in the sample, and the different TARGET peptides (and their STANDARDS) can be enriched separately, for example using different populations of magnetic beads for each different TARGET peptide or by placement of loaded beads in separate tiny containers, with the chemical modifications and reaction with linkers to form chains also carried out separately for each TARGET peptide and STANDARD, as described above. Then each concatenated chain will contain only one type of TARGET peptide sequence and STANDARD. Pooling these separately processed concatamers will create a sample comprised of multiple peptides, but in which each concatamer molecule chain will contain only one predominant TARGET peptide and its STANDARD.
  • the first peptide sequences read by a nanopore from a concatamer molecule will identify the type of TARGET peptide (and/or STANDARD) comprising the whole molecule, making it possible for the sequencing system software to decide whether or not further counts of that TARGET peptide or its STANDARD are required, i.e., whether the minimum molecule counts for the peptide and respective STANDARD required to achieve the desired measurement precision have already been achieved or not (whether from this pore or others).
  • this approach has been implemented in devices for nanopore DNA sequencing, and shown to decrease repeated re- sequencing of the same sequences, and improved coverage of rarer sequences in a given total sequence output.
  • this ability to reject already- well-measured peptides improves throughput substantially, and more substantially the longer (i.e., having more peptides attached) the concatamers are, the greater the benefit.
  • peptide movement through a nanopore can be facilitated by addition of charged molecules that bind to, but do not covalently react with, peptides (e.g., sodium dodecyl sulfate, and similar charged detergent molecules). 7.6.15 Data analysis.
  • TARGET and STANDARD sequences which in many cases will total 4 to 50 peptides
  • the primary requirement is to accurately classify a peptide’s sequence as either i) confidently recognized as one of a limited set whose nanopore signatures have been extensively characterized before (i.e., TARGETS and STANDARDS), and for which a machine learning method has been optimized, or ii) a molecule whose sequence has not been confidently recognized.
  • such recognition and classification of ion current signatures is used to count the confidently recognized TARGET and STANDARD peptides and eliminate the signatures that are not confidently recognized.
  • a similar approach has been used to distinguish limited sets of DNA barcodes used to tag DNA libraries from different samples that are then pooled together for analysis. In such strategies, DNA reads that are not assigned to a barcode with sufficient certainty can be discarded, improving the overall quality of results.
  • a machine learning system is trained to recognize and classify the ion current signatures of a set of TARGET peptides and their STANDARDS using large numbers (e.g., thousands to millions) of “reads” (or “traces” or “signatures” or “squiggles”) of known peptide sequences transiting sequencing nanopores.
  • Recognition based directly on machine learning evaluation of the ion current traces i.e., current measurements over time, typically generating 100-1,000 current measurements during transit of a peptide
  • recognition based directly on machine learning evaluation of the ion current traces i.e., current measurements over time, typically generating 100-1,000 current measurements during transit of a peptide
  • recognition based on machine learning evaluation of the ion current traces i.e., current measurements over time, typically generating 100-1,000 current measurements during transit of a peptide
  • recognition based on amino acid sequences deduced from the traces and therefore represents the preferred method of peptide recognition.
  • This training can be accomplished using libraries of nanopore current signatures generated by constructs made from pure synthetic peptides having the TARGET and STANDARD peptide sequences.
  • Large training sets of pure TARGET and STANDARD peptide constructs are used to select optimal recognition algorithms (e.g., machine learning methods including convolutional neural nets, etc.) and iteratively improve the classification accuracy of these methods to provide accurate counts of the various peptide sequences.
  • the type of pore used is selected based on recognition performance of machine learning systems trained with a specific set of TARGET and STANDARD peptides on the various candidate pores.
  • multiple types of nanopores are used in a system, allowing recognition of specific TARGET and STANDARD peptides by a type of nanopore best able to accurately recognize them.
  • novel nanopores are designed and tested to optimize performance in recognizing specific sets of TARGET and STANDARD peptides
  • the accuracy of counting TARGET and STANDARD peptides is further improved by “counter-training” a machine learning system to reject peptide sequences other than TARGET and STANDARD peptides that may be present as low- abundance contaminants after enrichment of the TARGET and STANDARD peptides from digests of complex biological samples.
  • a library of peptide sequences coded for by the relevant genome and sharing partial sequence or specific sequence motifs with members of a set of TARGET and STANDARD peptides is created and used to counter-train a peptide recognition system to avoid mistaking these sequences for authentic TARGET and STANDARD peptides.
  • the results of such training can be expected to improve peptide recognition with time and the accumulated learning from increasing sample numbers, providing the potential to retrospectively improve the precision of past assay results by reanalysis with updated software.
  • a plurality of candidate TARGET peptide sequences are prepared as constructs for nanopore sequencing, and libraries of nanopore current reads collected using these molecules. This data is used to determine the accuracy with which specific peptide read signatures can be distinguished from other sequences, and this information used in the selection of a set of most accurately classifiable TARGET peptide sequences to represent the target proteins in subsequent routine analyses. Specific affinity reagents can then be generated to bind epitopes in the middle region of these sequences, providing optimal analytical performance.
  • a plurality of TARGET peptide sequences derived from a panel of target proteins are prepared as constructs for nanopore sequencing, and libraries of nanopore current read signatures collected using these molecules. Classification accuracy data derived from these signatures is used to select a set of most accurately classifiable TARGET peptide sequences spanning the set of desired protein panel members.
  • a plurality of candidate STANDARD sequences cognate to one or more TARGET peptide sequences is included in a set of constructs used to generate libraries of nanopore current signatures, and STANDARD sequences are selected for each TARGET peptide so as to provide a set of most accurately classifiable STANDARDS that minimize errors in classifying a TARGET peptide’s STANDARD in relation to other TARGET peptides and STANDARDS.
  • peptide:oligo constructs are constructed with recognizable ion current signals (e.g., a high current associated with an abasic stretch) either before or after the peptide, or both before and after.
  • recognizable ion current signals e.g., a high current associated with an abasic stretch
  • TARGET and STANDARD peptide sequences Use of the methods described above for selection of most-accurately classifiable TARGET and STANDARD peptide sequences provides information about each selected peptide and its likelihood of misclassification within a panel of TARGET and STANDARD peptides.
  • additional signature and classification accuracy data is generated by analysis of sets of relevant biological samples (e.g., plasma or dried blood spot samples) and versions of these into which selected TARGET and STANDARD peptides have been spiked at known levels.
  • STANDARD sequences are unlikely to exist among proteolytic fragments of naturally-occurring proteins (a supposition that is easily tested by bioinformatics analysis of the relevant genome sequences, allowing any naturally-occurring sequences to be rejected as STANDARD candidates) and therefore detection of apparent STANDARD signatures in digests of natural samples that have not been spiked with STANDARD provides a direct estimate of the “false positive” detection rate for STANDARDS.
  • Comparisons of molecule counts among a set of STANDARDS spiked into sample digests at the same (or different but known) levels provides a means of estimating STANDARD “false negative” detection rates (e.g., any STANDARD showing fewer counts than other STANDARD spiked at the same level is likely to be affected by false negative detection errors).
  • TARGET peptides may likely be detectable in digests of natural samples from the relevant species, false positive and negative detection rates can be estimated by comparing TARGET and STANDARD peptide detection rates in samples spiked with equal amounts of TARGET and STANDARD peptides: any excess of TARGET peptide counts over STANDARD counts provides an estimate of the TARGET peptide false positive rate, and any deficit of TARGET peptide counts compared to STANDARD counts provides an estimate of the TARGET peptide negative rate (in each of these cases taking into account the independently determined false positive and negative detection rates of the STANDARD’S.
  • alternative indices of sequence error can be used, e.g., an experimentally determined confusion matrix among amino acids, and/or an experimentally determined confusion matrix among the selected Target and STANDARD peptides.
  • the detection approach can be modified, e.g., by extending the sequence acquisition to more residues (e.g., when using sequential degradative readouts), by alteration of the sequence or modification of a STANDARD involved in a confusion uncertainty, by selection of an alternate TARGET sequence from a target protein, or by other means known in the art. 7.7 USE OF THE INVENTION WITH SINGLE MOLECULE IMAGING AND COUNTING TECHNOLOGIES.
  • TARGET and STANDARD constructs prepared according to the invention can be immobilized (e.g., on glass or quartz slides) and, after staining with fluorescently labeled reagents (e.g., BINDERS for peptides and complementary oligos for flags and barcodes), imaged using this technology to count molecules.
  • fluorescently labeled reagents e.g., BINDERS for peptides and complementary oligos for flags and barcodes
  • peptide molecules are immobilized (e.g., on a surface) and their identities (e.g., as TARGET or STANDARD peptides) determined by optically detecting the binding (or lack of binding) of a series of one or more specific and/or possibly promiscuous affinity reagents with optically-detectable labels (e.g., BINDERS and oligonucleotides complementary to barcode sequences that are labeled with fluorescent dyes or proteins) applied to the surface one after another (e.g., in a flowcell) with the option of removing each affinity reagent before application of the next, and using recognition techniques (including machine learning) to decipher peptide identity based on the pattern of affinity reagents that do, or do not, bind detectably.
  • a system for example that described in US Patent Application 16/659,132, can also be used to count TARGET and STANDARD peptide molecules of the invention (or to count intact target protein molecules in the event that BINDERS
  • linkage chemistries can be employed to connect a peptide construct to an imageable surface, including through direct reaction with peptide amino groups (e.g., using NHS esters), with carboxyl groups (e.g., using carbodiimide chemistries), with cysteine sulfhydryl groups, and with biotin, click chemistry and other groups that have previously been introduced into a peptide construct.
  • Chemistries such as e.g., click chemistry, involve modification of a site or sites on the peptide as well as providing a connecting site on the surface.
  • the required modification(s) of the peptide are carried out while the TARGET and STANDARD peptides are bound to the BINDER (e.g., during the capture stage of the enrichment process).
  • TARGET and STANDARD peptides or peptide:oligo constructs comprising a click attachment group can be eluted from BINDERS (after enrichment) in the presence of a concentrated suspension of SNAPs comprising a TCO click attachment group, resulting in the covalent coupling of one peptide to each SNAP, after which large numbers (e.g., 10 billion) SNAPs can be arrayed for affinity reagent imaging in a suitable optical detection system.
  • the elution of peptides (and associated VEHICLES) from BINDERS under acidic conditions can occur before, at the same time as, or after the peptides couple to the SNAPs.
  • this elution and coupling can take place in a very small volume, e.g., within the interstitial volume of a packed mass of magnetic beads on which the BINDERS are immobilized (i.e., in 0.1 to a few microliters of liquid).
  • constructs bound to BINDERS on magnetic beads are reacted with SNAPs, and the magnetic beads carrying the SNAP: construct complexes moved into close proximity with an imageable surface before release (i.e., elution) of the complexes from the BINDERS on the beads, after which they need only migrate a very short distance by diffusion to reach and bind to the imageable surface. This approach significantly diminishes losses of molecules in the workflow, and thereby maximizes detection sensitivity.
  • peptide detection is accomplished using BINDERS modified to comprise detectable labels (e.g., fluorescent dyes or proteins such as GFP, nanoparticles comprising fluorescent dyes, enzymes that generate optically detectable products, and the like) to visualize TARGET and STANDARD peptides on a support.
  • detectable labels e.g., fluorescent dyes or proteins such as GFP, nanoparticles comprising fluorescent dyes, enzymes that generate optically detectable products, and the like
  • specific nucleic acid sequences are detected by means of hybridizing complementary probes comprising optically detectable labels, for example labels like those used in optical genome mapping (76).
  • Well-known methods of optical detection using microscopic systems are able to detect individual bound labels and associate the resulting optical signals with discrete locations on a surface, thereby allowing a sequence of binding events to be constructed for each bound analyte molecule (e.g., TARGETS and STANDARDS).
  • a specific STANDARD label functionality e.g., biotin, a fluorescent label, a unique short peptide segment, or a unique oligonucleotide sequence
  • a reagent capable of specifically binding to the label or direct optical detection e.g., of a fluorescent label
  • Figure 31 schematically illustrates the use of such a multi-step detection approach to characterize a standardized sample digest, focusing on a region where 96 peptide:oligo construct molecules prepared according to the invention and arrayed on a surface are probed, and the results decoded to provide a quantitative estimated of the amount of a TARGET molecule.
  • Fig 31 A and B two different BINDERS are used and optically detected (shown in green) where present. These signals establish the array sites having each of the two TARGET peptides.
  • Fig 31C and D BINDERS (or oligos complementary to construct DNA sequences identifying TARGET and STANDARD molecules) are separately applied and imaged to determine which constructs are TARGET and STANDARD molecules (in this case irrespective of which peptide they represent).
  • Fig 3 IE and F show detection results of separately applying oligos complementary to construct DNA barcode sequences identifying molecules recovered from two different sample digests (Samples 1 and 2). Using the digital information provided by the binary “optically detected or undetected” signals recorded for each arrayed molecule during these 6 detection cycles, the number of molecules of each TARGET and STANDARD version of each peptide in each sample can be directly tabulated, and the ratio of TARGET to STANDARD counts computed. This ratio, multiplied by the known amount (in relative or absolute terms) of the STANDARD added during standardization of the sample digest, provides a measure of TARGET abundance.
  • Figure 11 shows schematically a series of 6 such sequential detection steps or cycles, each using different binders to identify, or help identify, a specific peptide sequence.
  • peptide A is recognized first by an anti -peptide antibody BINDER specific for an internal peptide epitope.
  • BINDERS specific for short trimer amino acid sequences present in the peptide are used to support peptide identification.
  • an antibody specific for the c-terminal amino acid (or amino acids) is used to further support peptide identification.
  • additional anti-peptide antibody BINDERS specific for other peptide sequences are used to identify these molecules coupled to a support.
  • Each of the bound peptides is part of a larger peptide:oligo construct comprising DNA sample barcodes (Codes 1,5,7 and 11 in the Figure) and a DNA barcode identify each molecule as either a TARGET or STANDARD (shown as the OLIGO-EITHER barcode).
  • BINDERS as a means of identifying immobilized TARGET (and cognate STANDARD) peptides (e.g., imaging methods)
  • a variety of means may be employed to probe specific features of peptides, making use of the fact that their sequences are known a priori (i.e., established during initial TARGET peptide selection).
  • Figure 32 shows additional methods of confirming or improving the specificity of BINDER interactions with peptides by detecting the effect of a change in peptide structure on the binding.
  • antibody BINDERS to an internal epitope, to 2 short trimer epitopes, and to a c-terminal epitope are applied and read as in the example of Figure 11.
  • a proteolytic enzyme capable of cleaving a specific site in some of the peptides is then applied to the immobilized peptide constructs, resulting in release of a c-terminal fragment from the peptide shown (Peptide- A).
  • the peptide is linked to the remainder of the construct by its c-terminal end (e.g., through linkage to the amino group of a c-terminal lysine), then a cleavage with in the peptide will result in release of an n-terminal fragment, and loss of epitopes involving this fragment.
  • the addition of the proteolytic step enables “mapping” epitope locations within the peptide, further strengthening the identification.
  • peptide identification is strengthened by observing the effects of one or more peptide alterations, including peptide cleavage (as shown in Fig 32), chemical or enzymatic removal of one or two terminal amino acids (e.g., using Edman degradation), enzymatic or chemical removal of a phosphate group from Ser, Thr or Tyr, chemical modification of one or more amino acids (e.g., alkylation of a free cysteine, etc.), or any of a very large repertoire of amino acid modifications known in proteomics and protein chemistry.
  • the binding of specific BINDERS is further characterized by observation of the effects of altered solution conditions on the binding to individual peptide molecules.
  • a change from near-neutral to acidic (or basic) pH can result in the dissociation of some binders (but not others) from some peptide epitopes.
  • a change from near- neutral to acidic (or basic) pH can result in the dissociation of some BINDERS from less- preferred peptide epitopes (e.g., sequences similar to but not the same as the cognate TARGET sequence) while remaining bound to the true cognate sequences.
  • a chaotropic agent such as NHiSCN
  • an organic solvent e.g., acetonitrile
  • a detergent can reduce binding of some BINDERS to some targets while allowing other, stronger interactions to persist.
  • a change in temperature can differentially affect various BINDER interactions.
  • changes (particularly temperature) may be employed that affect interactions between oligonucleotide components of a construct and complementary probes used to read sample barcodes, BINDER barcodes, or TARGET/STANDARD codes.
  • any of the changes employed for detection of differential effects can be applied stepwise (i.e., as an abrupt change), or as a gradient of change over time - in which case the degree of change (determined as a function of time across a gradient) at which a BINDER interaction is affected can serve as a highly specific indicator of the affinity and/or specificity of the interaction, and hence its contribution to a correct identification.
  • peptide molecules or their constructs are positioned at points on a predetermined lattice of locations on a planar support (e.g., like the system described in Patent Application 16/659,132).
  • peptide molecules or their constructs are positioned through hybridization of construct DNA sequences to complementary sequences in extended nucleic acid molecules produced by techniques such as “optical genome mapping” (OGM: e.g., US Patent 9,536,041).
  • OGM optical genome mapping
  • Such OGM implementations can use naturally-occurring DNA molecules, or DNA molecules designed to comprise tens, hundreds, thousands, or tens of thousands of repeating complementary sequences at appropriate intervals (e.g., 0.1, 0.5, or 1.0 microns separation) along the length of the molecules.
  • Long DNA molecules linearized by OGM methods can be transferred to, and immobilized on, a planar support having appropriate reactive groups, thereby creating a regular array of complementary sites on a surface within a flowcell for optical imaging during application and removal of a series of optically labeled peptide BINDERS and oligonucleotide probes useful in characterizing bound peptide: oligo constructs.
  • peptide molecules and their constructs are positioned randomly, but at spacings that are typically optically resolvable, on a planar support through binding to sites previously established on the support, e.g., by coating the support with BINDER molecules, by coating the support with molecules having an affinity for some chemical feature of a peptide construct (including oligonucleotides complementary to components of a peptide construct, biotin labels, or the like), or with chemically reactive sites such as click chemistry groups capable of reacting with click groups on peptide constructs, etc.
  • an optical detection system capable of simultaneously and separately detecting these labels based on differences in their excitation and/or emission wavelengths (i.e., multicolor imaging).
  • the use of multiple labels with separate detection wavelengths allows BINDERS to be multiplexed, thereby decreasing the number of binding and elution cycles required to observe a given set of BINDERS.
  • identifying e.g., by amino acid sequence, partial amino acid sequence, or presence of sequence-related features detected by binding reactions
  • individual TARGET or STANDARD peptide molecules or derivatives of these that preserve their individual identities
  • a plurality of BINDERS are used to increase recognition specificity by increasing the number of amino acids involved in interactions with (i.e., “recognized by”) the BINDERS. This effectively increases the peptide sequence coverage of the BINDER(s).
  • 2 or more such single-epitope BINDERS are stably joined to form a single molecule, and the well-known “avidity effect” results in a much higher overall affinity for the peptide than would be seen with any of the BINDERS individually.
  • multiple single-epitope BINDERS with distinct optical (e.g., fluorescent) labels are used together, and peptides that bind the set of BINDERS cognate to the peptide’s epitopes are identified as those exhibiting the correct label emissions.
  • multiple single-epitope BINDERS are labeled with distinct fluorophores such that one BINDER is labeled with a fluorophore acting as a FRET donor and another BINDER is labeled with a fluorophore acting as a FRET acceptor.
  • the proximity of the donor and acceptor fluorophores enables detection of this inter-epitope proximity relationship through detection of emission by the acceptor when the donor is illuminated at its excitation wavelength (i.e., a FRET signal is generated).
  • BINDERS are used whose binding to a cognate peptide epitope is characterized by a rapid off-rate (e.g., in the range of 20 msec to 60 sec half-off-times).
  • the optical signal from such a BINDER will appear and disappear as it repeatedly binds to, dissociates from, and re-binds to (etc.) an immobilized peptide construct.
  • the number of transitions between bound (localized fluorescent signal) and dissociated (no localized signal) states per unit time serves as a quantitative kinetic parameter of the strength of binding (77) which can be used to differentiate binding events to the correct cognate epitope from binding events to a similar but slightly different epitope. This fine level structural recognition further amplifies the specificity of peptide detection.
  • peptides are chemically modified before, during or after a series of imaging detection steps. In some embodiments these modifications alter the detectability of specific peptides, such that a method of detection (e.g., imaging of a BINDER bound to an epitope of the peptide) that produces a positive signal when used before the modification does not produce a signal after the modification has taken place (or vice versa).
  • a method of detection e.g., imaging of a BINDER bound to an epitope of the peptide
  • a modification that perturbs, disrupts or cleaves the peptide in an epitope can result in the failure of a BINDER specific for the original intact epitope to bind to the peptide after the modification has taken place (or for the BINDER to exhibit altered binding kinetics as discussed above).
  • a sequence-specific proteolytic cleavage is used as a modification - in this case the cleavage can result in release of the end portion of the peptide that is not immobilized on the support.
  • Sequence-specific enzymes such as trypsin, ArgN, AspN, GluC, chymotrypsin, pepsin, papain and the like may be used to cleave peptides at specific sites - only peptides comprising such sites will be cleaved, and the positions of the sites in the cleaved peptide sequences, in relation to the BINDER epitopes, determine whether or not BINDER binding is affected.
  • specific amino acids within a peptide sequence are modified.
  • protein kinase enzymes may be used to add a phosphate group to specific serine, threonine or tyrosine residues within a sequence. Addition of a phosphate group to an amino acid within a binding epitope is likely to have a significant impact on BINDER binding to the epitope (typically diminishing binding).
  • BINDERS are used that specifically bind to a phosphorylated epitope but do not bind to the unphosphoryated epitope, and in this case the BINDER binds to the peptide only after the kinase modification has taken place.
  • cysteine SH groups are modified, e.g., by reaction with iodoacetamide, acrylamide, or any of a variety of n-ethylmaleimide compounds.
  • a peptide containing cysteine in a BINDER’S epitope can be kept unmodified (including un- oxidized) for recognition by the BINDER and subsequently modified covalently by reaction with iodoacetamide, after which reprobing with the same BINDER results in weaker (or no) binding.
  • a free n-terminal amino group (or similarly a free c-terminal carboxyl group) can be modified in a way that impacts the binding of a BINDER whose epitope included that terminal group (e.g., by acetylation of the amino group, by removal of a terminal group, or by enzymatic addition of a terminal amino acid).
  • the first of a pair of FRET donor-acceptor fluorophores is added to a site on the peptide (e.g., the n-terminal amino group, a cysteine SH group, a linker joined to the peptide) and the second to a BINDER capable of binding to an epitope near the site of the first.
  • the intensity of the resulting FRET fluorescence provides a measurement of the distance between the two fluorophores that can contribute to the identification of the peptide.
  • one member of a pair of FRET donor-acceptor fluorophores is added to each of 2 BINDERS specific for adjacent epitopes on a peptide. BINDING of the two BINDERS in proximity to one another (i.e., to their adjacent epitopes) creates the conditions required for FRET detection, thus confirming correct binding to these epitopes.
  • one or more BINDERS capable of distinguishing terminal amino acids (or the terminal pair of amino acids) is used to determine this feature of the peptide sequence, thus adding considerable specificity to the overall detection scheme.
  • BINDERS such as those described above for use in peptide sequencing by cyclical degradation/identification can be used for this purpose.
  • repeated cycles of Edman or enzymatic removal of one or two terminal amino acids allows identification of multiple terminal amino acids.
  • the binding of specific BINDERS is further characterized by observation of the effects of altered solution conditions on the binding to individual peptide molecules. A change from near-neutral to acidic (or basic) pH can result in the dissociation of some binders (but not others) from some peptide epitopes.
  • a change from near- neutral to acidic (or basic) pH can result in the dissociation of some BINDERS from less- preferred peptide epitopes (e.g., sequences similar to but not the same as the cognate TARGET sequence) while remaining bound to the true cognate sequences.
  • a chaotropic agent such as NH4SCN
  • an organic solvent e.g., acetonitrile
  • a detergent can reduce binding of some BINDERS to some targets while allowing other, stronger interactions to persist.
  • a change in temperature can differentially affect various BINDER interactions.
  • changes may be employed that affect interactions between oligonucleotide components of a construct and complementary probes used to read sample barcodes, BINDER barcodes, or TARGET/STANDARD codes.
  • Any of the changes employed for detection of differential effects can be applied stepwise (i.e., as an abrupt change), or as a gradient of change over time - in which case the degree of change (determined as a function of time across a gradient) at which a BINDER interaction is affected can serve as a highly specific indicator of the affinity and/or specificity of the interaction, and hence its contribution to a correct identification.
  • peptide molecules can be “reverse-translated” into nucleic acid sequences using a cyclic procedure involving recognition of peptide n-terminal amino acid residues, or a pair of n-terminal residues, and using this recognition to add or transfer an oligo sequence tag specific for the detected amino acid (or pair) to a growing DNA oligo, which is subsequently sequenced to identify and count the reverse-translated TARGET and STANDARD sequences.
  • This technology described in US Patent Application 16/760,028, can also be used to identify and count TARGET and STANDARD peptide molecules of the invention.
  • additional information comprising a peptide’s identity as a TARGET or STANDARD, sample identity (e.g., a sample barcode), identity of the BINDER that bound the peptide during an enrichment step (e.g., a BINDER barcode), and other pertinent information can be added to a growing DNA oligo using any of the methods well-known in the art, including copying of a sequence (e.g., by a polymerase), ligation of an oligo onto the growing chain, insertion of a sequence using CRISPR and related technologies, etc.
  • sample identity e.g., a sample barcode
  • identity of the BINDER that bound the peptide during an enrichment step e.g., a BINDER barcode
  • other pertinent information can be added to a growing DNA oligo using any of the methods well-known in the art, including copying of a sequence (e.g., by a polymerase), ligation of an oligo onto the growing chain, insertion
  • the information thus collected characterizes the peptide in a variety of ways useful in the interpretation of the peptide molecule’s sequence and its significance in an assay.
  • This information may be read out using any of the well-known nucleic acid detection (e.g., PCR) or sequencing methodologies (nanopores, sequencing by synthesis, etc.). This readout can be accomplished either with or without first removing the peptide from the nucleic acid component of the construct.
  • the TARGET and STANDARD peptide molecules of the invention can be arrayed by binding to a surface, or by distribution in an array of pre-formed wells or zones on a surface, and the molecules can be observed individually by a position- sensitive detection means, e.g., optical detection means or electronic detection means.
  • Appl.No.: 16/686,028 describes such a method that can be used to decode the sequence of individual peptide molecules anchored in individual wells of an array of wells in a semiconductor chip. The method enables identification of individual molecules by matching to TARGET peptide or STANDARD sequences, and tabulation of the numbers of such molecules occurring in the array of wells (described as millions of wells on a semiconductor chip substrate (35 ).
  • peptide molecules and/or VEHCILE constructs are immobilized on a surface and their identities (e.g., as TARGET or STANDARD peptides) determined by electronic detection of the presence of BINDERS recognizing peptide n-terminal amino acid residues (35).
  • BINDERS recognizing peptide n-terminal amino acid residues
  • Analogous technological means can be used in the same or similar platforms to read DNA sequences present in peptide:oligo constructs (79).
  • a majority of biomarker tests for proteins deliver a result based on quantity (e.g., the concentration of the target protein in a biological sample) rather than reporting a sequence.
  • quantity e.g., the concentration of the target protein in a biological sample
  • calibrator and control samples Use of external calibrator and control samples, analyzed alongside experimental samples to be analyzed, is well known in the analytical art, and widely used for specific assays (e.g., immunoassays) in clinical diagnostics and in research.
  • data obtained by analysis of a calibrator is used to determine one or more adjustable parameters that bring the system’s analytical result into concordance with an established external reference system.
  • a measurement system is inherently linear, a single point calibration can be used to provide a calibration factor by which detector output is multiplied to yield standard abundance or concentration units.
  • calibrators with multiple levels of analyte may be used to produce a non-linear “standard curve” to translate detector output into an accurate abundance or concentration value.
  • Control samples are typically provided to confirm that calibration has been effective: values obtained by analysis of one or more control samples are compared, after calibration adjustments, to pre-assigned values as a test of the calibration validity (i.e., controls provide quality control for the assay and its calibration).
  • calibrators and controls of this type are provided to be analyzed in the same sequence sensitive single molecule detection workflow as experimental samples, and thus provide calibration of the entire workflow.
  • an additional level of calibration and control is provided to ensure optimal operation of the sequence sensitive single molecule detector itself (i.e., focusing on the detector alone, instead of the entire workflow that includes digestion, any chemical modifications, etc.).
  • sequence sensitive single molecule detectors can produce errors, specifically misidentification of nucleic acid bases, amino acids, or whole molecules.
  • errors can arise from several sources, including errors in assembly of peptides into sequenceable constructs, the movement of molecules through the pore, defects in a nanopore itself, statistical fluctuations in current flowing through a nanopore, electronic noise in the device measuring through-pore current, and a variety of errors contributed by the complex mathematical algorithms, including deep multi-layer machine learning software systems used to interpret that current traces.
  • calibrator TARGET and STANDARD constructs are provided to address this issue, and these can perform either or both of at least two functions: 1) calibration of the relationship between the numbers of TARGET and STANDARD molecules reported by the detection system and the numbers expected based on prior validated measurement of the numbers present in the calibrator material, and 2) tuning and assessment of the accuracy with which TARGET and STANDARD molecules are classified.
  • a calibrator sample is provided that is capable of being read by a nanopore under conditions that area the same as, or similar to, those pertaining when sample peptides are read using workflows of the invention.
  • a calibrator comprises a polymer VEHICLE (which may comprise polymer segments capable of threading a nanopore and oligonucleotide segments capable of engaging an oligonucleotide motor to control movement through a nanopore), with TARGET and STANDARD peptide molecules attached or incorporated therein, and in which the ratio between the numbers of TARGET and STANDARD peptide molecules in the sample’s population of calibrator constructs is known.
  • Nanopore analysis and current trace interpretation of such a calibrator sample will generate an experimentally determined TARGET: STANDARD ratio, which may be compared to the ratio known a priori to be present in the calibrator. Any discrepancy can be used as a basis for calculating and applying a correction factor to the TARGET STANDARD ratio reported by the analytical system on other samples. For example, if the known TARGET STANDARD ratio in the calibrator is 1.0, and the nanopore result (TARGET: STANDARD ratio calculated from the counts of TARGET and STANDARD molecules by the analytical system) is 1.2, then the measured ratios for other samples can be multiplied by 1/1.2 to provide a calibrated result.
  • a calibrator is used to tune the detection system itself.
  • the calibrator comprises constructs of TARGET and STANDARD peptide molecules on a VEHICLE in a manner that identifies TARGET and STANDARD molecules to the detection system.
  • the calibrator can comprise a mixture of two constructs consisting of a) a plurality of TARGET peptides on a type of VEHICLE and b) a plurality of STANDARD peptides on the same or a different type of VEHICLE.
  • each construct comprises multiple copies of either the TARGET and STANDARD peptides, but not both.
  • TARGET and STANDARD peptides are coupled to different VEHICLES that provide independent identification of which peptide is present (e.g., by incorporating different DNA or other recognizable sequences.
  • TARGET and STANDARD peptides are present in the construct in an order or arrangement (e.g., alternating order) that allows the sequencing systems to accurately infer the identity of each peptide. In analyzing the calibrator, the detection system is able to recognize a separate set of valid current traces for each of the two types (TARGET and STANDARD).
  • collections of constructs each comprising only one or a few peptide molecules are provided, and in such cases each peptide’s true identity is determined from highly reliable barcode labels in the constructs.
  • a plurality of calibrator constructs is provided for the calibration and/or optimization of detection of a plurality of TARGETS and cognate STANDARDS.
  • a peptide digest prepared from a complex protein sample can be processed according to the invention to create a large collection of different TARGET constructs (e.g., incorporating a TARGET code).
  • a second aliquot of the same protein sample can be processed according to the invention to create a large collection of constructs labeled with a different code (e.g., incorporating a STANDARD code). Any pair of distinct codes can be used instead of TARGET and STANDARD codes for this special purpose.
  • the two preparations can be mixed in a specified ratio (e.g., 1 part of the first mixture and 10 parts of the second) to provide a calibrator in which two construct versions (e.g., labeled with TARGET and STANDARD tags) of many different peptides can be detected. Observation of the expected ratio (1: 10 in this example) for each detected peptide confirms the linearity of a single molecule detection system.
  • one or more calibrators are analyzed separately from experimental samples (e.g., before or after a run of experimental samples), and the results used for the purposes described above.
  • one or more calibrators are mixed with an experimental sample to provide calibration within a nanopore run.
  • calibrator construct nanopore current traces are evaluated for individual nanopores among a plurality of available nanopores in a device and used to deactivate or otherwise suppress data from such nanopores. In some embodiments, calibrator construct current traces are used to optimize the machine learning algorithms used to analyze the data from each individual nanopore.
  • algorithm parameters are adjusted based on evaluation of calibrator traces to provide a specified level of certainty of peptide assignment. For example, when few copies of a TARGET peptide are detected, it can be preferable to ensure that these few copies are correctly identified and are not incorrectly assigned STANDARD (or other) molecules.
  • the current trace interpretive algorithm can be modified to count only high confidence identifications while assigning lower confidence identifications to an “unassigned” category. This modification increases the certainty that these TARGET peptide molecules are correct identifications, at the cost of reducing the number of identifications, and hence increasing the CV of the measurement. It will be clear to those skilled in the art that tradeoffs between the accuracy of nanopore trace identification on the one hand and overall TARGET: STANDARD ratio precision on the other result from such adjustments, and that these must be taken into account in the overall optimization of assay performance.
  • the false positive and negative detection rates of TARGET peptides and STANDARDS are used in statistical calculations to provide improved estimates of the precision of the respective molecule counts and the precision of the ratio between TARGET and STANDARD counts.
  • Those knowledgeable in the art will understand that a variety of advanced statistical methods exist for the incorporation of multiple measures of uncertainty and error into an overall estimate of precision.
  • a fully elaborated model of assay precision is of considerable importance in establishing the clinical utility of assays according to the invention
  • one or more well-characterized samples similar to or representative of the experimental samples to be analyzed are used as “controls”.
  • quantitative results provided by use of the invention include the ratio of the number of molecules identified and counted as TARGET peptide constructs to the number of molecules identified and counted as STANDARD constructs (STANDARD being present in known or at least consistent amount across a set of samples being analyzed).
  • the precision afforded by counting molecules, where counts are distributed in an approximately Gaussian manner, is estimated to be governed by the ratio of the square root of the number of counts to the number of counts (a ratio often referred to as the Coefficient of Variation, or CV).
  • CVs of 20% or less (for research assays), of 5% or less (for critical diagnostic assays) or 2-3% or less (for sensitive longitudinal tracking of biomarker levels) are desired.
  • the number of counts theoretically needed to achieve a target CV is (1/CV) 2 : thus, CVs of 20%, 5% and 2% would require respectively 25, 400 and 2,500 molecule counts for a single TARGET or STANDARD construct.
  • the CV of the ratio of the number of molecules identified and counted as TARGET peptide to the number of molecules identified and counted as STANDARD is more complicated, but is dominated by the count with the higher CV (i.e., the peptide with the fewer molecules counted, and hence the lower precision).
  • the amount of STANDARD added to a sample as internal standard may be set approximately equal to the average level of the TARGET peptide observed in a set of samples from a relevant human population (i.e., at the population average level, such that the averaged TARGET: STANDARD ratio is 1.0).
  • biomarkers exhibit different levels of quantitative variation among individuals (59), in most cases normal variation occurs within a range of 10-fold below and 10-fold above the population average (i.e., from 10 to 1,000 units for a biomarker whose average level is 100 units in a relevant population; (59)), though some biomarkers show less variation (e.g, occur within a range of 0.5-2.0-fold from the mean) and a few others can change by >1, 000-fold (e.g., CRP in cases of extreme inflammation). In some samples, more TARGET peptide molecules will be counted than STANDARD, and in some samples the reverse. The CV of the ratio will be dominated by the CV of the variable with the fewer counts, since the variable with more counts will have a smaller CV.
  • the CV of the ratio is no more than 1.5 times the larger of the TARGET peptide and STANDARD CV’s.
  • TARGET + STANDARD The total number of molecules of the TARGET peptide to be counted (TARGET + STANDARD) would be a maximum of 25,250 molecules in the first case (lOx range) and 252,500 molecules in the second case (lOOx range). This requirement for counting large numbers of peptide molecules, and in particular larger numbers to achieve better precision (lower CV’s) and wider dynamic range, provides strong motivation to optimize the design of assays using the stoichiometric flattening method of the invention.
  • proteins of diagnostic interest can vary in abundance by more than 10 10 (10 billion-fold (1, 8) a range that significantly exceeds the practical dynamic range of available detection technologies, including mass spectrometry and molecule counting.
  • peptide quantitation according to the invention makes use of counts of TARGET peptides compared to counts of STANDARD internal standard molecules (e.g., as a ratio between the two), it is not necessary to capture all, or even a large fraction, of the molecules of a high-abundance peptide in order to accurately measure its concentration in a sample.
  • the invention instead provides for adjustment of the amount of each peptide TARGET+ STANDARD pair captured, e.g, by adjusting the amount of each peptide’s specific enrichment reagent (e.g., amount of cognate BINDER) or the circumstances of enrichment (e.g., duration of binding and washing steps, solution conditions, etc.) so as to capture only the amount of the cognate TARGET peptide and STANDARD pair that is needed to allow counting the minimum required number of peptide molecules (specifically the minimum number required to deliver the desired measurement precision for the less abundant of the Target and STANDARD molecules: the more abundant of the two will by definition have more counts and thus a better precision, so that the ratio of Target and STANDARD measurements will have a precision similar to that of the less abundant molecule alone).
  • specific enrichment reagent e.g., amount of cognate BINDER
  • the circumstances of enrichment e.g., duration of binding and washing steps, solution conditions, etc.
  • stoichiometric flattening enriches low-abundance peptides, and specifically “de-enriches” or depletes selected high-abundance peptides to a relative abundance level specified in the assay design (typically much less than 100% but greater than 0% of the initial amount), and is therefore distinct from the general concept of “enriching” TARGET peptides as a means of increasing assay sensitivity by capturing all of a rare analyte from a large sample.
  • the amounts of the respective BINDERS are adjusted so as to deliver approximately equal numbers of TARGET plus STANDARD peptide molecules for each TARGET peptide, assuming that STANDARDS are added to the sample at levels approximately equal to the expected level of the cognate TARGET peptide.
  • the process of adjusting BINDER amounts is carried out in a series of steps, beginning with a combination of the BINDERS in certain amounts (which may for convenience initially be equal amounts), measuring the numbers of each TARGET peptide (or STANDARD) molecule detected after enrichment and then reducing the amount of BINDERS for which a large number of peptides were detected and/or increasing the amount of BINDERS for which few peptide molecules were counted.
  • this empirical method allows progressive adjustments of the relative amounts of the BINDERS towards the goal of similar peptide counts for each TARGET peptide and STANDARD pair.
  • the recipe can be locked down as a reproducible product until changes in one or more BINDERS (e.g., development of different BINDER reagents), STANDARD amounts, or panel composition are required.
  • two or more stages of BINDER capture are used: a first capture to collect TARGET and STANDARD peptides from a standardized sample digest (i.e., having hundreds of thousands of different peptides), and one or more secondary BINDER capture steps to further purify or concentrate these relatively pure peptides, or transfer them to a different immobilized format (e.g., a smaller number of larger beads).
  • the process of stoichiometric flattening as described is carried out by adjustments of relative amounts of different BINDER in the first capture stage, or else in the second capture stage, or in multiple capture stages.
  • a first BINDER capture stage is used to collect amounts of the TARGET and STANDARD peptides from a complex standardized digest, and may not, because of variations in the character of different samples, yield the desired level of stoichiometric flattening (i.e., a roughly equal amounts of all the peptides); however adjustments of BINDER amounts or properties in a second stage capture, which begins with a relatively pure peptide sample can provide a much flatter stoichiometry and thus better detection efficiency.
  • Figure 33 illustrates the value of stoichiometric flattening in reducing the number of molecules that must be counted to ensure precise measurement of peptides present in a sample in widely disparate amounts.
  • Hb hemoglobin derived from red blood cells
  • sTfR soluble transferrin receptor
  • Figure 34 presents a more complete example in which stoichiometric flattening is used to improve measurement of a panel of 26 proteins measured in small human blood samples.
  • the lowest abundance protein is sTfR and the highest is HbA, with a series of clinically relevant protein biomarkers occurring at various abundance levels in between.
  • sTfR the lowest abundance protein
  • HbA the highest abundance protein
  • a series of clinically relevant protein biomarkers occurring at various abundance levels in between.
  • stoichiometric flattening a total of approximately 1,000,000,000 peptide molecules must be counted while 913 counts of sTfR are accumulated (as in the previous example).
  • the increased efficiency provided by stoichiometric flattening translates directly into a dramatic reduction in the time and number of pores required to analyze a sample, which in this case is the time required to accumulate the required numbers of counts.
  • multiple samples can be analyzed together (i.e., multiplexed) using some form of molecular barcoding technology (as used for example in genomic sequencing on Oxford Nanopore platforms), and given sufficient pore throughput capacity, this enables more samples in a given time and thus higher overall throughput.
  • the capability of a nanopore reader to identify portions of a sequence early during the read operation, and eject a molecule whose sequence is not of interest (or is surplus to requirements in the current context) can be used to further reduce the stoichiometric differences between high and low abundance peptide reads.
  • This approach termed “computational enrichment of target sequences” or “Read Until” (74) can provide modest (e.g., max 10-fold) improvements in yield of target sequences in a DNA context, but its value depends on having long reads in order to have the opportunity of “rejecting” a significant amount of sequenceable material. In the context of the invention, this approach would yield little or no benefit for constructs carrying one or a few peptide molecules.
  • the number of peptides counted may be determined by the capacity of the detection system (e.g., millions of peptide sites on arrays used by Quantum-Si, Encodia or Google platforms), and the time required for analysis of an initially fixed number of molecules is determined by the number of amino acids that must be serially decoded to accurately identify the TARGET and STANDARD peptides for counting.
  • Example sets of TARGET and STANDARD peptide sequences designed to measure a panel of proteins can be constructed so as to allow recognition of each peptide by sequencing only 3 or 4 amino acids from either terminus. In some embodiments it will be advantageous to sequence further (more amino acids) in order to decrease potential for misidentification and/or provide for recognition of any unwanted peptides with sequences similar to, but different from, the expected TARGET and STANDARD sequences.
  • the time required may be somewhat adjustable (e.g., by adjusting the number of amino acids required to be read) but the overall number of peptide molecules being processed is determined by the geometry of the detection system itself. For this reason, stoichiometric flattening is key to ensuring that there is sufficient number capacity to provide acceptable precision in the measurement of a series of target proteins.
  • sequence-sensitive single molecule detection approach of the invention can distinguish between different peptide sequences, it can be used to measure multiple different TARGET peptides and their respective STANDARDS, potentially representing multiple different sample proteins, at the same time in the same sample.
  • multiple specific affinity reagents e.g., BINDERs
  • BINDERs can be used together (e.g., immobilized on magnetic beads) to enrich their cognate peptide sequences from a complex sample digest without significant interference between peptides.
  • Figure 35 illustrates a multiplex panel embodiment in which 10 peptides, along with their respective STANDARD peptides designed according to the invention are measured by nanopore sequencing in the form of concatamers and counted to provide quantitative measurements of the presence in a clinical sample of SARS-CoV-2 NCAP protein, antibodies to SARS-CoV-2 NCAP and Spike proteins, levels of three host inflammation markers (CRP, LPSBP and Hp), and the RNA genome of SARS-CoV-2.
  • This collection of analytes determined by a single nanopore sequencing run, provides broad coverage of COVID-19 infection and patient response.
  • the invention provides novel components and workflows for modifying proteolytic peptides to create of heterogenous molecular constructs suitable for single molecule detection using several different detector technologies.

Abstract

The inventions herein, e.g., relate to quantitative measurement of proteins, and provide significant improvements, e.g., in the sensitivity, accuracy, throughput and cost of measuring clinically important proteins in biological samples such as blood. The inventions herein, e.g., also relate to peptide library preparation for quantitative single molecule analysis.

Description

ENRICHED PEPTIDE DETECTION BY SINGLE MOLECULE SEQUENCING
1 BACKGROUND
The entire content of each patent document and publication referenced in this application, including but not limited to those listed below, is hereby incorporated by reference herein in its entirety.
1.1 BACKGROUND PATENTS
U.S. Provisional Patent Application No. 63/284,990, filed 12/1/21
U.S. Provisional Patent Application No. 63/288,987, filed
U.S. Provisional Patent Application No. 63/296,196, filed 1/4/22
U.S. Provisional Patent Application No. 63/303,417, filed 1/26/22
U.S. Provisional Patent Application No. 63/313,760, filed 2/25/22
U.S. Provisional Patent Application No. 63/348,213, filed 6/2/22
U.S. Provisional Patent Application No. 63/352,925, filed 6/16/22
U.S. Provisional Patent Application No. 63/373,875, filed 8/30/22
U.S. Provisional Patent Application No. 63/381,722, filed 10/31/22
US Patent No. 7,632,686 (application no. 10/676,005, entitled High Sensitivity Quantitation of Peptides by Mass Spectrometry filed 2 October 2003)
International Application No. PCT/US11/028569 (entitled Improved Mass Spectrometric Assays for Peptides filed 15 March 2011)
International Application No. PCT/US 13/48384 (entitled Multipurpose Mass Spectrometric Assay Panels for Peptides)
International Application No. PCT/US 12/042,931 (entitled Magnetic Bead Trap and Mass Spectrometer Interface)
1.2 SINGLE MOLECULE SEQUENCING USING NANOPORES : http s : //nanoporetech . com/ PCT/GB2020/053082 US11098355 US11168363
US20200239950A1
US 2017021955
US010814298
US20210147904
1.3 REVERSE TRANSLATION OF PEPTIDE SEQUENCE TO DNA (OR RNA, ETC.)
FOLLOWED BY DETECTION USING HIGH-THROUGHPUT NUCLEIC ACID
SEQUENCING PLATFORMS:
“Proteocode” technology developed by Encodia: https://www.encodia.com/technology).
US20180201980A1
US20180328936A1
US20200348308A1
US20210254047A1
US20210302431A1
WO2017192633A1
“ProtSeq” technology developed by Google:
US 2021/0102248
US 2021/0079557
US 2021/0079398
US 2021/0171937
1.4 AFFINITY REAGENT IMAGING PLATFORMS
Nautilus: https://www.nautilus.bio
US 2020/0318101
US 2021/0358563
US 2021/0239705
US 2021/0101930
US 2020/0082914
US 10,948,488 B2 PEPTIDE DEGRADATION WITH OPTICAL DETECTION OF TERMINAL AMINO
ACIDS
Quantum-Si: https://www.quantum-si.com/products-and-technology/ and the following patent filings:
US11175227
US20160041095A1
US20200123593A1
US20200123594A1
US20200395099A1
US20210121875A1
US20210139973A1
US20210217800A1
US20210270740A1
US20210331170A1
US20210354134A1
WO2021086945A1
WO2021086954A1
WO2021146475A1
WO2021216763A1 FLUOROSEQUENCING AND FRET FINGERPRINTING:
US 9,625,469
US 10,545,153
US 2021/032536
US2018/0201980
US2018/0201980
US2021/03024
US2021/0254047
US20150087526A1
US20180328936A1
US20200018768A1
US20200123593A1 US20200123594A1
US20200124613A1
US20200231956A1
US20200348308A1
US20200400677A1
US20210221839A1
US20210331170A1
WO2016164530A1
WO2017192633A1.
WO2019222527A1
W02020014586A9
W02021086908A1
WO2021216763A1
WO2021111125A1
2 FIELD OF THE INVENTION:
2.1 PROTEIN QUANTITATION.
The inventions herein relate to quantitative measurement of proteins, and provides significant improvements in the sensitivity, accuracy, throughput and cost of measuring clinically important proteins in biological samples such as blood. More than 100 different proteins are currently measured by clinical diagnostic tests in blood (1), each requiring a separate test and a separate aliquot of sample. Such tests are typically immunoassays, and make use of indirect detection of protein targets by antibodies, opening the door to a variety of interferences and associated clinical errors (2). The cost and complexity of this paradigm for clinical laboratory testing severely constrains the health benefits obtainable from measurement of clinical biomarker proteins, and effectively precludes emerging applications such as high frequency longitudinal testing to establish personal biomarker baselines and health models. The inventions herein also relate to peptide library preparation for quantitative single molecule analysis.
2.2 EARLIER AFFINITY ENRICHMENT METHODS: SISCAP A-MS.
In the past, significant progress was made in improving the specificity, multiplexability and sensitivity of protein tests through the introduction of quantitative mass spectrometric protein assays, particularly those using specific proteolytic peptides as quantitative surrogates for their parent proteins and enriching those peptides using peptide-specific enrichment reagents such as anti-peptide antibodies (e.g., the SISCAPA technology, (3, 4)). These advances have improved patient care (e.g., the SISCAPA test for the thyroid cancer marker thyroglobulin performed by leading clinical reference labs in the US and Canada (5, 6), and the recently introduced SISCAPA assay for SARS-CoV-2 NCAP protein (7)), and are widely used in pharmaceutical research and development to measure biomarkers, therapeutic proteins and drug targets with high precision.
However, the requirement for a mass spectrometer as the final detector in such assays represents a significant barrier to adoption due to capital cost (typically -$500,000), operator expertise required, limited throughput (typically 2-15 minutes per sample), and unsuitability for ultimate point-of-care use. In addition, the practical sensitivity of mass spectrometers for peptide detection is limited, with the best current instruments requiring at least 10 amol (-6 million molecules) of a peptide for reliable detection, far more than would be required in principle if individual molecules could be counted reliably. It is an object of the present invention to overcome these barriers by enabling the use of single molecule detection techniques in quantitative protein assays.
2.3 SINGLE MOLECULE METHODS.
Several technologies are being advanced for sequence-sensitive single molecule peptide characterization and detection using concepts and methods initially developed for DNA and RNA sequencing. These include nanopore sequencing, “reverse translation” of peptide to DNA sequences, cyclical degradation with electronic detection of terminal amino acids, optical detection of single molecule epitopes, etc., as described below. Nucleic acid versions of such methods typically aim to provide enormous throughput (e.g., gigabases per run) of sequence (i.e., digital) data - a feature required to address whole genome, whole exome, or RNASeq sequencing requirements - but are not focused on precise quantitative (i.e., analog) measurements of the amount of a particular type of molecule. Publications in these field rarely (if ever) use statistical terms related to quantitative precision (such as variance, coefficient of variation (CV), or accuracy): in contrast, these terms are commonly used in discussions of protein quantitation methods where diagnostic accuracy often requires preset precision (e.g., a CV of 5%).
Several significant barriers impede the use of these technologies for the analysis of peptides and proteins. Peptides and proteins are made of 20 common amino acids, as opposed to only 4 common bases in either RNA or DNA, requiring a much greater degree of analytical discrimination to sequence peptides as opposed to DNA. An additional consequence of the greater variety of amino acids, compared to bases, is that peptides and proteins have a very wide variety of physical properties (e.g., number, polarity and localization of electric charges, solubility, chemical reactivity, inter-molecular interactions, etc.) as compared to most nucleic acids, which are uniformly negatively charged along their lengths, with generally unreactive bases attached. In addition, there is no biological equivalent of the polymerase chain reaction (PCR) for peptides and proteins, or of any enzymatic process capable of copying a protein molecule directly, thus eliminating many of the most powerful methods used in nucleic acid preparation for sequencing. Perhaps most importantly, without PCR or an equivalent amplification method for peptides, these methods have limited dynamic range. Genomic DNA has a very limited dynamic range (genes are generally present at near equal stoichiometry). While RNA or cell-free DNA can be present at widely varying abundances, PCR can be used to amplify low abundance molecules. In contrast, important protein-containing samples such as blood, plasma or tissues, have documented dynamic ranges in excess of 1011 (difference in molar amounts between the highest and lowest abundance proteins for which there is a clinical need to measure), with no practical method of amplifying the numbers of low abundance molecules. As a result, there are major gaps in the generality, specificity, and robustness of peptide methods derived from nucleic acid technologies, and thus significant barriers to their applicability to the problem of quantitative protein measurement.
2.4 THE PRESENT INVENTIONS.
To overcome these limitations and enable use of a “sequence-sensitive single molecule detector” instead of a mass spectrometer to detect and measure peptide molecules, major challenges in the preparation of peptide samples and their presentation to such detectors must be addressed. The present invention provides a general approach to the preparation of peptide libraries for quantitative single molecule analysis, and specific implementations appropriate for use with several alternative single molecule detectors (nanopores, optical imaging systems, and single molecule stepwise sequencing systems).
A key obstacle in formulating the invention has been the multi-dimensional nature of the problem, encompassing as it does the areas of protein and peptide chemistry, oligonucleotide chemistry and sequence design, antibody selection, single molecule detection by optical, chemical and electrical technologies, and requirements of specific clinical diagnostic assays. Key aspects of the invention involve adaptation of technologies from each of these areas in a novel combination.
The invention preserves the fundamental benefits of direct detection of analyte molecules (a strength of mass spectrometry in comparison with indirect detection methods such as immunoassays), while offering the potential for improved test sensitivity, sequence specificity, and lower cost - all of which improve commercial competitiveness against legacy immunoassay technologies and enable expanded use of protein biomarkers in medicine and pharmaceutical R&D. Using reagents and methods of the present invention, a substantial improvement in the throughput, cost and sensitivity of protein analysis can be achieved. In the case of nanopore sequencing, taken as an example of sequence-sensitive single molecule detection, we can, in principle, estimate the performance of a system optimized for peptide quantitation. Assuming that 1) peptides can be delivered as oligonucleotide constructs of about 50 bases in length; 2) nanopore sequencers process (read) oligonucleotides at a rate of approximately 400 bases/sec; 3) accurate measurement of the amount of a peptide requires detection and counting of less than 5,000 molecules; and 4) a commercially available nanopore cartridge contains 3,000 simultaneously readable nanopores, it would be theoretically possible to identify and precisely measure 30 peptide targets (representing 30 distinct clinically-relevant proteins) in a single sample in approximately 6 seconds. Using DNA barcoding methods described herein to multiplex samples, 96 samples could be analyzed in 10 minutes using a single cartridge, compared to approximately 10 hr using liquid chromatography - mass spectrometry (LC-MS): an advantage of 60-fold in speed with less than I/IO*11 the equipment cost. Relatively inexpensive benchtop commercial devices exist capable of operating 48 such cartridges simultaneously, providing potential throughput for such a 30-plex biomarker panel of 4,608 samples in 10 min, or more than 600,000 samples per 24-hr day (assuming that sample preparation could keep up with this throughput!).
Using a completely different single molecule detection method based on optical imaging of molecules immobilized on a planar array, similar increases in analytical throughput can be obtained. An array capacity of 2,000,000,000 individual molecules would allow 400,000 peptide abundance measurements, assuming that 5,000 molecules need to be counted for each measurement and that all the different molecules can be brought to near equal stoichiometry. A single run of such a system, requiring approximately 1 day for optical readout, could therefore measure 40 proteins in 10,000 samples per day.
Furthermore, the ability to recognize and count individual analyte molecules can, at least in theory, offer the maximum assay sensitivity possible by any method, approximately 1,000 times as sensitive as mass spectrometry (-5,000 vs 6,00,000 molecules required for quantitative measurement, respectively). Such an improvement in sensitivity would enable precise measurement of almost all the 100+ clinically- established blood protein biomarkers in much less than 1 microliter (1/20111 of a drop) of blood.
These advances in throughput and sensitivity can translate into significant improvements in the cost of measuring the current menu of approximately 115 blood protein biomarkers. Current clinical laboratory analyzers measure one protein at a time, typically by means of specific immunoassays, at an average cost of $5-10 per protein per sample, and requiring 50-100 uL of sample per measurement (which explains why 5-10 mL of blood is typically drawn by venipuncture when blood tests are required). A significant downside of this paradigm is the limitation it places on use of protein panels: measured one at a time, a panel of proteins would cost $5-10 times the number of proteins - a strong disincentive impeding the application of panels despite their greater diagnostic information content. Using single molecule methods, it is estimated that a single run costing $10,000 and yielding 400,000 protein measurements would lower the cost per individual clinical protein measurement to $0,025. This cost structure would revolutionize clinical diagnostics and enable major advances in disease detection and management.
However, this enormous advance in performance provided by sequence-sensitive single molecule detectors is only realizable if several very challenging problems can be solved that currently impede their use to measure a wide range of clinically-important proteins in diagnostic samples such as plasma and whole blood. These problems relate to the preparation of peptide libraries for quantitative single molecule analysis, and are successfully overcome by the present invention, which provides improvements in quantitation (by providing novel internal standards), sensitivity (by enriching low-abundance targets), dynamic range (by enabling stoichiometric flattening), and analytical workflow (by providing applicable chemistries for sample preparation).
3 BACKGROUND OF THE INVENTION:
The present inventions address several major challenges in protein analysis, making use of a number of methods well-known in the art, in novel combinations and in combination with entirely novel concepts disclosed herein. 3.1 PROTEIN QUANTITATION CHALLENGES .
Quantitative measurements of protein biomarkers, drugs and drug targets are important in many areas of medical practice, pharmaceutical trials and biological research. The importance of improving such measurements in terms of sensitivity, specificity, and generality is nowhere more significant than in the context of blood, the primary clinical specimen. Blood represents the largest and deepest version of the human proteome present in any sample: in addition to the classical “plasma proteins” and cellular proteins of red cells, white cells and platelets, it contains all tissue proteins (as leakage markers) plus very numerous distinct immunoglobulin sequences (8). In addition to the large number of proteins present, proteins in plasma exhibit an extraordinary dynamic range in abundance: more than 10 orders of magnitude in concentration separate albumin and the rarest proteins now measured clinically. Abundant scientific evidence, from proteomics and other disciplines, suggests that among these are proteins whose abundances and structures change in ways indicative of many, if not most, human diseases. Nevertheless, only about 100 proteins are currently used in routine clinical diagnosis ( ), while the rate of introduction of new protein tests approved by the US FDA has paradoxically declined over the last two decades to about one or two new protein diagnostic markers approved per year. Furthermore, it appears that the clinical value of most such tests would be substantially improved if the results were interpreted in terms of patient- specific (i.e., personalized) baselines (rather than population reference intervals) - an advance that is currently inhibited by the cost and inconvenience of collecting a series of baseline samples from each patient before the emergence of major disease processes (9). Major advances in clinical diagnostics and pharmaceutical research are to be expected if certain technical problems in sample collection, preparation and analysis are solved.
Current methods of protein analysis, including those used in clinical laboratories for analysis of samples like blood, have limitations that significantly impact their utility, both in research and clinical practice. High-precision tests for clinical use are expensive, limited to a small menu of proteins, and require expensive equipment. Research methods, e.g., those of proteomics, measure many proteins, but with limited precision, low throughput and high cost. There is therefore a need for improvements in protein measurement. 3.2 PEPTIDES AS QUANTITATIVE SURROGATES FOR PROTEINS
In many applications aimed at protein quantitation, one can use a single peptide as a quantitative surrogate for the parent protein, provided that there is one (or some other known number) of copies of the peptide per protein molecule; i.e., that the peptide molar amount (or number of molecules) is equal to (or some known multiple of) the protein’s molar amount (or number of molecules). Using the known sequence of a target protein and/or experimental data, one can select one or more proteolytically-derived peptide segments within it as "target peptides" (herein referred to as TARGET(s)) to be measured as surrogates for their parent proteins. A good target peptide for quantitation purposes is one that is a) proteotypic for the protein (i.e., occurs in no other protein of the species from which the sample is derived); b) occurs a known number of times (usually once) in the protein sequence, allowing the peptide to be used as a surrogate measure of the molar amount of the protein; c) is efficiently detected by a chosen detector; and d) behaves reliably in a practical sample preparation workflow appropriate to the assay objectives (which may include, for example, specific binding and enrichment compared to other un-selected peptides). Methods for selection of TARGET peptides from a wide range of target proteins for conventional mass spectrometric detection is well-known in the art, but not directly relevant to selection of optimal peptides for single molecule detection. .
3.3 PROTEOLYTIC DIGESTION
Digestion of proteins to peptides serves to “simplify” the structure of a protein sample, by eliminating complicated protein shapes (and their associated unique physical properties and protein: protein interactions), at the expense of increasing the numbers of molecules present. In other words, the immense variety of folded protein structures present in a biological sample is transformed by digestion into a larger set of essentially unstructured short, linear peptides. Proteins exhibit a very wide range of physical properties, ranging from soluble to insoluble, compact to extended, positively to negatively charged, with half-lives of seconds to months, and thus each protein represents an individual challenge in terms of handling and measurement. However, proteolytic digestion of a given protein to peptides generally yields a mixture of peptide molecules from which an example can almost always be chosen that is unique to a given target protein (and thus can serve as a quantitative surrogate for it) and has properties compatible with a selected measurement method (encapsulated by the aspirational phrase “in every bad protein there is at least one good peptide”). For this reason, peptide-level detection is less susceptible to interferences, and more compatible with universal sample preparation methods, than protein-level detection. A typical human protein yields about 50 peptides upon digestion with trypsin, and thus a sample containing, for example, 5,000 proteins is likely to yield a tryptic digest containing 250,000 different peptides. Peptides of the length of typical tryptic peptides (5 to 25 amino acids in a typical tryptic digest) do not generally exhibit stable folded structures and thus do not generally interact with one another to form stable multi- peptide structures. This overall absence of stable interactions between digest peptides overcomes the major source of interference and error in technologies such as conventional immunoassays.
Proteolytic digestion is widely used in proteomics to fragment proteins for analysis by mass spectrometry (10) and other analytical methods. Digestion of a sample such as plasma is typically carried out by first denaturing the sample proteins (e.g., with detergents such as deoxycholate, organic solvents, urea or guanidine HC1), reducing the disulfide bonds in the proteins (e.g., with tris(2-carboxyethyl)phosphine (TCEP), dithiothreitol or mercaptoethanol), alkylating the cysteines to prevent re-formation of disulfides (e.g., by addition of iodoacetamide which reacts with the free -SH group of cysteine), quenching excess iodoacetamide by addition of more dithiothreitol or mercaptoethanol, and finally (after removal or dilution of the denaturant) addition of the selected proteolytic enzyme (e.g. trypsin, Lys-C, etc.,), followed by incubation to allow digestion. Following incubation, the action of trypsin is terminated, either by addition of a chemical inhibitor (e.g., TLCK) or by denaturation (through heat or addition of denaturants, or both) or removal (if the trypsin is on a solid support) of the trypsin. Digestion destroys protein: protein interactions and thus generally eliminates interferences that occur in conventional immunoassays.
A very wide variety of proteolytic digestion protocols have been developed, and some have been shown to exhibit extremely high quantitative reproducibility when implemented on automated platforms (4). Most such protocols involve use of a single proteolytic step with a single enzyme (typically trypsin), while in a few cases two enzymes are used together (e.g., Lys-C and trypsin) in order to improve efficiency: Lys-C is smaller than trypsin and more stable at elevated temperature and in the presence of denaturants, and therefore able to cleave proteins that are otherwise relatively resistant to trypsin attack. In some cases this approach makes use of sequential digestion by Lys-C followed by trypsin, with these two steps carried out at different temperatures or at different denaturant concentrations. In contrast to embodiments described herein, the sequential use of Lys-C and trypsin to improve digestion efficiency does not allow oriented construction of peptide-polymer constructs as disclosed in the present invention.
3.4 THE DYNAMIC RANGE PROBLEM
Quantitative detection of peptides by any method faces a major challenge in the form of the vast dynamic range of protein concentrations in samples of interest. In the main clinical specimen (blood serum or plasma) proteins of clinical interest span more than 10 orders of magnitude (10 billion-fold) between the highest abundance proteins (albumin in plasma, or hemoglobin in whole blood) and low abundance proteins of interest (e.g., thyroglobulin (Tg) in the blood of a thyroid cancer survivor). Thus calculations based on the known amounts of various proteins in human plasma show that for every molecule of a selected peptide unique to thyroglobulin in the tryptic digest of a given volume of plasma, there are ~40 billion molecules of other peptides. Comparing the selected thyroglobulin peptide abundance with that of a selected peptide from albumin, there are more than 400,000,000 copies of the albumin peptide for each molecule of the Tg peptide. An example panel of clinically important proteins is shown in Figure 1, comprising a high-abundance protein (transferrin), and lower abundance proteins soluble transferrin receptor (sTfR) and hepcidin, as well as thyroglobulin (present at 0.5 ng/ml - a level indicative of thyroid cancer recurrence in someone treated for that cancer). In a plasma digest, a transferrin peptide is expected to outnumber sTfR peptides by almost 1,000 to 1; to outnumber hepcidin peptides by almost 5,000 to 1, and to outnumber Tg peptides by 28,000,000 to 1. These proteins are measured today in separate assays, each optimized for a different abundance level, and typically offering an assay dynamic range of -1,000 (i.e., a range in which most clinical specimen concentrations of that protein are expected to fall).
Wider dynamic ranges have been achieved in detection systems reliant on amplification (e.g., PCR assays for nucleic acids and proximity ligation (11) or Somascan (12) assays for proteins). However, amplification-based assay systems sacrifice certainty regarding analyte identity, since the actual molecular targets are not themselves observed or measured by the ultimate detectors used in such assays (which detect binding reagents such as antibodies instead), with the result that unexpected interfering molecules can be measured and genuine analyte molecules can fail to generate signal. Confidence that the intended analyte, and only this analyte, is being measured requires direct analyte detection by a detector capable of discriminating the correct analyte from all others, as exemplified by sequence sensitive single molecule detectors used in the present invention.
3.5 LIMITATIONS OF MASS SPECTROMETRY AND SISCAP A
Using direct analyte detection (whether by mass spectrometry or single molecule detection) with samples such as blood plasma comprising a very wide dynamic range of protein targets represents a major technical challenge. A practical technology for measuring the clinical plasma proteome should be capable of accurately quantitating panels of specific biomarker proteins spanning all abundance levels in a single aliquot of sample. To date, the favored approach to this problem has been a combination of differential enrichment of peptides during sample preparation (bringing target peptides into more equal ratios) with mass spectrometry detection (providing a close approximation to sequence-based analyte identification) - an approach termed SISCAPA. The SISCAPA method described in previous disclosures (US7632686) and publications (3, 9, 13-15) is a general approach for protein quantitation involving digesting proteins (e.g., with trypsin) into peptides that can be enriched by specific affinity capture and further fragmented in a mass spectrometer (e.g., by LC- MS/MS) to generate a sequence-based identification and a measure of amount by comparison to an internal standard. This approach combines the advantages of classical immunoassays (sensitivity, throughput) with those of mass spectrometry (specificity, multiplexability, wide linear dynamic range), while overcoming the limitations of each. Challenges remain, however, in sensitivity (mass spectrometry typically requires millions of molecules to generate a reasonably precise signal), throughput (MS-based assays typically require 5-30 minutes per sample), cost (mass spectrometers are expensive, e.g., $500,000, and require expert operators), and robustness (MS-based systems are typically confined to large institutions with sophisticated infrastructure).
A major improvement would result from the replacement of mass spectrometry by a sequence-sensitive single molecule detector, coupled with sample preparation technology capable of delivering to the detector the small numbers (e.g., 100-10,000) of purified analyte molecules that, when counted by the detector, generate the required assay precision (in this case determined by counting statistics).
No mature, practically-implementable technology currently exists that is capable of preparing peptide libraries from complex sample digests for single molecule analysis over such a wide dynamic range.
3.6 PEPTIDE CHEMISTRY
A wide range of chemical modifications have been devised that can be of use in the preparation of peptides and digests for analytical applications of the invention. In some embodiments described herein a peptide is chemically modified, for example to create a linkage to another molecule during assembly of a novel multi-part construct. Site specific linkage chemistries are known for amino groups (e.g., the n-terminal amino group, and the epsilon amino group of lysine); for carboxyl groups (e.g., the c-terminal carboxyl, and sidechain carboxyls of aspartic and glutamic acids); sulfhydryl groups of cysteine residues; and a variety of other less frequently used chemistries. Of these, only the amino and carboxyl groups are available on almost all peptides, making them the preferred attachment points for general methods applicable to a wide variety of peptides. While many reagents have been identified that react preferentially with amino groups, the most common chemistries are n- hydroysuccinimide (NHS) esters and their more soluble sulfo-NHS derivatives. The n- terminal amino group and epsilon amino group of lysine have significantly different pK values, offering the potential to modify one in preference to the other, though this distinction is not absolute. Nevertheless, protocols have been devised (16) that couple NHS-derivative small molecules (e.g., TMT labels used in mass spectrometric detection) to peptide amino groups with very high efficiency (>95%) using very little excess reagent (less than 2-fold excess of reagent over amino groups), illustrating the feasibility of quantitative modification of amino groups in complex peptide mixtures such as proteolytic digests. Carboxyl groups can also be modified using chemistries involving carbodiimides (e.g., EDC: lethyl-3-(3- dimethylaminopropyl)carbodiimide); however the lack of site specificity of such reactions (between c-terminal and amino acid side chain carboxyls) restricts their usefulness for site- specific approaches. A variety of reagents exist that are capable of introducing “click” chemistry functional groups into peptides, often by reaction of an NHS-derivative of a click functionality with a peptide amino group, and oligonucleotides, often by incorporating an amino-derivatized base in a DNA sequence during synthesis and subsequently adding a click group by reaction with this amino group (17). Examples of effective click reagent pairs useful in creation of constructs according to the invention include i) reaction of an azide with an alkyne functionality (some requiring Cu(I) catalysis, which is less preferred in some embodiments); ii) reaction of an azide with a cyclooctyne such as DBCO (dibenzocyclooctyne, also called DIBO), Aza- dibenzocyclooctyne (ADIBO) or BCN (bicyclo[6.1.0]non-4-yne) by means of a strain- promoted alkyne cycloaddition (SPAAC) reaction without the need for a Cu catalyst; and iii) reaction of a tetrazine (Tz, such as methyltetrazine) with a trans-cyclooctene (TCO), also without the need for a Cu catalyst.
3.7 INTERNAL STANDARDIZATION
Use of an internal standard in an analytical assay is highly desirable as it provides a stable reference against which the desired analyte can be measured. In the case of mass spectrometric detection of peptides, a synthetic stable isotope labeled version of a target peptide can easily be made and used as an internal standard (the well-known method of “isotope dilution mass spectrometry”). The approach works well because the labeled and unlabeled peptides are chemically and structurally identical, and thus behave the same through any sample preparation protocol, yet can be distinguished reliably by measuring their masses in the final mass spectrometer detection step. Since the labeled peptide is added at a known concentration, the ratio between the amounts of the natural and isotopically labeled forms detected by the final MS analysis allows the concentration of the natural peptide in the sample mixture to be calculated. The approach can be multiplexed to cover multiple peptides measured in parallel, and can be automated through computer control to afford a general system for protein measurement (13).
Single molecule detectors are unable to measure peptide mass accurately enough (or in most cases at all) to use stable isotope versions as internal standards in this manner. Hence there is a need for an alternative peptide labeling strategy to create single molecule internal standards capable of a) behaving like the targeted peptide analyte during the steps of sample preparation, while b) being clearly distinguishable from the target by the chosen single molecule detection technology. Use of the term “standard” in this specific sense is distinct from other forms of “standards” that can be introduced into workflows for quality control of separations, monitoring of chemical reaction yields, etc., rather than improving quantitation of a single specific analyte.
3.8 PEPTIDE-SPECIFIC BINDERS
A variety of types of biologically derived antibodies (e.g., polyclonal, monoclonal and oligoclonal antibodies derived from mice, rabbits, humans, camelids and other species), molecules derived from antibodies by molecular biology techniques (e.g., antibodies selected from libraries using phage display and other techniques), aptamers (based on DNA, or RNA, and including a variety of modified bases and backbones), and other molecular constructs can be created that are capable of specifically binding a peptide (“BINDERs”). For example, a TARGET peptide can be coupled to a carrier protein (e.g., keyhole limpet hemocyanin: KLH) and used to immunize an animal (such as a rabbit, mouse, chicken, goat, camelid or sheep) by one of the known protocols that efficiently generate anti-peptide antibodies. Experience with the SISCAP A technology (3, 9, 13-15) has shown that antibodies, preferably monoclonal antibodies, can be developed that bind and capture a specific low abundance tryptic peptide from the digest of a very complex sample such as human blood plasma (which may contain 250,000 distinct peptides, some at very high abundance), and thereby enrich the peptide substantially (e.g., more than 10,000-fold). Discovery of such BINDERS (e.g., antibodies) requires use of very specific screening processes to find reagents that do not bind non- TARGET peptides and retain the TARGET peptide long enough to wash non-binding peptides away (typically 10-15 minutes in many automated protocols). The screening process does not assess equivalence of binding TARGET and STANDARD (stable isotope-labeled peptide) since it is known there will be no difference (at least for 15N and 13C isotopic labels).
If, however, a peptide TARGET and its cognate internal STANDARD molecule are not chemically identical (as is the case with stable isotope labeled standards), the very specificity of effective BINDERS creates a major problem: if a BINDER binds a STANDARD more or less tightly (or with different kinetics) than its cognate TARGET peptide, then binding will impact the ratio of TARGET to STANDARD molecules and lead to an incorrect assay result. The selection of TARGETS, STANDARDS and BINDERS that successfully preserve the quantitative ratio is therefore critical for enablement of quantitative internally-standardized single molecule detection.
3.9 STRENGTHS AND LIMITATIONS OF SINGLE MOLECULE DETECTION
The ability to detect a single molecule of an analyte provides the maximum theoretically possible analytical sensitivity. While only a few current clinical assays are sensitivity-limited, sensitivity is nevertheless a major barrier for many emerging and potentially clinically-useful protein biomarkers. Single molecule sensitivity would also allow measurement of the 100+ proteins of the current clinical laboratory menu in much smaller samples than the 10-100uL plasma currently required, and therefore reduce the need for phlebotomy by enabling use of tiny samples such as dried blood microsamples. The few single molecule methodologies so far developed for practical application to protein analytes (e.g., Quanterix SIMOA technology: https://www.quanterix.com/simoa-technology/) make use of indirect detection (e.g., using antibodies to identify analyte molecules) and are therefore subject to the well-known range of immunoassay interferences, non-linear responses, and limited multiplexability.
The ability to determine the partial or full sequence of a biomolecule confers a further major advantage: improved confidence in its identity. For biopolymers such as nucleic acids and proteins, sequence information can identify the analyte unambiguously, thus enabling direct analyte detection. Methods that make use of antibodies to recognize intact protein analytes (e.g., immunoassays) cannot provide this level of certainty, and are classified as indirect detection methods.
It is thus highly desirable in protein analysis to employ single-molecule methods that provide sequence information (18, 19) or structural information closely tied to sequence. A number of such methods are being developed, in most cases making use of technology foundations created for use in nucleic acid (e.g., DNA and RNA) analysis, where sequence information is the primary deliverable.
The present invention provides reagents, methods, and kits for the preparation of peptide libraries suitable for quantitative analysis by a variety of sequence-sensitive single molecule detection technologies, including, but not limited to, the following examples: 3.9.1 Single molecule sequencing using nanopores.
Biopolymers (nucleic acids and polypeptides) can pass through nanopores (both biological and inorganic) in suitable membranes. Signals (e.g., through-pore ion current, or cross-pore tunneling current) recorded during transit of the analyte molecule through the pore reflect differences in blockade of ions flowing through the pore by the side chains of the biopolymer (i.e., bases or amino acids) in the pore’s throat region. Nanopore methods of DNA and RNA sequencing have been developed and successfully commercialized {(20, 21)).
Nanopore analysis of peptides and proteins is advancing rapidly {(19, 22-25)), but discrimination of 20 different amino acids presents a far greater challenge than discrimination of 4 nucleic acid bases. The most mature approach for nanopore analysis of peptides is one involving in-line linkage of peptides and nucleic acids into a hybrid polymer, allowing use of some features of a successful commercially-available DNA sequencing platform to be applied to peptides (e.g., international publication WO 2021/111125 Al). Similar methods are likely to work with a variety of alternative platforms including, but not limited to, alternative biological nanopores (26, 27), inorganic nanopores (28, 29), DNA-origami nanopores (1,_2)_ and the like.
3.9.2 Single molecule characterization using an Affinity Reagent Imaging Platform (ARIP) in which fluorescent BINDERS recognize molecular features.
Single protein molecules can also be arrayed in a regular pattern on a planar surface and probed by a succession of “promiscuous” binding agents to build up a pattern of epitope occurrences in each molecule (30-32). Machine learning approaches can be used to interpret these epitope occurrence patterns to identify most proteins produced in a given organism despite the stochastic nature of individual binding events. In the context of short peptides, rather than whole proteins, this approach does not deliver direct peptide sequence information.
3.9.3 “FRET” fingerprinting of peptides moving through a protease
A limited but sequence-specific fingerprint of a peptide can be accomplished by detecting the order of fluorophores coupled to specific amino acids on a single TARGET peptide molecule (33). Such a technology has been developed by functionalizing a peptide with one type of fluorophore (Cy3) at the N-terminal site and a second type of fluorophore (Cy5) on an internal cysteine residue. The method then monitored the order in which the two fluorophores passed through Alexa488-labeled ClpP14 protease, as detected using the separation-dependent Forster resonance energy transfer effect (“FRET”). This approach provides less information than is potentially available by nanopore sequencing.
3.9.4 Single molecule degradative sequencing.
Several methods have been developed for recognizing, detecting and removing one amino acid at a time from a peptide and either recording (e.g., in DNA) or detecting (e.g., by optical readout methods) the result, providing a single molecule version of classical Edman peptide sequencing. These methods have in common the need to recognize individual terminal amino acids, a problem that has not been completely solved to date, with the result that only an “approximate” version of a peptide’s sequence is obtained. As described below, even this approximate information may be sufficient to allow use of these detection technologies with the invention, since the invention requires discrimination among only a relatively small number of different TARGET peptides. The instrument platforms associated with these technologies typically provide for simultaneous “sequencing” of millions to billions of peptide molecules, with successive amino acids decoded in successive cycles of reagent recognition and terminal amino acid removal. Thus any of them can be used to generate sequence data (or “approximate” sequence data) from the peptide libraries prepared using the invention.
3.9.4.1 Single molecule degradative sequencing by reverse translation.
The concept of “reverse-translation” of a peptide sequence into a DNA sequence is of course a noteworthy contradiction of the central dogma of molecular biology (DNA makes RNA makes protein), and no such biological system is believed to exist. However, by coupling the recognition of a specific terminal amino acid on an immobilized peptide (by a recognition molecule specific for one or more terminal amino acids) with a transfer of a DNA code from the recognition molecule to a nearby DNA molecule, one position of the peptide’s sequence can be converted into a DNA code. By then removing the terminal amino acid (e.g., by chemical Edman degradation, or limited enzymatic attack by an exoprotease, or other equivalent means) and repeating the recognition and transfer process, a DNA sequence can be progressively generated that encodes all or part of the peptide amino acid sequence (the peptide being destroyed in the process: i.e., the reading process is degradative). The DNA molecule can subsequently be read using any of the established DNA sequencing methodologies. In the event that some amino acids are not clearly discriminated, or that some recognition molecules only recognize a class of amino acids (e.g., those having positive charge, orthose with negative charge, or those with uncharged hydrophilic side chains, etc.), the resulting “approximate” sequence information may nevertheless be sufficient to recognize one peptide sequence among a limited set of expected alternatives. A variety of methods can be used to immobilize millions of individual peptide molecules and adjacent DNA molecules so as to produce a DNA library encoding sequence information from the original peptide library. In this technology, peptides are typically linked to the solid support via the c-terminal carboxyl group, leaving the n- terminus free. Significant progress towards reverse translation of peptide libraries has been reported by several groups, e.g., the “Proteocode” technology (https://www.encodia.com/technology: US 2021/0208150) and the “ProtSeq” technology ((34); patent publication US 2021/01022).
3.9.4.2 Single molecule degradative sequencing using .fluorescence detection.
Other degradative single peptide molecule methods have been reported that make use of optical detection of fluorescent labels. In one such method, the dynamics of binding of recognition reagents to terminal amino acids of single peptide molecules located in individual wells on a semiconductor chip are sensed and interpreted peptide sequence data (35). In this technology, peptides are typically linked to the solid support via the c-terminal carboxyl group (PCT/US2021/028471).
Another method ("fluorosequencing”, (36-38)) makes use of fluorescent labels attached to specific amino acids (e.g., cysteine SH, lysine NH2, etc.) by chemical methods, and records the disappearance of these fluorescent signals when a labeled amino acid is cleaved off during a sequence of degradative (e.g., Edman) steps.
3.10 SAMPLE BARCODING
The ability to manipulate DNA sequences, particularly to synthesize, sequence, and splice together DNA sequences of various lengths, enables the attachment of designed, recognizable sequence tags (i.e., “barcodes”) to the DNA recovered from biological samples. Once sample DNA is barcoded, then DNA from multiple samples can be combined for further processing, such as next generation sequencing (“NGS”), and afterwards attributed to the correct original sample (i.e, demultiplexed). A variety of DNA barcode systems have been developed with the object of reliable identification of the original source sample in NGS applications. Note: this use of the term barcode (meaning a designed tag used for labeling) is distinct from an alternative usage applied to endogenous DNA sequences found to be characteristic of biological species and used to identify presence of a species in a sample comprised of multiple organisms.
For high-throughput DNA sequencing applications, Xu, et al, devised a library of 240,000 orthogonal 25mer DNA barcodes in 2009 (39), and continuing work by a number of investigators has resulted in barcode libraries of improved reliability and error-resistance (40- 42). In these applications, where the analytical system has the ability to directly sequence the barcode along with the sample DNA, the barcodes can be relatively short, and sophisticated mathematical approaches for error-detection and correction can be employed to reduce the likelihood of incorrect barcode assignments due to incorrect base calls, insertions, deletions, etc.. Sophisticated software, including machine learning, has been developed to improve correct DNA barcode assignment in multiplexed nanopore DNA sequencing (e.g., (43)).
DNA barcodes are also used in other applications where the barcodes are “read” by hybridization of a complementary probe that can be detected by optical or other detection means (44) without sequencing (e.g., using a fluorescently labeled complementary-sequence probe reagent detected by single molecule microscopic imaging). Such methods have been successfully applied for single molecule fluorescence detection of up to 1,000 different mRNA sequences in single cell images using 16 different 30-mer readout probes in a 16-bit modified Hamming distance 4 code (45). Such coding methods enable efficient sample barcoding and demultiplexing in single molecule imaging platforms.
Potential errors in barcode identification resulting from sequencing errors, hybridization failures, etc., have been dealt with by application of techniques derived from information theory. A particularly effective approach has been the use of error-correcting codes (ECC) used in digital information storage systems (e.g., computed memory), generally stemming from the work of Hamming (46). By incorporation of extra parity bits in an encoded signal, Hamming codes can be designed to allow detection and repair of single-bit errors, and detection (and in some cases repair) of two bit errors. Given the potential for error in barcode readout by nucleic acid sequencing (e.g., in single molecule nanopore methods) or hybridization (e.g., in optical imaging applications), error detection and correction can be critical when single molecules are being detected and counted to determine a quantitative result.
Yet other barcoding strategies have been created using DNA barcodes to label synthetic chemical libraries (47) and peptide barcodes detected by mass spectrometry (48).
3.11 MACHINE LEARNING FOR SIGNAL CLAS SIFIC ATION.
Machine learning methods have been successfully developed that allow the identities and/or sequences of individual molecules to be deduced from complex signal patterns. Nucleic acid sequences can be derived from current traces measured as DNA or RNA molecules pass through nanopores using highly-trained neural networks to recognize and interpret conductivity transitions (49). Proteins can be recognized by machine learning based on optically-detected stochastic binding of multiple promiscuous affinity reagents to single molecules (31). In general, machine learning approaches make it possible to improve the recognition of molecules by all the above single molecule technologies by building mathematical models based on large numbers of reference examples, and incorporating more data for each example than is practical in human-designed programs.
3.12 LIMITATIONS OF EXISTING TECHNOLOGIES AND OBJECTIVES OF THE PRESENT INVENTION
The current dominant methods for direct detection of peptide molecules by MS, including SISCAPA and related methods, have significant limitations. These include A) sensitivity limited by the performance of available mass spectrometers (currently limited to 10-100 amol of peptide, equivalent to 6 million to 60 million molecules of a peptide); B) low throughput (largely due to the limited speed of typical liquid chromatography systems employed); C) lack of robustness of the liquid chromatography systems used to separate peptides and introduce them into the MS; D) level of expertise required to operate LC-MS systems); E) high cost of LC-MS systems and the consequent limited adoption in clinical laboratories and F) impracticality of use in low-technology environments. In addition, there is the fundamental limitation that MS typically resolves and identifies analytes based on one or a few parameters that are derived from the peptide sequence (typically its mass and the masses of one to three of its specific fragments), but it does not typically determine the entire peptide sequence and is therefore susceptible to various forms of identification error.
Existing methods of single molecule detection likewise have significant limitations restricting their application to peptide quantitation. These include A) absence of internal standards that could provide a quantitative reference for comparison of samples; B) limited dynamic range (inability to count sufficient numbers of molecules to estimate frequency of very low abundance targets); C) lack of efficient sample preparation protocols to deliver peptides for single molecule detection; and D) limited ability to recognize many of the amino acids in a peptide sequence (i.e., limited specificity).
Recognizing these limitations, a recent analysis (50) of the limitations of single molecule methods in comparison with mass spectrometry for the general characterization of complex proteomes concluded that the above-referenced limitations, and specifically the dynamic range limitation, effectively prevent current single molecule methods from providing achieving a general analysis of sample proteins.
It is an object of the present invention to transcend these limitations and others. The invention provides significant improvements in assay sensitivity by making use of single molecule counting technologies instead of mass spectrometry detection, with the potential to make quantitative measurements at the level of hundreds to thousands of analyte molecules (i.e., >1, 000-fold improvement compared to MS methods, including SISCAP A-MS). The invention provides sequence-based assay specificity through direct detection and counting of analyte molecules without the use of liquid-chromatography or expensive mass spectrometer instruments. The invention makes use of certain technologies and platforms that have been extensively developed for nucleic acid applications (e.g., DNA and RNA sequencing), some of which have been implemented commercially as small, inexpensive instruments capable of generating accurate results in low-technology environments. A further object of the invention is to significantly lower the cost of making precise measurements of protein biomarkers, drugs and targets, and thereby to enable expanded use of quantitative protein tests in diagnostics and in longitudinal health monitoring. The invention provides methods for improved protein quantitation by adapting a novel specific affinity enrichment strategy to allow detection of enriched peptides by technologies other than mass spectrometry - specifically technologies that enable counting individual peptide molecules in a sequence-specific manner. In adapting the specific affinity enrichment strategy to these alternative detection means, significant novel changes as described herein are required in the selection and treatment of peptides, in the generation of suitable internal standards as substitutes for stable isotope labeled versions, in the generation of sequence- specific binding reagents, in the preparation and delivery of peptides for single molecule detection, and in the analysis of resulting data.
The present invention will be described with respect to particular embodiments and with reference to certain drawings, but the invention is not limited thereto but only by the claims. It is to be understood that not necessarily all aspects or advantages may be achieved in accordance with any particular embodiment of the invention. Thus, for example those skilled in the art will recognize that the invention may be embodied or carried out in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other aspects or advantages as may be taught or suggested herein.
The invention, both as to organization and method of operation, together with features and advantages thereof, may best be understood by reference to the following detailed description illustrated in the accompanying drawings. The aspects and advantages of the invention will be apparent from and elucidated with reference to the embodiment s) described hereinafter. Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Similarly, it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment.
It should be appreciated that "embodiments" of the disclosure can be specifically combined together unless the context indicates otherwise. The specific combinations of all disclosed embodiments (unless implied otherwise by the context) are further disclosed embodiments of the claimed invention.
DEFINITIONS OF TERMS
Key terms used frequently herein:
Figure imgf000028_0001
Figure imgf000029_0001
The term "amino acid" in the context of the present disclosure is used in its broadest sense and is meant to include organic compounds containing amine (NH2) and carboxyl (COOH) functional groups, along with a side chain (e.g., a R group) specific to each amino acid. In some embodiments, the amino acids refer to naturally occurring L amino acids or residues. The commonly used one and three letter abbreviations for naturally occurring amino acids are used herein: A=Ala; C=Cys; D=Asp; E=Glu; F=Phe; G=Gly; H=His; K=Lys; L=Leu; M=Met; N=Asn; P=Pro; Q=Gln; R=Arg; S=Ser; T=Thr; V=Val; W=Trp; and Y=Tyr (Lehninger, A. L., (1975) Biochemistry, 2d ed., pp. 71-92, Worth Publishers, New York). The general term "amino acid" further includes D-amino acids, retro-inverso amino acids as well as chemically modified amino acids such as amino acid analogues, naturally occurring amino acids that are not usually incorporated into proteins such as norleucine, and chemically synthesized compounds having properties known in the art to be characteristic of an amino acid. For example, analogues or mimetics of phenylalanine or proline, which allow the same conformational restriction of the peptide compounds as do natural Phe or Pro, are included within the definition of amino acid. Such analogues and mimetics are referred to herein as "functional equivalents" of the respective amino acid. Other examples of amino acids are listed by Roberts and Vellaccio, The Peptides: Analysis, Synthesis, Biology, Gross and Meiehofer, eds., Vol. 5 p. 341, Academic Press, Inc., N.Y. 1983, which is incorporated herein by reference
The term “analyte” may refer to any of a variety of different molecules, or components, pieces, fragments or sections of different molecules that one desires to measure or quantitate in a sample.
The term “anti-peptide antibody” (a class of specific binding agent, or BINDER) as used herein means a macromolecule capable of non-covalently and reversibly binding to a peptide in a manner that is specific to all or a portion of the peptide’s sequence. The term includes a variety of types of macromolecules as indicated in the definition of “antibody” above, and is not limited to the proteins conventionally considered antibodies.
The term “barcode” includes any distinguishing physical, chemical or sequence characteristic of a peptide construct capable of having multiple values that can be determined by a single molecule detection method. Nucleic acid sequences can be used as barcodes, for example by providing a set of distinguishable sequences, a different one of which can be linked to the peptides of each sample, identifying (“decoding”) the source of these peptides after the peptides from multiple samples are pooled for efficient processing in a single molecule detection system. Other forms of molecular barcodes can be used as well, including sets of glycan structures (which can be decoded using various specific lectins, for example), peptides (when these can be linked to and sequenced with the TARGET peptides); non-biological polymers distinguishable by length or content of alternative polymer units; and small molecules including colored or fluorescent dyes.
The term “bind” includes any physical attachment or close association, which may be permanent or temporary. Generally, reversible binding includes aspects of charge interactions, hydrogen bonding, hydrophobic forces, van der Waals forces, etc., that facilitate physical attachment between the molecule of interest and the analyte being measured. The “binding” interaction may be brief as in the situation where binding causes a chemical reaction to occur. Reactions resulting from contact between the binding agent and the analyte are also within the definition of binding for the purposes of the present invention, provided they can be later reversed. The terms “BINDER”, “antibody”, “anti-peptide affinity reagent”, “specific affinity reagent”, “specific binding reagent”, “affinity capture reagent” and “anti-peptide antibody” as used herein mean a reagent having the ability to reversibly bind to a specific TARGET peptide (and its cognate STANDARD) in a manner that is specific to all or a portion of the peptide’s sequence. Such a BINDER will typically bind a TARGET peptide with greater affinity, greater kinetic on-rate or lower kinetic off-rate than a majority of the other peptides present in samples, sample digests, or other sources of contamination. The terms include antibodies and fragments thereof as well as non-naturally occurring or synthetic antigen binding molecules. Thus, included are IgG antibodies (polyclonal, monoclonal, oligoclonal, etc.), and other antibody isotypes, fragments thereof, such as Fab fragments, murine, chimeric, and other non-human or not fully human antibodies and fragments thereof, synthetic (non-naturally occurring) antigen binding formats such as single chain antibodies and bispecific antibodies, as well as aptamers (including DNA, RNA and other polymeric aptamers) and binding proteins built from non- antibody structures (e.g., nanobodies).
The term “BINDER ID” means a molecular barcode identifying the BINDER to which a molecule bound in an enrichment step.
The term “biologic” means a drug produced by a biological mechanism, such as a protein; i.e., a protein therapeutic, or protein drug.
The term “biomolecules” refers to any molecule present in a biological system, and includes proteins, nucleic acids (specifically DNA and RNA in its various forms, both intracellular and extracellular), complex sugars (glycans and the like), lipids, and a variety of metabolites.
The term “denaturant” includes a range of chaotropic and other chemical agents that act to disrupt or loosen the 3-D structure of proteins without breaking covalent bonds, thereby rendering them more susceptible to proteolytic treatment. Examples include urea, guanidine hydrochloride, ammonium thiocyanate, trifluoroethanol and deoxycholate, as well as solvents such as acetonitrile, methanol and the like. The concept of denaturant includes non-material influences capable of causing perturbation to protein structures, such as heat, microwave irradiation, ultrasound, and pressure fluctuations. The term “click chemistry” means the use of pairs of chemical groups that react with each other but not with other chemical groups commonly found in biomolecules: i.e., they are bio-orthogonal coupling mechanisms. Commonly used click chemical pairs include, but are not limited to, a 3’ transcyclooctyne (TCO) group reacting bio-orthogonally with a tetrazine group (e.g., methyltetrazine (Me-TZ)), and bicyclononyne (BCN) reacting bio-orthogonally with an azide group. In some instances copper (Cu) ions serve as catalysts for a click reaction, and in other instances, typically involving a strained cyclic alkyne, a catalyst is not required.
The term “clonotypic” means uniquely characteristic of a clonal product, typically referring to a peptide sequence unique to a specific monoclonal antibody.
The term “cognate” as used herein means a relationship between molecules in which either 1) the molecules each contain a region that has the same structure as the other, or 2) the molecules can bind together by a specific interaction. In the case of peptides (e.g., a TARGET and STANDARD peptide pair), cognate peptides can share a region of identical sequence, which may be from 2 amino acids up to the full length. The difference between cognate peptides can be a difference in sequence, or a difference due to attachment or removal of some atom(s) or groups (including one or more entire amino acids), or the addition to the peptide or a chemical group of any size (including oligonucleotides, peptides, “handles” such as biotin, and reactive groups able to subsequently bond to other molecules).
The term “cognate BINDER” or “cognate affinity capture reagent” means a specific affinity reagent (e.g., a specific binding reagent, BINDER) that is capable of specifically binding a cognate TARGET peptide and/or cognate STANDARD, in the sense that the cognate affinity capture reagent is designed, generated or selected to have a specific affinity for an epitope comprising part or all of its cognate peptide sequence.
The term “degradative sequencing technology” means a technology in which peptide molecules are disassembled one amino acid at a time (or in some cases two amino acids at a time), typically from one end, and the terminal amino acid identified, e.g., as one of the 20 common amino acids found in proteins, or as one of a subset of amino acids. In some cases, the identification can be obtained directly by optical or electrical readout, and in some cases the amino acid identity is translated into another molecular form (e.g., DNA) for later readout using a different technology. The terms “drug” and “therapeutic” mean a type of molecule that may, under appropriate circumstances of dosing and timing, interact with components of a subject’s body to modify biological processes, including disease processes, normal processes, aging and the like. A drug may be a small molecule such as aspirin, or a macromolecule such as a protein (e.g., insulin), a nucleic acid (such as an anti-sense drug), or carbohydrate (such as heparin). Drugs that comprise or are derived from monoclonal antibodies represent a growing class of therapeutic agents with particular advantages in terms of extreme specificity for endogenous protein and other targets involved in disease processes.
The term “electrospray ionization” (ESI) refers to a method for the transfer of analyte molecules in solution into the gas and ultimately vacuum phase through use of a combination of liquid delivery to a pointed exit and high local electric field.
The term “elution” means the release of a bound peptide or construct from a BINDER.
The term “flag” is used herein as equivalent to Barcode, and may be any type of distinguishing molecular feature including, but not limited to, a polymer of dissimilar subunits encoding an identification relevant to sample analysis.
The terms “Forster resonance energy transfer” or FRET refer to energy transfer between two light-sensitive molecules (chromophores, typically fluorescent molecules). A donor chromophore, initially in its electronic excited state, may transfer energy to an acceptor chromophore through nonradiative dipole-dipole coupling. The efficiency of this energy transfer is inversely proportional to the sixth power of the distance between donor and acceptor, making FRET extremely sensitive to small changes in distance, generally on scales of 1 to 10 nm.
The term “immobilized enzyme” means any form of enzyme that is fixed to the matrix of a support by covalent or non-covalent interaction such that the majority of the enzyme remains attached to the support of the membrane.
The term “ligation” as used herein means the joining of an end of a polymer chain (such as a nucleic acid) to an end of another polymer chain to form a combined linear polymer. The term includes joining by enzymatic means (such as that of a DNA ligase, splicing means such as CRISPR, and other well-known molecular biology techniques for joining and splicing nucleic acid sequences) and chemical means (such as the use of click chemistry).
The term “Linkage” means a connection between originally separate molecules, and includes common covalent connections between units found in biopolymers and man-made polymers, as well as connections made using chemistries such as the well-known “click” chemistries, reactions such as those between amino groups and NHS esters, and formation of sugar-phosphate bonds when oligonucleotides are ligated together, as well as strong but non- covalent connections such as the interaction between biotin and streptavidin. The related term “Linker” means a segment of a molecule comprising an atomic configuration capable of, or arising from, formation of a linkage between two or more initially separate molecules.
The term “MALDI” means Matrix Assisted Laser Desorption Ionization and related techniques such as SELDI, and includes any technique that generates charged analyte ions from a solid analyte-containing material on a solid support under the influence of a laser or other means of imparting a short energy pulse.
The term “Mass spectrometer” (or “MS”) means an instrument capable of separating molecules on the basis of their mass m, or m/z where z is molecular charge, and then detecting them. In one embodiment, mass spectrometers detect molecules quantitatively. An MS may use one, two, or more stages of mass selection. In the case of multistage selection, some means of fragmenting the molecules is typically used between stages, so that later stages resolve fragments of molecules selected in earlier stages. Use of multiple stages typically affords improved overall specificity compared to a single stage device. Often, quantitation of molecules is performed in a triple-quadrupole mass spectrometer using the method referred to as ‘Multiple Reaction Targeting’ or “MRM mass spectrometry” in which measured molecules are selected first by their intact mass and secondly, after fragmentation, by the mass of a specific expected molecular fragment. However, it will be understood herein that a variety of different MS configurations may be used to analyze the molecules described. Possible configurations include, but are not limited to, MALDI instruments including MALDI-TOF, MALDI-TOF/TOF, and MALDLTQMS, and electrospray instruments including ESI-TQMS and ESLQTOF, in which TOF means time of flight, TQMS means triple quadrupole MS, and QTOF means quadrupole TOF. The terms “molecular tag”, “molecular flag”, or “molecular feature” mean a structural component of a molecular construct that can be detected by a single molecule detector and assigned a significance in the interpretation of counted molecules (e.g., distinction between TARGET and STANDARD tags, barcodes identifying samples, barcodes identifying BINDERS, etc.)
The terms “particle” or “bead” mean any kind of particle in the size range between lOnm and 1cm, and includes magnetic particles and beads.
The term “peptide library preparation” means a method used to convert the proteins in a biological sample into a collection of peptides modified so as to be detectable, identifiable and countable by a sequence sensitive single molecule detector.
The term “proteolytic treatment” or “proteolytic enzyme” may refer to any of a large number of different enzymes, including trypsin, chymotrypsin, LysC, ArgC, AspN, GluC, v8 and the like, as well as chemicals, such as cyanogen bromide, that, in the context of the methods described herein, acts to cleave peptide bonds in a protein or peptide in a sequence-specific manner, generating a collection of shorter peptides (a digest).
The term “proteotypic peptide” means a peptide whose sequence is unique to a specific protein in an organism, and therefore may be used as a stoichiometric surrogate for the protein, or at least for one or more forms of the protein in the case of a protein with splice variants.
The term “sample” means any complex biologically-generated sample derived from humans, other animals, plants or microorganisms, or any combinations of these sources. “Complex digest” means a proteolytic digest of any of these samples resulting from use of a proteolytic treatment.
The term “SAMPLE ID” means a molecular barcode identifying the sample from which a molecule was obtained, i.e., its sample of origin. A sample barcode present in a construct identifies the sample of origin and allows this identity to be recovered after constructs from multiple samples have been pooled and analyzed together in a single molecule detector.
The terms “ratchet mechanism”, “protein nanomachine” or “molecular motor” mean a molecular-scale device capable of pulling, pushing, unzipping (in the case of complementary strands of nucleic acids), or otherwise regulating the motion of linear molecules in discrete steps.
The term “sequence-sensitive single molecule detection” or “SSSMD” means detection and counting of individual molecules using a method capable of differentiating between different linear biopolymer sequences occurring in the molecules. A “sequence-sensitive single molecule peptide detector” means a detector, instrument, technology, chemistry, or multi-component system that is able to achieve sequence-sensitive single molecule detection of peptides. Such a detector need not achieve 100% accuracy to accomplish the objectives of the invention, since the number of different peptide sequences that must be distinguished from one another and counted in the invention is a small number (e.g., 1, 1-5, 1-10, 5-20, 10-50, 25- 100, 50-200, or more peptides) compared to number of peptides present in a digest of a complex biological sample (typically hundreds of thousands of peptides in the digest of a sample such as blood plasma). The term includes nanopore-based sequencing of nucleic acids, proteins and peptides; fluorescence-based methods such as fluorosequencing (36-38) including Edman methods; “reverse-translation” of peptide sequencing into DNA sequences followed by DNA sequencing (the “Proteocode” technology developed by Encodia: https://www.encodia.com/technology); “FRET” fingerprinting of peptides (36, 51) single molecule imaging methods (31) and other related methods.
The terms “sequencing nanopore” and “nanopore” as used herein refer to ion- conductive pores capable of functioning in an ion-impermeable membrane or vessel wall, and through which linear polymers can pass. Typical nanopores are of biological origin (e.g., MspA), comprising one or more protein molecules, or created by engineering (e.g., versions of biological nanopores modified by mutation, rearrangement or combination of proteins; very small holes etched or drilled in thin metallic or ceramic substrates; or DNA assemblies). A recording of the current flowing through a nanopore over time is referred to as a “trace” or “squiggle”.
The term “sequential degradation” refers to a process in which amino acid residues are removed, in sequence order, from one terminus of a peptide. In the context of the invention, sequential degradation can be employed in a process in which a peptide’s terminal amino acid is “recognized” (e.g., by binding of one of a series of affinity regents specific for the various amino acids presented at the terminus) and its identity determined or recorded for later evaluation, after which the terminal amino acid can be cleaved off (e.g., using enzymes such as exoproteases, classical Edman chemistry, or other chemistries capable of removing a terminal amino acid) and the process repeated to determine a sequence of amino acids from the peptides’ terminus. Similarly, a process can employ recognition reagents that report information on two or more terminal amino acids at a time, and a cleavage process can be employed that removes two or more terminal amino acids per cycle. The process need not sequence all amino acids in a peptide to generate TARGET peptide or STANDARD identifications and single molecule counts that are useful in the invention.
The term “SISCAP A” means the method described in US Patent No. 7,632,686, and in Mass Spectrometric Quantitation of Peptides and Proteins Using Stable Isotope Standards and Capture by Anti-Peptide Antibodies (SISCAP A) (Journal of Proteome Research 3: 235-44 (2004).)
The term “small molecule” or “metabolite” means a multi-atom molecule other than proteins, peptides and DNA; the term can include but is not limited to amino acids, steroid and other small hormones, metabolic intermediate compounds, drugs, drug metabolites, toxicants and their metabolites, and fragments of larger biomolecules.
The term “stable isotope” means an isotope of an element naturally occurring or capable of substitution in proteins or peptides that is stable (does not decay by radioactive mechanisms) over a period of a day or more. The primary examples of interest in the context of the methods described herein are C, N, H, and O, of which the most commonly used are 13C and 15N.
The term “solubilized tissue sample” means a liquid sample generated from a sample of a solid biological tissue (e.g., liver, brain, skin, etc.) by a method that results in a solution containing tissue molecules. Depending on the method of solubilization, a solubilized tissue sample may contain one, a few, many or almost all tissue molecules in solution. Tissue solubilization can be achieved by a variety of methods including grinding, pulverization, ultrasonication, homogenization, and similar mechanical methods, as well as exposure to liquid solutions including detergents, solvents, protease inhibitors, salts, buffers, and the like. The terms “STANDARD”, “internal standard peptide”, “internal standard”, “labeled TARGET”, or “labeled TARGET peptide” may be any altered version of the respective TARGET fragment or TARGET peptide that is 1) bound by the appropriate BINDER with an affinity and kinetics very similar to that with which the cognate TARGET fragment or TARGET peptide is bound, and 2) differs from it in a manner that can be distinguished from the cognate TARGET peptide by a sequence-sensitive single molecule peptide detector (e.g., by means of some sequence difference, amino acid modification, inclusion of a non-natural chemical group), or a mass spectrometer (either through direct measurement of molecular mass or through mass measurement of fragments, e.g., through MS/MS analysis), or by another equivalent means. In the case of a nanopore detector, for example, a suitable TARGET peptide and its STANDARD would produce distinguishable ion current signatures while passing through the nanopore.
The term “STANDARD tag” or “STANDARD flag” means a molecular tag or feature within or attached to a STANDARD peptide enabling a single molecule detector to distinguish the STANDARD tag from a TARGET tag. The STANDARD tag may be the absence of any TARGET tag. A STANDARD tag may consist of the absence of a feature present in a cognate TARGET tag, the presence in the STANDARD tag of a feature absent in the cognate TARGET tag, or the presence of different features in the STANDARD tag and TARGET tag. Multiple different STANDARD tags may be used, provided that the STANDARD tags are distinguishable from any TARGET tags.
The term “standardized sample digest” or “standardized sample” means a protein or peptide sample to which one or more STANDARD version(s) of one or more TARGET peptide or protein analytes have been added in an amount that is a) known (in terms of concentration, mass, moles or other physical units) or b) consistent between samples (allowing quantitative comparison of TARGET peptide amounts between samples even if the absolute amount of the STANDARD added is not known). Once the sample is standardized with respect to a given TARGET peptide, then the ratio between TARGET peptide and STANDARD represents and preserves information concerning the amount of the TARGET peptide in the sample, allowing this information to be recovered by later quantitative analysis even if a variable amount of the TARGET peptide and STANDARD pair is recovered during a suitable enrichment process (i.e., a process that does not distinguish between the TARGET peptide and STANDARD peptides) prior to analysis.
The term “stoichiometric” refers to relationships between quantities of different molecules. In some chemical contexts, the word stoichiometry refers to presence of different elements or compounds in simple integral ratios, as prescribed by an equation or formula. Thus a TARGET peptide sequence that occurs once in the sequence of a parent protein target has a 1 : 1 stoichiometric relationship with the target, and can therefore be used as a quantitative surrogate to measure amounts of the protein. In the more general sense used in this disclosure, stoichiometry means a ratio relationship between molecules (or elements) that may have any numerical value, including non-integer values. In biological samples, two different proteins in blood or in a cell can have a relative stoichiometry extending over a very broad range, in principle from one molecule (the lower limit) of a low abundance protein to hundreds of billions of molecules (or more) for a high abundance protein in the same sample.
The terms “stoichiometric flattening”, “normalization”, “equalization” or “differential enrichment” refer to processes by which different molecules (e.g., peptides) that are present in a sample (e.g., a biological sample digest) at different concentrations (or in different amounts in mass or molar terms) are brought closer to equal concentrations or amounts. An example of such a process is an affinity enrichment method in which a larger relative fraction of a low abundance molecule is captured while a smaller relative fraction of a higher abundance molecule is captured (e.g., by adjusting the amounts of the corresponding affinity reagents, such as antibodies, used to accomplish this capture), the captured molecules being then separated from the sample and released from capture, resulting in a more nearly equivalent amount of the molecules in the processed sample. In order to preserve information as to the relative abundances of the molecules in the original sample, internal standard versions of the molecules (e.g., STANDARD versions of TARGET peptides) are added before this enrichment step, and both TARGET peptide and STANDARD measured in the resulting enriched sample. Given an estimate of the amount of STANDARD added to the sample, the ratio of TARGET peptide to STANDARD can be used to calculate the TARGET peptide abundance in the original sample. The terms “Subject” or “Patient” means a biological individual such as an individual human being or an animal.
The terms “TARGET” or “TARGET peptide” means a peptide chosen as a TARGET fragment of a protein or peptide. The TARGET may be any piece of a protein or peptide which can be produced by a reproducible fragmentation process (e.g., digestion using a proteolytic enzyme, or without a fragmentation if the TARGET fragment is the whole analyte) and whose abundance or concentration can be used as a surrogate for the abundance or concentration of the analyte.
The term “TARGET tag” or “TARGET flag” means a molecular tag or feature within or attached to a TARGET peptide enabling a single molecule detector to distinguish the TARGET tag from a STANDARD tag. The TARGET tag may be the absence of any STANDARD tag. A TARGET tag may consist of the absence of a feature present in a cognate STANDARD tag, the presence in the TARGET tag of a feature absent in the cognate STANDARD tag, or the presence of different features in the TARGET tag and STANDARD tag. Multiple different TARGET tags may be used, provided that the TARGET tags are distinguishable from any STANDARD tags.
The term “tag” is used herein as equivalent to Barcode, and may be any type of distinguishing molecular feature including, but not limited to, a polymer of dissimilar subunits encoding an identification relevant to sample analysis.
The term “T/S tag” or “T/S flag” means either a TARGET or a STANDARD tag, or a set of mixed TARGET and STANDARD tags, as will generally be present in a standardized sample.
The term “VEHICLE” means a molecule (for example a polymer such as an oligonucleotide or a polyethylene glycol, a linker comprising chemically reactive sites such as NHS or click chemistry groups, or a macromolecular carrier such as a bead or a “SNAP” particle), to which TARGET and STANDARD peptides (together with their associated distinguishing tags) can be linked in order to facilitate single molecule detection. A VEHICLE can include barcodes identifying a sample of origin, barcodes identifying a BINDER used to enrich specific cognate TARGET and STANDARD peptides. VEHICLES can also include one or more additional molecular structures that facilitate the transport of TARGET and STANDARD peptides to a single molecule detector, their presentation to such a detector, their transport through a detector (such as a nanopore), or their immobilization to a site or in a region observed by such a detector.
The use of the singular herein in any instance (e.g., “a” construct or “a” peptide), unless otherwise indicated, is intended to mean one or more and is not intended be limited to only one.
5 THE PRESENT INVENTION
The inventions herein provide improved quantitative measurement of proteins, and peptides derived from them, through improvements to previous methods including the replacement of mass spectrometric detection by other detection techniques capable of identifying and counting individual molecules.
In the descriptions that follow, quantitation of proteins, peptides and other biomolecules is addressed in a general sense, and hence the invention disclosed is in no way limited to the analysis of blood, plasma and other body fluids.
5.1 BRIEF DESCRIPTION OF THE DRAWINGS :
Figure 1 : Examples of abundances, TARGET peptide and STANDARDS for proteins in human plasma
Figure 2: Adding Amino Acids to a TARGET to Create a STANDARD
Figure 3: Rope-Tow Constructs with STANDARD and TARGET Sequence Tags
Figure 4: Two-Step Rope-Tow Constructs with Double Tags
Figure 5: TARGET/STANDARD Barcoding Followed by Enrichment
Figure 6: Two-step Digestion for Differential Modification of Peptide Ends
Figure 7: Rope-Tow Constructs with Sequencing Adapters: Ligation of rope-tow constructs with nanopore sequencing adapters
Figure 8: Multi -Epitope Binders Figure 9: DNA Sample Barcodes (16-bit example)
Figure 10: Decoding Peptide-A and Sample Barcodes
Figure 11 : Identifying Peptides Using Epitopes
Figure 12: Assembly of a Peptide:Oligo Construct for Nanopore Detection
Figure 13: Details of a Peptide:Oligo Construct for Nanopore Detection
Figure 14: Scheme for double ligation of tryptic peptides with c-terminal lysine (i.e., two amino groups) using “Click” chemistry
Figure 15: Click Ligation of Tryptic (Lys) Peptides to Motor Assemblies
Figure 16: Peptide Loop Insertion with Enzymatic Cut: Insertion of TARGET Peptide Into Oligo VEHICLE as a Loop, Followed by Sequence-Specific Enzymatic Oligo Cleavage
Figure 17: Peptide Loop Insertion with Chemical or UV Cleavage
Figure 18: Peptide Loop Preparation of TARGET and STANDARD
Figure 19: Parallel TARGET and STANDARD Constructs
Figure 20: Specific Affinity Capture of TARGET and STANDARD
Figure 21 : Peptide Loop: Effect of Failure to Insert Fully
Figure 22: Nanopore sequencing of constructs
Figure 23 : Rope-tow Constructs
Figure 24: Concatenation of constructs by hybridization and ligation
Figure 25: Constructs Prepared Using Bi-functional Supports
Figure 26: Example Rope-Tow Construct: Detailed example structure of a rope-tow peptide: oligo construct
Figure 27: Concatenation of Tryptic (Lys) Peptides Using Click Chemistry
Figure 28: Concatenated constructs
Figure 29: Rope-Tow Ligation with Splints
Figure 30: Ligation of Double-Tag Rope-Tow Constructs Figure 31 : Analysis of Detection Events in Affinity Imaging Detection
Figure 32: Identifying Peptides: Before and After Cleavage
Figure 33: Stoichiometric Flattening
Figure 34: Equalization: Stoichiometric Flattening
Figure 35: Multiplex Test Panel for SARS-CoV-2: A multiplex combination of molecules detectable using nanopore sequencing, including a) peptides from the SARS-CoV- 2 NCAP protein; b) SARS-CoV-2 Spike and NCAP protein linear epitopes whose binding by patient antibodies indicates vaccination or exposure to the virus; c) proteotypic peptides of three proteins used as plasma biomarkers of inflammation; and d) the RNA genome of SARS-CoV-2.
Figure 36: Design and Nanopore Analysis of Loop-Insertion Constructs
Figure 37: Nanopore Traces of Loop-Insertion Constructs
Figure 38: Preparation of Constructs for Reverse Translation Detection
Figure 39: Assembly of a Peptide:Oligo Construct for Reverse Translation Detection
6 DETAILED DESCRIPTION OF THE INVENTION:
6.1 SUMMARY OF THE INVENTION.
1. A molecular construct and vehicle comprising:
(a) a molecular construct comprising a peptide comprising a target peptide sequence derived from proteolytic cleavage of a target protein and a molecular tag defining the source of said peptide, and
(b) a vehicle capable of presenting the construct for analysis by a sequence-sensitive single molecule detector.
2. The molecular construct and vehicle of paragraph 1, wherein the molecular tag is a target tag that identifies the peptide as a peptide created by proteolytic digestion of a biological sample. The molecular construct and vehicle of paragraph 1, wherein the peptide comprises a synthetic peptide and the molecular tag is a standard tag that identifies the synthetic peptide as an internal standard. The molecular construct and vehicle of paragraph 1, wherein the vehicle is capable of binding said construct either to a support or to a soluble adapter capable of presenting the construct for analysis by a sequence-sensitive single molecule detector. The molecular construct and vehicle of paragraph 2, wherein more than 90% percent of the target molecules present in said sample digest are linked to target tags. The molecular construct and vehicle of paragraph 2, further comprising a SAMPLE barcode identifying the sample of origin. The molecular construct and vehicle of paragraph 1 further comprising a BINDER barcode identifying a binder to which the construct has been bound. The molecular construct and vehicle of any of the preceding paragraphs wherein the barcode or the tag is an oligonucleotide. The molecular construct and vehicle of any of the preceding paragraphs wherein the sequence-sensitive single molecule detector comprises a nanopore, a single molecule imaging system, or a single molecule degradative peptide sequencer. A plurality of reagents, comprising: the molecular construct and vehicle of paragraph 3, a tag reagent capable of reacting with a target peptide in a proteolytic digest of a biological sample to create the molecular construct of paragraph 2, and a binder that binds to said molecular constructs of paragraph 2 and paragraph 3 with similar affinity and kinetics. The plurality of reagents of paragraph 10, wherein the binder contacting a standardized sample digest comprising the molecular construct and vehicle of paragraph 2 and paragraph 3 binds the molecular construct of paragraph 2 and paragraph 3 in a ratio equal within 2%, 5%, 10% or 20% to the ratio in which they are present in said standardized sample digest. The plurality of reagents of paragraph 10, further comprising: one or more reagents capable of proteolytic fragmentation of sample proteins, and one or more solid supports for binders, including magnetic beads, non-magnetic beads, porous supports usable in packed columns, or chemical reagents capable of introducing reactive groups into peptides. A calibrator sample for peptide quantitation by a sequence-sensitive single molecule detector, comprising an amount of the molecular construct and vehicle of paragraph 2 and paragraph 3 in a known ratio. The calibrator sample of paragraph 13, wherein at least one of the constructs is present in known amount or concentration. A standardized sample digest derived from a proteolytic digest of a biological sample, comprising: an amount of a molecular construct comprising a target tag and a target peptide, said construct being a target peptide construct and an amount of a molecular construct comprising a standard tag and a peptide whose sequence is the same or similar to the sequence of said target peptide, said construct being a standard peptide construct, wherein the target peptide is generated by proteolytic digestion of a target protein in said biological sample, wherein said target and standard tags can be distinguished by a single molecule detector and comprise chemical or structural groups covalently joined to peptides in their respective constructs, wherein said target tag is covalently attached to a plurality of the peptides present in said sample digest, wherein said target peptide construct comprises more than 90% of the target peptide molecules present in said sample digest and wherein said standard peptide construct is prepared separately and added to said digest in a known amount, or in a consistent relative amount across a multiplicity of samples. The standardized sample digest of paragraph 7, wherein the number of molecules of the standard peptide construct added to the sample digest differs by no more than a factor of 100 from the number of molecules of the target peptide construct in said sample digest. The standardized sample digest of any one of paragraphs 15-16, further comprising one or more additional standard peptide constructs having a different standard tag from each other and with each construct at a different relative abundance. The standardized sample digest of any one of paragraphs 15-17, wherein the target tag is covalently attached to a majority of the peptides generated by proteolytic digestion of said sample. The standardized sample digest of any one of paragraphs 15-18, wherein said tags are oligonucleotides, An enriched standardized sample digest, comprising a bound fraction of the standardized sample digest of any one of paragraphs 15-16 bound by a binder, wherein said bound fraction comprises a target peptide construct and a standard peptide construct in a ratio equal within 2%, 5%, 10% or 20% to the ratio in which they are present in said standardized sample digest. A stoichiometrically-flattened standardized sample, comprising a plurality of pairs of cognate standard and target peptide constructs enriched from a standardized proteolytic digest of a biological sample by binding to their respective cognate binders, wherein a pre-enrichment ratio calculated by dividing the number of molecules of a first target peptide construct that is the most numerous of said target peptide constructs in the standardized sample digest by the number of molecules of a second target peptide construct that is the least numerous of said target peptide constructs in the standardized sample digest is more than 10 times larger than a post-enrichment ratio calculated by dividing the number of molecules of said first target peptide construct by the number of molecules of said second target peptide construct in said enriched sample. A method for the measuring the amount of a selected target protein in a biological sample, comprising: proteolytically digesting said sample, modifying a plurality of peptides in the digested sample by adding a target tag to form a plurality of constructs comprising a selected target peptide derived from, and proteotypic of, said target protein, said plurality of constructs being target construct molecules, adding an amount that is known and/or consistent between a set of samples of a prepared standard peptide construct that is a cognate of said selected target peptide construct and comprises a standard tag, forming a standardized digest, enriching said cognate target and standard peptide constructs by contacting said standardized digest with a cognate binder, forming bound constructs, separating said bound constructs from unbound constructs to form enriched constructs, releasing said enriched constructs from said binder, linking said enriched constructs to a vehicle capable of presenting said enriched constructs to a sequence-sensitive single molecule detector, counting said enriched target construct molecules and said enriched standard construct molecules using a sequence-sensitive single molecule detector capable of distinguishing said target and standard tags and identifying said peptides, calculating the amount of said protein in said sample. The method of claim 22, wherein the calculating is performed by multiplying the amount of standard construct added by the ratio of the number of target construct molecules counted to the number of standard construct molecules counted by said detector. the method of claim 22 or 23, wherein, independently or in any combination:
• said binder is attached to a support,
• said linking of said vehicle to said constructs occurs while said constructs are bound to said binder,
• said presenting of enriched constructs to said detector occurs while said constructs are bound to said binder,
• peptides are bound to the binder, the binder washed, and the peptides eluted in two or more successive cycles of enrichment,
• said proteolytic digestion comprises at least two sequential steps resulting in peptide cleavage at different sites, and wherein peptides are covalently modified between two such steps (or wherein said first sequential step cleaves at lysine residues),
• said tags are added to said peptides by reaction with peptide amino groups,
• said tags are added to said peptides by chemical reaction at a single site in an n- terminal amino acid,
• said tags are added to said peptides by chemical reaction at a c-terminal lysine residue,
• said constructs comprise a non-peptidic component attached to an n-terminal amino acid and a different non-peptidic component attached to a c-terminal lysine residue,
• said proteolytic digestion comprises at least two sequential steps resulting in peptide cleavage at different sites, and wherein peptides retain an unmodified n- terminal amino group when presented to said detector,
• a sample barcode is linked to said constructs encoding the identity, or relative position within a sample set, of said standardized samples; a plurality of said standardized samples is pooled; said sample barcodes associated with construct molecules are read using a sequence-sensitive single molecule detector; and the counts of target and standard construct molecules for each sample are separated based on said sample ID barcode identifying the sample from which they were enriched, and wherein said barcode may be an oligonucleotide,
• a binder barcode is linked to said constructs identifying the binder by which they were enriched, and wherein said barcode may be an oligonucleotide,
• said construct molecules are joined together into concatamers prior to presentation to said detector,
• said detector determines all or part of the amino acid sequence of the peptide components of said construct molecules by a stepwise degradative process (or wherein the sequence of said target peptide is encoded in a nucleic acid component linked to target tags, standard tags, sample tags, binder tags and vehicles, and is read, and counted, by conventional DNA sequencing),
• said detector recognizes and decodes the target and standard constructs comprising target peptides, tags and optional additional barcodes using time- dependent variations in an electrical or optical parameter measured while the construct molecules transit a nanopore (or wherein said nanopore is biological, e.g., the common protein nanopores occurring in nature or derivatives thereof, or nucleic acid constructs, or a hole in a solid state inorganic material, e.g., Si3N4, SiO2, graphene, or MoS2) (or wherein said target and standard constructs comprise non-peptide polymers including nucleic acids that engage with a molecular motor, e.g., a polymerase or helicase to regulate the speed at which the constructs move through a nanopore (or wherein the nanopore detection is continued, and construct counts accumulated, until reaching pre-determined threshold numbers of counts, which may be based on counts required for each peptide sequence, e.g., to provide a pre-determined precision according to counting statistics, or counts required to achieve a pre-determined precision in a ratio, e.g., target/standard counts),
• said constructs are located on a support and detected by sequential binding of a plurality of binders comprising a detectable label, wherein independently or in any combination: o peptides in constructs are identified by using cognate binders labeled with optically detectable moieties including fluorescent dyes or proteins o multiple binders recognize distinct epitopes within a target peptide o kinetic binding analysis of binder-construct interactions is used to improve the specificity of detection o detection of binding or lack of binding by binders is interspersed with sequential removal of n-terminal amino acids or peptide segments o one or more of the said target tags, standard tags, sample tags, binder tags, or vehicles is an oligonucleotide and is detected by hybridization of an optically-labeled complementary oligonucleotide. A method for the measuring amounts of a plurality of selected target proteins in a biological sample, comprising independently or in any combination: proteolytically digesting proteins in said sample to yield peptides, modifying said peptides by covalent chemical addition of a target tag to form a plurality of constructs, including target construct molecules comprising selected proteotypic target peptides derived from said target proteins, adding prepared standard constructs that are cognates of said target constructs and comprise a standard tag in amounts that are known and/or consistent between a set of samples, forming a standardized digest, enriching said cognate target and standard construct pairs by contacting said standardized digest with cognate binders, forming bound constructs, separating said bound constructs from unbound constructs to form enriched constructs, releasing said enriched constructs from said binder, linking said enriched constructs to a vehicle capable of presenting said enriched constructs to a sequence-sensitive single molecule detector, recognizing and counting said enriched target construct molecules and said enriched standard construct molecules using a sequence-sensitive single molecule detector capable of distinguishing said target and standard tags and identifying peptides, calculating the amount of said proteins in said sample by multiplying the amount of each standard construct added by the ratio of the number of cognate target construct molecules counted to the number of cognate standard construct molecules counted by said detector, wherein a pre-enrichment ratio calculated by dividing the number of molecules of a first target peptide construct that is the most numerous of said target peptide constructs in the standardized sample digest by the number of molecules of a second target peptide construct that is the least numerous of said target peptide constructs in the standardized sample digest is more than 10 times larger than a post-enrichment ratio calculated by dividing the number of molecules of said first target peptide construct by the number of molecules of said second target peptide construct in said enriched sample.
The inventions herein provide improved quantitative measurements of the amounts of proteins, in a way that is highly specific, extremely sensitive, multiplexable with wide dynamic range, capable of very high throughput with low cost per measurement, and amenable to implementation on compact, inexpensive equipment.
The invention combines a series of known and new processes in a novel combination, and provides novel advantages over existing protein measurement methods. In its most basic aspect, the invention comprises proteolytic fragmentation of target protein(s), addition of internal standard versions of one or more peptides in known amount (these standard peptides being detectably different from the sample digest peptides based on incorporation of molecular tags into either sample digest peptides, added standard peptides, or both), enrichment of selected sample peptides (TARGETs) and cognate internal standards (STANDARDS) by specific affinity selection on BINDERS, and single molecule identification and counting of the resulting enriched peptides. By counting individual peptide molecules, the invention provides the maximum sensitivity attainable by direct analyte detection (i.e., detection without amplification). The following general features, and others described herein, are included in the invention:
Proteolytic digestion of proteins in a sample to yield a peptide digest.
Selection of TARGET peptides from among the candidate peptides produced by digestion of a target protein, based on theoretical (e.g., in silico) and/or experimentally determined features and performance as a quantitative surrogate of the protein analyte.
Design of internal standard (STANDARD) versions of TARGET peptides (which may be identical sequences).
Coupling TARGET molecular tags to TARGET peptides to form TARGET constructs, coupling STANDARD molecular tags to STANDARD peptides to form STANDARD constructs, or both.
Addition of known or reproducible amounts of STANDARD constructs to a digest prior to enrichment, creating a standardized sample digest. Specific enrichment of TARGET and STANDARD peptide constructs, removal from digest matrix, washing, chemical modification as needed, and elution, creating an enriched standardized sample digest. This specific enrichment step may be carried out with amounts of BINDERS adjusted to achieve some degree of stoichiometric flattening among a series of TARGET peptides, creating an enriched and flattened standardized sample digest.
Optional addition of BINDER-identifying barcodes to TARGET and STANDARD peptide constructs in a sample digest.
Optional addition of sample-identifying barcodes to TARGET and STANDARD peptide constructs in a sample digest when samples are to be pooled prior to single molecule detection. Sample barcode addition can be carried out either before or after specific enrichment by BINDERS.
Presentation of enriched TARGET and STANDARD peptide constructs to a sequence- sensitive single molecule detector in an appropriate chemical form or structure, typically by linkage to a VEHICLE.
Detection and counting of peptide TARGET and STANDARD molecules using the sequence- sensitive single molecule detector.
Estimation of the amount(s) of TARGET peptide(s) (and thus parent proteins) in a sample based on the ratios between respective cognate TARGET and STANDARD peptide counts, with pooled samples decoded when necessary using sample barcodes.
Several technologies among those known in the art can be used for sequence-specific single molecular detection and counting of biological macromolecules. Some of these have been developed for peptide sequencing, while others have been developed primarily for sequencing nucleic acids, but show potential for application to peptides.
The present disclosure provides methods for preparation and analysis of protein- containing biological samples compatible with any of the sequence-sensitive single molecule detection methods.
In some embodiments the invention provides a means to measure the amount of a peptide molecule (termed a "TARGET” peptide), typically a proteolytic fragment of a sample protein resulting from proteolytic digestion of a biological sample. In some embodiments, sample proteins are proteolytically digested and standardized by addition of an internal standard peptide or peptide construct (STANDARD) to create a standardized sample, from which the TARGET peptide and STANDARD are enriched and individually counted using sequence-sensitive single molecule detection (e.g., nanopore sequencing). The resulting counts of TARGET peptide and STANDARD molecules allow estimation of the absolute or relative amount or concentration of the target protein in the sample, given knowledge of the amount of STANDARD present in the digest.
6.2 COMPONENTS USED IN THE INVENTION
Four molecular components used in some embodiments of the invention are described below: a) one or more TARGET peptides (the analytes to be measured); b) internal standard (STANDARD) versions of TARGET peptides or peptide constructs used as internal standards; c) specific affinity reagents (BINDERS) for capture and enrichment of TARGET peptides and STANDARD prior to detection; and d) barcodes used to distinguish TARGET and STANDARD constructs and/or to distinguish peptides on constructs derived from different samples. In general, these components are selected, prepared or optimized in ways surprisingly distinct from earlier work in which mass spectrometry has been used for peptide detection.
6.3 TARGET PEPTIDE.
Using the known sequence of a target protein, one or more peptide segments within it are selected as “TARGET” peptides. Selection can be accomplished using an “in silico” approach (e.g., by “digesting” the sequence of a target protein, known for example from the genome sequence of the relevant species, using a computer to cut the sequence at sites predicted based on the known cleavage specificities of a selected protease or chemical fragmentation method), an experimental approach (e.g., from a list of peptide fragments actually observed in a digest of the protein or a sample containing it), or both. A preferred TARGET peptide can be defined in some embodiments by criteria selected from a set intended to identify peptides that optimize the performance of the assay. In some embodiments it is preferred that a TARGET peptide has one or more characteristics that improve its performance in an assay according to the invention, including, but not limited to, peptides that are or have:
A length of about 4 to 50 amino acids, or more preferably 6 to 24 amino acids; Efficient digestion: Produced rapidly and in high (ideally >90%) yield by digestion of the target protein through an efficient, inexpensive proteolytic treatment with an enzyme (e.g., a protease such as trypsin) or a high-yield chemical treatment (e.g., CNBr cleavage);
Proteotypic sequence: A sequence that is unique to the target protein (unless measurement of a family of proteins is an object of the assay), i.e., that it is "proteotypic" for the protein and appears in no other proteins likely to be found in the intended sample (or more preferably, that it occurs in no other natural protein coded for by the genome of the species of interest), and that it occurs in the target protein sequence in a known number of locations (typically one location, but potentially more than one if the peptide sequence is repeated in the protein);
Few variants: A sequence that is relatively consistent across the population of target protein molecules occurring in a sample, i.e., occurs with relatively few sequence variants (unless measurement of such sequence variants is an object of the assay).
Few post-translational modifications: Peptides with relatively few post-translational modifications (unless measurement of such modifications is an object of the assay)
Favorable epitopes: A sequence containing structural features (e.g., “immunogenic” epitopes in the case of antibody affinity reagents) that facilitate development of specific affinity reagents capable of binding the peptide with high affinity, and specifically a slow off- rate (e.g., monoclonal antibodies, aptamers, etc.).
Solubility: Favorable physico-chemical properties, including solubility in aqueous solutions, little or no binding to materials used in sample preparation and analysis vessels and devices (e.g., nanopores), and little or no tendency to aggregate.
Stability: Low rate of spontaneous chemical degradation (e.g., by methionine or tryptophan oxidation, asparagine or glutamine de-amidation, etc.).
Recognizable sequence: A sequence that has features making it easily distinguishable from other sequences using the chosen sequence-sensitive single molecule detection means. If, for example, the detection means is nanopore sequencing, then peptide sequences that produce distinctive current traces as molecules pass through a nanopore, thus allowing them to be distinguished from other peptides, are preferred. Typical nanopores produce current signals that are reflective of a stretch of 3-6 contiguous amino acids (a “kmer”) inside the pore. Amino acids have a multiplicity of different side-chain volumes, and while these volumes do not always directly determine the nanopore “blockade currents”, sequences with more variable patterns of side chain volume are preferred. If the detection means involves recognition and recording of a terminal amino acid and its removal in a cyclical process to expose a new terminus (e.g., using degradative sequencing technology), then sequences that include amino acids for which the terminal recognition is most accurate, and/or least confusing, are likely to be preferred. If the detection means involves binding of recognition molecules to peptide epitopes, then peptides with multiple distinct epitopes are preferred.
Amino acid constraints: Presence or absence of specific amino acids that impact assay performance. In some embodiments cysteine is avoided due to its potential to form bridges between peptides, or alternatively a step is included in the sample preparation to block cysteines (e.g., via alkylation by iodoacetamide). In some embodiments, one or more cysteine residues present in a peptide are used as reactive sites for introduction of linkages to other molecules, or labels that can assist in peptide recognition by a sequence-sensitive single molecule detector.
Amino groups: Specific numbers and sites of amino groups (lysine side chain and n-terminal). As the most preferred sites for chemical linkage to other molecules, the number and position of amino groups is an important factor in the design of some constructs required for efficient presentation of peptides to a sequence-sensitive single molecule detector. For example, a tryptic peptide with c-terminal arginine and no internal lysine residues has a single amino group located at its n-terminus (i.e., it is “single amino”), and therefore has a unique site at which certain amino-reactive chemistries can establish a covalent linkage between the peptide and another molecule. Alternatively, a tryptic peptide having a c-terminal lysine and no internal lysines has two amino groups: one at the n-terminus and one at the epsilon amino group of the lysine side chain, thus facilitating methods in which a peptide is linked into a longer polymer by coupling at both ends. Carboxyl groups: Specific numbers and sites of carboxyl groups (glutamic and aspartic acid side chains and at the c-terminus). As an alternative preferred site for chemical linkage to other molecules, the number and position of carboxyl groups is an important factor in the design of some constructs required for efficient presentation of peptides to a sequence-sensitive single molecule detector.
Electric charges: Specific numbers of charged amino acids (e.g., lysine, arginine, histidine, and the n-terminus with positive charges; and glutamic and aspartic acids and the c- terminus with negative charges) and the sum of these (i.e., the net charge of the peptide at the working pH of a sequence-sensitive single molecule detector). The total charge of a peptide can significantly affect its movement through a nanopore under the influence of an electric potential between cis and trans compartments, between which the nanopore serves as a conduit: in some embodiments a net negative peptide charge is preferred, such that the peptide is pulled through the pore from cis to trans (i.e., in the same direction a negatively charged oligonucleotide would be). In some embodiments a net positive peptide charge is preferred, such that a peptide is dragged through the pore by another molecule to which it is attached (and which has a net negative charge). In some such embodiments, it is preferred that positive charge(s) be localized towards the end of the peptide that is last to enter the pore (i.e., the trailing end of the peptide) so as to help maintain the peptide in an extended, linearized form as it passes into and through a nanopore.
These characteristics vary considerably, and in many cases independently, across the population of potentially selectable TARGET peptides to which the invention may be applied, as well as between single molecule detection technologies, and as a result the selection of optimal TARGET peptide sequences appropriate for some embodiments involves a complex weighting of these and other characteristics.
6.3.1 Distinction from selection for mass spectrometry
It is useful to note that in many embodiments, commonly-considered features of proteolytic peptides in conventional peptide quantitation methods are NOT included among peptide characteristics considered important or limiting in the present invention: these include i) performance of a peptide in chromatographic separations (e.g., favorable elution behavior in reversed-phase liquid chromatography, elution time separated from other peptides used in panel combinations, etc.); ii) ionization efficiency in a mass spectrometer source (including the preference for a c-terminal positive charge); iii) fragmentation behavior in a mass spectrometer (important for example when using triple quadrupole “MRM” quantitation methods), and iv) peptide size (i.e., as a limitation due to the preferred mass range of a mass spectrometer). In view of the inapplicability of these criteria, some embodiments of the present invention will make use of TARGET peptides different from those preferred in mass spectrometry-based detection systems (including peptides that would be unusable in MS detection).
6.3.2 Peptides with 1 or 2 amino groups
In some embodiments TARGET peptides are generated through cleavage of proteins by trypsin (an inexpensive and well-understood protease that cuts polypeptide chains preferentially c-terminal to lysine and arginine residues). Tryptic TARGET peptides can be selected to contain either 1 (“Single amino”) or 2 (“Double amino”) amino groups. Numerous alternative proteases (e.g., Lys-C, Arg-C, pepsin, papain, chymotrypsin, etc.) and chemical cleavage reactions (cyanogen bromide (CNBr) cleaving at methionine (Met) residues; BNPS- skatole cleaving at tryptophan (Trp) residues; formic acid cleaving at aspartic acid-proline (Asp-Pro) peptide bonds; hydroxylamine cleaving at asparagine-glycine (Asn-Gly) peptide bonds, and 2-nitro-5-thiocyanobenzoic acid (NTCB) cleaving at cysteine (Cys) residues) can also generate peptides with characteristics allowing their use as TARGET peptides.
In some embodiments, peptides with a single amino group are preferred because this provides a unique and chemically convenient site that can be covalently coupled to other molecules used in enrichment or detection of peptides. Tryptic peptides ending in c-terminal arginine (R) and having no internal lysine(s) possess only one amino group (at the n-terminus) and are termed “Single amino” peptides. Reaction with this group provides a geometrically well-defined “handle” on one end of the peptide. Such peptides are used in some embodiments to create “rope-tow” constructs for nanopore sequencing through combination with oligonucleotides, as described below. In some embodiments, it is preferred that the TARGET peptide has either no net charge or a net positive charge in order to facilitate peptide movement within a nanopore. Peptides of the “Single amino” group may be selected to contain no aspartic or glutamic acids so as to minimize the contribution of negative charges on the peptide (i.e., the peptide preferably has zero or net positive charge that, in some embodiments, helps resist being pulled through a nanopore by its attachment to a negatively charged polymer construct), and no cysteine (so as to avoid the necessity for a method step to block these reactive groups). Single amino peptides with no aspartic or glutamic acid residues also have a single carboxyl group, which is useful in some embodiments that rely on anchoring a peptide to a support via its c-terminus, leaving the unmodified n-terminus available for binding of affinity reagents or sequential degradation (e.g., by Edman chemistry).
In some embodiments requiring linkage sites on both ends of a peptide, it can be convenient to select TARGET peptides (e.g., “Double amino” peptides) with a single lysine residue present at or near the c-terminus, whose epsilon-amino group provides a second reactive amino group, in addition to the n-terminal amino group at the opposite end of the peptide molecule. Linkage through these two amino groups allows a peptide to be coupled “in-line” with preceding and succeeding polymers via amine-reactive chemistry to form a continuous thread. Peptides with an internal lysine, or multiple lysines, are less preferred in such embodiments due to the potential for multiple non-linear constructs.
In some embodiments TARGET peptides are selected for other configurations of reactive sites. In some embodiments it is preferred that a peptide have a single carboxyl group and this criterion can be met by peptides with no aspartic or glutamic acid residues while possessing a free c-terminal carboxyl. In some embodiments other specific amino acids are desired so as to facilitate labeling of peptide molecules with amino acid-specific detection reagents.
In some embodiments in which a net positive peptide charge is preferred, aspartic or glutamic acid carboxyl groups in the peptide can be converted to positive charged sites by a chemical modification, e.g., activation by a carbodiimide and reaction with a reagent having an amino group (that couples to the carboxyl) and a second positively charged group.
In some embodiments, peptides having post-translational modifications are selected. For example, the n-terminal amino group of tryptic peptide VHLTEEPK from the beta chain of human hemoglobin (Hb) is modified by glycation in a fraction of molecules in the blood as a result of slow reaction with blood glucose (the modified Hb is referred to as HbAlc, and is used clinically as a measure of average blood glucose over time in a test for diabetes). The unmodified form of this peptide is thus “Double amino”, while the modified form is “Single amino”. In some embodiments designed to measure the fraction of Hb that is modified by this glycation, the n-terminal amino group of unmodified peptide is first blocked by reaction under conditions favoring reaction with the lower pK n-terminal amino group, and subsequently both modified and unmodified forms of the peptide are coupled to other molecules by the single remaining amino group of the c-terminal lysine.
6.4 STANDARD VERSION OF A TARGET PEPTIDE FUNCTIONING AS AN INTERNAL STANDARD.
6.4.1 History in mass spectrometry
In mass spectrometry (MS) as used for peptide quantitation (52), an internal standard is typically a synthetic, same-sequence version of a TARGET peptide including one or more amino acids comprising stable isotope labels (typically referred to as a Stable Isotope Standard or SIS) that allow it to be distinguished from the sample-derived TARGET peptide by mass measurement in the MS instrument (i.e., the well-known method of “isotope dilution”). Given the chemically identical structures of TARGET peptides and stable isotope labeled peptide internal standards, there is no basis on which to suspect that specific capture reagents (e.g., anti-peptide antibodies or other BINDERS) can distinguish between them, and thus such capture reagents will bind TARGET and stable isotope labeled peptides in whatever ratio they exist in the surrounding solution (e.g., the sample digest). A key feature of this approach to internal standardization is that the same method can be used to create the standard in all cases: for example, with tryptic peptides an effective standard can be made by synthesizing the TARGET sequence with a c-terminal amino acid (typically either lysine or arginine) containing stable isotopes (e.g., all 12C replaced by 13C and all 14N replaced by 15N). Therefore no advanced design or experimental testing and selection is required in a particular case: one approach works in all cases. This is not the case in the present invention, in which some embodiments require an involved selection and/or manufacturing strategy to identify and produce cognate TARGETS, STANDARDS and BINDERS that function properly together in the invention. 6.4.2 Limited use of internal standards in nucleic acid-based technologies
Internal standards that are truly identical, or almost identical (i.e., “cognate”), to the target analyte sequence are not typically used in genomics and nucleic acid technologies. Since the genome has very flat stoichiometry to begin with (most genes present in equal copy numbers, or if not, present in small integer ratios), internal standardization is typically not required for copy number quantitation. In quantitative methods such as RNAseq, where quantitation is important and mRNA levels can vary widely, current approaches typically sequence deeply and use total counts of specific sequences without differential enrichment, while focusing on between- sample normalization methods to reduce the effect of systematic factors that influence the results, such as RNA length, GC content, structure formation potential, etc. Where nucleic acid quantitation is important (e.g., in detection of one or more specific sequences such as SARS-CoV-2 sequences), PCR and related technologies are typically employed that rely on amplification, and the result is expressed in terms of the number of amplification cycles required to achieved a certain detection threshold (e.g., a “Ct value”). In none of these cases is a true internal standard (one chemically equivalent to the analyte in the assay but distinguishable by the detector) required or employed.
Since most of the emerging single molecule peptide technologies are evolutions of genomic methods, and/or are aimed at biomarker discovery rather than precise quantitation over a wide dynamic range, internal standardization with cognate STANDARDS has not previously been a significant objective of single molecule peptide analysis by these methods. It is thus a novel object of the present invention to provide effective internal standardization for single molecule peptide methods so as to enable accurate peptide quantitation by single molecule methods.
6.4.3 Need for novel internal standards for quantitative single molecule detection
In the present invention, a single molecule detection technology is not expected to be able to reliably detect small mass differences (a few atomic mass units) between the otherwise identical chemical structures of a TARGET peptide and an isotopically-labeled STANDARD with the same sequence (i.e., same chemical structure), and therefore other differences in molecular character besides isotopic mass must be employed. In this case a difference in chemical structure is required, and it would be expected based on experience with the extreme specificity of high-affinity specific capture reagents that a structural difference would in general lead to a significant difference in binding by a specific capture reagent, which in turn would interfere with the ability of a chemically modified peptide to function as an effective internal standard by preserving the ratio of TARGET to STANDARD during enrichment by BINDER from the digest.
In some embodiments of the invention, a STANDARD is identified and prepared for each TARGET peptide and added to the sample in known or constant amount before, during or after digestion, but before enrichment, to act as a quantitative reference at the detection step. A sample digest to which STANDARDS corresponding to cognate TARGETS have been added in known or constant amount is referred to herein as a “standardized sample digest”. A sample digest may be standardized with respect to a single TARGET, or with respect to multiple TARGETS. The amount of a TARGET peptide can be compared with the amount of added STANDARD, and thereby measured, by multiplying the amount of STANDARD by the observed ratio of TARGET peptide to STANDARD in a sample. In some embodiments, the STANDARD is very similar to the TARGET peptide, i.e., as close as possible to being indistinguishable from it during steps of the workflow before the detection step, while being clearly distinguished from it at the detection step - in other words a cognate sequence peptide standard herein referred to as a STANDARD.
In some embodiments, the STANDARD serves as an internal standard against which the TARGET peptide amount is compared, for example by comparing the number of TARGET peptide molecules to the number of STANDARD molecules, providing a ratio measurement. In some embodiments in which a known amount of STANDARD is added (e.g., a known mass, or molar amount, or known number of molecules), multiplication of the ratio by this amount yields the amount of TARGET peptide (or mass or the number of TARGET peptide molecules) in the sample digest. In the case where the same amount (although not necessarily an amount whose mass in moles, grams or molecules, or concentration is known) of STANDARD is added to multiple samples, the presence in each of a consistent amount of STANDARD allows the amounts of TARGET peptide to be compared between these samples (using the shared amount of STANDARD as basis) to provide relative quantitation within a sample set. Samples can be compared using this approach by addition of the same amount of STANDARD (i.e., the same mass or the same volume of the same or equivalent solution), or using different amounts in different samples so long as the amounts added to different samples are known in relative terms (e.g., twice as much STANDARD added to sample 2 and to sample 1). Methods for the use of STANDARDS at various levels of monitor peptide quantitation and calibration are described in detail in Provisional patent application 63/213,371 - Calibration of Analytical Results in Dried Blood Samples, which is incorporated by reference herein in its entirety.
It is advantageous for a STANDARD construct and the STANDARD peptide it comprises to be as similar as possible to the respective TARGET construct and the TARGET peptide it comprises, since this similarity minimizes the probability that the ratio between them (which encodes the desired quantitative result of the analytical process) will be skewed or altered by some physical or chemical process in any step of an analytical workflow prior to detection, including enrichment by a cognate BINDER. As noted above, since a BINDER selected to enrich cognate TARGET and STANDARD constructs (or the respective peptides) must be highly specific in order to bind these peptides and not the enormous variety of other peptides present in a digest of a biological sample, some embodiments of the invention make use of TARGET and STANDARD peptides that are identical (i.e., perfect cognates). Alternative approaches, in which limited modifications of peptide sequence or structure distinguish TARGET and STANDARD peptide components, are less ideal and less general, but in some cases may be practically useful.
Similarly, for TARGET and STANDARD constructs, where a structural or chemical distinction is required in order that they be separately countable by a single molecule detector, the non-peptidic components of the TARGET and STANDARD constructs should also be cognates, though with relaxed similarity constraints. It is therefore advantageous for the non- peptidic components of the TARGET and STANDARD constructs to have similar physical properties such as mass, physical dimensions, shape, charge, hydrophobicity, solubility, etc. In some embodiments of the invention these constraints are addressed by the use of oligonucleotide TARGET and STANDARD tags, wherein the tags have the same length and may have the same base composition (implying the same molecular mass), but different sequences, allowing them to be distinguished by DNA sequencing or by specific hybridization to complementary probes. Linkage of such oligo tags to TARGET to STANDARD peptides can be accomplished using bifunctional linkers (for example including flexible polymer components such as polyethylene glycol between the oligo and peptide attachment sites) that reduce any steric hindrance the oligo may exert on the peptide that could affect binding to a BINDER. Such a level of similarity reduces the probability of skewing of the TARGET to STANDARD construct ratio because of differences in the diffusion, charge repulsion, epitope- masking, or solubility of the two constructs.
Suitably similar TARGET and STANDARD constructs form a cognate construct pair. The TARGET and STANDARD constructs, together with a BINDER capable of binding them without skewing the ratio, form a set of cognate molecules. A STANDARD construct, the cognate BINDER, a TARGET tag and any linker required to link the TARGET tag to TARGET peptide molecules in a sample digest form a cognate reagent set useful for specific measurement of the TARGET peptide and its parent target protein in a sample (i.e., they can serve as a kit for measuring the protein).
Achieving the goal of construct cognate equivalence nevertheless remains challenging because of the interdependence of constraints governing cognate constructs and BINDERS, and the absence of successful attempts to solve this problem in the past.
6.4.4 Standards with altered TARGET amino acid sequence
In some embodiments, the STANDARD is created by replacement or alteration (e.g., by chemical modification) of one or more amino acids in the TARGET peptide sequence, or by addition of amino acids or other chemical structures. In some embodiments it is preferred that the replacement, addition or alteration a) does not result in any significantly difference in binding of the TARGET peptide and the cognate STANDARD to the cognate BINDER, and b) results in an easily detected change in the result from a sequence-sensitive single molecule detector (e.g., a different ion trace during transit of a nanopore compared to the TARGET peptide, or a different amino acid sequence detected by a degradative sequencing process, or a difference in the set of epitope-specific binders detected by a single molecule imaging platform).
In some embodiments, one or more amino acids or other chemical groups can be added to either the n-terminal or c-terminal end of the TARGET peptide to create a STANDARD, with the same constraints (e.g., an easily detected change in the result of a sequence-sensitive single molecule detector, but no significantly difference in binding of the TARGET peptide and STANDARD to the cognate specific BINDER). In some embodiments, these replacements and/or modifications are made to residues outside the peptide epitope to which a selected BINDER binds - such epitopes are typically linear contiguous regions of 4-8 amino acids in the case of IgG antibody BINDERS, leaving numerous potential modification sites available outside this region in a TARGET peptide 8-25 amino acids long.
In some embodiments a single serine (S) residue may be added to either the n-terminus or c-terminus of the sequence of a TARGET peptide to create a cognate STANDARD. Any other amino acid, or sequence of amino acids, that is clearly recognized by a sequence-sensitive detector can in theory be used instead of serine, the choice of added amino acid(s) being free, constrained only by the requirements of STANDARDS generally (i.e., BINDER binding equivalent to that of the cognate TARGET peptide, etc.) and any sequence constraints arising from any chemistry required to present peptides for detection. In some embodiments that make use of coupling reactions at two linkage sites on the peptide provided by the n-terminal amino group and the epsilon amino groups of a c-terminal lysine residue, addition of a residue after the lysine (a serine residue in this embodiment) provides a STANDARD that is chemically identical to the TARGET peptide along the entire chain of connected atoms between the peptide’s two amino groups (the n-terminal amino and lysine epsilon amino group) while comprising an appended serine residue “side chain”. In some embodiments, an amino acid such as serine can be added to the n-terminus. In other embodiments any amino acid(s) or chemically linkable group of atoms can be added to one or the other terminus, to an internal amino acid, or to both termini, to create a STANDARD version of a TARGET peptide sequence.
Figure 2 illustrates the challenge in practice of designing a simple addition of amino acids to the n-term or c-term of a peptide TARGET to create a cognate STANDARD while preserving equivalent binding to a BINDER. Here a series of versions of the peptide LLGPHVEGLK (proteotypic for human mesothelin) was synthesized with each of the 20 amino acids added to the n-terminus, and dipeptides added to the c-terminus (in each case a proline was added after the lysine and ahead of the variable amino acid in order to prevent removal of the added amino acid by trypsin cleavage, were the STANDARD added to a sample prior to digestion as is often the case). Each variant was mixed with a similar amount of the unmodified TARGET (LLGPHVEGLK), and the ratio of variant candidate STANDARD and TARGET signals measured by mass spectrometry before and after enrichment by a rabbit monoclonal antibody with specific affinity for this peptide. All n-terminal additions result in a dramatic decrease in binding of candidate STANDARDS compared to the TARGET: none are enriched to more than 4% of the level of the TARGET. The epitope recognized by this antibody thus probably includes the n-terminus and cannot accommodate an added amino acid. C-terminal additions are successfully enriched, with recoveries compared to TARGET of 27% (-PC) to 5386% (-PW); i.e., widely varying depending on the specific amino acid added after the proline. The antibody binding therefore appears to be affected by c-terminal additions, and in some cases (e.g., -PW) these c-terminal variants bind in preference to the original TARGET against which the antibody was made. Only 2 of the 39 variants examined bind the TARGET and candidate STANDARD at near-equivalence: c-terminal -PP and -PQ additions bind with approximately 99% and 102% recovery relative to the TARGET sequence. This example demonstrates that a large majority of modifications made by adding amino acids to the n-term or c-term of a 10 amino acid long TARGET peptide are unlikely to yield STANDARDS that bind equivalently to a given BINDER, preserving the TARGET/BINDER ratio present in the standardized sample digest. In addition, those STANDARD versions that appear to satisfy this simple version of the equivalence requirement (-PP and -PQ in this case), must pass further tests of equivalence under varying solution conditions, workflow timelines, and sample matrices, further restricting the range of choices.
In some embodiments using nanopore sequencing, and in which peptides transit the pore from n-terminal to c-term, it can be advantageous to use a STANDARD with an added residue at the c-terminus (the end closest to a DNA motor on the cis-side of the pore) so as ensure that the STANDARD variation is read, even if the peptide is longer than the nanopore’ s read depth and as a result some of the peptide’s n-terminal residues are not read during a period of controlled movement of the peptide through the nanopore. Sets of STANDARDS generated by addition of a constant c-terminal residue or residue pair to form the STANDARDS will in general require accurately reading a minimal subsequence of 2 amino acids more than the minimum required to distinguish the TARGETS themselves. Given the likelihood of imperfect reads, and the potential contamination with other, un-selected peptides, longer reads of 5, 6, 7, 8, or 10 amino acids, or the entirety of the peptide’s sequence may be required to identify the TARGET peptide and STANDARD molecules with sufficient accuracy (e.g., 99.5%, 99%, 98%, or 95% accuracy) to enable use of the ratio TARGET-to- STANDARD molecule counts to calculate a precise estimate of TARGET peptide amount.
In some embodiments, for example degradative methods in which a peptide is immobilized by the c-terminus and read by successively removing amino acids from the n- terminus, an added residue indicating STANDARD status is preferred at the n-terminus so as to ensure that the distinction between TARGET and STANDARD peptides is read at the beginning, and does not require sequencing to the end of the entire peptide. Accurately reading a minimal subsequence of 3 amino acids starting from the n-terminus is often sufficient to distinguish among a small set (e.g., 20) of TARGET peptides and their respective STANDARDS. Given the likelihood of imperfect reads, and the potential contamination with other, un-selected peptides, longer reads may be required to confidently identify the TARGET peptide and STANDARD molecules. In some embodiments, for example those that employ a sequential enzymatic (e.g., exoprotease) or chemical (e.g., Edman) process to remove single amino acid residues from one terminus of a peptide, the advantage of rapid definitive identification of TARGET and STANDARD sequences based on just a few terminal residues is substantial, since it could allow early termination of the cyclical read process, thus leading to a significant decrease in the number of cycles required and thus in analysis time, with associated decreases in cost and increased throughput.
In some embodiments, larger numbers of TARGET peptides and STANDARDS are used and need to be discriminated: for example, 25, or 50, or 100, or 200, or 400, or 600, or 800, or 1,000 TARGET peptides and their cognate STANDARDS; in such cases, based on an analysis of the uniqueness of the sequences, it may be desirable or required that a detector determine more of the peptide sequence, up to a complete sequence of some or all of the peptides.
In some embodiments, initial studies can be undertaken in which the TARGET peptides are sequenced beyond 3 or 4 residues, up to complete sequences, in order to detect the presence of any interfering peptides (i.e., peptides that share short n-terminal or c-terminal sequences with TARGET or STANDARD sequences, or otherwise generate output that can be confused with the pre-selected TARGET and STANDARD sequences) likely to be present in a given sample type. If such interfering sequences are commonly detected, deeper sequencing can be applied to distinguish the interfering sequences from TARGET or STANDARD sequences (i.e., sequencing up to or beyond the amino acid residue where the interfering peptide is no longer identical to a TARGET or STANDARD sequence), an approach with a high probability of success given that TARGET peptides are typically selected to be proteotypic in the species of interest.
STANDARDS generated by modification of a TARGET amino acid sequence face several challenges that motivate exploration of alternative approaches. These include rarity (the low probability of finding a modified sequence that binds equivalently to a cognate BINDER); lack of generality (the fact that each TARGET and cognate BINDER represent a separate case that must be individually optimized); and the fact that only some single molecule detection technologies are likely to be able to detect such a sequence difference reliably.
6.4.5 Chemically modified peptides as standards
In some embodiments one or more amino acid residues of a TARGET peptide may be modified to generate a STANDARD. A large number of non-canonical amino acids that are known in the biochemical literature can be substituted for residues of the TARGET peptide or added to its sequence. Likewise, a large number of naturally occurring chemical modifications of amino acids are known and can be introduced into residues of the TARGET peptide during or after synthesis to form a STANDARD. Likewise, a large number of artificial chemical modifications can be made to amino acids of the TARGET peptide to form a STANDARD. Two examples of small but significant modifications are terminal blockages: 1) acetylation of an n-terminal amino group or 2) amidation of a c-terminal carboxyl group, both of which can be carried out easily during synthesis of a STANDARD peptide having the same sequence as a cognate TARGET, and both of which represent small alterations in the peptide structure. These small alterations can be “read” at a later stage of a single molecule workflow by reaction of peptides with a chemical reagent capable of efficiently combining with exposed amino or carboxyl groups (respectively). In some embodiments making use of amino groups to link peptides to oligonucleotides, blockage of a STANDARD’S c-term carboxyl can prevent reaction with a reagent that nevertheless reacts with a TARGET’S c-terminus: if the result of reaction with the reagent (which may for example add polymer or other structures to the TARGET’S structure) is detectable by a single molecule detector such as a nanopore, then the distinction between TARGET and STANDARD required by the invention can be provided. Any of these modifications may be used to create a STANDARD, provided that it meets the criteria described above (equivalent binding to a specific enrichment reagent, and equivalent reactivity in any required chemical reactions involved in sample preparation).
In some embodiments, for example those that employ nanopore sequencing, TARGET and STANDARD molecules may be “read” completely during passage through a nanopore, reducing the potential for confusion between expected TARGET and STANDARD sequences, or with potentially interfering sequences. In some nanopore sequencing embodiments capable of halting the reading of a peptide after reading a small number of amino acids and ejecting the peptide from the nanopore based on confidently identifying it as a specific TARGET or STANDARD sequence, the uniqueness of n-terminal or c-terminal sequences remains important and provides an opportunity to reduce time spent on unproductive sequence reading and therefore increase throughput of molecule counting.
6.4.6 Discrimination of STANDARDS and TARGETS using members of different “click” chemical pairs
In some embodiments, STANDARDS and cognate TARGETS share an identical amino acid sequence but differ in an attached chemical group. For example, STANDARDS can comprise a peptide sequence linked to one member of a pair of “click” chemistry groups (e.g., TCO, capable of reacting bio-orthogonally with molecules comprising a tetrazine group, the other member of the click pair, or vice versa), while cognate TARGETS comprise the same (or very similar) peptide sequence linked to one member of a different pair of “click” chemistry groups (e.g., BCN, capable of reacting bio-orthogonally with molecules comprising an azide group, the other member of that click pair, or vice versa). Because the components of the two click pairs generally react only with the other pair member, but not between click pairs, such click-activated TARGETS and STANDARDS are generally inert until they encounter a molecule comprising the opposite pair member, at which time they spontaneously react forming a covalent linkage. Such click-activated TARGETS and STANDARDS are therefore each capable of reacting specifically with different additional molecules (e.g., oligonucleotides comprising the appropriate different click groups) at a later stage of a sample preparation workflow. In some embodiments, TARGETS and STANDARDS comprise different chemical linkage groups (e.g., selected from the above-mentioned click pairs) connected to the peptide by similar or identical spacers (e.g., polyethylene glycol of length 1, 2, 3, 4, 5 or more polymer units) thus reducing any potential impact of the difference in chemical linker structures (e.g., TCO and BCN as mentioned above) on the relative binding of TARGETS and STANDARDS to a cognate BINDER.
6.4.7 Identification of STANDARDS by linkage to non-peptide flags or barcodes
In some embodiments a peptide is attached to another molecule to label it as a TARGET vs a STANDARD, to barcode it (e.g., to identify the sample from which it came), to facilitate or regulate its passage through a nanopore, or a variety of other purposes useful in a single molecule detection workflow. In some embodiments the distinction between TARGETS and STANDARDS is encoded in an attached, non-peptidic “tag” component rather than in the peptides’ structures themselves or in chemical linkage groups (e.g., click groups) they comprise. In some embodiments this is accomplished by preparing the STANDARD prior to its addition to a sample digest in a form that is already attached to a detectable tag (e.g., a nucleic acid sequence tag) that specifically indicates its status as a STANDARD. In an example of such an embodiment shown in Figure 3 A, an oligonucleotide VEHICLE comprises a 5’ phosphate 52 (to facilitate ligation with other nucleic acid chains), a preceding sequence 29, a residue 33 (indicated by X) capable of forming a linkage 34 with a terminal residue of peptide 52 (in this case a STANDARD peptide having the same sequence as a cognate TARGET), an abasic stretch 36 running alongside the peptide (forming a rope-tow construct as described herein), and a following sequence 30 comprising a tag sequence 54 (indicated by a box) that identifies the construct as containing a STANDARD peptide. In the example shown, the peptide GFVEPDHYVVVGAQR is a member of the class of “single amino” peptides, and thus comprises only a single amino group which is located at its n-terminus. Cognate TARGET peptide 53 (example shown in Figure 3B) in such an embodiment is attached to a VEHICLE of similar overall structure as the STANDARD construct, but comprising a different nucleic acid sequence tag 55 that indicates its status as TARGET. During single molecule detection, the VEHICLE nucleic acid sequences can be read by a nanopore and their location in relation to a peptide (e.g., preceding or following with pre- determined proximity) can be used to identify each peptide molecule as a TARGET or STANDARD molecule. The overall similarity of the VEHICLES attached to the pre-prepared STANDARD (Figure 3A) and sample digest-derived TARGET peptides (Figure 3B) minimizes any potential difference in binding of the peptide portions of the constructs (TARGET and STANDARD peptides 52 and 53 being structurally identical) to cognate peptide sequence-specific BINDERS. In some embodiments the sequence tags distinguishing the TARGET and STANDARD VEHICLES are optimized for high sequence accuracy in a given sequence-specific detection system (e.g., a nanopore reading system, or an affinity reagent imaging system).
The primary function of STANDARD tags and TARGET tags is to distinguish peptide constructs added to a sample as internal standards (STANDARD constructs) from peptide constructs that incorporate peptides created by proteolytic digestion of the sample proteins (TARGET constructs). In some embodiments the TARGET tag may be the absence of any STANDARD tag. In some embodiments the STANDARD tag may be the absence of any TARGET tag. A STANDARD tag may consist of the absence of a feature present in a cognate TARGET tag, the presence in the STANDARD tag of a feature absent in the cognate TARGET tag, or the presence of different features in the STANDARD tag and TARGET tag. In some embodiments, such presence/absence features may include differences in the sequence of oligonucleotide tags. The importance of maintaining unbiased (unskewed) ratio relationships between TARGET and STANDARD constructs (i.e., preserving their cognate character, specifically in regard to interaction with a cognate BINDER) argues against large structural differences between the TARGET and STANDARD tags (e.g., presence vs absence of a sizable chemical group). Multiple different STANDARD tags may be used, provided that the STANDARD tags are distinguishable from any TARGET tags. Multiple different TARGET tags may be used, provided that the TARGET tags are distinguishable from any STANDARD tags.
In some embodiments each different peptide STANDARD is prepared attached to a respective cognate VEHICLE that comprises a nucleic acid sequence tag that specifically identifies that STANDARD amino acid sequence and distinguishes it from a plurality of other STANDARDS that may be used in the same workflow. Use of different nucleic acid sequence tags for each STANDARD provides an orthogonal method for identifying these peptides, and this information makes it possible to assess the reliability of both methods (i.e., the degree of agreement between the peptide detection and tag detection), and to optimize the respective detection methods to improve accuracy.
In some embodiments the oligo tag sequences used to identify and distinguish cognate STANDARDS and TARGETS in a cognate group are selected so as to be chemically very similar (e.g., same length and base composition) while being reliably distinguishable (e.g., different base sequence). By being chemically similar, and not located in close proximity to the peptide, the tags are unlikely to have any differential effect on the binding of cognate TARGET and STANDARD molecules to the cognate BINDER, thus preserving the ratio of TARGET to STANDARD in the standardized digest. By having distinguishable sequences, as detected by any of the single molecule methods herein, the TARGET and cognate STANDARD molecules can be identified and counted reliably, thus providing an accurate value for the ratio of TARGET to STANDARD.
In some embodiments STANDARD-VEHICLE constructs (e.g., Figure 3A) are prepared and added to the sample digest after sample digest peptides have been incorporated into similarly-structured TARGET constructs (e.g., Figure 3B). In these embodiments, the structure of the STANDARD peptide molecule may be identical to the TARGET peptide structure (e.g., it can be a synthetic version of a known cognate TARGET peptide sequence), while their respective STANDARD and TARGET VEHICLES comprise distinct nucleic acid sequence tags (54 and 55 in Figure 3), thus ensuring that the cognate BINDER will bind the attached peptides equivalently, and thereby accurately preserve the TARGET -to- STANDARD ratio present in a standardized sample digest.
Given such an encoding scheme to identify STANDARDS and TARGETS the method requires that these peptides be joined to their respective tags prior to enrichment and selection using the cognate BINDERS. A specific advantage of this approach is the identical structures of cognate TARGET and STANDARD peptides, and thus the high likelihood that the cognate BINDER binds them with identical affinity and kinetics, thus preserving the TARGET-to- STANDARD ratio present in the sample digest. The potential disadvantage of this approach is the expense involved in using sufficient TARGET vehicles to incorporate all the digest peptides. A further simplification of this approach to address this issue is described below.
Some embodiments make use of a further simplification of the VEHICLE-encoding scheme shown in Figure 3 and described above. Figure 4 shows a method in which a short identifying oligo a STANDARD tag 62 is attached to a STANDARD peptide 52, in this case by an amine-reactive N-hydroxysuccinamide (NHS) group 61 attached by linker 34 to a suitable DNA nucleotide of the oligo (for example an amino-modified C6 dT base to which NHS functionality has been added during manufacture). The 16 base long oligo tag 62 has a molecular weight of about 5,000 daltons, substantially less than the VEHICLES described in the embodiment shown in Figure 3, and therefore less expensive and also able to diffuse and bind BINDERS more rapidly in solution. Those skilled in the art will be able to design oligonucleotide tag sequences of reduced (or longer) length capable of specifically hybridizing with complementary sequences as required in the steps of Figure 4 C and D. TARGET and STANDARD oligonucleotide tags (e.g., 62 or 63) may be provided of lengths ranging from 4 to 30 bases, more preferably 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16-30 bases. Oligo tag sequences are designed according to well-known principles to maximize specific binding to a complementary sequence (e.g., 64 or 68) and dissociate at a reasonable (melting) temperature, while minimizing the potential to hybridize with other oligo sequences used in the workflow, or to form self-associations (e.g., intra-molecule hairpins or inter-molecular hybrids).
The product of the reaction of the oligo tag’s NHS and the peptide’s NH2 groups of Figure 4A is shown in Figure 4B, while the equivalent reaction for TARGET peptide 53 with TARGET tag 66 is shown in Figure 4 E and F. The STANDARD construct of Figure 4B is added to a sample digest (in this example a tryptic digest) whose sample peptides (including the TARGET) are prepared as TARGET -tag constructs (as in Figure 4F), thus forming a standardized sample digest. In order to preserve the quantitative relevance of the TARGET amount, it is desirable that the reaction of peptide 53 with VEHICLE 66 goes to completion (or if not, that the proportion of the TARGET peptide incorporated into construct of Figure 4F is consistent between samples).
At this point TARGET and STANDARD peptides are linked to, and thus identified by, oligo sequence tags of the respective TARGET and STANDARD tags (oligos 62 and 66 in Figure 4). Enrichment of the TARGET and STANDARD peptides using the cognate BINDER isolates these two peptides and their attached respective oligo tags (the peptides having identical structures but derived from different sources), preserving the TARGET-to- STANDARD ratio in the standardized sample. Subsequent to the enrichment step, after removal of unbound sample peptide constructs, the enriched bound TARGET and STANDARD constructs can be “completed” by hybridization and ligation to respective “secondary VEHICLES” as needed for various single molecule detection methods. Completed constructs shown in Figures 4 C and D (for the STANDARD peptide) and Figure 4 G and H (for the TARGET peptide) are particularly useful in the case of nanopore sequencing. In summary, short oligo tag 62 (which identifies peptide 52 as a STANDARD molecule) hybridizes with a complementary sequence 64 bringing the 3’ residue of the oligo 62 into proximity with the phosphorylated 5’ end of an oligo comprising an abasic region 36 (abasic backbone links are symbolized by “o”) and a following sequence 63. In some embodiments the 5’ terminus of the secondary VEHICLE comprises one or more bases before the abasic region (e.g., AA at the 5’ end of secondary VEHICLE oligo 36 in Figure 4) that are capable of hybridizing with complementary bases in oligo 70 (TT in oligo segment 70 of Figure 4D), and these bases are different in STANDARD and TARGET secondary VEHICLES so as to minimize potential hybridization of a secondary STANDARD oligo (e.g., 36 + 63) with a TARGET complementary sequence 70, and vice versa. The number of such hybridizing bases at the 5’ end of segments 36 may be selected to have an extended length that is less than the total length of linker 34 in order to avoid overlap of the peptide with these hybridizing bases when the complete construct passes through a nanopore. Similarly, the length of abasic region 36 is chosen so as to avoid overlap with the peptide 52 as it transits a nanopore, and oligo sequence 63 is designed so as to engage with a DNA motor and regulate pore transit of the peptide, allowing measurement of its current trace (“squiggle”). To optionally further reinforce the identification of the peptide as a STANDARD, oligo sequence 63 may be a unique sequence also indicating status as a STANDARD molecule (i.e., different from the sequence of oligo 67 in the TARGET VEHICLE construct) resulting in a “double-tag” (i.e., redundantly tagged) construct. An advantage of the double-tag approach is that each peptide is identified as STANDARD or TARGET by sequence information both before and after the peptide. In some embodiments, alternative linkage chemistries, including various click chemistry linkages as described elsewhere herein, are used to create a linkage between a peptide and a nucleic acid tag. Figure 5 shows an example in which a tag oligo prepared with a 3’ transcyclooctyne (TCO) group is reacted with a peptide derivatized at its n-terminus with methyltetrazine (Me-TZ) to yield an “in-line” construct (one in which the oligo and peptide sequences form a single continuous polymer). Figure 5A shows preparation of a TARGET construct using a tag sequence identifying the construct as a TARGET (labeled OLIGO- TARGET tag). In some embodiments, this modification (labeling peptides with the OLIGO- TARGET tag) is applied to all peptides in a sample digest (for example by derivatizing all digest peptide amino groups with NHS-tetrazine and reacting these with an excess of the reactive oligo comprising 3’ TCO), thereby ensuring that all molecules of the TARGET are taken into account in subsequent steps of an analytical workflow. Figure 5B shows preparation of a STANDARD construct using a tag sequence identifying the construct as a STANDARD (i.e., the OLIGO-STANDARD tag). In some embodiments, the STANDARD construct is prepared separately, for example using synthetic STANDARD peptide having the same (or generally similar) sequence as the TARGET, and is added to sample digests to serve as an internal quantitative standard (i.e., to standardize the sample digests). In some embodiments the physical properties of the TARGET and STANDARD oligo tags (e.g., length, base composition, and melting temperature) are selected to be similar so as to minimize the impact of their different sequences (necessary to distinguish then during peptide single molecule analysis) on relative binding to the cognate BINDER. Figure 5C shows TARGET and STANDARD constructs bound to cognate antibody BINDER molecules immobilized on a magnetic bead. Since the difference between TARGET and STANDARD is encoded by the oligo component, the peptide components can be chemically identical (same sequence) capable of identical interaction with the BINDER. In some embodiments, the peptide is linked to the oligo via its c-terminus (e.g., via a c-terminal carboxyl and the epsilon amino group of a c-term lysine) rather than the n-term as shown. In some embodiments the oligo is linked to the peptide via the 5’ end rather than the 3’ as shown. These and other functionally equivalent physical arrangements will be apparent to those skilled in the art.
In some embodiments, such as nanopore sequencing applications, that may require addition of molecular components such as oligonucleotides to both ends of a peptide, STANDARD constructs can be prepared in a stepwise process comprising 2 discrete attachment steps, thereby allowing different molecular components to be added at (or near) the peptide N-terminus and C-terminus, and establishing a consistent orientation of the peptide with respect to the overall construct (thereby avoiding the need to recognize a peptide in two different polarities). In some embodiments the 2-step digestion process described herein for preparing sample peptide libraries is used to generate STANDARD constructs. In this case, a synthetic peptide comprising the cognate TARGET sequence ending in Lys (i.e., a 2-amino peptide), and comprising an added n-terminal sequence ending in Arg (e.g., GSGR in the case of trypsin second cleavage, or any suitable peptide ending in the amino acid at whose c- terminal position a second proteolytic enzyme cleaves), can be processed in a series of steps similar to those used to process sample peptides as shown in Figure 6. For example, for a generic TARGET peptide sequence XXXXXXK, a synthetic STANDARD precursor can be generated with the sequence XXXRXXXXXXK. This peptide has both an n-terminal and a lysine epsilon amino group, both of which can be reacted with a suitable linkage reagent such as NHS-BCN, thus adding a click linker to both ends of the peptide, after which excess NHS- BCN can be removed. A suitably activated flag group (e.g., an oligonucleotide) can be linked to the peptide at this point using an appropriate click partner (e.g., azide to react with BCN) in order to identify the STANDARD (either generically using the same flag for all STANDARDS, or specifically, using a flag that identifies which STANDARD peptide sequence is involved). Cleavage with a second enzyme (e.g., trypsin) then reveals a fresh n-terminal amino group (in the peptide XXXXXXK) which is available to react with a second linkage reagent such as NHS-TCO, thereby activating the n-terminus for potential connection to a second tetrazine- activated oligo via the TCO-tetrazine click reaction. Use of different mutually orthogonal click groups at the two termini offers the ability to add distinct oligos to the two ends via the distinct click pair reactions, and to postpone one or both oligo additions until after BINDER enrichment of the TARGET and cognate STANDARD peptides. Likewise, one or both of the oligos may be coupled to the peptide prior to enrichment on a BINDER, provided that the same process is applied to the peptides of the digest (including TARGET peptides) prior to enrichment by a BINDER. It will be apparent to those skilled in the art that alternative linkage chemistries, including different click pairs or application of click pairs in alternate order, can be employed; that the steps of adding the first oligo for click linkage and second cleavage can be carried out in either order, and that the addition of the second linker group and/or second oligo may be delayed until after the BINDER enrichment step or omitted altogether when the single molecule sequencing and detection means requires an available n-terminal amino group (e.g., for Edman degradation).
An important advantage of carrying out BINDER enrichment after assembly of peptide-oligo constructs such as those shown in Figures 3, 4, 5 and 7 is that constructs taken forward for sequencing after enrichment are very likely to contain a peptide (otherwise they would not be bound by and recovered from the BINDER).
In some embodiments, a TARGET peptide, its cognate STANDARD and the cognate specific affinity BINDER reagent(s) (i.e., those molecules used together in the present invention to quantitate the TARGET peptide) may be developed together (co-evolved or co- optimized) to achieve optimal performance; i.e., through an iterative process comparing assay performance of various combinations of versions of the reagents, through a full-matrix comparison of all available variants of each, or by molecular engineering guided chemical knowledge and experimental results.
6.4.8 Multi-level standards
In some embodiments, multiple distinguishable STANDARDS are provided for a single TARGET, and added in different amounts so as to establish a standard curve against which the TARGET can be quantitated. In such embodiments, the multiple STANDARDS may be distinguished by connection to distinct oligo sequence tags, or they may contain different and distinguishable structural modifications of the TARGET peptide (e.g., different amino acids added to its sequence). The different STANDARDS may be added to a sample digest in different amounts to standardize it: for example, three STANDARD versions (A, B, C) of a given TARGET peptide may be added to a digest in 0.1 : 1.0 : 10 relative amounts, thus generating a 3-point calibration curve. Multi-level STANDARDS increase the likelihood that at least one STANDARD will be present in an amount (and thus number of molecules) close to the amount of a TARGET in an unknown sample. Likewise, the pre-established ratio between different STANDARD version provides an internal check on the quantitative precision and linearity of a single molecule detection system. 6.5 SPECIFIC AFFINITY REAGENTS (BINDERS) TO CAPTURE AND ENRICH PEPTIDE TARGETS AND STANDARDS.
One or more specific affinity reagents, capable of binding the peptide TARGET and STANDARD specifically (i.e., while not binding a potentially vast number of other peptides that can be present in a sample digest) are used in some embodiments of the invention to capture the TARGET peptide and STANDARD prior to single molecule detection. We refer to such a reagent generically as a BINDER, and include within that term not only canonical antibodies such as IgG, but also numerous types of proteins and other macromolecules (e.g., aptamers) known in the art to be able to bind to particular peptide sequences with specific affinity.
Experience with the SISCAPA technology (3, 9, 13-15) has shown that antibodies, preferably monoclonal antibodies, can be developed that bind and capture a specific low abundance tryptic peptide from the digest of a very complex sample such as human blood plasma (which may contain 250,000 distinct peptides, some at very high abundance), and thereby enrich the peptide substantially (e.g., more than 10,000-fold). A variety of types of biologically derived antibodies (e.g., polyclonal, monoclonal and oligoclonal antibodies derived from mice, rabbits, humans, camelids and other species), molecules derived from antibodies by molecular biology techniques (e.g., antibodies selected from libraries using phage display and other techniques), aptamers, and other molecular constructs can be created to achieve the purpose of specific peptide binding - all of these are included within the term BINDER as used herein.
6.5.1 Antibody BINDERS
Many methods exist for generating antibodies to a peptide in animals. For example, a synthetic peptide having the TARGET peptide sequence can be coupled to a carrier protein (e.g., keyhole limpet hemocyanin: KLH) and used to immunize an animal (such as a rabbit, mouse, chicken, goat, camelid or sheep) by one of the known protocols that efficiently generate anti-peptide antibodies. For convenience, the peptide used for immunization and antibody purification may contain additional c-terminal or n-terminal residues (e.g., cysteine) added to the TARGET peptide sequence. The resulting extended TARGET peptide can be conveniently coupled to carrier KLH that has been previously reacted with a heterobifunctional reagent such that multiple SH-reactive groups are attached to the carrier. In classical immunization with the peptide (now constituting a hapten on the carrier protein), a polyclonal antiserum can be produced containing antibodies directed to the peptide, to the carrier, and to other non-specific epitopes.
Specific polyclonal anti-peptide antibodies can be prepared from an immunized animal’s serum by affinity purification on a column containing tightly-bound peptide. Such a column can be easily prepared by reacting an aliquot of synthetic TARGET peptide made with a cysteine residue added to one end with a thiol-reactive solid support. Crude antiserum can be applied to this column, which is then washed and finally exposed to 10% acetic acid (or other elution buffer of low pH, high pH, or high chaotrope concentration) to specifically elute anti-peptide antibodies. These antibodies are neutralized or separated from the elution buffer (to prevent denaturation), and the column is recycled to physiological conditions for application of more antiserum if needed.
In some embodiments, one of a variety of methods known in the art (e.g., B-cell cloning, hybridoma generation, recombinant expression, etc.) is used to generate candidate clonal antibody proteins, genes or gene sequences (from the cells, genes, or proteins of an immunized animal, or from natural or artificial protein binder libraries such as phage display libraries) that can be screened to select a monoclonal antibody (or other BINDER molecule) with the ability to enrich the TARGET peptide and cognate STANDARD from a complex peptide mixture (e.g., a sample digest) under a specified set of solution conditions. Such screening can be carried out using the method of the invention (i.e., screening in the assay of ultimate use), or by alternative methods such as the SISCAPA method with MS detection. Monoclonal antibodies, particularly those produced by recombinant methods, have the advantages of homogeneity, superior performance, scalable production and longevity compared to polyclonal mixtures.
In some embodiments a preferred method of selecting a monoclonal (homogeneous) anti-peptide antibody BINDER for use in the invention includes testing whether a candidate antibody (e.g., product of a clone) binds the TARGET peptide and STANDARD equally, and selecting for use the antibody product of a clone or clones that bind these peptides most equally (i.e., with least bias towards one or the other, and thus capable of capturing both from a mixture without changing the ratio between them). Preserving the ratio of TARGET to STANDARD unchanged during capture and enrichment by the BINDER is desirable since the invention involves measuring this ratio (by counting TARGET peptide and STANDARD molecules) to calculate the amount of TARGET peptide (and hence target protein) in the original sample. In the event that BINDERS are not found that preserve the TARGET to STANDARD ratio precisely, some embodiments make use of BINDERS that exhibit differential, but reproducible, binding, thus allowing correction of the ratio as described above.
6.5.2 Multi-epitope BINDERS
In some embodiments multiple BINDERS are used to enrich a single peptide. In some embodiments, TARGET peptides are sufficiently long to include multiple epitopes: a typical antibody linear epitope is 4-6 amino acids long, while TARGET peptides may be 6-30 amino acids in length - those of length 12-30 amino acids have a high likelihood of comprising 2 or more non-overlapping epitopes. In some embodiments, multiple BINDERS targeting linear, non-overlapping epitopes in a peptide can be generated and used to increase enrichment specificity. In some embodiments monoclonal antibodies to multiple linear, non-overlapping epitopes of a peptide are made using conventional hybridoma or other cloning techniques to select among clones created by immunization of an animal with peptide and/or derivatives of it. In some embodiments, antibodies to multiple linear, non-overlapping epitopes of a peptide are selected from libraries such as naive or immunized phage display libraries of single-chain antibodies. In some embodiments, aptamers to multiple linear, non-overlapping epitopes of a peptide are selected from libraries or evolved using iterative selection approaches well-known in the art. In some embodiments, multiple BINDERS of different types (e.g., polyclonal or monoclonal antibodies, aptamers, etc.) targeting linear, non-overlapping epitopes are used together for increased affinity and/or specificity.
In many cases, the cost involved in creation of multiple monoclonal antibodies or aptamers for each TARGET peptide can present a practical barrier to such an approach. Some embodiments therefore make use of polyclonal antibodies purified from sera of immunized animals, or, alternatively, oligoclonal mixtures of antibody-like molecules extracted from large libraries (e.g., naive or immunized phage display libraries). In some embodiments methods well known in the art are used for affinity purification of multiple distinct antibody specificities from such polyclonal antisera using multiple affinity media, each comprising a linear peptide subsequence of a TARGET peptide sequence.
In some embodiments, multiple BINDERS with affinities for distinct, ideally non- overlapping, epitopes on a TARGET peptide are simultaneously affinity purified by binding to synthetic TARGET peptide. In this approach, a TARGET peptide with multiple epitopes binds multiple BINDERS from a mixture until each epitope is saturated with BINDER, thereby establishing a balanced mixture of BINDERS to the epitopes. Figure 8B shows 3 BINDERS (I, II and III) bound to different linear epitopes (1, 2, 3) on a single peptide molecule (the peptide shown in Fig 8A). In some embodiments, this is achieved by affinity purification of BINDERS from a polyclonal antiserum (or pool of antisera) generated in response to immunization with the peptide, or multiple fragments of it. In some embodiments, this is achieved by capture of a mixture of BINDERS from a variety of sources selected to bind to the peptide, or fragments of it. By saturating the TARGET peptide epitopes with BINDERS competing to bind to various epitopes, a population of BINDERS is produced that covers the peptide (i.e., has a BINDER bound to all, or at least a substantial fraction, of the epitopes present on each peptide molecule). The BINDERS are expected to have varying affinities and specificities, some or most of which may be individually insufficient to effectively capture the peptide from a sample digest. However, in some embodiments, a combination of these BINDERS can be linked together covalently (e.g., using bi-functional chemical crosslinkers or click connectors) or non-covalently (e.g., by reaction of biotin-labeled BINDERS with multivalent streptavidin) into a multivalent (e.g., bi-specific, tri-specific, or quadra-specific) BINDER (Fig 8C), that can be eluted from the peptide and subsequently used for affinity enrichment of TARGET and cognate STANDARD from standardized sample digests. This novel approach makes use of a multi-epitope peptide as a “scaffold” or “template” on which a plurality of BINDERS to short epitopes are assembled and then linked to form a larger multi- epitope BINDER. In some embodiments a multi-epitope (or multi-valent) BINDER is immobilized (e.g., on magnetic beads, a column, a surface, etc.) and used to capture and enrich TARGET and STANDARD peptides and peptide constructs according to the invention.
Such a multi-valent BINDER can bind the peptide with much higher affinity than any of the individual BINDERS. This effect is well known in the art as “avidity”: the increased binding efficacy of a multivalent BINDER compared to a monovalent BINDER. Natural antibodies exploit this effect, comprising 2 binding sites in IgG and 5 binding sites in IgM. In some embodiments, this avidity effect is exploited by crosslinking the individual epitope BINDERS while in proximity to one another (in situ in Figure 8C) to create a multi-BINDER construct as shown in Figure 8D. In some embodiments, a similar approach is carried out using individual monoclonal BINDERS, either crosslinked together or expressed in a combined recombinant product similar to well-known “bi-specific” or “tri-specific” therapeutic antibody constructs.
In some embodiments, multiple BINDERS with affinities for distinct, ideally non- overlapping, epitopes on a TARGET peptide are separately affinity purified by binding one after another to a series of immobilized synthetic TARGET peptides. In some embodiments, multiple BINDERS with affinities for distinct, ideally non-overlapping, epitopes on a TARGET peptide are used to sequentially affinity purify TARGETS and cognate STANDARDS from a standardized digest: TARGETS and cognate STANDARDS that bind to (and are subsequently eluted from) each successive BINDER specific to one of multiple TARGET epitopes are rendered purer than the set of TARGETS and cognate STANDARDS that bind to and are eluted from a BINDER to a single epitope.
In some embodiments, the higher peptide sequence specificity obtainable with multi- epitope BINDERS (through their interaction with a larger number of amino acids in the peptide portions of TARGETS and STANDARDS) provides an important addition to the overall specificity of a range of single molecule detection technologies. By increasing the enrichment discrimination between authentic pre-selected TARGET/STANDARD peptide molecules and the multitude of potential similar sequence “ off-target’ ’ peptides present in digests of complex samples, potential assay interferences (particularly false-positive detection of TARGET -like peptides) is reduced for all single molecule detection methods.
Multi-epitope BINDERS are described in more detail in U.S Provisional Patent Application No. 63/381,722, incorporated herein by reference in its entirety.
6.5.3 Enrichment of TARGETS and STANDARDS from standardized sample digests
The invention makes use of a specific enrichment step to enrich both TARGET peptide and cognate STANDARD from a sample digest, thereby creating an “enriched standardized digest sample” (or “enriched standardized sample digest”). In some embodiments the enrichment step is carried out by a cognate affinity capture reagent (BINDER) to which the two peptides bind equivalently, such that the ratio of their amounts after enrichment is the same or nearly the same as before enrichment. In some embodiments, the TARGET and STANDARD peptides bind to the cognate affinity capture reagent with identical affinity and kinetics, preserving the ratio between them exactly. In some embodiments the TARGET to STANDARD ratio after enrichment is within 2%, within 5%, within 10%, within 20%, or within 30% of the ratio before enrichment. In some embodiments, enrichment using a BINDER results in a change in the TARGET to STANDARD ratio, which change is consistent across a range of samples and assay replicates. In this case, prior knowledge of the factor by which the ratio is changed by enrichment (established by measurements of TARGET to STANDARD ratios in a sample or digest before and after enrichment) allows correction of the ratio observed after enrichment to yield the correct relative amounts of TARGET and STANDARD in the sample digest.
In some embodiments, a homogeneous cognate affinity capture reagent (BINDER, described below) is selected from a plurality of alternative BINDERS for its ability to bind the TARGET and STANDARD peptides equivalently (or with a consistent ratio shift). Alternatively, in some embodiments, TARGET and STANDARD peptides are selected from a plurality of alternatives to bind to a cognate BINDER equivalently. In some embodiments, TARGET peptide, STANDARD peptide, and cognate BINDER are each selected from a plurality of alternatives so as to maximize the property of equivalent (or consistent) TARGET and STANDARD peptide binding.
Conservation (or correctability, as described above) of the ratio between TARGET and STANDARD peptides through enrichment means that a measurement of the TARGET peptide to STANDARD ratio after enrichment provides an accurate measure of TARGET peptide amount, even if only a fraction of the total TARGET and STANDARD is captured in the enrichment step. For example, the enrichment process might capture 100%, or 90% or 10% or 1% or 0.1% or 0.01% or 0.001% of the TARGET and STANDARD peptides present in a sample digest, but in each case the ratio of TARGET peptide to STANDARD molecules would remain the same, and thus a measurement of the TARGET peptide to STANDARD ratio would provide the same answer, which would be equal to (or correctable to) the ratio present in the original standardized sample.
The proportion of a peptide captured by the enrichment step is therefore an adjustable feature of the invention, which can be used to capture more of one TARGET peptide & its cognate STANDARD than another. Such adjustments make it possible to use the enrichment step to capture most or all of a low abundance TARGET peptide, while capturing only a small fraction of the molecules of a high abundance TARGET peptide - with the result that the difference in absolute molar amounts of the two TARGET peptides can be significantly reduced by this differential enrichment. This use of the enrichment step to bring multiple TARGET peptides to similar abundances (while preserving the TARGET peptide to STANDARD ratios that encode TARGET peptide amount in the original sample) is referred to as “stoichiometric flattening” or “equalization”, and provides a means by which amounts of molecules with high and low abundances in the original sample can be measured using a measurement technology with limited dynamic range (e.g., single molecule counting methods). The flattening approach (differential enrichment of different TARGET/STANDARD cognate pairs) generates a “flattened enriched standardized sample digest sample”.
6.5.4 BINDER affinity
In some embodiments the affinity of the BINDERS most useful for enrichment is in the range of 0.01 to 10 nanomolar, more particularly with a preferred half-ofif-time (reciprocal of the off-rate) of at least several minutes, or more preferably 10-15 minutes. Off-rate is particularly important since it governs the length of time that unbound materials, including non-TARGET peptides, can be washed away (e.g., using conventional manual and automated workflow steps such as magnetic bead manipulation in 96-well plates) while retaining the TARGET and STANDARD peptides on the BINDER. Higher affinity BINDERS are typically required to enrich lower abundance TARGETS, i.e., to capture peptides present in a digest at low concentration.
In some embodiments, specific solution conditions, or changes in solution conditions, are employed in a sample preparation workflow to preferentially dissociate less-tightly-bound peptides while retaining the correct BINDER cognate TARGETS and STANDARDS. In some embodiments bound peptides are exposed to increasing chaotrope or denaturant concentrations, increasingly acidic or basic solution pH, increasing salt concentrations, and/or increasing temperature to dissociate less-tightly-bound peptides prior to final elution of the enriched TARGETS and STANDARDS.
In some embodiments (e.g., single molecule imaging technologies) in which labeled BINDERS are used to detect and identify specific TARGET peptide sequences, BINDER specificity may be more important than affinity, and BINDER selection methods correspondingly adapted.
6.5.5 Dissociation from BINDER
In some embodiments, a preferred property of a BINDER is the ability to release bound peptides rapidly at a desired point in a workflow, as a result of a change in solution conditions - for example as a result of a change in pH (e.g., pH 3.0 or pH 9.5), addition of a chaotrope (e.g., ammonium thiocyanate or KC1) or organic solvent (e.g., 50% acetonitrile in water), increase in temperature, or the application of an electrical field (as in electrophoresis).
In some embodiments, BINDERS are selected that tightly and specifically bind the cognate TARGET and STANDARD peptides from a digest, and then release these peptides only when in close proximity to a site at which sequence-sensitive single molecule detection can occur (e.g., a nanopore, or an immobilization site on a support), thus maintaining the peptides in concentrated form and reducing losses due to diffusion. Nanopore sequencing (53) typically relies on the presence of a concentrated salt solution (e.g., 0.4M KC1) to provide sufficient charge carriers to create a measurable open channel current through the nanopore (typically 20-200 pA). In some embodiments BINDER reagents are selected that release their TARGET and STANDARD peptide cargo when exposed to such salt conditions, i.e., when the antibodies are placed in the solution present on the cis side of a nanopore sequencing device, or when exposed to a high concentration of salt in a salt gradient in which salt concentration increases closer to a nanopore. This feature of a selected BINDER (i.e., releasing bound peptides in a high-salt environment) is advantageous in that peptides can be retained on a physically manipulatable solid support (e.g., magnetic beads) until they are in a sequencing chamber itself (or in a region of that chamber nearest to a nanopore), thereby minimizing potential losses and dilution that could occur if peptides were eluted elsewhere and later transported into the sequencing chamber. In some embodiments, chaotropic anions such as SCN are incorporated into the solutions of a nanopore cis compartment (or both compartments) in addition to or in place of Cl anions conventionally used, in order to facilitate release of peptides from a BINDER. It will be evident to those skilled in the art that a range of chaotropic anions or cations can be used to effect peptide release from BINDER over a range of concentrations suitable for optimization in particular device configurations.
In some embodiments using one or more alternative influences to release bound peptides from a BINDER without a pH change, the potential deleterious effects of acid elution (as practiced in conventional affinity enrichment systems including SISCAP A, e.g., pH 2.0- 3.5) on some protein nanopores, or on components of other single molecule detection systems, can be avoided. Acid elution is used in SISCAPA in order to avoid introduction of salts, since salts interfere significantly with detection by mass spectrometry. The use of high salt in nanopore sequencing at near-neutral pH contrasts with the use of acid elution and low salt in mass spectrometry-based detection systems, and this difference suggests that different BINDERS with different elution characteristics will be preferred in the respective peptide detection methods.
In some embodiments where enriched TARGET and STANDARD constructs are to be immobilized on a support, the constructs are delivered into proximity with the support while they are bound to the cognate BINDER (e.g., on easily manipulatable magnetic beads). BINDER-bound constructs are present at a very high effective local concentration, and may be conveniently moved from one environment to another without loss. This feature is an important advantage of the BINDER enrichment step when applied to low abundance peptides and their detection by single molecule counting.
6.5.6 BINDER immobilization
In some embodiments, a BINDER with specific affinity for the TARGET peptide and STANDARD may be immobilized on a solid support in order to facilitate separation of the antibody and its bound peptide and/or peptide construct cargo from a complex sample digest, to wash away unbound molecules, to concentrate bound peptides, and to deliver bound peptides to a site where they are available for sequencing. Typical solid supports used for this purpose include magnetic beads (allowing collection of beads from a liquid suspension by magnetic force) or a porous column (e.g., an affinity column) through which liquids may be pumped. In some embodiments, the BINDER is immobilized on commercially available protein G- derivatized magnetic beads (Dynabeads G; Thermo Fisher) and optionally crosslinked covalently with dimethyl pimelimidate (DMP) according to the manufacturer’s instructions. In an alternative preferred embodiment, the antibody is immobilized on tosyl-activated Dynabead magnetic beads. In a further alternative embodiment, the anti-peptide antibody can be immobilized on solid phase chromatography media (e.g., POROS G resin) packed in a column and crosslinked using DMP. Such a column can bind the TARGET peptide specifically from a peptide mixture (e.g., a tryptic digest of serum or plasma) and, following a wash step, release the TARGET peptide under elution conditions.
6.5.7 BINDER homogeneity
In some embodiments, e.g., those using a homogeneous cognate affinity capture reagent (e.g., a monoclonal antibody BINDER, wherein all or nearly all molecules have the same sequence), it is expected that the ratio of TARGET and STANDARD peptides is not affected by the degree of saturation of the BINDER binding sites by the peptides at equilibrium (particularly at low saturation). Inhomogeneous affinity capture reagents are difficult to characterize in detail, and can contain variants that bind one or the other of the TARGET and STANDARD peptides more strongly. Thus, saturation of one variant could be followed by binding to another, potentially lower affinity, variant that has different relative affinities for the TARGET and STANDARD peptides, resulting in a change in the bound ratio as a function of the amount bound. For this reason, homogeneous (typically clonal or sequence-defined) BINDERS are typically preferred: e.g., monoclonal antibodies or sequence-defined aptamers.
6.5.8 Chemical modification of peptides while bound to BINDER
In some embodiments, chemical or enzymatic reactions for the purpose of modifying a TARGET (or STANDARD) peptide are carried out in solution, and in some embodiments one or more reactions are carried out while a peptide or peptide construct is bound to a BINDER, which may or may not itself be bound to a solid support. In some embodiments, one or more reactions are carried out while a peptide is bound to a BINDER linked to a solid support, thus allowing the peptide to be contacted with reagents, and removed from contact, by physical movement of the support between liquids (e.g., by removal of magnetic beads carrying BINDER and bound peptides from liquid in one vessel and deposition of the beads in a different vessel where they are exposed to a different reagent), or equivalently by movement of liquids in contact with the support (e.g., by pumping one reagent and then a second reagent over a porous column, or magnetic bead mass, to which BINDER and its peptide cargo are bound). In addition to contact with one or more reagents required for execution of a sequence of reactions, manipulation of peptides on a support allows the peptide to be washed free of a reagent by exposure to a wash solvent prior to contact with a subsequent reagent. Movement of peptides between liquids by movement of a BINDER or support to which they are bound reduces or eliminates the need for purification or concentration of intermediate peptide forms created during a sequence of one or more chemical reactions. In some embodiments, peptides are bound to a solid support by means other than interaction with a specific BINDER, e.g., by binding of peptides to a generic support such as a reversed phase support (e.g., C18) or an ion exchange support.
In some embodiments, use is made of amino groups present at peptide amino termini and on lysine side chains for chemical linkage of a peptide to other molecules (e.g., oligo and other polymers that, with peptides, form constructs amenable to sequence-sensitive single molecule detection) while a peptide is bound to a BINDER. In order to eliminate competing side reactions with amino groups present in specific affinity reagents (e.g., lysines and n- terminal amino groups of anti-peptide antibody BINDERs) used in the invention, the invention provides for the optional elimination of some or all of these BINDER amino groups by chemical blockage (e.g., by reductive methylation, by PEGylation using commercially- available NHS-PEG or other reagents, conversion of lysine residues to homoarginine by treatment with O-methylisourea, or other chemical modifications known in the art), by protein engineering (e.g., by replacing some or all lysines in a recombinant antibody sequence with arginines or other amino acids), or various other means. Specific affinity reagents to be used in such embodiments may be selected so as not to contain any lysine residues in the TARGET peptide binding site, since these residues would likely be blocked along with other lysines, potentially leading to a loss of binding activity. Non-protein BINDERS, such as DNA and RNA aptamers and other similar molecules, may contain no amino groups to begin with, eliminating the need to block these prior to process aimed at modifying peptide amino groups. The elimination of BINDER amino groups that could participate in side reactions has the effect of avoiding waste of expensive reagents used in amino group modifications of TARGET and STANDARD peptides, including use in creating concatenated constructs of these.
In some embodiments, blockage (e.g., by PEGylation) of many or all of the amino groups on BINDERS, and on any other proteins present on the capturing support (e.g., Protein A or Protein G used to guide antibody immobilization on solid supports such as Dynabeads G magnetic beads), can also have the advantageous effect of rendering the BINDER more stable, and thus less liable to degradation by heat, by proteases, or by exposure to complex samples and sample digests. In the case of a Protein A or G coated magnetic bead, for example, it is advantageous in some embodiments to first react the antibody BINDER with the Protein A or G on the bead, then chemically cross-link the BINDER to the Protein A or G on the bead, then PEGylate some or all of the remaining protein amino groups on the bead. Such modifications can also alter the net charge on proteins and on beads carrying them towards greater negative charge overall, since typically about half the positive charges on a protein are arginine and half lysine (the latter of which would be blocked by blockage of amino groups). Since the amount of negative charge (largely attributable to glutamic and aspartic acids) would be unaffected by amino group blockage, the overall decrease in positive charges by -50% will shift the net charge on the BINDERS, and on a bead coated with BINDERS, towards the negative. Nanopore sequencing devices are typically operated with a negative electrode in the cis compartment (where the input molecules to be sequenced are added) and a positive electrode in the trans compartment: this polarity induces an oligo, which is strongly negatively charged on account of its sugar-phosphate backbone, to migrate towards and through the pore to initiate sequencing. In some embodiments this polarity also serves to move a negatively charged bead towards the pore, contributing towards the goal of delivering peptide-oligo constructs in close proximity to the pore.
6.6 BARCODES
In some embodiments the methods used for single molecule detection have the capability to detect very large numbers of molecules (e.g., IO10 in (54) far exceeding the requirements for quantitative measurement of a modest number of peptides in one sample. In order to make effective use of the analytical capacity of such platforms, and the consequent reduction in analytical cost, some embodiments connect sample-specific labels (“barcodes”) to the TARGET and STANDARD peptides present in a sample digest, or enriched from a sample digest by a BINDER, allowing the TARGET and STANDARD peptides from multiple samples to be combined prior to peptide detection (i.e., multiplexed), and afterwards de- multiplexed to associate them with the correct original samples. DNA provides an ideal medium for implementation of such barcodes since, as essentially a digital medium, it is easy to synthesize, cut, ligate, copy, and detect by both sequencing and hybridization. Alternative barcode polymers can be employed, such as peptides and synthetic chemical polymers, although these may be significantly more difficult to generate, manipulate and detect than oligonucleotides.
As described above, many sample barcoding systems have been developed using sets of distinct DNA barcodes to identify nucleic acid molecules derived from different samples prior to sequencing, or to facilitate optical readout of individual nucleic acid molecules in imaging systems. In some embodiments, sample barcodes with identical or very similar base composition but distinguishable sequences are preferred in order to minimize differences in physical properties between constructs on account of barcode properties.
In some embodiments the identity of samples from which single molecule constructs according to the invention are derived is encoded using nucleic acid (e.g., DNA or another sequenceable polymer having multiple distinguishable subunits) barcodes. In some embodiments sample barcodes are appended or linked to TARGET and STANDARD constructs prior to enrichment by cognate BINDERS. In some embodiments sample barcodes are appended or linked to TARGET and STANDARD constructs after enrichment by cognate BINDERS, in which case smaller amounts of the DNA barcodes are required.
Figures 9 and 10 illustrate schematically a 2-level encoding scheme used in some embodiments. In each sample digest, a specific peptide (here labeled Peptide- A) is linked to a DNA sequence tag (labeled OLIGO-TARGET) identifying it as a sample-derived TARGET molecule. The cognate internal standard formed by linkage of a synthetic version of Peptide- A with a distinct DNA sequence tag (labeled OLIGO-STANDARD) is added to the digest, creating a standardized digest (standardized with respect to Peptide-A). Either before, or more efficiently after, enrichment of Peptide-A TARGET and STANDARD constructs using a BINDER, sample barcodes comprising a plurality of modules (Codes) are linked to these constructs using conventional methods that may include ligation to the TARGET/STANDARD tag, chemical linkage (e.g., using click chemistry), non-covalent means (e.g., biotin on one oligo and streptavidin on the other), or a variety of other linkage means known in the art. Alternatively, the sample barcodes can be linked to a site on the peptide different from the site at which the TARGET/STANDARD tag is connected.
The scheme for sample barcoding shown in Figure 9 provides a construct compatible with a variety of single molecule detection methods, as described below. In this example, barcode modules at positions 1, 2, 3 and 4 are used to encode “bits” in a 10-bit binary sample code. In such a code, if all bits were readable at once, 10 bits could conventionally encode 210 = 1,024 samples. Given that each DNA base is 1 of 4 alternatives (2 bits of information), 10 bits of information could theoretically be encoded in a short sequence of 5 bases. However, all the methods envisioned for reading the sample barcode in a single molecule detection system are subject to error, and avoiding sample misassignment errors is a high priority in many applications (e.g., clinical). A preferred approach is therefore to add redundancy to the sample code. In some embodiments this is done by providing a unique sequence module comprising multiple bases (e.g., 4 to 30 bases depending on the preferred readout method) corresponding to each of the bits in the desired sample code space (number of samples to be identified). As a further measure against sample assignment errors, the error detection and correction methods of Hamming can be used, and in the case of a 10-bit code, Hamming extended parity error detection involves the addition of 4 parity bits to the 10-bit code, resulting in a total of 14 bits of information. Such a 10+4 = 14-bit code is capable of detecting and correcting any 1 -bit error, and detecting but not correcting 2 -bit errors. To implement such an approach in a manner that economizes on the total length of DNA that must be “read” to obtain the sample code, the example of Figure 9 simplifies this coding scheme to use only those code values having 3 or 4 bits set to a value of 1, which reduces the sample coding space to 105 samples that can be identified with very high accuracy, but reduces the total number of DNA modules that need to be in any one sample code. Thus 4 modules are included in any sample code, selected from among 14 different DNA sequences selected using computational and experimental methods well known in the art for minimal likelihood of confusion during readout. A mistaken read on any one of these modules can be corrected by the coding scheme, and mistakes in 2 modules can be detected (but not corrected). Those skilled in the art will recognize that many alternate coding schemes exist, with greater or lesser numbers of bits, of larger of smaller numbers of identifiable samples, and of great or lesser numbers of bases in each DNA module.
6.6.1 DNA sample barcodes used with platforms that incorporate DNA sequencing.
Several approaches to peptide single molecule detection include the ability to read nucleic acid sequences interspersed with peptide sequence (e.g., current DNA sequencing nanopore platforms) or else together with peptide sequence that has been reverse-translated into DNA (e.g., reverse translation platforms). In some embodiments, modules of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more bases are used.
6.6.1.1 Nanopore sequencing
DNA sample barcodes and TARGET vs STANDARD tags linked to peptides can be read directly by passage through a suitable nanopore sequencing system. At the current state of the art represented by the commercial MinlON device, the accuracy of individual basecalls can be greater than 99%, and therefore the accuracy with which one of a small set of sequences (designed to be distinct) can be recognized is high. In some embodiments, the sample barcode (e.g., a binary number identifying the sample, an alphanumeric code taken from a physical sample label, or any type of computer encodable sample identification) can be encoded directly (2 -bits per base) or with redundancy in the form of multiple bases per code bit, additional parity bits (including error detection and correction), or any information representation scheme that can be encoded in DNA or another nanopore-readable polymer with 2 or more distinguishable units. The association of a code with an individual peptide molecule is accomplished through the covalent linkage between the two that is established in the peptide library preparation workflow of the invention.
6.6.1.2 Reverse translation
Methods of single molecule sequencing by some form of reverse translation have been described (US 2021/0302431 and (34)), and these typically include the ability to copy a DNA sequence from an affinity reagent capable of recognizing a terminal amino acid (or amino acids) to a “root” oligo attached to, or in proximity to, the peptide being reverse-translated, which is extended with a DNA code identifying the amino acid at each degradation cycle. In some embodiments, such a root oligo linked to a peptide comprises a TARGET/STANDARD tag (identifying which version of a peptide it is attached to) and optionally a sample barcode. After the required number of decoding and degradation steps, the root oligo is prepared for conventional high-throughput DNA sequencing, generating a sequence comprising the peptide sequence (or a representation of it), its identity as a TARGET or STANDARD, and its sample of origin.
6.6.2 DNA sample barcodes used with affinity reagent imaging platforms
In some embodiments, e.g., those making use of single molecule imaging (30-32), DNA sample barcodes can be detected by sequential hybridization with labeled oligos complementary to the sample barcodes. Complementary oligos can be labeled with a variety of fluorescent or colored dyes, with quantum dots or other optically detectable nanoparticles, with enzymes capable of generating a localized signal (e.g., luminescence), or a variety of other compositions known in the art for the generation of a spatially localized extemally- detectable signal. In some embodiments, a set of barcode sequences is used that are designed to have high specificity (minimal cross-hybridization of one barcode with the probes complementary to the other barcodes). The lengths of the barcode sequence modules generally impact the specificity with which they are recognized by complementary probes, the kinetics with which they bind and the temperature at which they can be removed after being read (i.e., analogous to the “melting temperature”). In some embodiments, modules of 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more bases are used.
Since imaging approaches typically do not reliably establish the order of individual barcodes present in a construct molecule (this being at or beyond the resolution limits of conventional imaging systems), probing a single molecule construct with a set of probes complementary to the set of barcodes yields a series of binary (i.e., the probe binds or does not bind) results that can be considered a binary code.
In some embodiments, a set of N distinct barcodes is used, where N is a number required to encode at least the number of samples to be pooled (i.e., 2N > number of samples). For example, 127 different samples could be uniquely identified by detecting the presence or absence of 7 different barcodes (27=128, yielding 127 unique barcodes, excluding 0 - the absence of all barcodes). To make use of this scheme would require that peptide:oligo constructs include up to all 7 of the barcodes.
In some embodiments, a larger set of barcodes is used, but constructs need include only a limited number of these. For example, if 11 distinct barcodes are used, but only 4 or fewer of these are included in any given construct, several hundred different samples can be uniquely identified. Use of fewer barcodes in a construct is advantageous since it reduces the length and cost of the sample barcoding oligos required: individual distinct barcodes may be 20-30 bases long.
In some embodiments, a further improvement in sample barcoding is provided by the use of Hamming codes. For example, in a coding scheme with 10 data bits and 4 parity bits (14 bits total, here corresponding to 14 distinct DNA barcodes), and using only 3 or 4 of these barcodes in any individual peptide construct, it is possible, using the features of the Hamming scheme, to identify and correct any single error in detection of any of the individual barcodes. Error detection is of great value in preventing mis-attribution of molecules to the wrong sample, which could result in erroneous quantitative results derived from errors in the counts of TARGET and STANDARD molecules in a sample digest.
Figure 10 illustrates the process of decoding an example construct in 16 cycles using a Hamming code: in each cycle one of 16 oligo probes complementary to one of the DNA codes is applied, detected when present, and removed. In this example the first 2 cycles involve probes that determine whether the peptide is a TARGET or a STANDARD (either one probe binds or the other). In the remaining 14 cycles, 14 probes complementary to each of the 14 DNA sequence modules described above are successively applied, detected when present, and removed. The resultis a 14-bit binary number that is capable of identifying one of 105 different samples with single (1-bit) error correction and double (2 -bit) error detection (referred to as “SECDED” in the art). Separately, but as part of the same decoding process, one or more recognition reagents capable of characterizing the peptide are applied in additional cycles to establish the identity of the peptide.
7 GENERAL METHOD OF ANALYSIS. Some embodiments of the invention comprise a series of steps to transform a protein- containing sample into an enriched standardized digest sample, or a flattened enriched standardized digest sample, prior to sequence-sensitive single molecule detection and counting. The invention is equally applicable to protein samples from sources such as blood, blood plasma and blood serum, as well as other sources, such as tissue homogenates, animal, plant or microbial samples, other body fluids, environmental samples and the like.
An important feature of the invention is its generality, allowing the design of similar protocols and using similar reagents and equipment to prepare peptide libraries suitable for analysis of a wide range of different proteins in a variety single molecule detection systems. To accomplish this the invention makes use of specific features of peptides generated by particular enzyme cleavages; features provided by multiple click chemistry pairs; requirements for specific peptide and oligo orientation; multiple levels of barcoding; and detailed control of the capture and enrichment of peptide:oligo constructs by specific affinity reagents.
In some embodiments, steps of the general method may be carried out in a different order than that outlined below. Nevertheless the invention requires that steps of sample digestion to peptides, generation of TARGET constructs from TARGET peptides, and addition of STANDARD constructs to the digest (thus creating a standardized digest) must precede enrichment of TARGET constructs and STANDARD constructs from the standardized digest using BINDERS.
7.1 DIGESTION OF SAMPLE.
In some embodiments a general approach for sequence-based protein quantitation involves digesting sample proteins (e.g., with trypsin) into peptides. In order to improve the completeness of digestion, disulfide bonds may be broken and proteins denatured to disrupt secondary and tertiary structure. Samples can be any kind of protein-containing sample without limitation, including body fluids, tissues, tissue lysates, tissue extracts, bacterial, fungal, animal and plant samples, recombinant proteins including protein drugs, food products, and the like.
In some embodiments, preparation of proteolytic peptides from a complex sample is carried out by a series of reagent addition steps which may include: denaturing a protein sample (e.g., with detergents such as deoxycholate or CHAPS, organic solvents, urea or guanidine HC1), reducing the disulfide bonds in the proteins (e.g., with tris(2- carboxyethyl)phosphine (TCEP), dithiothreitol or mercaptoethanol), alkylating the cysteines (e.g., by addition of iodoacetamide, or iodoacetic acid, which react with the free -SH group of cysteine preventing reformation of disulfide bonds), quenching excess iodoacetamide by addition of more dithiothreitol or mercaptoethanol, and (after removal or dilution of the denaturant) addition of the selected proteolytic enzyme (e.g. trypsin), followed by incubation to allow digestion. Numerous variations of this process, some including additional steps and some eliminating individual steps, are known in the art. In some embodiments, following incubation, the action of trypsin can be terminated, either by addition of a chemical inhibitor (e.g., TLCK) or by denaturation (through heat or addition of denaturants, or both) or removal of the trypsin (if the trypsin is on a solid support). There are many specific protocols available for proteolytic digestion, including automated methods using only liquid addition steps (14). In some embodiments it has been shown that automated digestion of biological samples can be very reproducible, exhibiting minimal variations (e.g., CV < 2%) between replicate samples.
In some embodiments, a desired peptide can be liberated by proteolysis without the need for disulfide reduction and alkylation (e.g., peptides that do not contain cysteine residues, and are not sterically constrained by nearby disulfide bridges), and in some cases without denaturation (e.g., peptides exposed on the surfaces of a protein). A range of alternative proteolytic enzymes can be used instead of trypsin to produce peptides defined by specific cleavage sites (including GluC, Lys-C, Arg-C, chymotrypsin, papain, pepsin, V8 protease, and the like), and chemical agents can also be used (e.g., CNBr cleavage at methionine residues).
In some embodiments a simplified digestion protocol is used comprising addition of protease to a liquid protein-containing sample without prior denaturation, reduction of disulfides or blockage of resulting cysteines. In some embodiments, heat is used to improve protein digestion by partial denaturation of protein substrates before the trypsin (or other proteolytic enzyme) is denatured, often using a variable temperature profile such as a ramp from room temperature to a higher temperature (e.g., 70C for a plasma sample). Such protocols are not expected to result in complete digestion of most proteins, but they can reproducibly generate certain tryptic peptides from some regions (e.g., surface exposed sequence segments) of some proteins, and if these peptides satisfy the requirements of TARGET peptides in a given application, the abbreviated protocol allows substantial simplification of the sample preparation workflow.
In some embodiments digestion can be carried out by immobilized proteolytic enzymes such as trypsin. Trypsin has been immobilized in the art at very high concentrations (e.g., on derivatized porous nylon, PVDF or nitrocellulose membranes) and used to perform very rapid (e.g., < 1 minute) digestion of proteins.
Proteolytic digestion disrupts protein: protein interactions by largely if not completely eliminating tertiary structure when a large protein is reduced to short peptides free to diffuse apart. This conversion of a large complex protein molecule to a series of short peptides offers a significant improvement in protein quantitation, since it removes the primary sources of assay interferences observed with immunoassays (in which a protein: protein interaction that blocks an epitope used by an assay antibody can result in a false negative result, while false positives can result from bridging interactions involving protein components not expected to be involved in the assay). An example in which tryptic digestion overcomes such an interference is the SISCAPA assay for thyroglobulin (55).
An average-length human protein produces about 50 peptides upon tryptic digestion, from which an assay designer can choose one or more peptides suitable for specific applications. This feature expands the range of detection alternatives compared to intact protein detection. Since intact proteins are so diverse in their physical properties, and therefore difficult to measure in many circumstances, the ability to select a proteotypic peptide from a range of alternatives as a stoichiometric surrogate for the intact protein is a major advantage of the digestion approach. It is often observed that within every “bad” protein there is at least one “good” peptide for a given application.
7.1.1 Proteolytic production of peptides with either one or two amino groups.
In some embodiments it is advantageous for TARGET peptides to have a single n- terminal amino group (i.e., “single amino peptides”), and in this case digestion is preferably carried out using an enzymatic protocol in which most peptides of an appropriate length do not contain lysine (which contains a side chain amino group), and thus have a unique amino group at the n-terminus of the peptide chain. While approximately half of the peptides resulting from tryptic digestion have a lysine at the c-terminus, addition of Lys-N or similar enzymes that cleave n-terminal to a lysine residue can in many cases remove the c-terminal lysine from tryptic peptides, resulting in a larger proportion of “single amino” peptides of a useful length (i.e., the sum of the c-terminal arginine peptides produced by tryptic digestion and the set of lysine peptides from which the c-terminal lysines have been removed by the additional action of Lys-N). Another approach to decrease the proportion of double-amino peptides is to chemically convert the lysine epsilon-amino groups to homoarginine in a guanidination reaction with methylisourea (56). Alternatively, in embodiments in which it is preferred to have amino groups at both ends of the peptide (i.e., “double amino” peptides) digestion with Lys-C in place of trypsin typically leads to generation of peptides having both an n-terminal amino group and a c-terminal lysine with its side chain amino group.
7.1.2 Sequential digestion steps and application to distinguish linkage sites
In some embodiments it is advantageous to link peptides to a specific type of molecule at or near the n-terminus, and link the peptide to a different type of molecule at or near the c- terminus - such an approach can be used to generate a construct in which the peptide is “in- line” between preceding and following polymeric components (such as oligonucleotides) useful in detection and identification of the construct. While a variety of chemical methods referred to elsewhere herein can be used to selectively couple molecules to the n-terminal amino group, a c-terminal carboxyl group, a c-terminal lysine side chain amino group, a fortuitously positioned cysteine sulfhydral group, etc., the limited specificity and quantitative yield of these reactions can make it difficult to quantitatively and reproducibly couple the two peptide termini to different molecular additions. In some embodiments the invention provides an improved alternative method making use of sequential proteolytic cleavages and coupling procedures comprising the following steps: 1) cleavage of sample proteins c-terminal to lysine residues (e.g., using the enzyme Lys-C) to produce “Lys peptides”, a number of which may include internal Arg residues; 2) reaction of the lysine side chain amino groups and the exposed peptide n-terminal amino groups of these peptides, in one or more steps (which may, for example, include click chemistry ligations), with a first added molecule (e.g., a first oligonucleotide); 3) removal or depletion of any remaining uncoupled amount of this first added molecule (or intermediate chemical components); 4) cleavage of the Lys peptides at internal arginine residues when present in a second proteolytic step (e.g., by addition of enzymes such as trypsin, Arg-C, etc.), thereby exposing a fresh and unreacted n-terminal amino group in the c-terminal part of those peptides that have a c-terminal lysine and such an internal arginine residue (i.e., the part of the original Lys peptide that extends from the amino acid following the arginine to the c-terminus); and 5) reaction of these fresh n-terminal amino groups, in one or more steps, with a second added molecule (e.g., a second oligonucleotide using a click chemistry linkage). These steps are illustrated in Figure 6 using a hypothetical protein sequence shown in Fig 6A containing arginine residues indicated by R, lysine residues indicated by K and other amino acids (all indicated here by X). Digestion with Lys-C produces a series of peptides shown in Fig 6B, each having an amino terminal and carboxy terminus. Reaction of the free amino groups (the n-terminal amino group and the side chain amino group of lysine residues K) with an added group Ml produces a series of modified peptides shown in Fig 6C. Subsequent digestion with trypsin produces a set of peptides shown in Fig 6D. Reaction of the n-terminal amino groups exposed by this second digestion with added group M2 results in the peptides shown in Fig 6E. Two of these peptides (indicated by boxes and large asterisks) have the different groups M2 and Ml positioned, respectively, at the n-terminus and near the c-terminus (i.e., on the side chain amino group of the c-terminal lysine), separated by a sequence of amino acids long enough to have a high probability of being proteotypic for the target protein (e.g., unique to one protein in the human proteome): these peptides represent likely choices for use in the invention for quantitative measurement of the target protein.
In the case of K (Lys-C) cleavage followed by R (trypsin) cleavage as described above, not all proteins of interest necessarily comprise an appropriate proteotypic “R.. K” sequence. Surprisingly, however, our in silico calculations on the known protein-encoding regions of the human genome indicate that 17,482 of the approximately 20,000 proteins in the human proteome contain at least one such peptide with a length of 7-31 amino acids (there being approximately 97,330 such peptides in the human proteome overall). Of these, approximately 13,470 peptides are 15-to-31 amino acids long, which is long enough to encompass multiple epitopes for BINDER recognition. In contrast to earlier peptide detection methods using mass spectrometry, in which shorter peptides are generally favored over longer peptides due to their greater MS signal strength and better separation by conventional reversed-phase chromatography, the single molecule methods used in some embodiments of the present invention are not typically biased against long peptides. As described herein, recognition of multiple epitopes in a peptide can provide increased specificity, as well as the possibility of leveraging avidity effects to improve BINDER capture efficiency.
Each of the two sequential linkages to ends of the peptide may, in some embodiments, involve a sequence of reactions, for example an initial reaction with an amine coupling reagent such as an NHS or sulfo-NHS conjugate of a click reagent (e.g., NHS-BCN, NHS-DBCO, NHS-TCO, NHS-tetrazine, NHS-azide, an NHS-alkyne, or the like), followed by reaction of the click group thus introduced with a corresponding click chemistry partner attached to the molecule to be added (e.g., reaction of a BCN-modified peptide with an azide-modified oligo tag or barcode). Other schemes for linkage of molecules to peptide amino groups are well- known in the art (e.g., direct reaction with an NHS adduct of the molecule to be added, etc.) and can be used to create the linkages described. In some embodiments, the same coupling chemistry is used in the first and second linkage steps, since the first linkage is complete and any component reagents can be removed prior to the exposure of the amino group used in the second linkage step. Alternatively, in some embodiments, the first linkage step is carried out using one or more reactive groups different from those used in the second step. For example, in some embodiments the first linkage step makes use of an NHS-BCN reagent to activate peptide amino groups which are then reacted with an azide-activated oligo to accomplish the first linkage. Following removal or depletion of the reagents involved in this first step and a second proteolytic digestion, a second linkage is carried out by activating the freshly-exposed n-terminal amino group with NHS-tetrazine and subsequent reaction with a DCO-activated oligo. Because of the general orthogonality of BNC-Azide and tetrazine-DCO click reactions, these two steps are unlikely to cross-react even if some amount of reagent persists from the first modification step. In some embodiments, the first step of the 2-step procedure is carried out to the point of activating the lysine amino group with a member of a first click pair (e.g., BCN) but without linkage to the first oligo, and a second activation step, following the second proteolytic step, is accomplished using a member of a second orthogonal click pair (e.g., tetrazine), after which both activated peptide groups can be reacted simultaneously with the respective orthogonal activated oligos (i.e, with an azide-activated oligo at the lysine site and a TCO-activated oligo at the n-terminus). In some embodiments that make use of sequential reactions (e.g., after the 2 successive proteolytic cleavages by Lys-C and trypsin as described above) to expose different peptide reactive sites (e.g., lysine and n-terminal amino groups), the overall yield of correctly modified products can be increased by removal of the modifying reagents (e.g., NHS-BCN and an azide- labeled oligo) used in the first step (e.g., coupling an oligo to the lysine amino group via BCN- azide click coupling) before executing the second cleavage to expose fresh reactive groups (e.g., n-terminal amino groups). This removal step decreases the probability that the peptide reactive groups exposed by the second cleavage (e.g., the amino terminal NH2 groups shown among the peptides of Figure 6D) will be modified in the same way as the amino groups exposed after the first digestion (e.g., addition of group Ml), and instead react only with a second reagent or reagents that introduce a different addition (labeled M2 in Figure 6E). In some embodiments, the reagents involved in adding group Ml to the peptides are removed by separation of peptides from the solution phase (e.g., by capturing the peptides on a suitable support such as a reversed phase or ion exchange support and then washing the soluble reagents away), by size exclusion separation to separate peptides from low molecular weight reagents (such as NHS-BCN, etc.), or by exposing the mixture to a solid support comprising a substantial content of free amino groups to which any un-reacted amino-modifying reagents can couple before removal of the support. A variety of magnetic beads, agarose particles and column packing materials having reactive amino groups are commercially available and can be used for this purpose.
Following such a 2-step linkage procedure, peptides whose sequences are bounded by a c-terminal lysine and an n-terminal amino acid that is immediately preceded in the protein sequence by an arginine will, with high likelihood, be modified with distinct added groups on the two termini, as desired. In the invention, specific peptides with these characteristics (i.e., a preceding arginine and c-terminal lysine) can be selected as TARGETS and efficiently incorporated in a predetermined orientation (i.e., n-term to c-term or vice versa) into constructs amenable to single molecule detection and counting. In the method shown schematically in Figures 12 and 13, an in-line construct is assembled that comprises, in order, an oligonucleotide in 5’ to 3’ orientation, a peptide in C-term to N-term orientation (opposite to the conventional method of writing a peptide sequence) and a further oligonucleotide in 5’ to 3’ orientation. When the final (3’) oligo is omitted from such a construct, the peptide’s n-terminus is exposed and available for sequential degradation by Edman (or alternative) chemistries used to read peptide sequence (e.g., Encodia or Quantum-Si technologies). Those knowledgeable in the art will recognize that the approach described allows the design and construction of hybrid molecules in which peptides, oligonucleotides and other polymers can be linked in a specified order and a specified orientation adapted to a variety of different single molecule detection technologies.
It is well-known in the art that improved proteolytic digestion can be achieved using a combination of Lys-C and trypsin (57), and that this combination can be used to advantage in sequence (58) addition of Lys-C first in a concentrated denaturant (e.g., 6M urea), followed by subsequent addition of trypsin after dilution of denaturant (e.g., dilution to 1.5M urea). However, the use of the two enzymes in sequence, with each enzyme cleavage step followed, respectively, by coupling of a different added molecule to available amino groups, is novel and provides an effective method for the assembly of oriented linear constructs. Similarly, the use of the two enzymes in sequence, with the first enzyme cleavage step followed by coupling of an added molecule to available amino groups, while leaving the n-terminal amino group created by the second enzyme cleavage step is a novel and effective method for the generation of a peptide-oligo construct having a free, unmodified n-terminal amino group available for cyclical degradative sequencing.
In some embodiments useful for nanopore detection, a linear construct is produced comprising a leading oligonucleotide, a central peptide, and a trailing oligonucleotide according to the invention. The method of the invention allows each segment to be assembled in a specific orientation as required by a detector such as a nanopore regulated by a DNA motor; e.g., a leading oligo oriented 5’-to-3’, followed by a peptide that is oriented c-terminal- to-n-terminal, followed by an oligo oriented 5’-to-3’ (as shown in Figure 12). Likewise, for detection technologies that require peptides to be immobilized via the c-terminus while retaining an unmodified amino group available for sequential degradation (e.g., by Edman chemistry), the second amino group modification can be omitted (i.e., just using sequential first digestion-modification-second digestion) to produce tethered peptides with free n-termini as required by some degradative sequencing single molecule detectors. While a c-terminal lysine is a preferred in-line linkage site in multiple embodiments of the sequential cleavage method described here, other cleavage sites can be used in place of Arg cleavage (as expected above in a second cleavage using trypsin), since many proteolytic enzymes, as well as chemical agents such as CNBr, generate a fresh n-terminal amino group when they cleave a polypeptide. A wide range of such alternative cleavage specificities are available: for example, the enzymes AspN, GluC, chymotrypsin, elastase or even relatively non-specific proteinase K can be used in combination with Lys-C or equivalent enzymes to generate different sets of double-amino peptides and extend the applicability of the approach to proteins not well-covered by the R. . K method.
A related alternative embodiment makes use of chemical linkages to peptide carboxyl groups instead of amino groups, and employs a series of steps to 1) cleave sample proteins n- terminal to Asp residues (e.g., using the enzyme Asp-N) to produce “Asp peptides” having an n-terminal Asp residue; 2) react the Asp side chain carboxyl groups and the exposed peptide c-terminal carboxyl groups of these peptides (and any internals Glu carboxyl side-chains), in one or more steps (which may, for example, include click chemistry ligations), with a first added molecule (e.g., a first oligonucleotide); 3) removal or depletion of any remaining uncoupled amount of this first added molecule (or intermediate chemical components); 4) cleavage of the Asp peptides resulting from the first cleavage at one or more selected internal residues (e.g., by addition of trypsin to cleave K or R, Lys-C to cleave Lys, Glu-C to cleave at Glu, etc.), thereby exposing a fresh and unreacted c-terminal carboxyl group in those peptides that have an n-terminal Asp and such an internal residue; and 5) reaction of these fresh c- terminal carboxyl groups, in one or more steps, with a second added molecule (e.g., a second oligonucleotide using a click chemistry linkage). In this case, it is preferred to select as TARGET peptides those that lack an internal Glu residue since this would comprise a 3rd carboxyl group when AspN is used initially. A symmetrical situation would obtain if an enzyme with GluN specificity were used initially, and preferred TARGET peptides would be those lacking internal Asp residues.
In some embodiments, appropriate STANDARDS can be produced according to any of the 2-step methods described above by carrying out a similar set of steps using a synthetic peptide with an extended n-terminal sequence providing the same second-step cleavage site as the process applied in processing samples. In some embodiments the cognate STANDARD and TARGET constructs are distinguished by sequences incorporated into one or more oligos linked to the peptide. In some embodiments, one of more of the linkage steps in assembling STANDARD constructs makes use of a different click chemistry pair than that used in assembling TARGET constructs.
In some embodiments, multiple different 2-step peptide modification processes as described above are carried out in parallel on a sample, and their results combined to provide a collection of TARGET peptide constructs providing improved protein coverage or detection performance compared to a single 2-step procedure (e.g., the R...K method described initially).
7.2 GENERATION OF TARGET PEPTIDE CONSTRUCTS
TARGET peptide constructs are created by linkage of TARGET tags to TARGET peptides in a sample digest. In some embodiments, TARGET tags are linked to common chemical features of peptides, for example peptide n-terminal and/or lysine epsilon amino groups. The common occurrence of such features implies that it would be advantageous to convert a large fraction, and potentially all, peptides in a digest into constructs of the form of TARGET constructs, irrespective of whether each such construct is to be measured against a STANDARD construct. The feasibility of such an approach is related to the efficiency with which TARGET tags can be coupled to a range of peptides and the cost of reagents required to modify more than a few specific peptides. Enabling the use of inexpensive, efficient reagents and methods to link TARGET tags to all digest peptides is therefore one of the objects of the invention.
7.3 ADDITION OF STANDARD.
A TARGET peptide in a sample or set of samples is “standardized” by addition of a known quantity of its respective STANDARD (the STANDARD based on the sequence of the TARGET peptide with modification as disclosed above). The resulting “standardized sample digest” may be so standardized with respect to one TARGET peptide, or to multiple TARGETS (requiring multiple cognate STANDARDS). Multiple STANDARDS may be added together at one time, or at different times (e.g., after part of a standardized sample is analyzed, additional STANDARDS may be added to permit subsequent analysis additional TARGET peptides. In some embodiments the added quantity of a STANDARD is known in absolute quantitative terms (e.g., in grams/sample or grams/liter, in moles/sample or moles/liter, or molecules per sample or molecules per liter), and in some embodiments the amount of STANDARD added is known to be the same as, or have a defined ratio to, the amount of STANDARD added to other samples (thus allowing multiple samples to be compared on a consistent scale, particularly useful when samples run in a batch are compared with one another, or in longitudinal studies measuring changes in amounts of biomarker proteins occurring between serial samples from an individual). A sample to which STANDARDS have been added corresponding to a set of TARGET peptides is considered a “standardized sample” with respect to those TARGET peptides. In some embodiments one or more STANDARDS are added to a protein sample before or during digestion of the sample proteins to peptides. In some embodiments, one or more STANDARDS are added after digestion but prior to enrichment. In some embodiments additional STANDARDS are added to a digest sample that has been previously analyzed according to the invention for an earlier set of TARGET peptides and STANDARDS, enabling cycles of measurement for successive panels of peptides in a sample.
In some embodiments, the quantity of STANDARD added to a sample is chosen based on the amount of the respective TARGET peptide expected to be present in the sample(s). Specifically, the quantity of STANDARD may be based on the average or median amount of TARGET peptide observed or known to be present in similar samples, so that the ratio of TARGET peptide to STANDARD molecules falls in a range centered close to 1.0 (i.e., equal amounts). Depending on the variation in TARGET peptide amount in the samples, the ratio may for example range from 0.5 to 2, or 0.2 to 5, or 0.1 to 10, or 0.01 to 100. The benefit of arranging STANDARD amount based on TARGET peptide ranges is that it best avoids situations where the ratio is very large (e.g., 1,000: 1). Extensive investigations of the population ranges of clinical protein analytes (59) have shown that the observed range is protein-specific. High-abundance blood proteins such as albumin or hemoglobin usually vary by small amounts (much less than 2-fold), while acute phase proteins such as C-reactive protein (CRP) or serum amyloid A (SAA) can increase by 1000-fold in a serious infection. Since the present invention utilizes single molecule counting for purposes of quantitating the peptides (and therefore yields measurements whose precision is expected to depend on counting statistics in which precision is typically determined by the square root of the number of objects counted), a standardized sample with a 1,000: 1 ratio of TARGET peptide to STANDARD (or vice versa) would require the detector to count 1,000 times as many of the higher abundance peptide molecules as lower abundance peptide molecules (i.e., a total of 1,000 + 1 = 1001 times the minimum acceptable counts in each peptide to achieve desired precision based on established results in counting statistics and experimental data), thereby wasting counting capacity compared to a situation in which the peptides are present at closer to equal abundance (1 + 1 = 2 times the minimum acceptable counts would need to be counted). It will be clear to one skilled in the art that the optimal situation for efficient quantitation by counting TARGET and STANDARD peptide molecules is one in which the ratio is as close to 1 : 1 as practically possible. A person skilled in the art could design an assay according to the invention to measure hemoglobin using the population average value for the TARGET peptides as the STANDARD amount, while in CRP it could be preferable to set the STANDARD amount higher than the population average TARGET amount so as to better center the TARGET to STANDARD ratio for this highly inducible protein closer to 1.0.
For similar reasons, in some embodiments it is preferred that the amounts of molecules of different TARGET peptides measured together in a multiplex assay (each with its cognate STANDARD) should be as nearly equal as possible. This arrangement results in optimal precision achievable with a given total capacity for counting molecules according to counting statistics, and can be achieved by stoichiometric flattening during enrichment as described below.
In some embodiments two or more TARGET peptides are selected for a protein, yielding independent measurements of the protein’s amount that can be combined to deliver improved precision, or used to circumvent sequence (i.e., genetic) or post-translational variation in TARGET peptides in a population of samples. In general, unless a TARGET peptide sequence is repeated in a protein, different TARGET peptides from the same protein will be present in equal amounts after complete digestion (i.e., present in molar amounts equal to the molar amount of the parent protein). In some embodiments, multiple TARGETS are selected from a protein that exhibits highly variable amounts in relevant samples, and their respective cognate STANDARDS are added at different amounts so that the TARGET -to- STANDARD ratio of at least one of the TARGETS is close enough to 1 : 1 to be efficiently countable and thereby furnish an accurate ratio measurement. In some embodiments, for example, three TARGET peptides are selected and their respective STANDARDS added to the sample digest at O.lx, l.Ox, and 10. Ox the average amount of TARGET in an average sample, so that variation in the amount of target protein over a range of 10-fold above and below the expected amount (100-fold dynamic range) will be measured by at least one of the TARGET -to- ST AND ARD ratios close to 1 : 1. In some embodiments of this kind, the amounts of the respective BINDERS used to enrich the peptides are adjusted separately to bring the peptides into flat stoichiometry (near equivalence) before detection and counting.
7.3.1 Differences between MS and single molecule requirements
It is worthwhile to note that detection of peptides by single molecule counting involves significantly different tradeoffs as compared to quantitation by mass spectrometry (MS). MS typically produces analog measurements of the amount of a molecule that passes through a set of mass filters, or reaches a detector after some mass or size-based separation process. Since different molecules are typically detected at different times, the amount of one peptide does not usually affect the detection of a different peptide in a significant way, apart from extreme cases in which a detector is saturated, or a total aggregate ion capacity is exceeded. The dynamic range of modem triple-quadrupole MS instruments approaches 100,000-fold for a given analyte (e.g., peptide), and the MS (or LC-MS) system can accommodate two different peptide molecules that differ in abundance by 1 million-fold and still produce some quantitative information on both (e.g., by selecting a low-efficiency detection mode such as an infrequent MRM fragment mass for the high abundance molecule and a very efficient detection mode for the low abundance molecule). The detection process is typically driven by a chromatographic separation of 2 to 60 minutes prior to introduction of separated peptides into the MS, and is essentially insensitive to the total number of molecules in the applied sample: the analytical run will “consume” all the applied sample and will occupy the same period of time regardless of the number of molecules being analyzed. The situation pertaining in single molecule detection and counting is very different: the number of molecules sequenced and counted depends directly on time. Thus, for nanopores the number of molecules analyzed is a direct function of the time required to observe a typical molecule’s sequence (e.g., 0.25sec for a 50bp equivalent length oligo:peptide construct at a typical rate of 200bp/sec through a single nanopore) multiplied by the number of pores operating in parallel: running the device twice as long will typically detect twice as many peptides. Hence nanopore detection methods are inherently limited by the number of molecules that can be sequenced per time. As discussed below, the precision of ratios between TARGET and STANDARD peptides is largely determined by counting statistics, with more counts yielding higher precision. As a result, it is highly desirable in the present invention to avoid wasting capacity (in peptides or time) in counting more than a specified number of a given peptide, since this capacity could instead be used count more molecules of a lower abundance peptide to improve its precision. In some embodiments this principle is applied to generate enriched peptide samples in which a) the number of added STANDARD molecules is close to the average number of TARGET peptide molecules in typical samples, and b) the sum of TARGET + STANDARD molecules is approximately the same for all proteins and peptides being measured in a multiplex panel. This principle, referred to herein as “Stoichiometric Flattening” is described in greater detail below.
Similarly, for degradative peptide sequencing methods, a series of sequential steps is applied in parallel to a large number of immobilized peptide molecules, with the number of molecules being determined by the physical scale of the device used (number of peptides that can be immobilized and resolved by the detector) and the number of such runs. Likewise, for single molecule imaging detection methods, a large but limited number of immobilized peptide molecules is detected in a run, with the number of molecules being determined by the physical scale of the device used (number of peptides that can be immobilized and resolved by the detector) and the number of such runs. The overall throughput in terms of the number of molecules analyzed per unit of time is determined by the number of molecules sequenced per run, the duration of a run, and the number of runs (for a batch method). Hence, as with nanopore sequencing, efficiency is maximized by sequencing similar numbers of each TARGET and STANDARD peptide, instead of allowing one or more high abundance peptides to occupy a large fraction of the capacity, reducing the numbers of molecules from lower abundance TARGETS and thus the precision of their measurement.
7.3.2 Comparison between samples
In some embodiments, a fixed amount of each STANDARD (e.g., an equal volume aliquot from the same STANDARD stock solution) is added to each of a multiplicity of samples, thereby establishing a shared reference basis that allows accurate comparison of the amounts of TARGET peptides between these samples. This approach enables relative comparison of protein amounts between samples, but does not directly provide absolute quantitative information (e.g., in mass or concentration units) without the use of external calibrators (see e.g., US provisional patent application 63/213,371 - entitled Calibration of Analytical Results in Dried Blood Samples, filed 6/22/21, incorporated herein in its entirety). In some embodiments, the amounts of STANDARD molecules added represent known quantitative amounts (i.e., numbers of molecules, mass, or concentration), in which case the absolute amounts of TARGET peptides can be estimated by multiplying the STANDARD amounts by the observed TARGET peptide to STANDARD ratios.
In some embodiments, STANDARDS are generated from synthetic (e.g., recombinantly expressed) protein constructs whose digestion yields STANDARDS in relative ratios defined by their copy number in the construct’s sequence (see for example Patent Application US 2006/0154318). In some embodiments the STANDARDS are provided in physical forms enabling simplified manipulation and addition (as described in US9274124). In some embodiments the STANDARDS are added as peptides in solution.
7.3.3 Amount of standard added
In some embodiments, the amounts of each STANDARD added can be determined according to (or equal to) the average or baseline levels of corresponding TARGET peptides observed in a subject’s prior samples, thus providing STANDARD levels that are more or less equal to the expected TARGET peptide levels for that subject. This approach provides optimal precision and efficiency for the measurement of longitudinal samples from that subject, and represents an ideal case of personalized protein measurement with the potential to maximize precise detection of small protein changes from baseline levels over time. 7.4 ENRICHMENT OF TARGET PEPTIDES AND STANDARDS.
Once a sample digest has been “standardized” by addition of one or more of the appropriate STANDARDS in the desired amounts, the sample can be fractionated or purified to enrich analytes to a desired level, and/or to deplete unwanted components (the analytical “matrix” background). In some embodiments, peptide-oligo constructs according to the invention are purified by a process that reversibly captures peptides (e.g., reversed-phase adsorbents such as Cl 8 resins) with the result that oligonucleotides and other non-peptide components not part of peptide-oligo constructs can be washed away and thus removed. In some embodiments, peptide-oligo constructs according to the invention are purified by a process that reversibly captures oligonucleotides (e.g., adsorbents such as Ampure resins) with the result that peptides, remaining proteins and other non-peptide components not part of peptide-oligo constructs can be washed away and thus removed. In some embodiments addition features of the peptide-oligo constructs are used to isolate them, for example binding of a biotin group engineered into the oligo, the peptide or the linkage between them can be captured by immobilized streptavidin as a means of cleaning up the desired constructs.
After any such construct cleanup steps, enrichment using specific affinity BINDERS is an important aspect of the invention. In some embodiments BINDERS are used to carry out specific affinity enrichment of the respective cognate TARGET peptide and STANDARD pairs. The TARGET peptide, its STANDARD and the BINDER designed to bind them collectively form a cognate set of molecules specialized for the measurement of a specific TARGET peptide and thus its parent protein.
7.4.1 Magnetic bead enrichment
In some embodiments, BINDERS are immobilized on magnetic beads and these beads mixed with the standardized digest and incubated to allow binding of peptides to the BINDER. In some embodiments (e.g., using anti-peptide antibodies as BINDERs) the BINDERS can be bound to commercially-available Dynabeads G via Protein G’s affinity for the Fc domain of IgG, and optionally covalently linked to the beads using DMP crosslinking or other equivalent crosslinking methods forming bonds between the magnetic bead and the BINDER. In some embodiments BINDERS are bound to other types of magnetic particles such as Tosyl-activated beads, or otherwise chemically bound to particles that can be manipulated to allow exposure to and removal from a sample. Due to their specific affinity for the TARGET peptide sequences, the BINDERS bind the respective TARGET peptides and STANDARDS when placed in contact with them. In some embodiments, BINDERS are in contact with a standardized sample digest for a 30-minute incubation period with shaking to keep the beads suspended. In some embodiments, BINDERS are incubated with standardized digest for shorter periods (e.g., 1, 2, 5, 10, 15, 20 or 25 minutes) and in some embodiments, BINDER are incubated with standardized digest for longer periods (e.g., 45, 60, 90, 120, or 180 minutes, or 4, 5, 6, 9, 12, 18 or 24 hours). Depending on the kinetic properties of specific BINDERS, the abundances of their cognate TARGET and STANDARD peptides, and the presence or amounts of any competing sample peptides (e.g., peptides with different but similar sequences to a TARGET peptide), persons skilled in the art will understand how to perform experiments to evaluate and select a suitable incubation time. After binding, the beads with attached BINDERS and their bound peptide cargo are separated from the digest. To achieve this separation, the BINDER beads can be collected together using a magnet and either removed from the digest (for example using a Kingfisher device provided by ThermoFisher); held in a vessel (for example by magnetic attraction to the side of a well of a 96-well plate) while the digest solution is removed to another container by a pipetting device (e.g., an Agilent Bravo, Beckman Counter Biomek, Hamilton, Tecan or other liquid handling robot); held magnetically in a conventional pipette tip while surrounding liquid is expelled and replaced (e.g., as in the established “Magtration” technology); processed in a “magnetic bead trap” (60) or processed manually (e.g., by manipulation of vessels, magnets and handheld pipettes). The standardized digest remaining after separation from BINDERS may be preserved apart from the BINDERS and stored or subjected to additional processes to measure additional constituents at a later time.
In some embodiments, the beads with BINDERS and specifically bound peptide cargo are washed by addition to, and mixing with, aliquots of a wash solution, after which the beads are recollected and separated from the wash. In some embodiments the beads are washed 1, 2, 3, or 4 times with volumes of 50 to 400uL of wash solution, which may include buffers (e.g., PBS or Tris, typically at pH between 6.0 and 8.5 when antibody BINDERS are used), gentle detergents (e.g., CHAPS or deoxycholate), and/or low concentrations of an organic solvent (e.g., 5-20% acetonitrile added to help remove peptides bound to beads non-specifically). Persons skilled in the art will be able to evaluate and select wash solution compositions that are most effective at removing non-target digest peptides (and other components) while retaining TARGET peptides and STANDARDS on the BINDER. Since it is desired that the specifically-bound peptides remain attached to the beads during the wash procedure, in some embodiments the BINDERS are designed or selected to have half-off-times (the time period over which half the bound molecules become unbound, i.e., the dissociation half-life) longer than the time required to execute the sequence wash steps (typically 10-15 minutes using current laboratory automation systems). In some embodiments, for example those in which the BINDERS are present at high local concentrations at some point(s) during the wash process, TARGET and STANDARD peptides that leak from the BINDER can be re-bound by the same or other BINDER sites before being lost. It will be understood by those skilled in the art that experimentation with specific BINDERS, TARGETS and STANDARDS, specific wash solution compositions and temperatures, specific wash volumes and vessel geometries, and specific sample digest matrices is required to optimize a) the enrichment of the TARGETS and STANDARDS and b) the removal of the other digest components. As a general matter, the purer the TARGETS and STANDARDS are after enrichment, the better the invention will function to measure them precisely.
In some embodiments, the recovery of TARGETS and STANDARDS from a digest by enrichment using BINDERS is evaluated by successively contacting 2 or more separate aliquots of BINDER with the digest, and comparing the amounts eluted from the first and second BINDER aliquots. In cases where the number of BINDER binding sites is greater than the number of TARGET or STANDARD molecules, an effective BINDER will typically capture 80% or more of these peptides on the first capture step, and the second (and possible subsequent) capture steps will capture successively smaller amounts of the peptides. The ratio of TARGET and STANDARD captured in the first capture divided by the amount of these peptides in the sum of first and second captures provides a useful index of the overall recovery. In cases where the number of BINDER binding sites is less than the number of TARGET and STANDARD molecules, sequential BINDER captures will yield more nearly equal amounts of these peptides. In some embodiments, BINDERS immobilized on magnetic beads are contained within microfluidic systems capable of moving the beads between different liquid volumes to effect the steps of the invention. Microfluidic systems and technologies well known in the art allow use of reduced liquid volumes (and thereby less dilution of low-concentration analytes such as enriched peptides) and more complex, multi-step chemical processes with reduced losses as compared to conventional lab-scale (i.e., 5-500uL) liquid handling processes (for example in a magnetic bead trap device (60) .
7.4.2 Column format enrichment
In some embodiments, BINDERS are immobilized on columns through which sample digest and wash solutions can be passed (3), typically using a liquid chromatography system.
7.4.3 Post-peptide-capture immobilization
In some embodiments BINDERS are contacted with a standardized digest in solution (i.e., BINDERS free in solution, not bound to a support), thereby maximizing freedom of diffusion and potentially providing faster binding to TARGET and STANDARD peptides or respective VEHICLE constructs. After binding, the BINDERS can be themselves captured on magnetic beads (e.g., protein G coated beads in the case of antibody BINDERS, streptavidin coated beads in the case of BINDERS that have been previously biotinylated, etc.) or on columns functionalized with equivalent capture functionalities.
7.4.4 Multiple enrichment cycles
In some embodiments, peptides captured by and eluted from BINDERS are subjected to one or more additional cycles of BINDER enrichment. When bound peptides are eluted from BINDERS by a change to specific elution solution conditions (e.g., pH 2.5 in the case of antibody BINDERs), reversal of these conditions (e.g., neutralization to pH near 7.0) can allow the peptides to bind to respective BINDERS once again. Similarly, if elution is carried out by exposure to a chaotropic salt, a detergent, or increased temperature, these conditions can be reversed (e.g., through dilution of a salt or detergent, dialysis, or cooling), restoring conditions in which binding to BINDER (typically fresh BINDER) can occur. After the initial BINDER enrichment process, the captured peptides are removed from the bulk sample digest, and thus the vast majority of non-target “matrix” peptides are no longer present. A second BINDER enrichment cycle thus begins with a much smaller amount of total peptide material, in which the TARGET and STANDARD peptides represent a much larger fraction of dissolved material (mainly peptides). In some embodiments, a second BINDER enrichment cycle is carried out using fresh BINDER (i.e., BINDER that has not previously been exposed to complex digest), and this additional cycle further depletes non-target matrix peptides, while recovering a large fraction of the TARGET and STANDARD peptides, and resulting in a purer sample of the TARGET and STANDARD peptides of interest. Increasing the fraction of peptides that are desired TARGET or STANDARD peptides directly improves the efficiency of single molecule detection aimed at measuring those peptides by decreasing the time and resources spent sequencing other peptides. In some embodiments, one or more additional enrichment cycles are carried out using a smaller amount of BINDER (e.g., a smaller volume of beads) than used in the initial enrichment cycle, resulting in an opportunity to reduce the volume in which peptide constructs are eluted and thereby increasing the concentration of TARGET and STANDARD peptide constructs introduced into a detector. The ability to deliver TARGET and STANDARD peptides in a small volume, or on a small number of beads, can improve the efficiency of single molecule detection. Providing peptides for single molecule detection in as concentrated a form as possible can be important in specific detection methods, for example in delivering peptides to the vicinity of a sequencing nanopore as described below. In some embodiments, a first BINDER capture step is carried out using BINDER immobilized on a large number of small (e.g., 1, or 2.8 or 5 micron diameter) magnetic beads, thus maximizing the dispersion of BINDER in the sample digest volume and decreasing the diffusion distance and time required for peptide capture, after which the captured peptides are eluted and recaptured by fresh BINDER immobilized on a smaller number of beads, with the result that the peptides captured in the second round are both purer (fewer non-TARGET peptide molecules) and more concentrated (e.g., when the beads are collected magnetically into a mass for removal from contact with the peptide-containing liquid). In some embodiments, recovery of TARGET and STANDARD peptides in a very small volume of beads allows the bound peptides to be exposed to very small volumes (equal to or slightly greater than the included volume of the bead mass) of reagents used in the preparation of peptides for linkage to VEHICLES as described below. In some embodiments a second capture makes use of BINDER bound to a small number of larger (e.g., 10-40 micron diameter) beads each having a greater BINDER capacity, such that each bead can be taken through the series of chemical steps to prepare it for and/or complete VEHICLE linkage to it in a separate container, as described below.
In some embodiments a first BINDER enrichment cycle is carried out to recover and purify the TARGET and STANDARD peptides from a mass of sample digest, and the same or similar peptide-specific binders used to identify these TARGET and STANDARD peptides in a single molecule detection system. The first BINDER enrichment cycle can thus serve to remove non-TARGET peptides present in the digest (thus minimizing analytical capacity wasted on irrelevant peptides), and optionally to improve the stoichiometric flatness of a series of different TARGETS to be detected and counted (as described in detail below). For example, a fluorescently-labeled BINDER can be used to detect its cognate TARGET and STANDARD peptides in an optical imaging system of the kind used in high-throughput DNA sequencing or in similar protein detection systems (e.g., US 2021/02397, and thereby used to count the numbers of such molecules. In some embodiments a second class of BINDER that specifically recognizes a unique tag present in the STANDARD but not in the TARGET peptide is used to separately distinguish the STANDARD peptide molecules from the TARGET molecules. Using these two separate detection steps applied to a population of single molecules (i.e., identification of the molecules in a TARGET + STANDARD pair, and separate identification of STANDARD peptides), allows separate identification and counting of the molecules of TARGET and STANDARD peptides, and thereby determination of the TARGET-to- STANDARD ratio. In some embodiments, this approach is used to identify all STANDARD molecules in one recognition step (using a BINDER specific to the STANDARD molecule tag), and each different TARGET+ STANDARD pair is identified by a separate detection step using the TARGET-specific cognate BINDER. In some embodiments, the tag distinguishing the STANDARD molecules can be an added amino acid sequence on either amino or carboxy terminus of the TARGET peptide sequence (for example the well-known FLAG peptide sequence used in recovering expressed proteins), a chemical group bound to the STANDARD (such as biotin), or any of a variety of distinctive chemical structures unlikely to be found in the group of TARGET peptides. In some embodiments this approach can be applied to count whole protein molecules instead of proteolytic peptides using peptide-sequence-specific BINDERS to identify their cognate linear sequence epitopes in intact target protein molecules. In such embodiments aimed at detecting protein molecules, a first BINDER enrichment cycle can use BINDERS specific for an intact protein as well as BINDERS specific for linear peptide epitopes, and STANDARDS can be versions of the intact protein with any of a variety of unique tags as for peptides.
7.4.5 Stoichiometric flattening
In some embodiments, a plurality of TARGET and STANDARD peptides is enriched by the corresponding plurality of BINDERS, and the relative amounts, kinetic properties or solution conditions of the BINDER enrichment are selected or adjusted so as to accomplish some degree of stoichiometric flattening; i.e., to diminish differences in the relative amounts of different TARGET + STANDARD peptide pairs. As described above, to obtain the benefit of stoichiometric flattening, it is necessary to standardize measurement of TARGET peptide constructs by incorporating STANDARD constructs that can act as internal standards before BINDER enrichment (to preserve information on quantitation), and to effect enrichment using peptide-specific BINDERS.
Stoichiometric flattening, described in greater detail below, is distinct from conventional sample preparation enrichment methods. In genome sequencing, gene stoichiometries are equal, or close to equal, to begin with and do not need to be adjusted. In studies of RNA’s, which can be present at a range of relative abundances, enrichment can be used to focus on a specific set of RNA’s (e.g., mRNA’s), but sequence-specific internal standards are not used and thus differential enrichment of different sequences destroys relative abundance information. In proteomics studies aimed at broad proteome coverage, internal standards have found more use because of the practicality of using stable isotope labeled peptides and mass spectrometric detection; however, the requirement for individual enrichment BINDERS for a significant fraction of digest peptides remains an insurmountable problem given the number of proteins present in most samples (the ~50-fold higher number of peptides therefore present in a sample digest). In the case of targeted methods such as SISCAP A, the use of mass spectrometric detection makes it possible to achieve some degree of stoichiometric flattening by choosing target peptides that have very different ionization properties; i.e., choosing peptides with extremely high MS detection performance to represent low abundance proteins (where sensitivity is a major challenge) and choosing peptides with much lower performance to represent higher abundance molecules. Such peptide choices are an important component of stoichiometric flattening in MS methods, in addition to the adjustment of relative BINDER enrichments as described above.
Stoichiometric flattening by choosing peptides with different detection efficiencies is not useful in single molecule counting methods, which generally count all molecules with equal efficiency. In the present invention, focused on single molecule detection and counting, in most embodiments there is little or no difference expected between the detection efficiencies of different peptides (in contrast to the situation in mass spectrometry where ionization efficiency, ion transport, molecular fragmentation, and detection efficiency are all peptide- specific and highly variable). While the near-equivalence of peptides as far as detection efficiency in single molecule methods is an important advantage in expanding peptide choice for any given protein target, it removes one of the major avenues available for stoichiometric flattening and restricts the method to adjustments in the BINDER capture step only.
7.4.6 Consequences of not flattening stoichiometry
A further important distinction between the methods based on mass spectrometry versus single molecule sequence-sensitive detectors is the impact of failure to flatten stoichiometry. In typical mass spectrometry protocols using liquid chromatography (LC-MS), a sample is analyzed using a method involving a specified chromatographic separation (with a specified duration, usually in the range of 1-60 minutes), and the mass spectrometer is presented with whatever peptide ions emerge from the end of the column, whether they are too few to register as a signal, or too many to be accurately measured, or in between. In other words, the sample analysis typically takes a pre-specified length of time whether the stoichiometry of TARGET peptides in the sample has been flattened or not. Using single molecule methods, for example using nanopore sequencing, each peptide molecule takes some time to be analyzed - during this time one pore is occupied with one molecule, or a fixed number of molecules requires a given block of time to analyze. While the throughput of such a process can be increased by providing multiple pores or molecule immobilization sites, or decreasing read time, there remains a direct relationship between the number of molecules detected and the time devoted to the analysis of that sample. It is therefore of paramount importance when using single molecule methods for quantitation according to the present invention that the system not spend unnecessary time sequencing a) peptide molecules that are not TARGET or STANDARD peptides (i.e., do not contribute to the results desired), or b) peptides that have already been sequenced in sufficient numbers to provide a TARGET-to- STANDARD ratio with the desired precision (e.g., %CV based on counting statistics). Removing peptides defined by (a) above is a matter of enriching TARGET peptides and depletion of the rest. However, minimizing the counting of peptides that are surplus to statistical requirements (those defined by (b) above) is a key benefit of stoichiometric flattening. It is thus key to the practicality of single molecule detection methods for quantitation of selected TARGET peptide targets, and can reduce the total number of molecules that need to be counted in practical biomarker studies by large factors (e.g., 16,000-fold improvements in efficiency in measuring components of dried blood spots, as described in more detail below - “Stoichiometric Flattening”).
7.5 PREPARATION OF PEPTIDES FOR SINGLE MOLECULE SEQUENCE- SPECIFIC DETECTION.
In some embodiments the detection of BINDER-enriched TARGET and STANDARD peptides can be facilitated by certain chemical modifications, including covalent linkage to polymeric molecules on one or both ends (i.e., on or near peptide n- or c-termini), or linkage to a support, surface or bead, resulting in constructs with improved uptake and sequence readout by a sequence-sensitive detector, and/or incorporating additional information beyond the peptide itself in the form of detectable polymer sequences (e.g., DNA sequence tags).
7.5.1 Chemical modification of peptides on BINDER
In some embodiments, peptides are chemically modified while bound non-covalently to a BINDER (e.g., a BINDER that is used to enrich them from the peptide sample). Modification of the peptides while thus non-covalently anchored to a BINDER (which may itself be bound to a solid support) facilitates exchange of reagents between steps of a multi- step series of chemical modifications, avoids the necessity for other more cumbersome purification methods between steps (e.g., to separate modified peptides from reagents and unmodified peptides), and allows the peptides to be concentrated when necessary (e.g., by gathering magnetic beads bearing the BINDER into a solid mass with minimal included liquid). These advantages are very useful when a single modification step is required, and progressively more valuable as more modification steps are needed.
In some embodiments a series of chemical steps are used to prepare peptides for analysis, and it may be necessary to remove the chemical reagents required for one step, and in some cases wash them way, before adding reagents for the next step. When the off-rate of the BINDER is low, with a half-off-time for example longer than 10-15 minutes (as is typical for antibody BINDERS developed for use in SISCAP A), one or more rapid chemical reactions can be carried out before peptide molecules dissociate from the BINDER. In some embodiments involving more time-consuming chemical modification processes (e.g., requiring an incubation period of 1-60 minutes for a reaction to progress towards completion), BINDERS can be concentrated (e.g., by collecting magnetic beads into a mass or small volume, or using a column format bearing a high density of immobilized BINDERs) during steps of the process, and during these periods any peptide that dissociates from a BINDER is likely to quickly rebind to another BINDER site given the high local BINDER concentration. This kinetic effect effectively prolongs the time available for chemical modification of BINDER- bound peptides.
Certain limitations apply to this approach in some embodiments. For example, if the BINDER is an antibody, then solution conditions that would denature the antibody (e.g., strong detergents such as SDS at high concentrations, extremes of pH, or high temperatures) or cause the peptide to be released from the antibody (e.g., pH below 3.5 or presence of 2M NH4SCN) can be problematic while carrying out the desired peptide modification. However, means are known in the art for carrying out a wide range of desirable modifications to TARGET and STANDARD peptides under solution conditions compatible with retention of the peptides on a cognate BINDER.
In some embodiments the BINDERS themselves are modified prior to use in capturing TARGET and STANDARD peptides in order to prevent or diminish their reaction with reagents intended to react with the peptide cargo. Thus, in some embodiments, some or all of any free amino groups on an antibody BINDER can be blocked, for example by PEGylation using commercial NHS-PEG reagents. In some embodiments using DNA or RNA aptamers as BINDERS, modifications may be required to prevent hybridization between the aptamers and sequences being attached or ligated to peptides on a BINDER.
In some embodiments, the peptide modifications can be carried out while the peptides are bound to a support by a general but less-specific mechanism (e.g., to a reversed-phase support such as C18 particles), or free in solution.
7.5.2 Amino group linkage
Among the most useful reaction sites on a peptide is an amino group. For peptides generated by digestion with trypsin (the most commonly used proteolytic enzyme), all correctly processed peptides (except a protein’s c-terminal peptide) should have a c-terminal lysine or arginine residue. Since lysine is the only amino acid with a side-chain primary amino group (reaction with which would allow two attachment sites on a peptide), some embodiments make use of TARGET peptides from the group of tryptic peptides with c-terminal lysine when preparing a construct with polymer additions at both ends (a “Double amino” peptide), or selecting TARGET peptides from the group of tryptic peptides with c-terminal arginine (which peptides will have a single n-terminal amino group available to react with a linker; i.e., a “Single amino” peptide) when an extension on only one end (the n-terminus) of the peptide is desired.
In some embodiments, an advantage of linkage through amino groups, particularly in the case of c-terminal lysine peptides modified by a linkage chemistry that results in a decrease in the peptide’s net positive charge, is that the peptide subsequently has little or no positive charge (e.g., if it contains no His resides), and at least one (the c-terminal carboxyl) and perhaps more negative charges (if the peptide contains Asp or Glu amino acids). Peptides with a net negative charge have the same charge polarity as nucleic acids (negative, on account of the phosphate groups), facilitating the movement of both types of polymers through a pore using the same polarity electric field.
7.5.3 Carboxyl group linkage
In some embodiments linkage through peptide carboxyl groups can be used (61) but this approach has limited ability to distinguish between c-terminal carboxyls and side chain carboxyls of aspartic and glutamic acids, and thus could present additional constraints on peptide selection (i.e., de-selection of Asp and Glu containing TARGET peptides) or give rise to multiple constructs due to side reactions. In some embodiments, TARGET peptides devoid of aspartic and glutamic residues, and hence having a unique carboxyl group at the peptide c- terminus, are used with carboxyl coupling chemistries well-known in the art to link peptides through the c-terminus.
7.5.4 Other linkage sites
Other linkage sites and reaction chemistries can be employed. Linkage through a cysteine sulfhydryl group is frequently favored when a peptide’s sequence can be freely designed - however the occurrence of a cysteine residue at the n-terminus or c-terminus of a proteolytic peptide in a natural protein is infrequent, thus representing a limiting constraint on TARGET peptide selection.
Chemistries are known in the art for specific chemical modification of, and/or linkage to, histidine, tyrosine, tryptophan and other amino acids, and these can also be used in the invention.
7.5.5 Peptide:Oligo in-line constructs using sequential amino linkages
Figure 14 illustrates an embodiment for modification of enriched TARGET and STANDARD molecules (either of these being labeled a “Peptide” in the Figure) by linkage of an n-terminal amino group to a single-stranded oligonucleotide Leader (labeled Oligo 1) and an epsilon-amino group of a c-terminal lysine to a single-stranded oligonucleotide Trailer (labeled Oligo 2). Such a hybrid molecule can be considered an “in-line” peptide- oligonucleotide construct; i.e., one in which the construct forms a continuous backbone with “side-groups” consisting primarily of bases and amino acid side chains. In the embodiment shown in Figure 14, the linkages are carried out in a series of steps using “click” chemistry, a bio-orthogonal reaction chemistry (25, 62, 63). First, (Step 1 in Figure 14) a peptide’s n- terminal amino group is reacted with azidoacetic anhydride (AAA) under conditions of pH (e.g., pH 5.5 to 7.0, preferably 6.7) that effectively prevent or reduce reaction with a lysine’s tertiary amino group (25, 64), resulting in the introduction of an azide functionality at the peptide’s n-terminus (shown in Step 2 of Figure 14). After removal of the AAA (e.g., by washing the beads carrying the BINDER and their peptide cargo), this azide group is then allowed to react with an amount of oligonucleotide Oligo 1 (Step 3 in Figure 14), which has been prepared with an aza-dibenzocyclooctyne (ADIBO) functionality on the 3’ end, forming a “click” chemistry linkage of the Oligo 1 to the peptide n-terminus (25) (Step 4 in Figure 14). In a next step, after removal of unreacted Oligo 1, the peptide’s c-terminal lysine tertiary amino group is allowed to react with an azidoacetic acid NHS ester (Step 5 in Figure 14), thereby introducing an azide functionality on the peptide’s C-terminal lysine side chain (Step 6 in Figure 15). After removal of the unreacted NHS ester (e.g., by washing the beads carrying the BINDER and their peptide cargo), this azide group is then allowed to react with an amount of oligonucleotide Oligo 2 (Step 7 in Figure 14), which has been prepared with an aza- dibenzocyclooctyne (ADIBO) functionality on its 5’ end, forming a “click” chemistry linkage of the Oligo 2 to the peptide’s C-terminal amino acid (Step 8 in Figure 14). The result is a peptide-oligonucleotide construct comprising a peptide linked to an oligonucleotide on each end. Carrying out the additions step-wise as shown, using the two Oligos, each prepared with a “click” linkage site on one end, allows the construct to be generated with specified oligo polarities (i.e., 5’ to 3’ or vice versa) in each case, as may be necessary to achieve the desired interaction with DNA motors, pores, etc. The Oligos can each be single-stranded, or they can be rendered duplexes by hybridization with complementary sequences over part or all of their length(s).
In some embodiments the locations of the “click” reactive pairs can be inverted (e.g., modifying one or both of the peptide ends with ADIBO, and the Oligos with azide). Linkage at the peptide c-terminus can alternatively be introduced through modification of the peptide c-terminal carboxyl group (e.g., a c-terminal arginine, in which case the n-terminal amino group is the only amino group in the peptide) instead of a lysine tertiary amino group.
In some embodiments some or all of the steps of the double ligation scheme shown in Figure 15 are carried out with peptides bound to a BINDER on magnetic beads. Using peptides longer than the groove of a typical BINDER binding site (typically 4-8 amino acids), and BINDER that binds an epitope that does not include either peptide terminus, leaves both peptide termini available for reaction as shown. Magnetic beads carrying the BINDER and bound peptides can be exposed stepwise to the sequence of reagents as shown by moving the beads from one reagent solution to the next, with optional wash steps in between as required, or alternatively the beads can be gathered to the side of a vessel using a magnet and the sequence of reagent solutions added and removed, or manipulated in a microfluidic device.
Many alternative chemical reaction schemes are known in the art that can be used in place of the specific reagents and steps shown in Figure 14, while generating a construct with similarly useful features. A variety of reagents exist that are capable of introducing “click” chemistry functional groups into peptides and oligonucleotides (17). Examples include azide and tetrazine functionalities that are capable of specific, bio-orthogonal reactions with alkyne functionalities, some requiring Cu(I) catalysis (which is less preferred in some embodiments), or strain-promoted alkyne cycloaddition (SPAAC) reactions with bicyclononyne (BCN) or cyclooctyne and derivatives such as dibenzocyclooctyne (DIBO) or Aza-dibenzocyclooctyne (ADIBO shown in Figure 14).
In some embodiments one or both of the oligos linked to the peptide is part of a multimolecular construct designed to facilitate nanopore sequencing. Figure 15 illustrates a modification of the embodiment of Figure 14 in which the oligo joined to the N-terminal amino group of a peptide (Oligo 1) is part of a duplex having a site at which a “motor protein” can be (or in some embodiments already is) bound. The overall construct shown in Figure 15 provides a leading oligo motor construct capable of controlling movement of the peptide through the pore (providing a stepwise ratchet motion as used, for example, in the commercially-available Oxford Nanopore sequencing platform “Y-adapter”; https://nanoporetech.com/sites/default/files/s3/literature/product-brochure.pdf), followed by a peptide to be sequenced, and a trailing oligo 2. A tether molecule that hybridizes to some sequence in the construct may be used to associate the construct with the membrane in which the nanopore is located, thereby increasing the Leader’s probability of entering the nanopore.
In some embodiments, alternative chemistries can be used to join the peptide and oligos, such as use of an NHS-functionalized oligo that can react directly with a lysine or n- terminal amino group.
Similar peptide-oligonucleotide constructs have been described, for example in WO 2021/111125, using different “click” reagent combinations and using alternative means of purifying desired products (e.g., purifying via the oligo part of the construct using Agencourt AMPure XP beads). The referenced disclosure showed that oligo-peptide-oligo constructs assembled by click chemistry can be processed by existing nanopores and DNA motors to generate reproducible ion current traces indicative of peptide sequence. However, these disclosed inventions do not include the use of BINDERS to carry and/or move TARGET peptides during one or more steps of oligo-peptide construct assembly, washing and purification, and do not include STANDARDS or use of BINDERS for enrichment or stoichiometric flattening.
7.5.6 2-step digestion to distinguish amino groups
As described above and illustrated in Figure 6, a sequential enzymatic approach is used in some embodiments to derivatize peptides with two different added molecules, resulting in an ordered construct. Using this approach, a peptide can be linked to a specific oligo on the n- terminus and a different oligo on the c-terminal lysine reside. This approach is not completely general since it requires a specific order relationship between an arginine immediately preceding the peptide sequence and the peptide’s c-terminal lysine. Such peptides are common but not universal in proteins, and when added to a requirement such as uniqueness in the human proteome, it is probable that only a subset of proteins can be represented by proteotypic peptides having such a structure.
7.5.7 Delivery of peptides to a site of detection.
The potential of single molecule detection to improve assay sensitivity is enormous provided that the desired peptide molecules can be efficiently presented to the sequencing machinery - i.e, to occupy a large fraction of the available peptide sequencing capacity. As is well known in peptide analysis by mass spectrometry, peptides at very low concentration (i.e., at low single molecule levels) easily become “lost” through escape into dilute solutions, binding to surfaces, etc., and therefore detection can become more challenging as target abundance or concentration is reduced.
Some embodiments therefore make use of the localization of TARGET and STANDARD peptide constructs on BINDERS, which themselves can be bound to particles such as magnetic beads, to transport the peptides (e.g., as peptides, or as part of peptide-oligo constructs as disclosed above) into close proximity to the site of detection (e.g, a nanopore, a surface on which they may be imaged, or a well in which they may be subjected to degradative sequencing). Recent progress in this area is disclosed in patent applications US 2020/0284783 and US 2021/0147904.
Localization of TARGET and STANDARD peptide constructs on BINDERS produces a remarkable concentration effect. In some embodiments the BINDERS are attached to spherical magnetic particles, which can be gathered together into a compact mass by magnetic forces. In such a mass of spherical particles, the particles occupy about 74% of the total volume. Thus elution of constructs from binders on beads in such a compact mass releases the constructs into about 26% of the volume of the mass, and given the sub-microliter volume of this mass in many practical embodiments, the constructs can be present in volumes of interstitial liquid in the range of tens to hundreds of nanoliters.
In some embodiments, the immobilization of TARGET peptides, STANDARDS and their derivatives and constructs by non-covalent binding to BINDERS that are immobilized on supports such as magnetic beads, provide improved methods for delivery of these molecules to sequencing machinery. In some embodiments, magnetic beads carrying BINDERS and their peptide construct cargoes (including peptide-oligonucleotide constructs) are added directly to the cis chamber of a nanopore sequencing device, where the beads sink under the influence of gravity and come to lie on the membrane in which the nanopore is located. As described in US 2021/0147904, this presentation of sequenceable polymers adjacent to the membrane improves capture by the pore by orders of magnitude. In some embodiments, the salt solution of the cis compartment acts to slowly release the peptides from the BINDERS, for example with an off-rate equivalent to elution over a period of 15 to 180 minutes. Eluted peptide constructs are then captured by the membrane through a hydrophobic (e.g., cholesterol) tether as described in the commercial Oxford Nanopore device. Upon capture by the membrane, the peptide constructs diffuse in 2-dimensions on the membrane and are efficiently presented to the pore for capture and threading into the pore for sequencing.
In some embodiments, the BINDERS are attached to magnetic beads (or other particles) by a cleavable link such as a disulfide-containing linker. Exposure to a disulfide reducer (e.g., TCEP or mercaptoethanol) can thus release the BINDERS and their cargo from the beads into solution. By further providing the BINDER with a hydrophobic tether on a linker, the BINDER can be captured by the membrane and then diffuse in 2-dimensions, eventually bringing bound peptide constructs into close proximity to the nanopore.
In some embodiments, the BINDERS are released free into solution and the peptide constructs are not eluted from the BINDERS under the conditions of the cis chamber (e.g., 0.4M KC1), so that peptide constructs, still bound by BINDERS, are captured by and threaded into the nanopore. In some embodiments the force with which the electric field acts on the construct to pull it through the nanopore causes the construct to be pulled free of the BINDER while the speed of this motion is regulated by a DNA motor acting on the construct’s oligonucleotide component. Use of the pore forces to strip peptide constructs off of the cognate BINDER allows the BINDER to be used as a “chaperone” for the peptide construct throughout the journey from the peptide’s capture from a sample digest, though its modification to create a sequenceable construct, all the way to threading and delivery into a nanopore for sequencing.
In some embodiments, tethered versions of the BINDERS are added into the cis chamber and allowed to contact and be retained by the membrane, forming a dense surface of BINDER binding sites on the membrane within which the nanopore lies. This surface, comprising a dense plane of binding sites for TARGET and STANDARD peptide molecules, is able to capture these molecules from the contents of the cis compartment and thus provide an increased local concentration of constructs in the plane of the pore, and with the ability to diffuse in the membrane plane so as to deliver constructs for threading into the pore.
In some embodiments, peptide:oligonucleotide constructs can be propelled back and forth though a nanopore by reversal of the transmembrane electric potential (a process described as “flossing”) to repeatedly read and re-read a sequence to provide greater accuracy (65). In some embodiments of the present invention, this flossing approach is used to read selected peptides multiple times under computer control. This approach is particularly useful in confirming the sequences of peptides present at low abundance; i.e., when few copies of the peptide have been encountered and the potential for quantitative error is large. Thus, if a low frequency peptide is detected on its first pass through a nanopore, the nanopore control system can act on the observation of low frequency in real-time and implement a multi-read flossing protocol to verify the identity as a rare sequence. Achieving certain identification of a low frequency peptide sequence is more consequential than for high frequency peptides. 7.5.8 Inclusion of separative steps in addition to specific affinity capture and enrichment in a workflow.
In some embodiments additional separative steps are added to the workflow to improve performance. In some embodiments proteolytic peptides in a sample digest are captured on a solid support (e.g., C18-coated magnetic beads), thus allowing non-peptide sample components to be removed, allowing the immobilized peptides to be reacted with chemical reagents (e.g., for click chemistry derivatization as described below), and excess reagents to be removed from the peptides prior to their elution (e.g., using 50% acetonitrile) for use in subsequent steps. Likewise a conventional magnetic bead-based DNA “cleanup” (e.g., using commercially-available Ampure beads (Beckman Coulter)) can be carried out after assembly of oligo:peptide constructs generated according to the invention in order to remove any excess reagents, short oligos, un-linked peptides, and/or enzymes prior to delivery of peptide constructs to a single molecule sequencer (e.g., a nanopore). Modern laboratory robotics provides means to automate workflows involving multiple such affinity separative steps as well as precise liquid additions.
7.6 USE OF THE INVENTION WITH SINGLE MOLECULE DETECTION BY NANOPORES.
In some embodiments a TARGET or STANDARD peptide is linked to a polymer at one or more sites (e.g., at one or both ends) by stable linkages (e.g., by covalent bonds or very stable non-covalent bonds). In some embodiments these linkages are made between specific sites at, or near, the peptide’s terminus (or termini) and one or more polymer molecules (e.g., nucleic acids including DNA or RNA, chemical variants of these including phosphorothioate backbones, “locked” nucleic acids (“LNA”), peptides including polyglutamic or polyaspartic acids, and the like). In many preferred embodiments, an object of the invention is to cause a peptide to pass through a nanopore in an extended conformation, allowing the sequence of amino acids to be “read” by measurement of current flowing through the pore or other equivalent means. In other embodiments, peptides of interest may be immobilized and subjected to a series of binding interactions or covalent modifications (e.g., stepwise degradation), and in such embodiments the linkage of peptides to a surface by a single unique site (e.g., a unique amino group such as an n-terminal amino group) may be preferred. 7.6.1 Peptide selection for pore signature and ligation properties.
Some embodiments make use of monitor peptides selected from among a target protein’s proteolytic fragments based on features including A) their ability to generate a distinct sequence or fingerprint in a single-molecule sequence-sensitive detector (e.g., a “squiggle” or ion current signature over time in a nanopore sequencing system) compared to other peptides (particularly other peptides that may be used with the selected peptide in a multiplex panel assay); B) their content of reactive groups (e.g., primary (amino terminal) and tertiary (lysine) amines, primary (carboxy terminal) and side chain (aspartic and glutamic acid) carboxyl groups, cysteine sulfhydryls, etc.) of potential use in labeling or ligation reactions; C) their uniqueness to a target protein (i.e., they are typically required to be “proteotypic” so as to act as a surrogate of the pre-specified target protein exclusively); and other properties desirable in a peptide analyte (e.g., solubility, chemical stability, etc.). Proteolysis using trypsin (i.e., cleaving polypeptides at lysine and arginine) is common in peptide analysis for several reasons including low cost of the enzyme and its ability to generate peptides having a positive charge at the c-terminus (a useful feature in mass spectrometric analysis). In the context of the present invention, selection from a tryptic digest of TARGET peptides with a c-terminal lysine ensures the presence of amino groups at both peptide termini, while selection of c-terminal arginine peptides ensures the presence of only one amino group (at the n-terminus). In the present invention alternative enzymes can be used as well. The enzyme LysC cuts only at lysines (not arginine) and therefore on average generates longer peptides than trypsin, and each of these (apart from a protein’s c-terminal peptide) will have a lysine (and its tertiary amino group) at the c-terminus (features that can be exploited for linking purposes in the invention). TARGET peptide selection criteria in the present invention are therefore significantly different from selection criteria for mass spectrometry.
7.6.2 Double ligation strategy to attach Leader and Trailer.
Some embodiments apply a strategy to connect polymeric molecules (e.g., nucleic acids, polypeptides, and other polymers) to either or both ends of a peptide to form a heteropolymer construct, including ligation of a polymer to the n-terminal amino group and another (or another molecule of the same) polymer to a site near the c-terminus (e.g., through the c-terminal carboxyl group or to the tertiary amino group of a c-terminal lysine residue). One such ligated polymer (the Leader, which may be highly charged to facilitate movement by an electrophoretic process) can be used to initiate threading of the construct into a sequencing nanopore and another (the Trailer, at the opposite end of the peptide) can be used to assist in ratcheting the peptide through the ion current sensing region of a nanopore. While constructs joining a peptide with an oligonucleotide on one end are known in the art, the attachment of polymers to both ends is novel.
7.6.3 Concatenation of peptide analytes and other polymers in an alternating pattern to provide multi-peptide strings for analysis.
Preparation of macromolecules comprised of multiple Target and/or STANDARD peptides joined together with intervening polymers (e.g., DNA) that can participate in molecular ratchet mechanisms to control (e.g., slow down) movement of the concatenated construct through a nanopore and improve accuracy during the readout of sequence information from peptides. Intervening polymer segments (e.g., DNA) can comprise sequences that identify samples or other data associated with the peptides. Concatamers can be assembled using means including “click” chemistry or DNA or peptide ligases, and can be comprised of any mixture of peptides, oligonucleotides, or other polymers. Since concatenated peptides (with or without intervening linkers) can follow one another rapidly through a nanopore, without the need to wait for each new peptides construct to anchor to a membrane or thread through a pore, the rate at which peptide molecules can be identified and counted can be much higher than with individual single peptide constructs, thereby increasing the rate at which precise quantitative results can be obtained.
7.6.4 Simplified sequencing of peptide using a “Rope-tow” construct.
Some embodiments make use of a continuous polymer backbone to which a peptide can be linked through a single site and “dragged” through a nanopore by movement of the backbone, which may include single or double-stranded regions that interact with DNA motors controlling this movement in a ratchet-like manner.
7.6.5 Adaptable precision by continued sequencing.
The invention makes use of real-time peptide sequence evaluation and counting (e.g., as provided in commercial DNA nanopore sequencers) to update counts of individual TARGET peptides and their cognate STANDARD peptides, and thereby estimate the precision with which each has been measured up to this point based on counting statistics and other statistical methods. The availability of updated precision estimates allows the analytical device (e.g., a sequence-sensitive single molecule detector such as a nanopore sequencer) to terminate an analytical run when pre-defined precision criteria are met (e.g., when the variance of one or more specific peptide counts, estimated for example as the square root of the number of peptides counted, divided by the number of counts is less than a target such as 5%). This approach allows the analytical system to avoid wasting time counting peptides beyond the number providing the required analytical precision by terminating a run, or to continue counting beyond the expected run duration when more counts are needed to achieve a precision target for one or more peptides. Precision targets can be different for different TARGET peptides. In a typical case, a minimum number of counts can be specified for a given TARGET peptide + STANDARD pair, such that the lower abundance of the pair must achieve this minimum number to satisfy a precision target for the ratio between the two (which represents the assay’s quantitative result).
7.6.6 Adaptable focus on some peptides by rejection of over-abundant leaders.
The invention also makes use of real-time peptide sequence evaluation and counting (e.g., as provided in commercial DNA nanopore sequencers) to stop sequencing of peptides whose precision targets have already been met and eject these molecules from a sequencing pore in order to allow entry of a different molecule that may contain peptides whose precision targets remain unmet. The analytical system is thereby able to focus its attention on peptides in need of more counts (e.g., low abundance peptides) at the expense of peptides whose count targets have already been met (e.g., higher abundance peptides). This increase in efficiency increases throughput and lowers cost of peptide counting.
7.6.7 Peptide:Oligo loop insertion constructs using amino linkages
In some embodiments a peptide is inserted into a polymer chain by reacting two chemical groups at or near the peptide termini with adjacent or nearby residues of a polymer such as a nucleic acid, followed by cleavage of the polymer between these residues. In Figure 16, a peptide 71 having reactive groups (here amino groups in a double amino peptide) on both ends is activated by addition of groups 72 (for example BCN click groups, indicated in the Figure as X, that can be added to the peptide’s amino groups through reaction with an NHS derivative of BCN) in Figure 16B. The resulting activated peptide is reacted with oligo 62 (Figure 16C) having complementary click groups 74 (e.g., azide, indicated in the Figure as Y) attached to adjacent bases (here shown as a dinucleotide TT). In this example T was selected as the attachment base because of the commercial availability of synthetic oligonucleotides having an azide attached to one or more T residues (e.g., from Integrated DNA Technologies, Inc.); however alternative attachment means involving other bases or non-base components capable of being incorporated into synthetic oligos can be used. The two attachment sites can be adjacent bases as shown in Figure 16, or they can be separated by one or a few intervening bases (e.g., as shown in Figure 17). The result upon reaction between X and Y groups to form covalent linkages 75 is a peptide loop conjugate shown in Figure 16D.
7.6. 7.1 Enzymatic cleavage for loop release
In some embodiments, the linkage sites on the oligo 62 are designed such that the surrounding sequence (when hybridized to the complementary strand 65) is recognized by a restriction endonuclease (e.g., Pad) capable of cutting the oligonucleotide backbone between the two bases to which the peptide is linked (the enzyme Pad cuts between the second pair of T residues 76 in the sequence TTAATTAA, as shown in Figure 16E). Following the cutting of both strands by the nuclease, the result is a nanopore-sequenceable construct 77 comprising the peptide with leading and trailing oligo segments, either or both of which can comprise oligo sequence tags identifying the peptide as either a STANDARD or TARGET. In this simple version of the “loop ligation” method, which uses identical coupling groups on both ends of a peptide for simplicity of preparation, peptides may be inserted in both orientations, i.e., n-term first (shown as 77 in Figure 16F), or c-term first (shown as 78 in Figure 16G). Suitable data analysis algorithms are constructed to recognize a specific peptide’s squiggles in either orientation. Reaction of the activated peptide with the oligo bearing linkage sites can be carried out before or after the oligo is hybridized with the complementary strand to allow site- specific cleavage between the adjacent peptide-linked bases. Linkage of the peptide to oligo can be carried out after peptides are enriched by BINDERS (provided that STANDARD peptides are identified by a structural difference from cognate TARGET peptides, which difference can be sensed by a nanopore), or before BINDER enrichment (in which case the oligo sequences 62 or 63 can be used as tags indicating whether a peptide is a STANDARD or a TARGET, and the BINDER is used to enrich assembled peptide-oligo constructs). The loop method can provide a simpler method of inserting a peptide into an oligo sequence than other methods requiring stepwise linkage of linear peptide and oligo molecules.
7.6. 7.2 Chemical or photolytic cleavage for loop release
In some embodiments, linearization of a peptide loop construct is accomplished using chemical rather than enzymatic means. Figure 17 shows an example similar to that of Figure 16, with the difference that the nucleic acid bases providing oligo attachment sites 74 (here shown as T residues) are separated by an intervening chemical structure (shown here as residue labeled “Z”). In some embodiments, this intervening structure is a base linked to either or both adjacent T’s by a phosphorothioate linkage. After linkage of the peptide to the oligo (e.g., by click chemistry as described above), the backbone of oligo 62 can be cleaved at the Z position using chemical means (e.g., using iodine, aqueous silver nitrate or mercuric chloride (66) or chloride assisted by myeloperoxidase (67)). In some embodiments the intervening structure comprises a photocleavable spacer or linker (e.g., the linker designated /iSpPC/ available from Integrated DNA Technologies, or linker 26-6888 available from GeneLink) that can be cleaved by exposure to near UV light (e.g., 300-350nm wavelength). Photochemical cleavage provides an extremely efficient way to effect the linearization of a peptide:oligo loop construct. Those skilled in the art will understand that a variety of specific chemical and enzymatic cleavage mechanisms can be used to selectively cleave an oligo VEHICLE after insertion of a peptide loop to yield a linear oligo:peptide:oligo construct. Those skilled in the art will understand that the spacing between two peptide attachment sites 74 can be as short as no intervening bases, or as long as 2, 3, 4, 5, 6 or up to 10, 20 or 30 bases. An advantage of a short distance between sites 74 is the increased rate of formation of the second linkage after the first has formed: formation of the first linkage transforms the reaction from a bi-molecular (diffusion-limited) reaction between the peptide and oligo into a uni-molecular reaction that is likely to be more rapid, thereby increasing the probability that both nearby linkage sites are connected to opposite ends of the same peptide molecule (rather than two separate peptide molecules). 7.6. 7.3 Loop assembly without a disruptible linkage
In some embodiments, a peptide loop construct is assembled by reacting a peptide having reactive groups on both ends (e.g., Figure 16B) with a double-stranded oligo similar to that shown in Figure 16C except that the adjacent modified T bases comprising reactive groups 74 (indicated as “Y” groups) are not linked by sugar-phosphate bond (i.e., the DNA backbone is interrupted between them) causing oligos 62 and 63 to be separate molecules. Oligos 62 and 63 are held in place by their hybridization with the complementary continuous oligo 64 + 65, thereby holding the reactive sites 74 in close proximity, where they react preferentially with the activated groups 74 on either end of a peptide molecule, creating a continuous oligo- peptide-oligo construct amenable to single molecule detection using nanopores or other methods. A variety of alternative chemical linkages can be used to connect the peptide and oligos, including click chemical linkages as described elsewhere herein.
In some embodiments, for example those in which a 2-step digestion enables placement of 2 different, non-cross-reacting chemical groups at or near the ends of a peptide (e.g., members of 2 families of click groups as described elsewhere herein), the appropriate corresponding reactive groups can be placed on oligos 62 and 63 to enable peptide insertion with defined polarity (e.g., peptide inserted n-term to c-term in a 5’ to 3’ oligo VEHICLE). For example, the peptide n-term amino and c-term lysine epsilon amino groups can be labeled with BCN and TCO respectively, and the 5’ and 3’ T residues of the adjacent pair labeled respectively with azide and tetrazine. In this case, the peptide n-term reacts with the 5’ T (BCN + azide) and the peptide c-term reacts with the 3’T (TCO + tetrazine) creating a construct with all oligo and peptide components in a unique defined order.
7.6. 7.4 Oligonucleotide tags to identify STANDARDS
In some embodiments oligonucleotide sequences into which a peptide is inserted comprise encoded information (“tags”) identifying a peptide as a TARGET or a STANDARD (the STANDARD constructs being assembled in a separate process from the sample-derived TARGET constructs). In the embodiments of Figures 16 and 17, unique 16-base tags are provided, which can be incorporated either 5’ to the peptide attachment site (62) or 3’ to it (63), or both (as shown). The tags must be long enough to provide reliable identification when read by a nanopore or other single molecule reader - the examples of Figures 16, 17 show 16- base tags on both 5’ and 3’ ends, while the examples of Figures 18, 19 and 20 show the use of 8-base tags on the 3’ end to distinguish TARGET and STANDARD peptides (Figure 18). If the accuracy of sequence recognition is very high, a short tag sequence of a few bases can suffice; however, for increased accuracy and reliability, a tag of at least 4, or 5, 6, 7, 8, or more bases can be used. In cases where the construct is sequenced by passage through a nanopore from 5’ to 3’, it can be advantageous to place an oligonucleotide sequence identifying a peptide as TARGET or STANDARD in the oligo on the 3’ side of the peptide (i.e., the portion of the oligo that follows the peptide through the nanopore) since the oligo component on the 5’ side may not be read effectively when the adjacent peptide slips through the DNA motor without ratcheting). Figure 18 shows successive steps in the assembly of TARGET constructs (Figure 18 A, B, C and D), and cognate STANDARDS (Figure 18 E, F, G, and H), using two different 8-base oligo tag sequences 63 and 67 to distinguish them (in this case the 5’ oligo 62 is the same for TARGET and STANDARD constructs, as it is unlikely to be readable during 5’-to- 3’ transit through a nanopore. Because the TARGET and STANDARD peptide molecules are distinguished by oligo tags 63 and 67, joined with the respective peptides in separate processes (the STANDARD constructs being assembled separately from sample digest preparation), the TARGET and STANDARD peptides are of identical chemical structure, thereby ensuring their equivalent binding by the cognate BINDER.
7.6. 7.5 Stepwise assembly of peptide: oligo constructs
In some embodiments it is desirable to prepare peptide:oligo constructs of small size and then join these to larger oligos to provide sufficient oligo length upstream (i.e., 3’-wards in the case of a 5’-to-3’ nanopore reading system) to allow the pore to “read” a significant length of the peptide. In embodiments that make use of an oligo tag to distinguish TARGET and STANDARD peptides, it is typically necessary to form peptide:oligo constructs of all the peptide molecules in a sample digest in order to ensure that sample peptides are counted accurate in comparison with known added numbers of STANDARD molecules. Short oligos are more economical for such bulk peptide derivatization applications. In addition, short oligos exhibit more rapid diffusion, thereby improving the reaction rate between oligos and peptides. In the examples of Figure 18, the oligo CCTGAACCTZTTATCCAGT has a molecular weight of approximately 5,700, and the complete oligo:peptide construct including the peptide ESDTSYVSLK is 6,800 dalton. A construct of this size diffuses significantly faster than a construct comprising an oligo as shown in Figure 16 (12,700 daltons of oligo and approximately 1,100 daltons of peptide, for a total of 13,800 daltons). However the shorter oligos do not provide a sufficient number of bases 3’ to the peptide to enable reading both the peptide and the tag 3’ to it (the DNA motor must engage a sufficient number of bases upstream of the peptide and tag to provide effective ratcheting during the read). In some embodiments, the need for additional bases 3’ to the peptide and tag is satisfied by extension of the construct with more bases. Figure 19 shows an embodiment to achieve this extension in which oligo:peptide:oligo constructs 91 (a TARGET with oligo tag 63 shown in Figure 19A) and 92 (a STANDARD with oligo tag 67 shown in Figure 19E) prepared according to the steps of Figure 18 are annealed with complementary strands 93 and 94 respectively (Figure 19B and
F). Complementary oligos 93 and 94 also hybridize with extension oligo 95 (Figure 19C and
G), allowing oligos 95 to be ligated with oligos 91 and 92 to produce longer oligos 96 (an extended TARGET construct) and 97 (an extended STANDARD construct) shown in Figure 19D and H. The end structure of these constructs (5’ phosphate and 3’T overhang on the upper strand, 5’ A overhang on the lower strand) enables ligation of the constructs into longer oligos and their linkage to sequencing Y-adapters by conventional enzymatic ligation in preparation for sequencing (Figure 191).
In some embodiments peptide:oligo constructs are enriched by capture on specific BINDERS, and the kinetics of such capture can be improved by using relatively small constructs capable of rapid diffusion, and having less propensity to form large aggregates. In the example shown in Figure 20, constructs 91 (TARGET) and 92 (STANDARD) are captured by interaction of the peptide components with BINDERS 98 (which may be sequence-specific anti -peptide antibodies, for example) attached to magnetic beads 51. The capture reaction can be carried out either before (Figure 20A) or after (Figure 20B) cleavage of the oligo to “linearize” a peptide loop insertion construct (Figures 16, 17 and 18). The capture can be carried out with the sequenceable construct alone or hybridized to a complementary oligo. Those with skill in the art will understand that capture of smaller constructs is likely to proceed more quickly, with lowered potential for non-specific interactions. In some embodiments, extension of peptide:oligo constructs (as in Figure 19) and/or addition of complementary strands is carried out after the BINDER affinity enrichment step in order to take advantage of capturing smaller constructs. Nevertheless, extension of peptide:oligo constructs (as in Figure 19) and/or addition of complementary strands can also be carried out prior to the BINDER enrichment step.
7.6. 7.6 Reduction of loop insertion failures
In some embodiments the lengths of oligo segments in the constructs are optimized to minimize potential for failure to generate effective loop constructs. As shown in Figure 21, several scenarios can occur when a peptide connects with an oligo having two attachment sites. Figure 21 A shows an effective assembly in which both ends of the peptide connect to nearby sites on the oligo (following the processes shown in Figures 16, 17 and 18) generating a small construct capable of hybridizing with a complementary strand (Figure 2 ID) and thereafter forming a full-length sequenceable construct (e.g., through the extension steps shown in Figure 19). Figures 21B and C, in contrast, show cases in which only one end of a peptide successfully links to a site on the oligo, and in these cases the resulting constructs contain only a 5’ or a 3’ oligo segment, or in which two different peptide molecules react with nearby oligo linkage sites (Figure 21D). By limiting the length of these 5’ and 3’ segments, e.g., to 8 or 9 bases as shown, complexes shown in Figure 21E, F and G have low melting temperatures (e.g., below 20C, as in the examples shown), and therefore are unlikely to form in appreciable quantities at room temperature and above, compared to the construct in Figure 2 IE which is much more stable and thus capable of participating efficiently in the extension steps of Figure 19. The optimization of oligo segment lengths and sequences, and of temperature during relevant stages of a workflow, to disfavor the extension of incomplete constructs, while enabling extension of complete constructs, improves the efficiency of generation of sequenceable constructs.
7.6.8 Reading peptide:oligo constructs in nanopores
In some embodiments, the polymers attached to the peptides can fulfill several important functions in the process of sequence-sensitive peptide detection according to the invention, as exemplified in Figure 22, including: A) providing a highly charged “guide thread” that rapidly threads through a sequencing nanopore (ahead of the peptide) as a result of a voltage potential across the membrane in which a nanopore is embedded; B) work together with a protein nanomachine (e.g., a DNA motor such as a helicase) adjacent to the sequencing pore to provide a molecular “ratchet” that moves the peptide (or allows it to move under the influence of a cross-membrane electric potential) through the pore in discrete steps at a controlled rate (i.e., slow enough to allow accurate measurements of ion current through the pore for each step, despite being stochastic in nature); and C) provide a pore-sequenceable “sequence tag” or “barcode” whose nucleotide (or amino acid) sequence is read, in whole or in part, during transit through the sequencing pore (either before or after the TARGET peptide) and that identifies the attached peptide as having certain characteristics. In some embodiments, a sequence tag (e.g., comprising DNA) identifies a construct containing a TARGET peptide (and/or its associated STANDARD) as coming from a specific sample among a multiplicity of samples whose enriched peptides have been pooled (“multiplexed”) together after addition of sample-specific tags and prior to sequencing together in one sequencing run. Such a sample “barcode” can be included in a conventional sequencing adapter or provided as a separate polymer module to be linked with all of the constructs derived from a specific sample, as is commonly done using commercially available kits (for example https://store.nanoporetech.com/us/native-barcoding-expansion-l-12.html). This use of a tag for identification of a sample in a pool is well-known in the art as a means for multiplexing two or more samples to be combined in a pool for DNA sequencing: the tag allows sequences of the molecules from each sample to be separated after bulk sequencing. Sets of such tags are commercially available for the Oxford Nanopore system. Additional characteristics that may be encoded in the DNA Leader or Trailer tag include identification as a member of a specific TARGET or STANDARD peptide subset in cases in which subsets of TARGET peptides are separately extracted from a sample digest. Functions A and B above (guide thread and motor engagement) can potentially be fulfilled by a homopolymer (e.g., a DNA homopolymer of one of the 4 bases, or a peptide homopolymer of glutamic or aspartic acid), while function C requires a polymer (like DNA or peptide) made of multiple different monomers that can be distinguished by a nanopore sequencing system. In some embodiments sample barcodes are used together with TARGET vs STANDARD differentiating barcode tags (the former required in principle only once per sequenceable construct molecule, while the latter are required in association with each peptide molecule in a construct to identify its source). Barcodes provide an efficient way to record and recover information about the nature and source of individual peptide molecules in the invention, and thus exploit an advantageous feature of technologies such as nanopores that are capable of “reading” both oligonucleotide and polypeptide polymers.
In some embodiments shown in Figure 22A, Leader polymer 1 (labeled Oligo 1), linked to the n-terminal residue of peptide 3, threads through nanopore 5 in membrane 4 (shown in cross-section) as a result of its negative charge (i.e., the oligo’ s phosphate backbone) and the application of an electric potential between the “cis” side 9 of the membrane 4 and the “trans” side 10, in this case with the “trans” side positive. The peptide 3 follows Leader 1 into the pore. A second oligo 2 (Oligo 2), attached to the c-terminal residue of the peptide 3, is part of a complex governing the movement of the peptide through the pore. As shown schematically in Figure 22, this complex includes a protein nanomachine 7 (e.g., a DNA motor) interacting with oligo 2, another oligo 6 with a sequence partially complementary to oligo 2, and another oligo 8 forming a “tether” promoting an association between the membrane 4 and the construct. Tethering a construct close to the membrane (e.g., by providing a cholesterol functionality that adheres to the membrane while allowing free diffusion of the complex in the plane of the membrane while seeking an open pore) is known to increase the rate at which construct molecules thread into the nanopore by more than 1,000-fold.
In some embodiments, the current passing through nanopore 5 in membrane 4 between the cis chamber 9 and the trans chamber 10 (Figure 22) is measured (typically in picoamps given an electrolyte concentration in chambers 9 and 10 of 100-500mM salt). This current changes as amino acids or nucleic acid bases transit the narrow throat of nanopore 5, with the speed of transit regulated by nanomachine 7. A variety of DNA motor nanomachines 7 are known in the art and can be used, including helicases, polymerases, etc. In the embodiment shown in Figure 22, the nanopore is protein MspA or a derivative thereof, and nanomachine 7 is a processive enzyme motor protein such as a helicase, capable of regulating the passage of DNA through the nanopore 5. The motor is pre-positioned on adapter Oligo 2, where specialized bases prevent it from contacting the rest of the DNA before loading into the nanopore. This scheme is commercially available (Oxford Nanopore Technologies).
The DNA motor that engages with the oligo and regulates its passage through the nanopore is offset from the region of the nanopore (the “throat”) where the bases and/or amino acids modulate the through-pore current (i.e., where the polymer is read”). This offset provides the means by which peptides can be sequenced, by allowing the motor to engage oligo bases “above” the peptide while the peptide is in the throat begin read. However, since the DNA motor cannot engage with peptide to achieve ratchet motion, regions “below” the peptide by the offset distance cannot typically be read, but instead move rapidly through the nanopore until bases are once again engaged by the DNA motor. This offset and its effect on the acquisition of sequence information from peptide-oligo constructs is represented graphically in Figure 22. Construct 81 (a typical rope-tow construct) produces sequence information when moving through a nanopore whose offset between motor and throat 82 is approximately 8 bases. Thus sequence is obtained in this example for regions of the construct about 8 bases 5’- ward of each base in the oligonucleotide sections 89, but not for the non-base-containing (e.g., abasic) sections 87. Thus oligonucleotide section 85 can be read, as can peptide section 86, but region 83 produces no sequence information due to lack of DNA motor engagement. In some embodiments using peptide oligo constructs, whether in-line or rope-tow, this effect limits the length of peptide sequence that is observable using nanopore sequencing. For similar reasons, in rope-tow constructs, this offset limits the readable portion of a peptide linked near the 5’ end of an abasic stretch to a fixed length of amino acid chain measured from the 3’ end of the abasic stretch (i.e., the beginning of the oligo sequence that can engage the DNA motor). In some embodiments it is therefore preferred that peptides having a constant length (i.e., number of amino acids) that is the maximum length which will fit within the abasic stretch without overlapping base-containing stretches: this maximizes the portion of peptide sequence that falls within the offset 5’-ward of the following base-containing oligo section. Those skilled in the art will recognize that a section of peptide sequence of 8-12 amino acids should be readable using current nanopore systems (in future extendable to greater lengths by providing longer pores), and that the length of an abasic region parallel to a linked peptide molecule can be optimized so as to match (and possibly slightly exceed) the linear extended length of a peptide of 8, or 9, 10, 11 or 12 amino acids. This range of peptide lengths occurs very commonly in tryptic peptides of human and other proteins, indicating the likelihood that a proteotypic peptide of pre-specified length can be found for almost all target proteins.
A variety of alternative pores can be used in the invention. Nanopores such as aerolysin, a-hemolysin, fragaceatoxin C (FraC), MspA can be used. Motors working on DNA (e.g., Oligo 2 as shown) are known that can control movement of the construct towards the positively polarized trans side by “paying out” the oligo in steps while consuming chemical energy (e.g., ATP in the Oxford Nanopore system), as are motors such as phi29 DNAP (23) or helicase Hel308 (65) that “pull” an Oligo up against the electrophoretic force using ATP. A variety of schemes for controlling the upward (cis-wards) or downward (trans-wards) motion of a polymer through a nanopore have been described (referred to colloquially as “inny” and “outy” methods by Oxford Nanopore). Non-enzymatic methods can also be employed, such as “unzipping” of a DNA duplex under the influence of an electrophoretic force, and variation of the cross-membrane potential to regulate transit speed (68). Peptides can take the place of nucleic acids (e.g, replacing Oligo 2), in which case the motor function during readout as a construct transits a nanopore can be implemented using an “unfoldase” (69), or using ClpX on the “trans” side of the membrane. An alternative nanopore technology using “Field Effect Nanopore Transistor” has been described in US 9,341,592 B2 assigned to iNanoBio. A variety of non-biological nanopore technologies have been disclosed, many using semiconductor technology to create thin inorganic membranes (e.g, of Si3N4, SiO2, graphene, or MoS2) with small holes that function as nanopores, and in some cases enabling use of quantum mechanical tunneling currents across the lumen of the hole in addition to measurements of current through the hole as signals indicative of the transiting polymer sequence.
In the embodiment should in Figure 22B, the construct moves through the pore generating sequence information (as the timeline of pore current) from the peptide, and optionally Trailer Oligo 2 (e.g., containing a sequence tag), until the end of the construct is reached, at which point the construct is released by protein nanomachine 7 and the construct completes its movement through the pore and floats free into “trans” compartment 10. At this point another construct molecule can enter the pore from the “cis” side for sequencing, repeating the cycle.
Because of the limited length of the channel in a typical nanopore 5, sequence-related information can only be obtained from about 10-25 amino acids of the peptide 3 nearest to the linkage to the Trailer oligo 2 (i.e., at the c-terminal end of peptide 3 in Figure 22A). In some further embodiments, a second set of constructs is generated similar to those described above, but in which the peptide is linked in the opposite orientation by swapping the linkages to the two oligos in Figure 15. Such constructs (Figure 22C and D) allow sensing of sequence-related information from about 10-25 amino acids nearest to the n-terminal end of peptide 3. Combining results from the two reads, analogous to sequencing complementary strands of DNA, provides greater coverage of the peptide sequence as well as overlapping reads of the middle region of short peptides.
In some embodiments, means known in the art are applied to extend the length of a nanopore (and lengthen Oligo 2 if necessary) so as to be able to read longer amino acid sequences. These include stacking a spacer protein above the entrance to a nanopore as described, for example in WO 2021/111125, or construction of multi-component stacks (70).
It will be clear to those skilled in the art that numerous improvements in both nanopores and “motors” have been, and will be, devised, any of which could contribute to improvements in the performance of the present invention.
An example of the current state of the art of nanopore detection and its use to characterize peptides is described in patent application WO2021111125A1.
7.6.9 Parallel polymer “Rope-tow” constructs.
In some embodiments, a preferred alternative to the forgoing elaborate chemistry required to prepare “in-line” peptide-polymer constructs (e.g., with peptides and oligo segments alternating in a continuous linear polymer) is implemented by linking peptides by only one end to a continuous polymer VEHICLE (e.g., an oligonucleotide) having a plurality of available linkage sites, thus forming a long continuous polymer chain with multiple peptide branches. This VEHICLE construct is reminiscent of a rope-tow ski lift in which skiers can grasp any of a multiplicity of handles on a continuously moving rope to be pulled uphill. In this case the continuous polymer serves as the rope, and peptides are attached to the rope via linkers.
The motion of such a hybrid molecule traversing a nanopore can be regulated (e.g., to produce the desired ratchet motion) by interaction of the continuous polymer chain with a suitable motor (e.g., a continuous oligonucleotide interacting with a DNA motor). Short complementary oligos can be hybridized to the long continuous oligo as necessary to facilitate its interaction with a motor such as a DNA helicase. In the case of a typical oligonucleotide polymer, which has a uniform negative charge along its length due to the phosphates in its backbone, it is preferred that the attached peptides have a net positive or zero charge (i.e., experience an electrical force opposite to that of the polymer, or else no force, under the influence of the electric potential across the membrane) so that the polymer pulls them through the pore by the peptide end attached to the polymer, thus reducing the chance that the peptide’s free end will move forward and “bunch up” in the pore, potentially clogging it.
Peptides pulled through the nanopore transit alongside the continuous polymer, so that the nanopore throat is occluded by both together. In order to maximize the information content of the nanopore current signal for recognizing the peptide sequence, it is preferred that the region of polymer lying alongside a peptide is “featureless” allowing the peptide sequence signal to be recognized with minimal interference from the polymer background. This featurelessness can be achieved through the use of a homopolymer stretch alongside the peptide, for example an “abasic” stretch completely devoid of bases in the case of an oligonucleotide polymer. In some embodiments, the unit-spacing of the extended form oligo backbone (i.e., base pair spacing) is different from the amino acid spacing of the extended peptide.
In some embodiments a rope-tow VEHICLE can be created comprising i) an abasic stretch, ii) a reactive linker group within or adjacent to an abasic stretch and which is capable of making a covalent linkage to a peptide end, and iii) a stretch adjacent to the abasic stretch that is capable of engaging with a DNA motor (i.e., a stretch comprising bases, either single or double-stranded) to regulate movement of the oligo through a nanopore. Repeats of this configuration provide a long polymer VEHICLE with a plurality of peptide attachment sites. In some embodiments, rope-tow constructs are formed so as to be capable of assembling into longer concatamers, through enzymatic ligation (e.g., via DNA ligase), transposase recombination, CRISPR insertion, or chemical coupling (e.g., using click chemistry).
In some embodiments, a rope-tow construct has the advantage that it requires only a single linkage to a peptide, usually at either the peptide’s n- or c-terminus, instead of two sites as required when a peptide is inserted into an oligo ‘in-line” with the oligo backbone (the arrangement described in the prior art). In some embodiments, the oligo backbone through the abasic stretch provides a uniformly distributed negative charge (e.g., the canonical sugar- phosphate backbone of DNA) that largely masks the net charge and charge distribution of an attached peptide, and allows the electric potential applied between cis and trans sides of a nanopore to exert a near-uniform force on the construct irrespective of the peptide’s composition.
In some embodiments the abasic stretch has a length equal to or greater than the length of selected TARGET and STANDARDS peptides that can be linked to the linkage site, such that a peptide in an extended configuration can lie “alongside”, and parallel to, the extended oligo backbone without overlapping nucleic acid bases (i.e., such that the cross-sectional area of the linked peptide:oligo construct along the peptide is that of the peptide plus the backbone). In some embodiments one or more additional dsDNA segments are included in the oligo that contain sequence information useful to identify and establish registration of ionic signatures in a nanopore, to identify a construct as associated with a particular sample in a pool, or analyte in a panel (e.g., by identifying specific STANDARDS), or for quality control. In some embodiments, the oligo comprises one or more regions to which a DNA motor can bind, or where one can be pre-loaded (which regions may not be limited to natural DNA bases, abasic structures, or to a conventional sugar: phosphate backbone). In some embodiments the oligo comprises regions of DNA or RNA that can interact with a DNA motor to regulate passage of the oligo through a nanopore in a ratchet motion.
In some embodiments, multiple peptides are linked to multiple abasic sites on a prepared rope-tow oligo to form a peptide: oligo concatamer. In some embodiments, such an extended rope-tow “template” oligo VEHICLE, having a plurality of empty linkage sites and abasic stretches, is present in the cis compartment of a nanopore and reacts with TARGET and STANDARD peptides (introduced into the compartment in solution, or bound to BINDER from which they dissociate under conditions prevailing in the compartment, or on other solid supports) to form rope-tow peptide:oligo constructs in the vicinity of a nanopore. In some embodiments, an empty extended “template” rope-tow oligo VEHICLE is bound to the nanopore’s membrane by a tether. In some embodiments, the empty rope-tow oligo continues to react with and accumulate attached peptides after the commencement of a sequencing run.
In some embodiments, a rope-tow oligo VEHICLE allows peptides functionalized with one member of a “click” reagent pair to react directly with a “click” site (comprising the other member of a “click” pair) on a pre-synthesized VEHICLE, which can be of any convenient length, and contain any number of abasic stretches, thus creating a continuous polymer capable of threading and being read continuously by a nanopore having a processive DNA motor. The peptides are attached to the VEHICLE backbone in an orientation such that they are dragged through the pore alongside an abasic stretch of sugar-phosphate backbone. The nanopore current trace describing the peptide sequence thus reflects the combined areas and chemical properties of backbone and amino acids passing through the reading region (i.e., the “throat” of the nanopore) as the two parallel polymer segments are ratcheted through the pore by the interaction of a DNA motor at the entrance to the pore with an oligo region following the abasic stretch. Those skilled in the art will understand that numerous permutations of this concept can be implemented with alternative oligonucleotide sequences, backbone chemistries (e.g., PNA, etc.), base-free regions (e.g., positions with side groups smaller than normal bases or abasic) of various lengths designed to accommodate peptides of various lengths, and chemical connecting groups (e.g., various click combinations, amino-reactive groups, etc.). In some embodiments, tryptic peptides ending in arginine are preferred, as these have a single amino group (the n-terminal amine), which conveniently provides a single specific site for attachment of a click group for facile linkage to a “rope-tow” oligo. In some embodiments tryptic peptides ending in lysine are connected by one of the peptide’s two amino groups following blockage on one amino group (e.g., blockage of the n-terminal amino group by reaction at near-neutral pH), thus allowing a peptide to be coupled in the opposite orientation (c-term-first into the pore) compared to linkage via the n-terminal group, and providing the ability to “read” peptide sequences in both directions. In some embodiments, aspartic and glutamic acid-free peptides are linked to the oligo via the unique carboxyl group at the peptide’s c-terminus. In some embodiments, peptides are attached to the oligo via a site near the 5’ end of an extended abasic region so that as the oligo is drawn 5 ’-first through a nanopore, the peptide (whether n-terminal first, via an n-terminal linkage, or c-terminal first via a c-terminal linkage) is pulled through the pore lying alongside the oligo backbone.
In some embodiments, as illustrated in Figure 23 A, a method of concatenating peptides for nanopore sequencing is provided in which a continuous polymer is prepared comprising single-stranded oligonucleotide segments 23 with stretches of abasic sites 21 (positions in which there is no base attached to the continuous sugar-phosphate backbone) and through which the chain therefore has a diminished cross-sectional area on account of the lack of bases. In some embodiments, the length of the abasic stretch is designed to be longer than any of the peptides to be linked to it (including the length of any chemical linkers). In some embodiments a nucleic acid residue 22 preceding (e.g., 5’ to) and adjacent to one or more abasic regions 21 is provided (during synthesis or through subsequent modification) that includes a reactive chemical linking group (e.g., one of a pair of click-chemistry groups, or a reactive group such as an amino group where a click functional group can be installed), capable of combining with the other of the pair of click groups that is attached to one terminus (e.g., the n-terminus) of a peptide molecule 3. The provision of abasic stretches provides a length of backbone devoid of bases, and whose diminished cross-sectional area allow a peptide chain attached at the leading (typically 5’) end of the stretch to be pulled through a nanopore parallel to the abasic backbone. The nanopore occlusion in the area of the abasic region is thus due to the peptide chain plus the parallel oligonucleotide backbone, and through this stretch the amino acid sequence can be read from the changes in nanopore ion current during transit under control of the DNA motor interacting with the DNA sequence to the 3’ end of the abasic region.
In some embodiments the polymer chain is synthesized chemically or ligated together from chemically-synthesized units, for example using oligonucleotide synthesis and incorporating DNA, RNA, modified DNA and abasic synthons (such abasic sequences can be obtained commercially from, e.g., Integrated DNA Technologies as dSpacer, rSpacer or Abasic II residues). In some embodiments, alternatives to the common DNA or RNA backbones are used, such as peptide nucleic acid or phosphorothioate backbones, or any of a variety of linear polymers that can be joined to oligonucleotide backbones to form a continuous molecule. It is understood by those skilled in the art that the polymer VEHICLE constructs described herein as comprising “oligos” or DNA can alternatively be formed of other polymers, other backbones and a variety of natural and modified bases or side-groups. It is likewise understood by those skilled in the art that the linkage groups described for coupling peptides to rope-tow constructs can be any of a variety of coupling chemistries including “click” chemistry, amine-reactive chemistries (e.g., NHS esters), carboxyl -reactive chemistries, etc. Reactive sites may be created in synthetic oligos by a variety of means, for example by including an amino-modified version of an internal base such as 5' Amino Modifier C6 dT or a 3’ Amino Modifier (both available commercially e.g., through Integrated DNA Technologies, Inc. custom DNA synthesis service). These amino groups can be converted to NHS derivatives as part of oligo manufacture, and can be further converted to click chemistry groups where required.
It is likewise understood by those skilled in the art that numerous alternative ratcheting nanomachines can be used to regulate movement of polymers, including oligos, through a nanopore in place of a DNA motor, and that numerous alternative biological (protein and DNA-based) and inorganic (e.g., solid-state) nanopores have been described that are capable of reading polymer sequences.
In some embodiments, the abasic stretches as described can be any structure that preserves the continuity of the backbone and preferably has a smaller cross-sectional area than canonical single-stranded DNA or RNA. In some embodiments the backbone in the abasic stretches comprises negative charges, mimicking the uniform negative charge distribution of a sugar-phosphate backbone.
The modified residues comprising linkage sites can be any of a variety of residues comprising a linker group, and the linker may be attached to a base, to a sugar, or to a phosphate group.
In some embodiments the linkage site is preceded (i.e., in the 5’ direction) by one or more abasic sites. Such preceding abasic sites can be provided to generate a high-current (almost open-pore) start signal preceding the current profile attributable to a linked peptide and parallel polymer backbone.
In some embodiments, as shown in Figure 23 A, a complementary oligonucleotide 24 is generated that hybridizes with oligo 23, except through the abasic regions (where there are no bases on 23 with which to hybridize), and in the region of the 5’ terminus of oligo 23 (where the 5’ region comprises a leader sequence, or non-oligo charged polymer, that threads through the nanopore initially and may comprise a site for binding a DNA motor 7). In some embodiments only a fraction of the corresponding residues hybridize. In some embodiments, the complementary oligo 23 is interrupted, comprising only segments that hybridize with the non-abasic (“natural”) segments of oligo 23, leaving the abasic stretches single-stranded. In some embodiments the number of complementary strand bases aligned with abasic stretches is not the same as the number of abasic sugar-phosphates, such that one strand is longer than the other in the abasic region, leading to a kink in the duplex at abasic regions, increased exposure of the region to the environment and lesser steric hindrance in the reaction of the oligo linkage site with that of the peptide.
In some embodiments the length of abasic regions can be set to accommodate the length of peptides that are intended to be read (e.g., the TARGET and STANDARD peptides), so that the abasic regions are at least as long as the extended length of the peptides plus any linking groups, thereby ensuring that a full-sized nucleic acid base and an amino acid do not transit the throat of the nanopore together, potentially clogging it. Abasic regions are typically 6 to 50 backbone (sugar-phosphate) units long.
In some embodiments, nanopores with throats of larger dimensions can be used to accommodate oligo and attached peptides in parallel (i.e., roughly double the throat area of nanopores currently used for DNA sequencing), in which case the stretches of abasic sites alongside the peptides in Figure 23 can be replaced with canonical DNA with backbone present with normal nucleotides attached. In such a case it is preferred that the stretch of oligo alongside the peptide is a homopolymer, thus providing a consistent background against which the variations of nanopore current due to different amino acids of the peptide can be detected.
In some embodiments, peptides are prepared for conjugation with rope-tow oligo VEHICLES by functionalizing the n-terminal amino group with a reactive moiety such as a click chemistry reagent suitable for joining to the modified oligo attachment site. In some embodiments, TARGET peptides ending in Arginine are preferred since they have only a single amino group (the n-terminus) and therefore are derivatized at only one site by reaction with an amino selective reagent.
In some embodiments, a peptide amino group is derivatized while the peptide is localized on a BINDER, and later released into solution to react with a rope-tow oligo VEHICLE.
In some embodiments, a peptide amino group is derivatized while the peptide is localized on a BINDER, and subsequently reacted with a solution of rope-tow oligo molecules, after which any unreacted rope-tow oligo is washed away prior to elution from the BINDER. This approach has the advantage that a large majority of the rope-tow molecules will have attached peptides (i.e., most or all attachments sites will be “loaded” with peptides), and subsequent concatentation of these loaded rope-tow oligos will generate a fully loaded construct capable of yielding a plurality of TARGET and STANDARD peptide counts in one nanopore read.
In some embodiments peptide amino groups are not modified but rather react directly with an oligo whose reactive sites can react directly with an amino group (e.g., the oligo VEHICLE has NHS linkage sites). In some embodiments peptides are eluted from BINDERS using a competing (i.e., “displacer”) peptide of the same or similar sequence to the TARGET and STANDARD peptides and introduced, when elution is required, at higher concentration than the TARGET and STANDARD peptides or BINDER binding sites. After a duration of one or a few BINDER half-off-times, the BINDER is saturated with the displacer peptide and the Target and STANDARD peptides will be free in solution. Displacer peptides can be modified so as to be unable to participate in linkage reactions taking place after elution of TARGET and STANDARD peptides, e.g., by blocking the n-terminal amino group and/or the c-terminal carboxyl, and any lysine amino group. In some embodiments that involve a linkage reaction with the n-terminal amino group of bound peptides, a displacer peptide of the same sequence as the TARGET (or STANDARD), but with the n-terminal amino group acetylated during or after synthesis, can be used to displace and thus release the bound peptides without interfering in the amino group chemistry. An advantage of eluting TARGET and STANDARD peptides using a displacer peptide is that no other solution conditions need be changed, reducing the likelihood of eluting non-specifically bound materials from the BINDERS, carrier beads or other supports. In the case of nanopore sequencing, use of a displacer peptide with a net positive charge results in no displacer peptide migrating towards or into the nanopore.
In some embodiments, not all reactive sites on the rope-tow oligo VEHICLE react with peptides, leaving empty peptide-accommodating sites. Empty sites are easily recognized in nanopore ion current traces by their short duration and lack of major current modulation.
In some embodiments, illustrated in Figure 23B, a rope-tow oligo (with peptides attached) and its complementary strand can be joined to a prepared adapter 25 comprising the components required to present the oligo to a sequencing nanopore, start threading into the pore and regulate movement into the pore (e.g., using a DNA motor). In this case the adapter is modeled after a commercially available “Y-adapter” provided by Oxford Nanopore, which can be ligated to a DNA duplex (e.g., aligned by a T-overhang) via a simple kit procedure according to the manufacturer.
In some embodiments, DNA motors are loaded at periodic sites along the rope-tow oligo between the peptide attachment sites in order to provide means to continue ratcheting the rope-tow VEHICLE through a nanopore if a motor falls off when transiting a peptide-loaded or abasic site.
In some embodiments rope-tow constructs are prepared by joining a peptide to an oligo having a single abasic stretch (i.e., with a capacity of a single peptide), and these short rope- tow constructs are subsequently assembled into concatamers using linkage methods described above and illustrated in Figures 23 and 24 (e.g., click chemistry or enzymatic ligation). In some embodiments, such short, single attachment site oligos are reacted with peptides while the peptides remain on the BINDER, allowing unreacted oligos to be washed away prior to elution of peptides from the BINDER, and resulting in subsequent concatamerization of only “loaded” oligos, thus avoiding empty abasic sites and wasted sequence reads.
In some embodiments, double-stranded oligo-peptide rope-tow VEHICLE constructs are introduced into prepared double-stranded sequenceable constructs (e.g., the products of well-known commercially-available library preparation kits and methods used with the Oxford Nanopore system) by recombinant processes (e.g., by use of transposases, CRISPR mechanisms, “tagmentation”, hybridization and repair, and the like) to form sequenceable concatamers.
In some embodiments, a bead functionalized with a single type of BINDER is used to capture and transport molecules of a single TARGET + STANDARD pair to the vicinity of a nanopore, where the peptides are eluted by one of the methods described above and allowed to combine (e.g., via “click” linkages as described above) with a VEHICLE construct molecule (e.g., a “rope-tow” construct) pre-positioned at (i.e., already threaded into or directly available to) the nanopore. In such an embodiment, an incubation period can optionally be included to allow eluted peptide molecules to couple with the VEHICLE prior to the start of motion through the nanopore. Motion of the construct through the pore can be initiated by an increase in the trans-membrane voltage pulling the typically negatively-charged construct through the nanopore, at which point a ratchet mechanism (e.g., a DNA motor) begins to feed the construct through the nanopore for reading. In some embodiments each nanopore is prepared with a single VEHICLE in place, and is used to sequence only that VEHICLE molecule - in such an embodiment it can be advantageous to use a long VEHICLE with a large number of peptide binding sites, for example a VEHICLE of lOOkb equivalent length with abasic stretches comprising peptide linkage sites every 100b and therefore able to accommodate 1,000 peptide molecules (a number sufficient to provide a precise TARGET -to-STAND ARD ratio, and thus protein amount, when the TARGET and STANDARD are in approximately equal amounts). In some embodiments, such pre-positioned VEHICLES associated to nanopores are combined with mixtures of different TARGET + STANDARD pairs.
In some embodiments a bi-functional support is used in which a BINDER 52 specific for a TARGET 48 + STANDARD 49 peptide pair is immobilized on a support (e.g., magnetic beads 51) that also carries immobilized VEHICLE molecules comprising a tag (e.g., a sequenceable DNA tag 55) that is assigned to the TARGET. An example using specific sequences is shown in Figure 25A to illustrate the concepts, while not limiting the scope of sequences, binders and chemistries that can be used. An object of this arrangement is to capture a TARGET + STANDARD peptide pair via the cognate BINDER, separate these peptides from other peptides, and subsequently allow the captured peptide molecules to react with the VEHICLE comprising the TARGET’S tag (which can be considered a VEHICLE cognate to the TARGET sequence), producing a construct whose sequenceable (e.g., DNA) tag identifies the TARGET and STANDARD peptides expected to be attached. This approach provides additional information (the identity of the TARGET) that improves the reliability of the identification of the TARGET and STANDARD peptides linked to the VEHICLE in nanopore current traces - essentially transmitting the identity of the BINDER to the nanopore sequencer via this “labeled construct”. In some embodiments, a bi-functional support (which may, for example, be one or more magnetic particles) carries multiple molecules of a BINDER and multiple molecules of a VEHICLE (e.g., a rope-tow construct 45) incorporating a sequence tag 55 indicative of the BINDER (and thus TARGET) identity. Figure 25A shows a rope-tow VEHICLE comprising a nanopore sequencing adapter 41 (including a DNA motor 50) followed by two tandem copies of a rope-tow construct, each of which comprises a linkage site 47 to which a peptide (48 or 49) is covalently linked by linkage 46. The construct can include tens, hundreds or thousands of repeats of a rope-tow construct, providing the capability to attach tens, hundreds or thousands of peptide molecules to a single VEHICLE molecule. Figure 25B shows a magnetic bead 51 to which is attached multiple copies of the VEHICLE construct 53 (e.g., the construct shown in Figure 25A) and multiple copies of BINDER (52) with bound TARGET (48) and STANDARD (49) peptides. This configuration results from exposure of the bifunctional bead to a standardized sample digest containing TARGET and STANDARD peptides, allowing the BINDERS to capture these peptides specifically, and subsequent removal of the beads from the digest and washing unbound peptides away. Figure 25C shows the configuration of the bifunctional bead after the peptides are eluted from the BINDERS and allowed to react with the linkage sites 47 on the VEHICLE constructs 53. The VEHICLE constructs are subsequently released from the beads using any of a variety of well- known chemical (e.g., reduction of an S-S bond) or enzymatic (e.g., cleavage at a restriction endonuclease site) means and delivered to a sequencing nanopore.
In some embodiments, including those shown in Figures 3, 4 and 26, the peptides can be chemically modified while bound to the BINDERS in order to introduce a linkage group capable of combining with the VEHICLE linkage sites (e.g., site 47). In some embodiments either the peptide linker group or the VEHICLE linker group, or both, are present while TARGET and STANDARD peptides are bound to the BINDERS in a form that is not reactive with the counterpart group, thereby reducing or eliminating premature reaction of free reagents with either of the linking groups. After the peptide and VEHICLE linking groups are in place and any reagents used to introduce them are removed (e.g., by washing the beads), an activation step is carried out to convert the inactive linker forms to active forms capable of reacting with counterpart groups to form linkages 46 between peptides and VEHICLES. In some embodiments the linkages of peptides to VEHICLES result from reactions between a pair of “click” chemistry groups i.e., one on the peptide (e.g., on its n-terminal amino group) and one on the VEHICLE (e.g., at linker site 47), at least one of which was prepared initially in an unreactive form and converted afterwards to an active form able to react to form linkage 46. Introduction of “click” chemistry precursor groups into peptides and nucleic acids, and their subsequent conversion to reactive “click” groups is well known in the art, for example using chemistries described as “single-pot click chemistry” (71). Subsequent elution of the peptides (now having active linkers) from the BINDERS results in rapid reaction of the peptides with the VEHICLE at high efficiency due to the very close spatial proximity of the BINDERS to the VEHICLES on a bead. In some embodiments the yield of peptides linked to VEHICLES is further improved by carrying out the elution step (and subsequent reactions to form linkages 46) after the bead has been placed in a nanowell, thus restricting diffusion of eluted peptides away from the bead and VEHICLES to a very small volume. In some embodiments the VEHICLES are released from the beads before, during or after elution of the peptides from the BINDERS.
In some embodiments in which reactive groups are introduced into peptides while the peptides are bound by BINDERS (e.g., by reaction with peptide amino groups), the BINDERS are either selected so as to contain no reactive amino groups (e.g., nucleic acid aptamers contain no free amino groups), or else the amino groups of the BINDERS are blocked to avoid creation of active linkage-capable groups on the BINDERS that could compete for reaction with the VEHICLE reactive sites.
In some embodiments, oligo:peptide constructs of the invention are purified before sequencing by binding to and elution from a support designed to capture and enrich nucleic acids from a sample (e.g., Agencourt AMPure XP beads (Beckman Coulter)).
7.6.10 Concatamer construct formats
In some embodiments, peptide constructs are concatenated to optimize detection performance, specifically throughput. Presentation of peptides as concatamers allows a nanopore to continuously read molecules, and avoids delays that can arise if a pore must wait for each of a series of short constructs to approach and thread the nanopore to be read. In some embodiments, as shown in Figure 27, the ligation approach can be modified to enable assembly of long strings of covalently linked molecules, in this case with polymer linkers (e.g., DNA) between peptide molecules in an alternating pattern. In the case shown in Figure 27, the DNA linker is modified compared to the Oligos of Figure 15 by providing ADIBO “click” functionality on one end (the 3’ end in this example) and an amine reactive NHS functionality on the other (5’) end. When the individual peptides having a c-terminal lysine residue (with a free epsilon amino group) are modified to introduce azide functionality on the n-terminus (as shown in Step 3 of Figure 28) while on the BINDER, and then, after removal of the unreacted AAAH, released from the BINDER into solution and exposed to the modified oligos (here labeled “Linker”), the result is rapid formation of “click” links to form extended chains of peptide+linker repeats. The arrangement shown in Figure 27 preserves a 5’ to 3’ polarity for the DNA linking segments as required in typical nanopore sequencing systems. In the resulting concatamers (Figure 28A), specific sites on the linkers interact with molecular motors (Figure 28B; either pre-positioned on the oligos or added later in solution) to provide a ratchet movement of peptides through the pore. In the case of molecular motors like nucleic acid helicases, polymerases and the like, the motor might be expected to fall off the concatamer when it encounters the peptide segment (thus requiring a new motor assembly on each DNA linker as shown in Figure 28B) - however as disclosed in WO 2021/111125 there are existing motor proteins that can slide over a peptide and engage a subsequent DNA segment.
In some further embodiments shown in Figure 28C and 28D, “click” chemistry is employed to assemble concatamers without intervening DNA linkers. In this case one end of peptides (e.g., the n-terminus) is derivatized with one of a pair of “click” reagents (e.g., azide functionality introduced using azidoacetic anhydride, while the other end (e.g., the tertiary amino group of a c-terminal lysine residue) is derivatized with Aza-dibenzocyclooctyne (ADIBO). When the peptides are released from the BINDER into solution, the peptides react with one another to form extended concatamers. In this embodiment, alternative molecular motors capable of stepwise, ratchet-like processing of polypeptide chains are used (70) instead of the enzymes used with nucleic acids.
In some embodiments, constructs are concatenated by hybridization as shown in Figure 24. Here the constructs of Figure 15 are joined in series by hybridization with an oligo (labeled Oligo 3) that has sequence regions complementary with both Oligo 1 and Oligo 2. After hybridization has occurred, an enzymatic ligation is carried out (at site indicated by the black triangles) to covalently join the successive constructs into a continuous single stranded molecule comprising repeats of Oligo 1, peptide and Oligo 2. In some embodiments Motor proteins are pre-positioned on the constructs prior to ligation as shown. In some embodiments a small proportion of the Oligo 1 Linkers of Figure 28 are specialized as “guide threads” optimized to engage and enter sequencing pores most efficiently (e.g., by having optimal means for engaging tethers to bring the concatamer into contact with the membrane allow its diffusion to a pre). Such Linkers have “click” linkage sites on only one end, and so are incorporated only at one end of a concatamer (typically the 5’ end). When the enrichment on BINDER is carried out to enrich one TARGET peptide and its STANDARD, then the resulting extended chains will be comprised of this TARGET peptide and STANDARD molecules in the same ratio as present in the original sample digest (provided that the reactivities of the TARGET and STANDARD with the other polymer components of the concatamer are equal, or nearly so - this equivalence being a criterion for selection of the STANDARD structure with respect to the TARGET peptide). However, when multiple TARGET peptides and their respective STANDARDS (e.g., forming a protein biomarker panel) are enriched using the panel of respective BINDERS together, and then eluted into solution, the concatamers that form in solution comprising these peptides will be composed of random mixtures of the different TARGET peptides and their STANDARDS, and each nanopore read will include a variety of TARGET peptides (and cognate STANDARDS).
7.6.10.1 Single TARGET concatamers
If each concatamer molecule is comprised only one type of TARGET and its STANDARD, then recognition and classification of the peptides using a nanopore current trace would be simplified, based on the a priori expectation that the peptide sequences read would all be variations of a single TARGET sequence. Thus in some embodiments, the constructs described above are concatenated so as to join into a given chain only molecules of a given TARGET peptide and its STANDARD (i.e., each chain being homogenous with respect to the TARGET to be read out), and will generate counts of only these two peptides. In some embodiments this objective is achieved by concatenating together constructs bound to an individual bead that is functionalized (or coated) with molecules of a single type (specificity) of BINDER. In a multiplex assay, this can be achieved by fixing each BINDER to beads separately (such that each bead has copies of only one BINDER on it) and subsequently pooling the beads to capture the various TARGET and STANDARD peptides. In some embodiments, after peptides are captured by BINDER on beads (each bead carrying bound peptide molecules of only one TARGET peptide and STANDARD pair), the beads can be distributed into very small (e.g., femtoliter) containers (such as those used in Illumina DNA sequencing technology) or droplets (as commonly used in digital PCR and associated microfluidic methods), effectively isolating each bead in a separate container. The addition of linking groups to the peptides can be carried out prior to distribution of beads into individual containers (i.e., while the peptides remain on the BINDER on the beads), or after the beads are distributed to containers. In the containers, the peptides are eluted from the BINDER (by exposure to eluting conditions or reagents, including displacer peptides), combined with the derivatized Oligos, and allowed to concatenate, forming sequenceable concatamer constructs as described in the invention. Since the peptides in each tiny container arise from one bead, and this bead bore only a single specificity of BINDER, the container’s peptide contents can be joined into concatamers containing only that TARGET peptide and its STANDARD. Concatamers comprising copies of only one TARGET peptide and its STANDARD are advantageous for two reasons: a) the probability of correctly recognizing the peptide sequence can be increased because multiple copies of the same (or similar STANDARD) sequence are detected successively in the current trace (“squiggle”) from a single nanopore and these jointly used to form a consensus assignment (e.g., by machine learning algorithms well-known in the nanopore sequencing art) to one of a set of pre-selected TARGET peptide sequences, and b) if a peptide sequence can be determined early in the processing of a long concatamer through the nanopore (i.e., in the first few of a large number of concatenated peptides), and that sequence has already been detected enough times to achieve the required assay sensitivity and/or precision, the concatamer can be ejected from the nanopore to enable the nanopore to begin reading a different sequenceable construct (a concept referred to as “computational enrichment of target sequences”).
In some embodiments the tiny container into which a single bead (with a single specificity of BINDER) is directed also comprises one or more sequencing nanopores, which are thereby devoted to sequencing a single TARGET and STANDARD pair. In some embodiments, microfluidic means are employed to control the movement of such beads into separate pore containers and optionally deliver one or more successive reagents into the container. In some embodiments each bead described above, bearing a single BINDER specificity and carrying bound molecules of a single TARGET peptide and STANDARD pair, is placed in the region of a single nanopore. The nanopore region can be a container in which one nanopore is present, and may be electrically isolated or having liquid or electrical connection to other nanopores, but having little or no diffusion between nanopore regions. The BINDER-bound peptides are eluted from the bead in the nanopore vicinity, and, due to their physical proximity to the nanopore and the nanopore’s isolation from other beads, the bead’s peptide cargo is detected, recognized as a specific TARGET peptide or STANDARD, and counted by passage of the constructs containing them through the nanopore.
In some embodiments an alternative approach can be implemented in which each BINDER is used separately to extract its TARGET peptides and STANDARD from a digest (a process that can be implemented as a sequence of separate BINDER captures), and then these peptide cargoes are processed separately to form similar homogenous concatamers. In such embodiments, the BINDER can be immobilized on physically separate supports (e.g., on separate porous affinity supports including column chromatography beads, or as separate zones on a porous membrane such as nitrocellulose as used in conventional lateral flow immunoassays), or the different BINDER can be exposed to a sample digest sequentially, one at a time, to produce separate captures.
In some embodiments beads (e.g., Pierce™ Protein A/G Magnetic Agarose Beads, diameter 10-40 microns) having a BINDER capacity larger than typical magnetic beads (e.g., Dynabeads, e.g., 2.8-micron diameter) are used in order to collect more molecules for presentation to a single nanopore. Methods for placing beads near nanopores to increase the rate and/or probability of sequenceable constructs entering the pore have been described in the art (72, 73) but they fail to encompass the current purpose of presenting one (or a small subset) of homologous constructs on one or a small number of beads to a given pore.
In some embodiments, microfluidic means known in the art are used to distribute single beads to individual nanopores.
It will be apparent to those skilled in the art that some optimization of concatamer assembly chemistry and conditions will be useful in order to minimize production of circular concatamers, which, because they have no free end to thread through a nanopore, are unlikely to be detected in a nanopore sequencer.
7.6.11 Rope-Tow ligation and concatenation using splint oligos.
Nanopore sequencing adapters can be ligated to one or a series of peptide- oligonucleotide constructs in tandem using a commercial ligase (e.g., T4 DNA ligase) capable of joining a 5’ phosphate of one oligo with a 3’ hydroxyl of another. The linkage can be facilitated by providing a single base sticky end, for example the T/A overhang at sites 45 shown in Figure 7B. Increased efficiency and specificity of ligation can be achieved using extended overlap regions in which “splint” oligos are provided having a 5’-end complementary to the 5’ end of a VEHICLE or sequencing adapter, and a 3’ end complementary to the 3’ end of another VEHICLE (shown in Figure 29). Annealing of two peptide-oligo constructs (such as a rope-tow constructs of Figure 26) with a splint such as 56 in Figure 29C allows formation of a head-to-toe chain of peptide-oligo constructs amenable to ligation by T4 ligase into a continuous long nanopore sequenceable molecule. Ligation of this chain with an appropriate sequencing adapter, optionally using a short splint 57 (Figure 29B) renders this molecule ready for entry into a nanopore.
7.6.12 Ligation of Double-Tag Rope-Tow Constructs
In some embodiments double-tag Rope-tow constructs according to the invention (exemplified by Figure 26) are directly suitable for ligation to a sequencing adapter (Figures 30A and B), and head-to-tail ligation in series (Figures 30C and D). In some embodiments each Rope-tow construct (the “right-hand” construct being ligated) has a 5 ’-phosphate and is hybridized to a complementary strand having a proj ecting 3 ’ A that base pairs with a proj ecting 3’ T on the “left-hand” construct being ligated. Such a configuration can be acted upon by a ligase (e.g., T4 ligase) to form a standard phosphate linkage between the two constructs.
7.6.13 Dynamic selection of concatamer molecules for detection at the nanopore level.
In some embodiments, a multiplex panel of proteins can be measured in the sample, and the different TARGET peptides (and their STANDARDS) can be enriched separately, for example using different populations of magnetic beads for each different TARGET peptide or by placement of loaded beads in separate tiny containers, with the chemical modifications and reaction with linkers to form chains also carried out separately for each TARGET peptide and STANDARD, as described above. Then each concatenated chain will contain only one type of TARGET peptide sequence and STANDARD. Pooling these separately processed concatamers will create a sample comprised of multiple peptides, but in which each concatamer molecule chain will contain only one predominant TARGET peptide and its STANDARD.
In some embodiments using this “individual enrichment and pool” approach, the first peptide sequences read by a nanopore from a concatamer molecule will identify the type of TARGET peptide (and/or STANDARD) comprising the whole molecule, making it possible for the sequencing system software to decide whether or not further counts of that TARGET peptide or its STANDARD are required, i.e., whether the minimum molecule counts for the peptide and respective STANDARD required to achieve the desired measurement precision have already been achieved or not (whether from this pore or others). If the desired minimum counts have already been collected, a reverse voltage can be applied to the pore and the concatamer ejected without spending further time reading, thus allowing another different concatamer to thread the pore and begin sequencing. This approach has been referred to as “computational enrichment of target sequences” (74).
If the desired minimum counts have not been achieved, then the current concatamer continues to be read, gradually adding counts towards the minimums. This approach has been implemented in devices for nanopore DNA sequencing, and shown to decrease repeated re- sequencing of the same sequences, and improved coverage of rarer sequences in a given total sequence output. In some embodiments of the present invention, this ability to reject already- well-measured peptides improves throughput substantially, and more substantially the longer (i.e., having more peptides attached) the concatamers are, the greater the benefit.
In the case that peptides are prepared and delivered to a nanopore individually (i.e., not in concatamers) there is less motivation to eject a sequence previously measured many times, since the total transit time for a single peptide construct is likely to be short. Hence computational enrichment offers little benefit towards the goal of stoichiometric flattening of a panel of TARGETS unless long constructs having each having multiple molecules of only a single type of TARGET and STANDARD are used.
7.6.14 Additional motive forces propelling peptide molecules through nanopores
In some embodiments, particularly those making use of long TARGET peptides, or peptides without attached nucleic acids, peptide movement through a nanopore can be facilitated by addition of charged molecules that bind to, but do not covalently react with, peptides (e.g., sodium dodecyl sulfate, and similar charged detergent molecules). 7.6.15 Data analysis.
Numerous algorithms have been developed to detect transitions between different through-pore ion current levels as molecules transit a nanopore, taking account of the variable timing between these transitions, and methods have been developed to classify sequences of these current transitions (i.e., sequence-sensitive signatures) using machine learning and other advanced computational techniques (e.g., commercial offerings by Oxford Nanopore). These methods have proven to be very accurate (approaching 99% accuracy) when used in “base calling” oligonucleotide sequences, and the same or similar computational methods can be applied to the determination of peptide sequences as well.
Given that DNA sequence determination involves discriminating among 4 bases, while perfect protein sequencing requires discriminating 20 amino acids, perfect peptide sequencing by nanopore reads is far more difficult than DNA sequencing. This is particularly evident when one takes account of the fact that in existing nanopores more than one residue typically affects the overall pore current - in many cases nanopores are sensing a “k-mer” of 3, 4, 5 or 6 consecutive residues at a time, which requires creation of extensively trained machine learning methods to de-convolve current signatures into residue sequences.
However, in the context of the present invention, which uses single molecule methods to count molecules of a limited variety (i.e., set of TARGET and STANDARD sequences, which in many cases will total 4 to 50 peptides) the primary requirement is to accurately classify a peptide’s sequence as either i) confidently recognized as one of a limited set whose nanopore signatures have been extensively characterized before (i.e., TARGETS and STANDARDS), and for which a machine learning method has been optimized, or ii) a molecule whose sequence has not been confidently recognized. In some embodiments, such recognition and classification of ion current signatures is used to count the confidently recognized TARGET and STANDARD peptides and eliminate the signatures that are not confidently recognized. A similar approach has been used to distinguish limited sets of DNA barcodes used to tag DNA libraries from different samples that are then pooled together for analysis. In such strategies, DNA reads that are not assigned to a barcode with sufficient certainty can be discarded, improving the overall quality of results. In some embodiments, a machine learning system is trained to recognize and classify the ion current signatures of a set of TARGET peptides and their STANDARDS using large numbers (e.g., thousands to millions) of “reads” (or “traces” or “signatures” or “squiggles”) of known peptide sequences transiting sequencing nanopores. Recognition based directly on machine learning evaluation of the ion current traces (i.e., current measurements over time, typically generating 100-1,000 current measurements during transit of a peptide) is generally more reliable than recognition based on amino acid sequences deduced from the traces, and therefore represents the preferred method of peptide recognition. This training can be accomplished using libraries of nanopore current signatures generated by constructs made from pure synthetic peptides having the TARGET and STANDARD peptide sequences. Large training sets of pure TARGET and STANDARD peptide constructs are used to select optimal recognition algorithms (e.g., machine learning methods including convolutional neural nets, etc.) and iteratively improve the classification accuracy of these methods to provide accurate counts of the various peptide sequences.
Since the recognition accuracy of specific peptides can be dependent on the type of nanopore used, in some embodiments the type of pore used is selected based on recognition performance of machine learning systems trained with a specific set of TARGET and STANDARD peptides on the various candidate pores. In some embodiments multiple types of nanopores are used in a system, allowing recognition of specific TARGET and STANDARD peptides by a type of nanopore best able to accurately recognize them. In some embodiments, novel nanopores are designed and tested to optimize performance in recognizing specific sets of TARGET and STANDARD peptides
In some embodiments, the accuracy of counting TARGET and STANDARD peptides is further improved by “counter-training” a machine learning system to reject peptide sequences other than TARGET and STANDARD peptides that may be present as low- abundance contaminants after enrichment of the TARGET and STANDARD peptides from digests of complex biological samples. In some embodiments a library of peptide sequences coded for by the relevant genome and sharing partial sequence or specific sequence motifs with members of a set of TARGET and STANDARD peptides is created and used to counter-train a peptide recognition system to avoid mistaking these sequences for authentic TARGET and STANDARD peptides. The results of such training can be expected to improve peptide recognition with time and the accumulated learning from increasing sample numbers, providing the potential to retrospectively improve the precision of past assay results by reanalysis with updated software.
In some embodiments a plurality of candidate TARGET peptide sequences (derived from one or more selected target proteins) are prepared as constructs for nanopore sequencing, and libraries of nanopore current reads collected using these molecules. This data is used to determine the accuracy with which specific peptide read signatures can be distinguished from other sequences, and this information used in the selection of a set of most accurately classifiable TARGET peptide sequences to represent the target proteins in subsequent routine analyses. Specific affinity reagents can then be generated to bind epitopes in the middle region of these sequences, providing optimal analytical performance.
In some embodiments, a plurality of TARGET peptide sequences derived from a panel of target proteins are prepared as constructs for nanopore sequencing, and libraries of nanopore current read signatures collected using these molecules. Classification accuracy data derived from these signatures is used to select a set of most accurately classifiable TARGET peptide sequences spanning the set of desired protein panel members.
In some embodiments, a plurality of candidate STANDARD sequences cognate to one or more TARGET peptide sequences is included in a set of constructs used to generate libraries of nanopore current signatures, and STANDARD sequences are selected for each TARGET peptide so as to provide a set of most accurately classifiable STANDARDS that minimize errors in classifying a TARGET peptide’s STANDARD in relation to other TARGET peptides and STANDARDS.
In some embodiments, peptide:oligo constructs (e.g., rope-tow constructs) are constructed with recognizable ion current signals (e.g., a high current associated with an abasic stretch) either before or after the peptide, or both before and after. The presence of such “punctuation” in an ion trace can substantially improve peptide sequence classification.
Use of the methods described above for selection of most-accurately classifiable TARGET and STANDARD peptide sequences provides information about each selected peptide and its likelihood of misclassification within a panel of TARGET and STANDARD peptides. In some embodiments additional signature and classification accuracy data is generated by analysis of sets of relevant biological samples (e.g., plasma or dried blood spot samples) and versions of these into which selected TARGET and STANDARD peptides have been spiked at known levels. STANDARD sequences are unlikely to exist among proteolytic fragments of naturally-occurring proteins (a supposition that is easily tested by bioinformatics analysis of the relevant genome sequences, allowing any naturally-occurring sequences to be rejected as STANDARD candidates) and therefore detection of apparent STANDARD signatures in digests of natural samples that have not been spiked with STANDARD provides a direct estimate of the “false positive” detection rate for STANDARDS. Comparisons of molecule counts among a set of STANDARDS spiked into sample digests at the same (or different but known) levels provides a means of estimating STANDARD “false negative” detection rates (e.g., any STANDARD showing fewer counts than other STANDARD spiked at the same level is likely to be affected by false negative detection errors). Since TARGET peptides may likely be detectable in digests of natural samples from the relevant species, false positive and negative detection rates can be estimated by comparing TARGET and STANDARD peptide detection rates in samples spiked with equal amounts of TARGET and STANDARD peptides: any excess of TARGET peptide counts over STANDARD counts provides an estimate of the TARGET peptide false positive rate, and any deficit of TARGET peptide counts compared to STANDARD counts provides an estimate of the TARGET peptide negative rate (in each of these cases taking into account the independently determined false positive and negative detection rates of the STANDARD’S.
In some embodiments using chemical and/or affinity-based methods of single molecule sequencing and counting instead of nanopore sequencing, alternative indices of sequence error can be used, e.g., an experimentally determined confusion matrix among amino acids, and/or an experimentally determined confusion matrix among the selected Target and STANDARD peptides. In such systems, if two or more peptides are found to be confused with one another, the detection approach can be modified, e.g., by extending the sequence acquisition to more residues (e.g., when using sequential degradative readouts), by alteration of the sequence or modification of a STANDARD involved in a confusion uncertainty, by selection of an alternate TARGET sequence from a target protein, or by other means known in the art. 7.7 USE OF THE INVENTION WITH SINGLE MOLECULE IMAGING AND COUNTING TECHNOLOGIES.
Advances in fluorescence microscopy make it possible to reliably detect single molecules of a fluorescent dye and acquire images of large numbers of such molecules and their spatial distribution (38, 75). TARGET and STANDARD constructs prepared according to the invention can be immobilized (e.g., on glass or quartz slides) and, after staining with fluorescently labeled reagents (e.g., BINDERS for peptides and complementary oligos for flags and barcodes), imaged using this technology to count molecules.
7.7.1 Peptide detection by imaging BINDERS interacting with immobilized TARGET or STANDARD peptides and constructs immobilized on a surface
In some embodiments, peptide molecules are immobilized (e.g., on a surface) and their identities (e.g., as TARGET or STANDARD peptides) determined by optically detecting the binding (or lack of binding) of a series of one or more specific and/or possibly promiscuous affinity reagents with optically-detectable labels (e.g., BINDERS and oligonucleotides complementary to barcode sequences that are labeled with fluorescent dyes or proteins) applied to the surface one after another (e.g., in a flowcell) with the option of removing each affinity reagent before application of the next, and using recognition techniques (including machine learning) to decipher peptide identity based on the pattern of affinity reagents that do, or do not, bind detectably. Such a system, for example that described in US Patent Application 16/659,132, can also be used to count TARGET and STANDARD peptide molecules of the invention (or to count intact target protein molecules in the event that BINDERS bind to linear epitopes).
A variety of linkage chemistries can be employed to connect a peptide construct to an imageable surface, including through direct reaction with peptide amino groups (e.g., using NHS esters), with carboxyl groups (e.g., using carbodiimide chemistries), with cysteine sulfhydryl groups, and with biotin, click chemistry and other groups that have previously been introduced into a peptide construct. Chemistries such as e.g., click chemistry, involve modification of a site or sites on the peptide as well as providing a connecting site on the surface. In some embodiments the required modification(s) of the peptide are carried out while the TARGET and STANDARD peptides are bound to the BINDER (e.g., during the capture stage of the enrichment process).
In some embodiments, for example those making use of ‘structured nucleic acid particles’ (SNAPs; (54)) or similar methods as a means of densely arraying single molecules on a solid support, TARGET and STANDARD peptides or peptide:oligo constructs comprising a click attachment group (such as methyltetrazine) can be eluted from BINDERS (after enrichment) in the presence of a concentrated suspension of SNAPs comprising a TCO click attachment group, resulting in the covalent coupling of one peptide to each SNAP, after which large numbers (e.g., 10 billion) SNAPs can be arrayed for affinity reagent imaging in a suitable optical detection system. Since click chemical linkages are relatively insensitive to pH, the elution of peptides (and associated VEHICLES) from BINDERS under acidic conditions (e.g., pH 3.0) can occur before, at the same time as, or after the peptides couple to the SNAPs. In some embodiments, this elution and coupling can take place in a very small volume, e.g., within the interstitial volume of a packed mass of magnetic beads on which the BINDERS are immobilized (i.e., in 0.1 to a few microliters of liquid). By thus efficiently transferring peptides and constructs from one immobilized BINDER to another class of object (e.g., a SNAP) that is designed to efficiently convey them into position for reading, a high proportion of TARGET molecules can be recovered, and the sensitivity of the analytical system maximized. In some embodiments, constructs bound to BINDERS on magnetic beads are reacted with SNAPs, and the magnetic beads carrying the SNAP: construct complexes moved into close proximity with an imageable surface before release (i.e., elution) of the complexes from the BINDERS on the beads, after which they need only migrate a very short distance by diffusion to reach and bind to the imageable surface. This approach significantly diminishes losses of molecules in the workflow, and thereby maximizes detection sensitivity.
In some embodiments peptide detection is accomplished using BINDERS modified to comprise detectable labels (e.g., fluorescent dyes or proteins such as GFP, nanoparticles comprising fluorescent dyes, enzymes that generate optically detectable products, and the like) to visualize TARGET and STANDARD peptides on a support. In some embodiments, specific nucleic acid sequences are detected by means of hybridizing complementary probes comprising optically detectable labels, for example labels like those used in optical genome mapping (76). Well-known methods of optical detection using microscopic systems are able to detect individual bound labels and associate the resulting optical signals with discrete locations on a surface, thereby allowing a sequence of binding events to be constructed for each bound analyte molecule (e.g., TARGETS and STANDARDS).
In such a system, a specific STANDARD label functionality (e.g., biotin, a fluorescent label, a unique short peptide segment, or a unique oligonucleotide sequence) can be incorporated into the structure of STANDARDS in order to facilitate their identification and discrimination from TARGET peptides using a reagent capable of specifically binding to the label or direct optical detection (e.g., of a fluorescent label).
The use of multiple detection stages, with removal of detection labels between stages, thus allows separate detection and identification of specific features of single peptides or peptide:oligo construct molecules, including: a) identification as STANDARDS or TARGETS (through detection of the STANDARD label); b) identification as a molecule bound by a first BINDER through detection of a BINDER ID code (and so forth for multiple BINDERs); c) identification as a molecule derived from processing of a first specific sample (i.e., a sample within a pool of samples) through detection of a sample barcode (and so forth for multiple samples); identification as members of a first TARGET + STANDARD cognate pair through detection of one or more peptide sequence-specific detection reagents, any of which may be a BINDER (and so forth for multiple distinct peptides).
Figure 31 schematically illustrates the use of such a multi-step detection approach to characterize a standardized sample digest, focusing on a region where 96 peptide:oligo construct molecules prepared according to the invention and arrayed on a surface are probed, and the results decoded to provide a quantitative estimated of the amount of a TARGET molecule. In Fig 31 A and B, two different BINDERS are used and optically detected (shown in green) where present. These signals establish the array sites having each of the two TARGET peptides. Fig 31C and D, BINDERS (or oligos complementary to construct DNA sequences identifying TARGET and STANDARD molecules) are separately applied and imaged to determine which constructs are TARGET and STANDARD molecules (in this case irrespective of which peptide they represent). Finally, Fig 3 IE and F show detection results of separately applying oligos complementary to construct DNA barcode sequences identifying molecules recovered from two different sample digests (Samples 1 and 2). Using the digital information provided by the binary “optically detected or undetected” signals recorded for each arrayed molecule during these 6 detection cycles, the number of molecules of each TARGET and STANDARD version of each peptide in each sample can be directly tabulated, and the ratio of TARGET to STANDARD counts computed. This ratio, multiplied by the known amount (in relative or absolute terms) of the STANDARD added during standardization of the sample digest, provides a measure of TARGET abundance.
Figure 11 shows schematically a series of 6 such sequential detection steps or cycles, each using different binders to identify, or help identify, a specific peptide sequence. In cycle 1, peptide A is recognized first by an anti -peptide antibody BINDER specific for an internal peptide epitope. In cycle 2, two different BINDERS specific for short trimer amino acid sequences present in the peptide are used to support peptide identification. In cycle 3, an antibody specific for the c-terminal amino acid (or amino acids) is used to further support peptide identification. In cycles 4, 5 and 6, additional anti-peptide antibody BINDERS specific for other peptide sequences are used to identify these molecules coupled to a support. Each of the bound peptides is part of a larger peptide:oligo construct comprising DNA sample barcodes (Codes 1,5,7 and 11 in the Figure) and a DNA barcode identify each molecule as either a TARGET or STANDARD (shown as the OLIGO-EITHER barcode).
In some embodiments, for example those using BINDERS as a means of identifying immobilized TARGET (and cognate STANDARD) peptides (e.g., imaging methods), there may be a need to confirm these identities by generating additional information on the peptides. A variety of means may be employed to probe specific features of peptides, making use of the fact that their sequences are known a priori (i.e., established during initial TARGET peptide selection).
Figure 32 shows additional methods of confirming or improving the specificity of BINDER interactions with peptides by detecting the effect of a change in peptide structure on the binding. In cycles 1, 2, and 3, respectively, antibody BINDERS to an internal epitope, to 2 short trimer epitopes, and to a c-terminal epitope are applied and read as in the example of Figure 11. A proteolytic enzyme capable of cleaving a specific site in some of the peptides is then applied to the immobilized peptide constructs, resulting in release of a c-terminal fragment from the peptide shown (Peptide- A). After Peptide-A is thus shortened, the same sequence of BINDERS is applied as in cycles 1-3, but the absence of some epitopes is revealed by the presence of antibody A binding (as before), and absence of one Anti-trimer signal and the anti- C-terminal BINDER signal. Such results confirm the presence of a specific proteolytic enzyme cleavage site within the peptide, and the binding of antibody A and one of the 2 anti-trimer antibodies to sites in the n-terminal portion of the peptide molecule. Similarly, if the peptide is linked to the remainder of the construct by its c-terminal end (e.g., through linkage to the amino group of a c-terminal lysine), then a cleavage with in the peptide will result in release of an n-terminal fragment, and loss of epitopes involving this fragment. In essence the addition of the proteolytic step enables “mapping” epitope locations within the peptide, further strengthening the identification. In some embodiments peptide identification is strengthened by observing the effects of one or more peptide alterations, including peptide cleavage (as shown in Fig 32), chemical or enzymatic removal of one or two terminal amino acids (e.g., using Edman degradation), enzymatic or chemical removal of a phosphate group from Ser, Thr or Tyr, chemical modification of one or more amino acids (e.g., alkylation of a free cysteine, etc.), or any of a very large repertoire of amino acid modifications known in proteomics and protein chemistry.
In some embodiments, the binding of specific BINDERS is further characterized by observation of the effects of altered solution conditions on the binding to individual peptide molecules. A change from near-neutral to acidic (or basic) pH can result in the dissociation of some binders (but not others) from some peptide epitopes. Similarly, a change from near- neutral to acidic (or basic) pH can result in the dissociation of some BINDERS from less- preferred peptide epitopes (e.g., sequences similar to but not the same as the cognate TARGET sequence) while remaining bound to the true cognate sequences. Likewise, the introduction of a chaotropic agent (such as NHiSCN), an organic solvent (e.g., acetonitrile), or a detergent can reduce binding of some BINDERS to some targets while allowing other, stronger interactions to persist. Likewise, a change in temperature can differentially affect various BINDER interactions. Similarly changes (particularly temperature) may be employed that affect interactions between oligonucleotide components of a construct and complementary probes used to read sample barcodes, BINDER barcodes, or TARGET/STANDARD codes. Any of the changes employed for detection of differential effects can be applied stepwise (i.e., as an abrupt change), or as a gradient of change over time - in which case the degree of change (determined as a function of time across a gradient) at which a BINDER interaction is affected can serve as a highly specific indicator of the affinity and/or specificity of the interaction, and hence its contribution to a correct identification.
In some embodiments, peptide molecules or their constructs are positioned at points on a predetermined lattice of locations on a planar support (e.g., like the system described in Patent Application 16/659,132). In some embodiments, peptide molecules or their constructs are positioned through hybridization of construct DNA sequences to complementary sequences in extended nucleic acid molecules produced by techniques such as “optical genome mapping” (OGM: e.g., US Patent 9,536,041). Such OGM implementations can use naturally-occurring DNA molecules, or DNA molecules designed to comprise tens, hundreds, thousands, or tens of thousands of repeating complementary sequences at appropriate intervals (e.g., 0.1, 0.5, or 1.0 microns separation) along the length of the molecules. Long DNA molecules linearized by OGM methods can be transferred to, and immobilized on, a planar support having appropriate reactive groups, thereby creating a regular array of complementary sites on a surface within a flowcell for optical imaging during application and removal of a series of optically labeled peptide BINDERS and oligonucleotide probes useful in characterizing bound peptide: oligo constructs. In some embodiments, peptide molecules and their constructs are positioned randomly, but at spacings that are typically optically resolvable, on a planar support through binding to sites previously established on the support, e.g., by coating the support with BINDER molecules, by coating the support with molecules having an affinity for some chemical feature of a peptide construct (including oligonucleotides complementary to components of a peptide construct, biotin labels, or the like), or with chemically reactive sites such as click chemistry groups capable of reacting with click groups on peptide constructs, etc.
In some embodiments comprising detection of molecules by binding of multiple BINDERS labeled with different fluorophores, an optical detection system is used capable of simultaneously and separately detecting these labels based on differences in their excitation and/or emission wavelengths (i.e., multicolor imaging). The use of multiple labels with separate detection wavelengths allows BINDERS to be multiplexed, thereby decreasing the number of binding and elution cycles required to observe a given set of BINDERS. These and other means described below of identifying (e.g., by amino acid sequence, partial amino acid sequence, or presence of sequence-related features detected by binding reactions) individual TARGET or STANDARD peptide molecules (or derivatives of these that preserve their individual identities) can be used to recognize and count the numbers of TARGET and STANDARD peptides in a sample that is standardized (i.e., by STANDARD addition) and enriched according to the invention.
In some embodiments a plurality of BINDERS, each specific for a different epitope on a peptide, are used to increase recognition specificity by increasing the number of amino acids involved in interactions with (i.e., “recognized by”) the BINDERS. This effectively increases the peptide sequence coverage of the BINDER(s). In some embodiments, 2 or more such single-epitope BINDERS are stably joined to form a single molecule, and the well-known “avidity effect” results in a much higher overall affinity for the peptide than would be seen with any of the BINDERS individually. In some embodiments, multiple single-epitope BINDERS with distinct optical (e.g., fluorescent) labels are used together, and peptides that bind the set of BINDERS cognate to the peptide’s epitopes are identified as those exhibiting the correct label emissions. In some embodiments, multiple single-epitope BINDERS are labeled with distinct fluorophores such that one BINDER is labeled with a fluorophore acting as a FRET donor and another BINDER is labeled with a fluorophore acting as a FRET acceptor. When such BINDERS bind to adjacent epitopes on a peptide, the proximity of the donor and acceptor fluorophores enables detection of this inter-epitope proximity relationship through detection of emission by the acceptor when the donor is illuminated at its excitation wavelength (i.e., a FRET signal is generated).
In some embodiments, BINDERS are used whose binding to a cognate peptide epitope is characterized by a rapid off-rate (e.g., in the range of 20 msec to 60 sec half-off-times). The optical signal from such a BINDER will appear and disappear as it repeatedly binds to, dissociates from, and re-binds to (etc.) an immobilized peptide construct. The number of transitions between bound (localized fluorescent signal) and dissociated (no localized signal) states per unit time serves as a quantitative kinetic parameter of the strength of binding (77) which can be used to differentiate binding events to the correct cognate epitope from binding events to a similar but slightly different epitope. This fine level structural recognition further amplifies the specificity of peptide detection.
In some embodiments, peptides are chemically modified before, during or after a series of imaging detection steps. In some embodiments these modifications alter the detectability of specific peptides, such that a method of detection (e.g., imaging of a BINDER bound to an epitope of the peptide) that produces a positive signal when used before the modification does not produce a signal after the modification has taken place (or vice versa). For example, a modification that perturbs, disrupts or cleaves the peptide in an epitope can result in the failure of a BINDER specific for the original intact epitope to bind to the peptide after the modification has taken place (or for the BINDER to exhibit altered binding kinetics as discussed above). By comparing the detection result (BINDER binds to the peptide) obtained before the modification with the result after the peptide has been modified to disrupt the BINDER’S epitope (BINDER fails to bind to the peptide), it is possible to infer that the peptide in question did, in fact, include a site that was modified. Likewise, a specific peptide modification can introduce a structural feature that enables binding of a BINDER that does not bind (or binds weakly) to the original intact epitope.
In some embodiments, a sequence-specific proteolytic cleavage is used as a modification - in this case the cleavage can result in release of the end portion of the peptide that is not immobilized on the support. Sequence-specific enzymes such as trypsin, ArgN, AspN, GluC, chymotrypsin, pepsin, papain and the like may be used to cleave peptides at specific sites - only peptides comprising such sites will be cleaved, and the positions of the sites in the cleaved peptide sequences, in relation to the BINDER epitopes, determine whether or not BINDER binding is affected.
In some embodiments, specific amino acids within a peptide sequence are modified. For example, protein kinase enzymes may be used to add a phosphate group to specific serine, threonine or tyrosine residues within a sequence. Addition of a phosphate group to an amino acid within a binding epitope is likely to have a significant impact on BINDER binding to the epitope (typically diminishing binding). In some embodiments, BINDERS are used that specifically bind to a phosphorylated epitope but do not bind to the unphosphoryated epitope, and in this case the BINDER binds to the peptide only after the kinase modification has taken place.
In some embodiments cysteine SH groups are modified, e.g., by reaction with iodoacetamide, acrylamide, or any of a variety of n-ethylmaleimide compounds. For example, a peptide containing cysteine in a BINDER’S epitope can be kept unmodified (including un- oxidized) for recognition by the BINDER and subsequently modified covalently by reaction with iodoacetamide, after which reprobing with the same BINDER results in weaker (or no) binding.
In some embodiments a free n-terminal amino group (or similarly a free c-terminal carboxyl group) can be modified in a way that impacts the binding of a BINDER whose epitope included that terminal group (e.g., by acetylation of the amino group, by removal of a terminal group, or by enzymatic addition of a terminal amino acid).
In some embodiments, the first of a pair of FRET donor-acceptor fluorophores is added to a site on the peptide (e.g., the n-terminal amino group, a cysteine SH group, a linker joined to the peptide) and the second to a BINDER capable of binding to an epitope near the site of the first. The intensity of the resulting FRET fluorescence provides a measurement of the distance between the two fluorophores that can contribute to the identification of the peptide.
In some embodiments one member of a pair of FRET donor-acceptor fluorophores is added to each of 2 BINDERS specific for adjacent epitopes on a peptide. BINDING of the two BINDERS in proximity to one another (i.e., to their adjacent epitopes) creates the conditions required for FRET detection, thus confirming correct binding to these epitopes.
In some embodiments one or more BINDERS capable of distinguishing terminal amino acids (or the terminal pair of amino acids) is used to determine this feature of the peptide sequence, thus adding considerable specificity to the overall detection scheme. BINDERS such as those described above for use in peptide sequencing by cyclical degradation/identification can be used for this purpose. In some embodiments repeated cycles of Edman or enzymatic removal of one or two terminal amino acids allows identification of multiple terminal amino acids. In some embodiments, the binding of specific BINDERS is further characterized by observation of the effects of altered solution conditions on the binding to individual peptide molecules. A change from near-neutral to acidic (or basic) pH can result in the dissociation of some binders (but not others) from some peptide epitopes. Similarly, a change from near- neutral to acidic (or basic) pH can result in the dissociation of some BINDERS from less- preferred peptide epitopes (e.g., sequences similar to but not the same as the cognate TARGET sequence) while remaining bound to the true cognate sequences. Likewise, the introduction of a chaotropic agent (such as NH4SCN), an organic solvent (e.g., acetonitrile), or a detergent can reduce binding of some BINDERS to some targets while allowing other, stronger interactions to persist. Likewise, a change in temperature can differentially affect various BINDER interactions. Similarly changes (particularly temperature) may be employed that affect interactions between oligonucleotide components of a construct and complementary probes used to read sample barcodes, BINDER barcodes, or TARGET/STANDARD codes. Any of the changes employed for detection of differential effects can be applied stepwise (i.e., as an abrupt change), or as a gradient of change over time - in which case the degree of change (determined as a function of time across a gradient) at which a BINDER interaction is affected can serve as a highly specific indicator of the affinity and/or specificity of the interaction, and hence its contribution to a correct identification.
7.8 USE OF THE INVENTION WITH SINGLE MOLECULE DEGRADATIVE SEQUENCING AND COUNTING TECHNOLOGIES
7.8.1 Peptide sequencing by cyclical degradation with “reverse translation” to DNA
In some embodiments, peptide molecules can be “reverse-translated” into nucleic acid sequences using a cyclic procedure involving recognition of peptide n-terminal amino acid residues, or a pair of n-terminal residues, and using this recognition to add or transfer an oligo sequence tag specific for the detected amino acid (or pair) to a growing DNA oligo, which is subsequently sequenced to identify and count the reverse-translated TARGET and STANDARD sequences. This technology, described in US Patent Application 16/760,028, can also be used to identify and count TARGET and STANDARD peptide molecules of the invention. In some embodiments making use of the detection schemes that reverse translate peptide sequence into nucleic acid sequence information, additional information comprising a peptide’s identity as a TARGET or STANDARD, sample identity (e.g., a sample barcode), identity of the BINDER that bound the peptide during an enrichment step (e.g., a BINDER barcode), and other pertinent information can be added to a growing DNA oligo using any of the methods well-known in the art, including copying of a sequence (e.g., by a polymerase), ligation of an oligo onto the growing chain, insertion of a sequence using CRISPR and related technologies, etc. The information thus collected characterizes the peptide in a variety of ways useful in the interpretation of the peptide molecule’s sequence and its significance in an assay. This information may be read out using any of the well-known nucleic acid detection (e.g., PCR) or sequencing methodologies (nanopores, sequencing by synthesis, etc.). This readout can be accomplished either with or without first removing the peptide from the nucleic acid component of the construct.
7.8.2 Peptide sequencing by cyclical degradation with optical detection
In some embodiments, the TARGET and STANDARD peptide molecules of the invention can be arrayed by binding to a surface, or by distribution in an array of pre-formed wells or zones on a surface, and the molecules can be observed individually by a position- sensitive detection means, e.g., optical detection means or electronic detection means. Appl.No.: 16/686,028 describes such a method that can be used to decode the sequence of individual peptide molecules anchored in individual wells of an array of wells in a semiconductor chip. The method enables identification of individual molecules by matching to TARGET peptide or STANDARD sequences, and tabulation of the numbers of such molecules occurring in the array of wells (described as millions of wells on a semiconductor chip substrate (35 ). As described herein, it may suffice to sequence only 3, or 4, or 5 or 6 amino acids from the end of a selected set of TARGET and STANDARD peptides in order to identify them (discussed elsewhere herein). A further alternative detection means capable of detecting the step-wise removal of fluorescent labels on selected amino acid side chains of peptides immobilized on a surface during cycles of Edman degradation has been described (18, 78). This approach can also be used to identify and count TARGET and STANDARD peptide molecules. 7.8.3 Peptide sequencing by cyclical degradation with electronic detection
In some embodiments, peptide molecules and/or VEHCILE constructs are immobilized on a surface and their identities (e.g., as TARGET or STANDARD peptides) determined by electronic detection of the presence of BINDERS recognizing peptide n-terminal amino acid residues (35). Analogous technological means can be used in the same or similar platforms to read DNA sequences present in peptide:oligo constructs (79).
7.9 CALIBRATORS AND CONTROLS TO SUPPORT ACCURATE PEPTIDE QUANTITATION.
A majority of biomarker tests for proteins deliver a result based on quantity (e.g., the concentration of the target protein in a biological sample) rather than reporting a sequence. In order to make use of sequence sensitive single molecule detection for quantitation of peptides and proteins, it is important to consider the calibration of these detection systems (e.g., using “calibrators”), as well as the confirmation that calibration is reliable (i.e., through analysis of “controls”).
Use of external calibrator and control samples, analyzed alongside experimental samples to be analyzed, is well known in the analytical art, and widely used for specific assays (e.g., immunoassays) in clinical diagnostics and in research. Typically, data obtained by analysis of a calibrator is used to determine one or more adjustable parameters that bring the system’s analytical result into concordance with an established external reference system. If a measurement system is inherently linear, a single point calibration can be used to provide a calibration factor by which detector output is multiplied to yield standard abundance or concentration units. In non-linear detection systems, calibrators with multiple levels of analyte may be used to produce a non-linear “standard curve” to translate detector output into an accurate abundance or concentration value. Control samples are typically provided to confirm that calibration has been effective: values obtained by analysis of one or more control samples are compared, after calibration adjustments, to pre-assigned values as a test of the calibration validity (i.e., controls provide quality control for the assay and its calibration). In some embodiments, calibrators and controls of this type are provided to be analyzed in the same sequence sensitive single molecule detection workflow as experimental samples, and thus provide calibration of the entire workflow. In some embodiments an additional level of calibration and control is provided to ensure optimal operation of the sequence sensitive single molecule detector itself (i.e., focusing on the detector alone, instead of the entire workflow that includes digestion, any chemical modifications, etc.). This is particularly relevant because sequence sensitive single molecule detectors can produce errors, specifically misidentification of nucleic acid bases, amino acids, or whole molecules. In the case of nanopore detection, these errors can arise from several sources, including errors in assembly of peptides into sequenceable constructs, the movement of molecules through the pore, defects in a nanopore itself, statistical fluctuations in current flowing through a nanopore, electronic noise in the device measuring through-pore current, and a variety of errors contributed by the complex mathematical algorithms, including deep multi-layer machine learning software systems used to interpret that current traces. These errors are inherently difficult to address by simple inspection because individual nanopore current traces are extremely difficult to interpret “manually” (i.e., by visual or elementary mathematical inspection), and essentially impossible to evaluate manually in large numbers (as required for useful quantitative applications). Software systems that are capable of rapid and accurate evaluation of nanopore current traces are typically complex, multilayered machine learning systems, which are well known in the art to be essentially impossible to understand in detail in terms of human-perceivable logic processes. It is therefore desirable to provide means for a) minimizing, and b) evaluation the sum of these errors so as to provide accurate peptide quantitation and a precise understanding of the magnitude and nature of errors remaining.
In some embodiments, calibrator TARGET and STANDARD constructs are provided to address this issue, and these can perform either or both of at least two functions: 1) calibration of the relationship between the numbers of TARGET and STANDARD molecules reported by the detection system and the numbers expected based on prior validated measurement of the numbers present in the calibrator material, and 2) tuning and assessment of the accuracy with which TARGET and STANDARD molecules are classified.
In some embodiments a calibrator sample is provided that is capable of being read by a nanopore under conditions that area the same as, or similar to, those pertaining when sample peptides are read using workflows of the invention. In some embodiments, a calibrator comprises a polymer VEHICLE (which may comprise polymer segments capable of threading a nanopore and oligonucleotide segments capable of engaging an oligonucleotide motor to control movement through a nanopore), with TARGET and STANDARD peptide molecules attached or incorporated therein, and in which the ratio between the numbers of TARGET and STANDARD peptide molecules in the sample’s population of calibrator constructs is known. Nanopore analysis and current trace interpretation of such a calibrator sample will generate an experimentally determined TARGET: STANDARD ratio, which may be compared to the ratio known a priori to be present in the calibrator. Any discrepancy can be used as a basis for calculating and applying a correction factor to the TARGET STANDARD ratio reported by the analytical system on other samples. For example, if the known TARGET STANDARD ratio in the calibrator is 1.0, and the nanopore result (TARGET: STANDARD ratio calculated from the counts of TARGET and STANDARD molecules by the analytical system) is 1.2, then the measured ratios for other samples can be multiplied by 1/1.2 to provide a calibrated result.
In some embodiments, a calibrator is used to tune the detection system itself. In this case, the calibrator comprises constructs of TARGET and STANDARD peptide molecules on a VEHICLE in a manner that identifies TARGET and STANDARD molecules to the detection system. For example, the calibrator can comprise a mixture of two constructs consisting of a) a plurality of TARGET peptides on a type of VEHICLE and b) a plurality of STANDARD peptides on the same or a different type of VEHICLE. In this case, each construct comprises multiple copies of either the TARGET and STANDARD peptides, but not both. In some embodiments, TARGET and STANDARD peptides are coupled to different VEHICLES that provide independent identification of which peptide is present (e.g., by incorporating different DNA or other recognizable sequences. In some embodiments, TARGET and STANDARD peptides are present in the construct in an order or arrangement (e.g., alternating order) that allows the sequencing systems to accurately infer the identity of each peptide. In analyzing the calibrator, the detection system is able to recognize a separate set of valid current traces for each of the two types (TARGET and STANDARD). These sets of trace data, correctly labeled as to identity based on the calibrator’s design, provide an opportunity to test the accuracy of the system’s assignment resulting from processing the current traces, resulting in a typical confusion matrix composed of the correct and incorrect calls for each of the two peptides. In some embodiments a similar calibrator approach (providing sets of TARGET and STANDARD trace data, correctly labeled as to identity) is used to adjust parameters of nanopre current trace evaluating algorithms themselves. A number of studies have suggested that direct application of machine learning to the raw current traces can provide better accuracy in identifying a sequence among a set of candidates (e.g., a DNA or peptide “barcode”) than evaluation through “base-calling” and comparisons of the resulting sequences. It is clearly demonstrated in the art that numerous alternative methods of machine learning, algorithm designs and parameter sets can be applied with varying levels of success. Hence the accurately labeled current trace data provided by a calibrator as described can be used to create or refine the algorithm used to identify TARGET and STANDARD peptides, as well as test its accuracy (as in the preceding embodiment).
In some embodiments collections of constructs each comprising only one or a few peptide molecules are provided, and in such cases each peptide’s true identity is determined from highly reliable barcode labels in the constructs.
In some embodiments, a plurality of calibrator constructs is provided for the calibration and/or optimization of detection of a plurality of TARGETS and cognate STANDARDS.
In some embodiments, a peptide digest prepared from a complex protein sample (e.g., a plasma sample) can be processed according to the invention to create a large collection of different TARGET constructs (e.g., incorporating a TARGET code). A second aliquot of the same protein sample can be processed according to the invention to create a large collection of constructs labeled with a different code (e.g., incorporating a STANDARD code). Any pair of distinct codes can be used instead of TARGET and STANDARD codes for this special purpose. The two preparations (the same peptides labeled with different tags) can be mixed in a specified ratio (e.g., 1 part of the first mixture and 10 parts of the second) to provide a calibrator in which two construct versions (e.g., labeled with TARGET and STANDARD tags) of many different peptides can be detected. Observation of the expected ratio (1: 10 in this example) for each detected peptide confirms the linearity of a single molecule detection system. In some embodiments, one or more calibrators are analyzed separately from experimental samples (e.g., before or after a run of experimental samples), and the results used for the purposes described above.
In some embodiments, one or more calibrators are mixed with an experimental sample to provide calibration within a nanopore run. In this case it is advantageous to definitively identify the calibrator constructs by incorporating specific sequence barcode labels in addition to a set of homogenous peptide molecules.
In some embodiments, calibrator construct nanopore current traces are evaluated for individual nanopores among a plurality of available nanopores in a device and used to deactivate or otherwise suppress data from such nanopores. In some embodiments, calibrator construct current traces are used to optimize the machine learning algorithms used to analyze the data from each individual nanopore.
In some embodiments, algorithm parameters are adjusted based on evaluation of calibrator traces to provide a specified level of certainty of peptide assignment. For example, when few copies of a TARGET peptide are detected, it can be preferable to ensure that these few copies are correctly identified and are not incorrectly assigned STANDARD (or other) molecules. The current trace interpretive algorithm can be modified to count only high confidence identifications while assigning lower confidence identifications to an “unassigned” category. This modification increases the certainty that these TARGET peptide molecules are correct identifications, at the cost of reducing the number of identifications, and hence increasing the CV of the measurement. It will be clear to those skilled in the art that tradeoffs between the accuracy of nanopore trace identification on the one hand and overall TARGET: STANDARD ratio precision on the other result from such adjustments, and that these must be taken into account in the overall optimization of assay performance.
In some embodiments the false positive and negative detection rates of TARGET peptides and STANDARDS are used in statistical calculations to provide improved estimates of the precision of the respective molecule counts and the precision of the ratio between TARGET and STANDARD counts. Those knowledgeable in the art will understand that a variety of advanced statistical methods exist for the incorporation of multiple measures of uncertainty and error into an overall estimate of precision. A fully elaborated model of assay precision is of considerable importance in establishing the clinical utility of assays according to the invention
In some embodiments, one or more well-characterized samples similar to or representative of the experimental samples to be analyzed are used as “controls”.
7.10 STOICHIOMETRIC FLATTENING AND ITS IMPACT ON THE EFFICIENCY OF PEPTIDE QUANTITATION BY COUNTING.
In some embodiments, quantitative results provided by use of the invention include the ratio of the number of molecules identified and counted as TARGET peptide constructs to the number of molecules identified and counted as STANDARD constructs (STANDARD being present in known or at least consistent amount across a set of samples being analyzed). The precision afforded by counting molecules, where counts are distributed in an approximately Gaussian manner, is estimated to be governed by the ratio of the square root of the number of counts to the number of counts (a ratio often referred to as the Coefficient of Variation, or CV). In various applications, CVs of 20% or less (for research assays), of 5% or less (for critical diagnostic assays) or 2-3% or less (for sensitive longitudinal tracking of biomarker levels) are desired. In such a model the number of counts theoretically needed to achieve a target CV is (1/CV)2: thus, CVs of 20%, 5% and 2% would require respectively 25, 400 and 2,500 molecule counts for a single TARGET or STANDARD construct.
The CV of the ratio of the number of molecules identified and counted as TARGET peptide to the number of molecules identified and counted as STANDARD is more complicated, but is dominated by the count with the higher CV (i.e., the peptide with the fewer molecules counted, and hence the lower precision). In quantitative applications, e.g., measuring clinical protein biomarkers in blood, the amount of STANDARD added to a sample as internal standard may be set approximately equal to the average level of the TARGET peptide observed in a set of samples from a relevant human population (i.e., at the population average level, such that the averaged TARGET: STANDARD ratio is 1.0). While different biomarkers exhibit different levels of quantitative variation among individuals (59), in most cases normal variation occurs within a range of 10-fold below and 10-fold above the population average (i.e., from 10 to 1,000 units for a biomarker whose average level is 100 units in a relevant population; (59)), though some biomarkers show less variation (e.g, occur within a range of 0.5-2.0-fold from the mean) and a few others can change by >1, 000-fold (e.g., CRP in cases of extreme inflammation). In some samples, more TARGET peptide molecules will be counted than STANDARD, and in some samples the reverse. The CV of the ratio will be dominated by the CV of the variable with the fewer counts, since the variable with more counts will have a smaller CV. As a general estimate we can expect the CV of the ratio to be no more than 1.5 times the larger of the TARGET peptide and STANDARD CV’s. For example, a TARGET peptide whose ratio to its STANDARD is to be measured with a CV of 3% or better, needs to produce 3%/l .5 = 2% CV for the lower of the two counts, which in turn requires 2,500 molecules of the less frequent peptide version (i.e., TARGET or STANDARD) to be counted (an estimate based on counting statistics as discussed above). If the more frequent version of a peptide is lOx as abundant as the less frequent (at the edge of the intended dynamic range of the assay), then 25,000 molecules of the more frequent version would need to be counted before 2,500 molecules of the less frequent are detected. If the more frequent version of a peptide is lOOx as abundant as the less frequent (at the edge of the intended dynamic range of the assay), then 250,000 molecules of the more frequent version would need to be counted. The total number of molecules of the TARGET peptide to be counted (TARGET + STANDARD) would be a maximum of 25,250 molecules in the first case (lOx range) and 252,500 molecules in the second case (lOOx range). This requirement for counting large numbers of peptide molecules, and in particular larger numbers to achieve better precision (lower CV’s) and wider dynamic range, provides strong motivation to optimize the design of assays using the stoichiometric flattening method of the invention.
In clinically important sample types such as whole blood and blood plasma, proteins of diagnostic interest can vary in abundance by more than 1010 (10 billion-fold (1, 8) a range that significantly exceeds the practical dynamic range of available detection technologies, including mass spectrometry and molecule counting. However, given that peptide quantitation according to the invention makes use of counts of TARGET peptides compared to counts of STANDARD internal standard molecules (e.g., as a ratio between the two), it is not necessary to capture all, or even a large fraction, of the molecules of a high-abundance peptide in order to accurately measure its concentration in a sample. Thus in some embodiments the invention instead provides for adjustment of the amount of each peptide TARGET+ STANDARD pair captured, e.g, by adjusting the amount of each peptide’s specific enrichment reagent (e.g., amount of cognate BINDER) or the circumstances of enrichment (e.g., duration of binding and washing steps, solution conditions, etc.) so as to capture only the amount of the cognate TARGET peptide and STANDARD pair that is needed to allow counting the minimum required number of peptide molecules (specifically the minimum number required to deliver the desired measurement precision for the less abundant of the Target and STANDARD molecules: the more abundant of the two will by definition have more counts and thus a better precision, so that the ratio of Target and STANDARD measurements will have a precision similar to that of the less abundant molecule alone). By adjusting the amounts of each peptide recovered, stoichiometric differences between different TARGET peptides are reduced, effectively “flattening” the stoichiometry across the set of TARGET peptides. This differential adjustment of enrichment recoveries for different TARGET peptide constructs requires co- enrichment of cognate STANDARDS - otherwise the desired quantitative information on TARGET abundance is lost.
Flattening the stoichiometric differences (i.e., reducing the abundance differences) between peptides that are present at very different abundance levels in the original sample digest, the total number of molecules that need to be counted in the assay, and hence the cost and time involved, is minimized.
In some embodiments, stoichiometric flattening enriches low-abundance peptides, and specifically “de-enriches” or depletes selected high-abundance peptides to a relative abundance level specified in the assay design (typically much less than 100% but greater than 0% of the initial amount), and is therefore distinct from the general concept of “enriching” TARGET peptides as a means of increasing assay sensitivity by capturing all of a rare analyte from a large sample. In conventional methodologies, low abundance analytes are enriched as efficiently as possible so as to allow measurement of all the analyte present in a sample: i.e., the enrichment target is 100% recovery, and different assay targets are not generally enriched to different levels in a multiplex assay design. It is important to note that the ability to differentially enrich different TARGET peptides as practiced in the invention relies on the presence of STANDARDS to preserve information on the quantity of the TARGET peptide in the sample: i.e., the constancy of the ratio between TARGET and STANDARD amounts before, and after, enrichment. Stoichiometric flattening is not practicable in situations where an internal standard is not included for each analyte, nor in situations where each analyte is not specifically enriched by a different BINDER whose capture amount can be adjusted.
In some embodiments of a multi-analyte panel of TARGET peptides and their respective STANDARDS designed to measure multiple proteins, the amounts of the respective BINDERS are adjusted so as to deliver approximately equal numbers of TARGET plus STANDARD peptide molecules for each TARGET peptide, assuming that STANDARDS are added to the sample at levels approximately equal to the expected level of the cognate TARGET peptide. In some embodiments the process of adjusting BINDER amounts is carried out in a series of steps, beginning with a combination of the BINDERS in certain amounts (which may for convenience initially be equal amounts), measuring the numbers of each TARGET peptide (or STANDARD) molecule detected after enrichment and then reducing the amount of BINDERS for which a large number of peptides were detected and/or increasing the amount of BINDERS for which few peptide molecules were counted. Applied in an iterative approach, this empirical method allows progressive adjustments of the relative amounts of the BINDERS towards the goal of similar peptide counts for each TARGET peptide and STANDARD pair. It will be evident to those skilled in the art that such a process can be terminated at any point where the numbers of peptides counted for the panel components meet the needs of a specific application, or it can be continued to progressively improve performance towards an optimum defined by any of a range of well-known statistical measures of overall precision and accuracy. This method of tuning is fully empirical and independent of any prior knowledge of the relative kinetic properties of the various BINDERS, of the relative abundances of the target proteins in samples of interest, or the details of data analysis (e.g., the probabilities of errors in identification of specific peptides by the sequencing platform). Once the relative amounts of the various BINDERS are established for a panel, the recipe can be locked down as a reproducible product until changes in one or more BINDERS (e.g., development of different BINDER reagents), STANDARD amounts, or panel composition are required.
In some embodiments, two or more stages of BINDER capture are used: a first capture to collect TARGET and STANDARD peptides from a standardized sample digest (i.e., having hundreds of thousands of different peptides), and one or more secondary BINDER capture steps to further purify or concentrate these relatively pure peptides, or transfer them to a different immobilized format (e.g., a smaller number of larger beads). In some embodiments, the process of stoichiometric flattening as described is carried out by adjustments of relative amounts of different BINDER in the first capture stage, or else in the second capture stage, or in multiple capture stages. In some embodiments, a first BINDER capture stage is used to collect amounts of the TARGET and STANDARD peptides from a complex standardized digest, and may not, because of variations in the character of different samples, yield the desired level of stoichiometric flattening (i.e., a roughly equal amounts of all the peptides); however adjustments of BINDER amounts or properties in a second stage capture, which begins with a relatively pure peptide sample can provide a much flatter stoichiometry and thus better detection efficiency.
7.10.1 Computing the benefit of stoichiometric flattening
Figure 33 illustrates the value of stoichiometric flattening in reducing the number of molecules that must be counted to ensure precise measurement of peptides present in a sample in widely disparate amounts. In a small sample of whole human blood (e.g., 10 uL), the most abundant protein (hemoglobin derived from red blood cells; Hb) is typically present at a level of 33,000,000 fmol (~2el6 molecules), while soluble transferrin receptor (sTfR) is present at a level of 37 fmol (~2.2el0 molecules) - a difference of almost 1,000,000-fold. Such a sample, which would be considered small in current clinical laboratories, contains many more molecules than must be counted to achieve precise quantitation (low CV’s) using single molecule counting as disclosed in the invention. If an embodiment requires a CV of ~5%, it is necessary to count only -400 molecules of the less abundant of the TARGET and STANDARD pair of peptides representing each of the protein targets (according to simple counting statistics; the square root of 400 divided by 400 being 20/400 = 5%). In the case of sTfR, the estimated maximum level expected in a sample is slightly more than the population average, and so the combined number of counts needed for sTfR TARGET peptide and STANDARD is estimated to be 913 total molecules. Without using stoichiometric flattening, -822,000,000 TARGET and STANDARD peptide molecules corresponding to HbA would be counted during the time it took to accumulate the 913 molecules of sTfR peptide (almost a million molecules for each sTfR molecule counted). In this case the single molecule counter is spending 99.9999% of its time counting HbA while waiting for sTfR counts to accumulate, as a result of the vast difference in their relative abundances in the sample. This level of inefficiency would render the use of single molecule counting for quantitating these molecules impractical.
However, using stoichiometric flattening as provided in the invention, only a tiny fraction of the HbA TARGET and STANDARD peptide molecules are captured by a correspondingly tiny amount of the cognate BINDER, while a much larger fraction of the sTfR TARGET and STANDARD peptide molecules are captured by a correspondingly larger amount of their cognate BINDER (or use of a much higher affinity BINDER). As a result, while 913 molecules of sTfR peptides are counted, only 844 molecules of HbA are required (a smaller number than sTfR because the variation in the amount of HbA across a population is less), for a total of 1,757 peptide molecules.
As a result, in this example stoichiometric flattening reduces the total number of molecules to be counted from -822,000,913 to 1,757, a reduction of almost 500,000-fold while delivering the same precision. This difference makes practical what would otherwise be entirely impractical.
Figure 34 presents a more complete example in which stoichiometric flattening is used to improve measurement of a panel of 26 proteins measured in small human blood samples. As in the previous case, the lowest abundance protein is sTfR and the highest is HbA, with a series of clinically relevant protein biomarkers occurring at various abundance levels in between. Without stoichiometric flattening, a total of approximately 1,000,000,000 peptide molecules must be counted while 913 counts of sTfR are accumulated (as in the previous example). However, using stoichiometric flattening to equalize the amounts of the peptides (e.g., by adjusting the amounts of their respective BINDER ’s used in the enrichment step), and taking into account the greater range of variation of some proteins compared to others in the panel, a total of only -64,000 peptide molecules would need to be counted to provide 5% CV measurements of all, resulting in an efficiency improvement of -16, 000-fold overall.
7.10.2 Stoichiometric flattening in nanopore sequencing
In some embodiments that employ nanopore sequencing for peptide detection (an essentially serial technique), the increased efficiency provided by stoichiometric flattening translates directly into a dramatic reduction in the time and number of pores required to analyze a sample, which in this case is the time required to accumulate the required numbers of counts. In such systems multiple samples can be analyzed together (i.e., multiplexed) using some form of molecular barcoding technology (as used for example in genomic sequencing on Oxford Nanopore platforms), and given sufficient pore throughput capacity, this enables more samples in a given time and thus higher overall throughput.
In some embodiments, the capability of a nanopore reader to identify portions of a sequence early during the read operation, and eject a molecule whose sequence is not of interest (or is surplus to requirements in the current context) can be used to further reduce the stoichiometric differences between high and low abundance peptide reads. This approach, termed “computational enrichment of target sequences” or “Read Until” (74) can provide modest (e.g., max 10-fold) improvements in yield of target sequences in a DNA context, but its value depends on having long reads in order to have the opportunity of “rejecting” a significant amount of sequenceable material. In the context of the invention, this approach would yield little or no benefit for constructs carrying one or a few peptide molecules. However, in embodiments that employ lengthy concatenated constructs carrying many copies of the same TARGET and STANDARD peptide pair, the use of computational enrichment can allow rejection of large numbers of peptide molecules of which sufficient numbers (defined by the statistical needs of the assay) have already been read.
7.10.3 Stoichiometric flattening in degradative sequencing applications
In some embodiments employing peptide sequencing by stepwise peptide disassembly while recording amino acids one after another (i.e., degradative sequencing methods), the number of peptides counted may be determined by the capacity of the detection system (e.g., millions of peptide sites on arrays used by Quantum-Si, Encodia or Google platforms), and the time required for analysis of an initially fixed number of molecules is determined by the number of amino acids that must be serially decoded to accurately identify the TARGET and STANDARD peptides for counting. In degradative sequencing methods that records the peptide sequence as DNA (Encodia or Google), the analysis further requires a DNA sequencing step: however current NGS sequencing platforms provide such an enormous capacity that this step is probably not a significant throughput limitation. Example sets of TARGET and STANDARD peptide sequences designed to measure a panel of proteins can be constructed so as to allow recognition of each peptide by sequencing only 3 or 4 amino acids from either terminus. In some embodiments it will be advantageous to sequence further (more amino acids) in order to decrease potential for misidentification and/or provide for recognition of any unwanted peptides with sequences similar to, but different from, the expected TARGET and STANDARD sequences. In any case, using stepwise degradative sequencing technology the time required may be somewhat adjustable (e.g., by adjusting the number of amino acids required to be read) but the overall number of peptide molecules being processed is determined by the geometry of the detection system itself. For this reason, stoichiometric flattening is key to ensuring that there is sufficient number capacity to provide acceptable precision in the measurement of a series of target proteins.
7.10.4 Stoichiometric flattening in single molecule imaging detectors
Current single molecule optical imaging detection platforms have on the order of IO10 attachment sites (54), all of which can theoretically be loaded with single molecules and imaged. While IO10 molecules might allow detection of a few copies of a peptide from a low abundance protein, that protein could not be quantitated accurately (too few counts), while a high abundance protein (like plasma albumin) might contribute O.5xlO10 of the total molecules imaged (an enormous overburden of no measurement value). In embodiments employing stoichiometric flattening according to the invention, however, accurate measurements can be made with a few thousand molecules per TARGET, allowing tens of thousands of samples to be analyzed for a hundred or more proteins in a single run, with correspondingly enormous decreases in cost per measurement and capacity to run large sample sets.
7.11 MULTI-ANALYTE PANEL EMBODIMENTS.
Since the sequence-sensitive single molecule detection approach of the invention can distinguish between different peptide sequences, it can be used to measure multiple different TARGET peptides and their respective STANDARDS, potentially representing multiple different sample proteins, at the same time in the same sample. As has been demonstrated using the SISCAPA method (14), multiple specific affinity reagents (e.g., BINDERs) can be used together (e.g., immobilized on magnetic beads) to enrich their cognate peptide sequences from a complex sample digest without significant interference between peptides. Figure 35 illustrates a multiplex panel embodiment in which 10 peptides, along with their respective STANDARD peptides designed according to the invention are measured by nanopore sequencing in the form of concatamers and counted to provide quantitative measurements of the presence in a clinical sample of SARS-CoV-2 NCAP protein, antibodies to SARS-CoV-2 NCAP and Spike proteins, levels of three host inflammation markers (CRP, LPSBP and Hp), and the RNA genome of SARS-CoV-2. This collection of analytes, determined by a single nanopore sequencing run, provides broad coverage of COVID-19 infection and patient response.
8 DISCUSSION OF SOME NOVEL FEATURES OF THE INVENTION.
8.1 DESIGN OF ORIENTED PEPTIDE-OLIGONUCLEOTIDE CONSTRUCTS
The invention provides novel components and workflows for modifying proteolytic peptides to create of heterogenous molecular constructs suitable for single molecule detection using several different detector technologies.
8.2 DESIGN OF INTERNAL STANDARDS FOR QUANTITATION: STANDARD CONSTRUCTS
The invention provides novel internal standards (STANDARD constructs) and methods of preparing these, as well as barcoding methods enabling multiplex analysis of multiple samples.
Some embodiments make use of an internal standard (STANDARD) for quantitation in a single molecule sequencing system. The STANDARD is designed and selected to allow optimal differential single molecule detection compared to its TARGET peptide (and other peptides likely to be present in an enriched sample) while minimizing any differences in binding of Target and STANDARD by a specific affinity reagent used to enrich these peptides from a complex sample (e.g., a sample digest) or in subsequent reactions involved in assembly of detectable constructs. Use of a cognate- sequence internal standard is novel for quantitation in a sequence-sensitive detection system, and provides the means to achieve precise quantitation of a TARGET peptide in a sample digest. 8.3 ENRICHMENT OF CONSTRUCTS USING BINDERS
The invention provides novel internal standards (STANDARD constructs) and methods of preparing these, as well as barcoding methods enabling multiplex analysis of multiple samples
8.4 PEPTIDE MODIFICATION WHILE A PEPTIDE IS CAPTURED BY A BINDER.
Some embodiments make use of linkages of polymers or other functional groups to either n-terminus or c-terminus (or both) of a peptide, and may involve chemical reactions with peptides while the peptide is bound to a specific enrichment reagent (e.g., an anti-peptide antibody) which may in turn be bound to a solid support (e.g., a magnetic bead or column). The ability to carry out assembly of a heteropolymer or modified peptide construct while the peptide is held non-covalently on a solid support is novel and provides an immense simplification compared to conventional methods of construct assembly in stages, which usually require separative steps between stages.
8.5 STOICHIOMETRIC FLATTENING TO DRAMATICALLY IMPROVE SINGLE MOLECULE DETECTION EFFCIENCY
Some embodiments make use of specific enrichment to recover a relatively larger proportion of a low abundance TARGET peptide and a relatively lower proportion of a higher abundance TARGET peptide, thereby reducing the abundance difference between them, while preserving information as to their respective abundances in a sample through the inclusion of internal standards (STANDARD). This stoichiometric flattening, which can be achieved by tuning the amounts of the specific affinity reagents used to capture different TARGET peptides at one or more stages, can be used to compress the dynamic range required for peptide detection, thereby enabling detection of TARGET peptides having very large sample abundance differences in the sample using a detector with smaller dynamic range (e.g., a detector capable of counting molecules, and whose precision is thus governed by counting statistics requiring detection of some minimum number of molecules in the lowest abundance of a series of peptides to be detected).
8.6 POTENTIAL FOR ULTIMATE ASSAY SENSITIVITY.
Efficient enrichment of TARGET peptides from complex samples, combined with direct detection of single molecules, offers a path to ultimate sensitivity (i.e., detecting and counting all the analyte molecules present) using small, inexpensive equipment and without sacrificing specificity.
9 EXAMPLES
9.1 EXAMPLE 1 : ASSEMBLY OF AN IN-LINE NANOPORE- SEQUENCEABLE OLIGO:PEPTIDE:OLIGO CONSTRUCT.
A sequenceable construct was assembled by insertion of a tryptic target peptide sequence derived from C-reactive protein (proteotypic peptide ESDTSYVSLK, having a nominal length of 4nm) into a nanopore sequenceable “loop insertion” VEHICLE (Figure 36 and Figure 37).
The method makes use of three types of linkage reactions: 1) NHS reaction with the two peptide amino groups (n-terminal and lysine epsilon-amino) to introduce BCN (bicyclo[6.1.0]nonyne) click groups into the peptide; 2) click chemical reaction between BCN on peptide ends with azide (incorporated into the termini of synthetic oligos A and B); and 3) enzymatic ligation of oligos assembled using complementary template (oligos D, E, F).
BCN click groups were added to the amino groups at both ends of the peptide by mixing 27uL of lOmM synthetic peptide ESDTSYVSLK (Vivitide), 33uL of IM HEPES buffer (pH 8.5) and 20uL of 40mM BCN-NHS ester (BroadPharm) in acetonitrile, followed by incubation at room temperature for Ihr. This peptide conjugate (BCN-ESDTSYVSLK-BCN)was diluted to lOOuM peptide in a final concentration of 12.5mM HEPES pH 8.5.
A non-peptidic control insert (endo-BCN-PEG2-NHS ester, BroadPharm) was dissolved in DMSO at a final concentration of lOmM.
A set of 6 oligonucleotides was designed and synthesized (Integrated DNA Technologies), combined in equal amounts (16.7 uM final concentration), heated for 5min at 70C and cooled to create a double-stranded VEHICLE design as shown in Figure 36. Oligo A comprises a sequence serving as a TARGET tag. Oligos A and B comprise azide click groups at their 3’ and 5’ termini, respectively, and are aligned by hybridization to complementary template oligo E to create a gap of about 2.5nm between the azide groups. These 3 oligos comprise a structure into which BCN-conjugated double-amino peptides can be inserted by click reaction with the two nearby azide groups. The proximity of the azide groups favors reaction with the two BCN groups of one peptide molecule, instead of coupling to two different peptide molecules. Oligo C, which comprises a sequence serving as a Sample lD tag, is aligned at the 3’ end of oligo B by hybridization with complementary template oligo F, whose 5’ (right) end is complementary to the Sample lD tag, and whose 3’ (left) end is complementary to the universal construct oligo B. Complementary template oligo D hybridizes with oligo A to provide a dA overhand site compatible with a commercial Y-adapter for nanopore sequencing. Oligos A and C comprise 5’ phosphate groups to facilitate ligation by T4 ligase.
A T/S ID 5'- /5Phos/CCTGAACCTATCCAGTGAGATAACACACAGGC/iAzide
B Loop follower 5'- /iAzide/ACACAGGCAGCCTACTATGCACCTCATGGAAT
C Sample lD 5'- /5Phos/CAGTTCCACCGTATAT
D Y -adapter align 5' ACTGGATAGGTTCAGGA
E Loop template 5'- AGTAGGCTGCCTGTGTTTTTTTTTGCCTGTGTGTTATCTC
F Sample lD linker 5'- ATATACGGTGGAACTGATTCCATGAGGTGCAT
Insertion experiments were performed by combining lOuL of the double-stranded oligo construct with three test inserts (defined below), incubation for 90min at room temperature, and dilution to achieve a final construct concentration of 50fmol construct in 25uL:
Figure imgf000190_0001
Each construct sample (30uL) was mixed with 5 uL T4 ligase (New England Biolabs), 12.5uL ligation buffer, and 2.5 sequencing adapter mix (AMX-F, Oxford Nanopore) and allowed to react for lOmin to ligate oligoA to the Y-adapter, and oligo B to oligo C. These samples were purified by binding to AMPure XP beads, washed twice in Short Fragment Buffer and eluted in 7uL of Elution buffer (Oxford Nanopore). Each sample was finally mixed with 15uL Sequencing buffer II and lOuL Loading beads II (Oxford Nanopore) immediately prior to loading on a separate Flongle chip for sequencing on an MinlON device (Oxford Nanopore). The construct design and experimental reads (basecalled sequences from nanopore traces) from the experiment are shown in Figure 36. In the design sequences of oligos A, B and E are underlined for clarity. Downward pointing arrows indicate sites of oligo ligation. In the experimental reads, regions that are identical with the design sequence are underlined.
Sample 135, which was prepared without a BCN-activated insert, generated sequence covering the 3’ end of the y-adapter and the 5’ portion of oligo A, confirming the effective ligation of the adapter and oligo A, as expected. Having no insert to connect the two azide groups, oligos B and C are missing from the sequenced construct.
Sample 144, which was prepared with a non-peptidic BCN-C2-BCN polyethylene glycol control insert, generated sequence covering the 3 ’ end of the y-adapter and the 5 ’ portion of oligo A, and extensive sequence covering oligos B and C, confirming the effective ligation of the adapter and oligo A, and oligo B with oligo C, as expected. The fact that oligos B and C are present in the construct demonstrated that the two azide groups were linked by reaction with the BCN-C2-BCN insert, used here at high concentration due to its very limited solubility.
Sample 146, which was prepared with peptide insert BCN-ESDTSYVSLK-BCN, generated sequence covering the 3’ end of the y-adapter and the 5’ portion of oligo A, and extensive sequence covering oligos B and C, confirming the effective ligation of the adapter and oligo A, and oligo B with oligo C, as expected. The fact that oligos B and C are present in the construct demonstrated that the two azide groups were linked by reaction with the BCN- ESDTSYVSLK-BCN insert, thereby confirming the derivatization of the peptide with two BCN and its reaction with the oligo azide groups to create a complete oligo-peptide-construct capable of passing through a nanopore and generating correct sequence data from all oligo components.
Figure 37 presents nanopore traces obtained from single molecules present in samples 135, 144 and 146. As is typical in nanopore current traces, distinctive features of the traces are recognizable but do not coincide perfectly in time (passage of molecules through a pore does not proceed perfectly smoothly). The trace produced by the molecule from Samplel35 includes only the first half of the traces obtained from samples 144 and 146, as expected since only the adapter and oligo A are present in Sample 135. Samples 144 and 146 produce longer traces that are almost identical (as expected since they cover the adapter, and oligos A, B and C). However, the traces differ significantly in the central region where the insert is expected to appear, and include, for the Sample 146 peptide insert, extreme current swings indicative of abrupt opening (current increasing) and closing (current decreasing) of the nanopore associated with passage of peptide ESDTSYVSLK.
These results confirm successful assembly of a nanopore-sequencable oligonucleotide VEHICLE incorporating an in-line peptide segment.
9.2 EXAMPLE 2: ASSEMBLY OF AN ORIENTED OLIGO-PEPTIDE-OLIGO CONSTRUCT USING 2-STEP DIGESTION
A 2-step digestion method (shown in Figure 6) is used to discriminate between peptide n-terminal and c-terminal linkage of oligos useful in nanopore sequencing, both to facilitate transport of the peptide through a nanopore and to provide contextual information on the peptide molecule (its status as TARGET or STANDARD, and optionally what BINDER captured it during enrichment, and what sample the peptide was derived from). The steps of the example are shown in Figure 12. The final construct produced in this example passes through a nanopore starting at the 5’ end, producing nanopore current traces (squiggles) that are decoded to provide sequences of the Adapter, a Sample lD, a BINDER ID, the TARGET/STANDARD identifying flag sequence, and finally a squiggle characterizing a peptide linked in-line with the oligos, with its c-terminus in the 5’ direction, and a following DNA Trailer segment. Modifications of this workflow can omit the Sample lD if multiple samples are not being multiplexed together, and can omit the BINDER-ID if this information is not required.
Sample proteins are digested with the enzyme Lys-C, and the amino groups of the resulting peptides modified by reaction with NHS-BCN to introduce BCN click groups at both the n-terminus and the side chain amino group of peptides having c-terminal lysine residues (i.e., most peptides resulting from Lys-C digestion). After this reaction has reached completion, amine-modified magnetic beads are added in sufficient quantity to react with and sequester any excess NHS-BCN reagent, and subsequently removed from the digest. Trypsin is then added to cleave peptides with internal Arg residues, resulting in new free amino groups on the n-termini thus created. As shown schematically in Figure 12A, sample peptides with c-terminal BCN (attached to the c-terminal lysine side chain amino group) are reacted with an oligo nucleotide comprising a sequence defined as the label (or flag) for sample-derived peptides (“T_ID”) and having a 5’ phosphate group and a 3’ azide functionality. The BCN-peptide and azide-oligo then react to form the peptide:oligo construct shown in Figure 12B. A previously made synthetic STANDARD construct comprising the same peptide sequence joined to an oligo comprising a sequence defined as the label (or flag) for STANDARD peptides (“S ID”) is added in known amount to act as an internal reference for quantitation of the TARGET, creating a standardized sample. The distinct T_ID and S ID sequences comprise a region of identical sequence capable of hybridizing with a region of the Assembler molecule shown below. Cognate TARGET and STANDARD peptide:oligo constructs are enriched by binding to a cognate BINDER attached to magnetic beads (Figure 12C). The BINDER has an attached oligo sequence “B tag” identifying the specificity of the BINDER (i.e., which peptide sequence it captures). Unbound peptides are washed away. Multiple TARGET peptides together with their cognate STANDARDS are bound and enriched from the standardized sample by a combination of their respective cognate BINDERS.
Three partially double-stranded DNA constructs are added to the BINDER-captured peptide:oligo constructs: 1) an “Assembler” bridge strand comprising a sequence complementary to the BINDER B tag and a second sequence complementary to an oligo having a sequence that identifies the BINDER specificity; 2) an oligo comprising a sequence that identifies the sample from which the peptides are derived (to enable multiplexing of samples for nanopore readout) and a short complementary sequence “S’”; and 3) a conventional nanopore sequencing Y-adaptor comprising a DNA motor protein. The Assembler hybridizes to i) the B tag of the BINDER having the complementary sequence; ii) the sequence shared by T_ID and S ID sequences; and iii) a universal subsequence shared by the set of Sample lD sample barcodes. The Sample lD barcodes further comprise an A/T overhang matching the requirements for ligation to the sequencing Y-adapter. Each of the S ID, T_ID, BINDER ID, and Sample lD oligos comprise 5’ phosphate groups. Once the oligos are assembled on the Assembler based on hybridization of complementary sequences (Figure 12E), a DNA ligase (e.g., T4 ligase) is used to covalently link the Adapter, Sample lD, BINDER ID, T/S ID and peptide molecules into a continuous linear construct (ligation is indicated in the figure by >||<). In Figures 12E-G the presence of a mixture of TARGET and STANDARD (T_ID and S ID) versions of a peptide present on the enriching BINDER is indicated as “T/S ID”).
The remaining free amino group at the peptide n-terminus is reacted with NHS-BCN, followed by removal of excess NHS-BCN using amine-modified magnetic beads (as carried out previously), and finally linkage to a further “Trailer” oligo. The Trailer functions with the DNA motor to regulate passage of the preceding peptide through the nanopore and allow recording of a “squiggle” trace capable of identifying the peptide among the set of expected TARGETS.
Figure 13 A shows the sequences of oligo components in a specific implementation of the workflow used in this example up to, but not including, ligation of the nanopore Y-Adapter. Figure 13B shows the construct after dissociation of the peptide from the BINDER and the B tag from the Assembler, at which point the BINDER beads are removed and the construct is ligated with a nanopore adapter using an A/T overhang (ligation indicated in the figure by >ll<).
9.3 EXAMPLE 3: PREPARATION OF AN ENRICHED STANDARDIZED PEPTIDE LIBRARY FOR SINGLE MOLECULE IMAGING DETECTION
In this example, a simplified protocol is used to prepare TARGET peptides EGYYGYTGAFR from serum transferrin (Tf) and GFVEPDHYVVVGAQR from soluble transferring receptor (sTfR) for detection by super-resolution single molecule microscopy. A set of samples of lOuL human plasma (or equivalently 20uL of whole blood) were digested with trypsin according to a published automated protocol (13).
The selected peptides have a c-terminal Arginine residue and thus have a single amino group at their n-terminus available for reaction with an NHS-ester reagent. The tryptic sample digest is reacted with a quantity of commercially-available endo-BCN-PEG3-NHS ester (BroadPharm) equal to a 1.5-fold molar excess over the peptide amino groups in the digest. This reagent was dissolved in acetonitrile to overcome its limited solubility in aqueous solvents and added to the digest. After completion of the NHS reaction (60min at room temperature) a 2-fold excess of TARGET ID oligonucleotide tag comprising a 3’ azide group and a 5’ phosphate was added and allowed to react overnight. These reactions can be executed sequentially as described, or simultaneously as indicated in Figure 38 A.
STANDARD constructs of the two target peptides are prepared separately using synthetic peptides having the same sequences as the TARGETS, and have the same structure as their cognate TARGET constructs generated in the foregoing steps, with the exception that the TARGET ID (or T_ID) flag is replaced by the STANDARD ID (S ID) flag oligo sequence. In this example, the TARGET and STANDARD tags comprise a 5’ region in which their sequences are the same, and a 3’ region in which they have different sequences capable of distinguishing one from another. STANDARD constructs are added to the sample digests in molar amounts approximately equal to the amounts of the respective target peptides in a typical plasma sample (in this case the same volume of STANDARD is added to each sample digest). Following addition of the STANDARD constructs (Figure 38B), the digest samples constitute standardized digests (with respect to these two peptides).
Specific BINDERS (rabbit anti-peptide antibodies) were generated to each of these peptides and covalently immobilized on magnetic beads by reaction with tosyl Dynabeads (ThermoFisher) according to the manufacturer’s instructions, at an approximate load of lug antibody per 5ul of bead suspension. 5uL of beads with the sTfR peptide BINDER and luL of beads with the Tf BINDER are added to the standardized digests and incubated at room temperature for 30min, after which the beads are collected by a magnets placed aside the digest vessels, the digests are removed, and the beads are washed in buffer (PBS) twice using an automated liquid handling system as described (4). The resulting samples of TARGET and STANDARD constructs for each of the two peptides (Figure 38C) constitute enriched sample digests.
Constructs present in the different sample digests are labeled by addition of Sample lD oligos having a 3’ region of fixed sequence (the same sequence for all Sample lD tags) and a 5’ region of variable sequence that encodes the identity of each sample. In this example the Sample lD oligos additionally comprise a tetrazine (Tz) click group at the 5’ end.
An additional synthetic oligo is added (the “Assembler” in Figure 38C) that has a 3’ region complementary to the 3’ end of the Sample lD oligo, and a 5’ region complementary to the 5’ end present in both the TARGET and STANDARD tags (i.e., the pool of T/S ID tags). The Assembler hybridizes with both the T/S ID oligo tags and the Sample lD oligos present in each sample, aligning the Sample lD 3’ end with the 5’ phosphate at the end of the T/S ID components of the antibody -bound constructs. T4 DNA ligase (New England Biolabs) is used according to the manufacturer’s instructions to ligate the Sample lD oligos to the T/S ID oligos, creating continuous covalent constructs each comprising a peptide, a T/S ID oligo flag and a Sample lD oligo flag. These additions and reactions are carried out while the initial peptide constructs remain bound to the BINDERS on the magnetic beads, allowing the use of small reaction volumes and efficient washing between steps.
Following ligation, the beads are washed free of ligation reaction components, and the enriched constructs are eluted from the BINDERS on the beads using an acidic eluent (0.5% formic acid/ 0.03% CHAPS detergent). After removal of the beads from the eluates, the eluates are neutralized with Tris buffer, and the eluates (now labeled with Sample lD tags) are combined to create a pooled sample for analysis.
The pooled construct sample is diluted and loaded onto a clean glass slide that was previously derivatized with TCO-PEG3-triethoxysilane at low density (about 1-10 TCO groups per square micron) and passivated to reduce non-specific binding (80). Following reaction of the construct 5’-Tz click groups with the TCO click groups on the slide surface, the slide is washed, assembled into a flow cell and positioned in the light path of a fluorescence microscope equipped with total internal reflection (TIRF) optics, laser illumination and a high- efficiency camera.
A series of recognition reagents are labeled with Cy5 fluorophore: the peptide BINDERS (each capable of specifically recognizing one of the peptides), two oligos respectively complementary to the unique portions of the TARGET and STANDARD tags (distinguishing sample-derived TARGET peptides from added internal STANDARDS), and oligos complementary to the unique portions of the SAMPLE ID tags (identifying the respective samples from which each construct originated). Each labeled recognition reagent is passed over the immobilized constructs in sequence, the resulting bound Cy5 signals recorded as an image. The slide is washed with acid eluent after each antibody recognition reagent and heated to 60C after each oligo recognition reagent in order to remove all fluorescence before addition of the following recognition reagent. The sequential images are aligned and interpreted as indicated schematically in Figure 31 to provide counts of each peptide and its internal standard in each sample. From these counts, the ratios of TARGET molecules to STANDARD molecules in each sample can be calculated, and multiplied by the amounts of STANDARDS added to each sample to arrive at a measure of the amount of each peptide (and its parent protein) in each sample.
9.4 EXAMPLE 4: PREPARATION OF AN ENRICHED PEPTIDE LIBRARY FOR DEGRAD ATIVE SINGLE MOLECULE DETECTION
The method of the invention was adapted (as shown in Figure 39) to provide peptide:oligo constructs suitable for analysis in a detection platform in which peptide sequence is reverse translated into DNAby sequential recognition and removal of n-terminal amino acids (81). In this application it is essential that the peptide n-terminus is accessible for chemical modification and recognition by specific affinity reagents, and an associated DNA oligo with a free 3’ end is required to which short base sequences can be added recording the successive amino acids detected and removed from the peptide in a cyclical reverse translation process. This is achieved by modifying the structure of the TARGET/STANDARD identification code oligos (T_ID and S ID oligos) to provide for click attachment to an internal, rather than a 3’ terminal, azide-labeled base (Figure 39A and B). The peptide is thereby connected to a site 2 or more bases in from the 3’ end of the oligo (leaving the 3’ terminus accessible for addition of bases (e.g., by DNA polymerase copying a strand associated with one of a series of n- terminal amino acid recognition reagents), but not within the 5’ end that hybridizes with the Assembler.
The process of reverse translation adds bases to the 3’ end of the oligo construct (81) in the process of reading out the identity of the n-terminal amino acid as recognized by a specific BINDER. When this cyclical process is stopped (e.g., when sufficient amino acids have been decoded and bases added to unambiguously identify the TARGET sequence, or determine that the peptide is not a relevant sequence), the extended oligo is cleaved from the substrate, and any necessary adapter sequences added to allow its introduction into a conventional DNA sequencer for analysis (e.g., using sequencing by synthesis as developed and commercialized by Illumina). The peptide may be removed from the construct if it presents a barrier to copying the oligo. 10 REFERENCES
1. N. L. Anderson, The Clinical Plasma Proteome: A Survey of Clinical Assays for Proteins in Plasma and Serum. Clin. Chem. 56, 177-185 (2010).
2. A. N. Hoofnagle, M. H. Wener, The fundamental flaws of immunoassays and potential solutions using tandem mass spectrometry. J. Immunol. Methods 347, 3-11 (2009).
3. N. L. Anderson, N. G. Anderson, L. R. Haines, D. B. Hardie, T. W. Pearson, Mass spectrometric quantitation of peptides and proteins using Stable Isotope Standards and Capture by Anti -Peptide Antibodies (SISCAP A). J. Proteome Res. 3, 235-244 (2004).
4. M. Razavi, N. Leigh Anderson, M. E. Pope, R. Yip, T. W. Pearson, High precision quantification of human plasma proteins using the automated SISCAPA Immuno-MS workflow. New Biotechnol. 33, 494-502 (2016).
5. C. V. Cheng, DISCREPANCIES BETWEEN SISCAPA LC-MS/MS AND ROCHE COBAS e601 THYROGLOBULIN REVEAL UNEXPECTEDLY HIGH RATE OF HETEROPHILE ANTIBODY INTERFERENCE IN IMMUNOASSAY. , 1.
6. A. N. Hoofnagle, M. Y. Roth, Improving the Measurement of Serum Thyroglobulin With Mass Spectrometry. J. Clin. Endocrinol. Metab. 98, 1343-1352 (2013).
7. K. K. Mangalaparthi, S. Chavan, A. K. Madugundu, S. Renuse, P. M. Vanderboom, A. D. Maus, J. Kemp, B. R. Kipp, S. K. Grebe, R. J. Singh, A. Pandey, A SISCAP A-based approach for detection of SARS-CoV-2 viral antigens from clinical samples. Clin. Proteomics 18, 25 (2021).
8. N. L. Anderson, N. G. Anderson, The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics MCP 1, 845-867 (2002). 9. L. Anderson, M. Razavi, M. E. Pope, R. Yip, L. Cameron, A. Bassini-Cameron, T. W. Pearson, Precision multiparameter tracking of inflammation on timescales of hours to years using serial dried blood spots. Bioanalysis 12, 937-955 (2020).
10. J. O. Becker, A. N. Hoofnagle, Replacing immunoassays with tryptic digestion-peptide immunoaffinity enrichment and LC-MS/MS. Bioanalysis 4, 281-290 (2012).
U. S. Gustafsdottir, E. Schallmeiner, S. Fredriksson, M. Gullberg, O. Soderberg, M. Jarvius, J. Jarvius, M. Howell, U. Landegren, Proximity ligation assays for sensitive and specific protein analyses. Anal. Biochem. 345, 2-9 (2005).
12. A. Joshi, M. Mayr, In Aptamers They Trust: Caveats of the SOMAscan Biomarker Discovery Platform From SomaLogic. Circulation 138, 2482-2485 (2018).
13. M. Razavi, N. Leigh Anderson, M. E. Pope, R. Yip, T. W. Pearson, High precision quantification of human plasma proteins using the automated SISCAPA Immuno-MS workflow. New Biotechnol. 33, 494-502 (2016).
14. M. Razavi, N. L. Anderson, R. Yip, M. E. Pope, T. W. Pearson, Multiplexed longitudinal measurement of protein biomarkers in DBS using an automated SISCAPA workflow. Bioanalysis 8, 1597-1609 (2016).
15. M. Razavi, L. E. Frick, W. A. Lamarr, M. E. Pope, C. A. Miller, N. L. Anderson, T. W. Pearson, High-Throughput SISCAPA Quantitation of Peptides from Human Plasma Digests by Ultrafast, Liquid Chromatography -Free Mass Spectrometry. J. Proteome Res. , 121119143208008-8 (2012).
16. J. Zecha, S. Satpathy, T. Kanashova, S. C. Avanessian, M. H. Kane, K. R. Clauser, P. Mertins, S. A. Carr, B. Kuster, TMT Labeling for the Masses: A Robust and Cost-efficient, In- solution Labeling Approach. Mol. Cell. Proteomics 18, 1468-1478 (2019).
17. N. Z. Fantoni, A. H. El-Sagheer, T. Brown, A Hitchhiker’s Guide to Click-Chemistry with Nucleic Acids. ('hem. Rev. 121, 7122-7154 (2021). 18. J. Swaminathan, A. A. Boulgakov, E. M. Marcotte, D. B. Searls, Ed. A Theoretical Justification for Single Molecule Peptide Sequencing. PLOS Comput. Biol. 11, el004080 (2015).
19. J. A. Alfaro, P. Bohlander, M. Dai, M. Filius, C. J. Howard, X. F. van Kooten, S. Ohayon, A. Pomorski, S. Schmid, A. Aksimentiev, E. V. Anslyn, G. Bedran, C. Cao, M. Chinappi, E. Coyaud, C. Dekker, G. Dittmar, N. Drachman, R. Eelkema, D. Goodlett, S. Hentz, U. Kalathiya, N. L. Kelleher, R. T. Kelly, Z. Kelman, S. H. Kim, B. Kuster, D. Rodriguez-Larrea, S. Lindsay, G. Maglia, E. M. Marcotte, J. P. Marino, C. Masselon, M. Mayer, P. Samaras, K. Sarthak, L. Sepiashvili, D. Stein, M. Wanunu, M. Wilhelm, P. Yin, A. Meller, C. Joo, The emerging landscape of single-molecule protein sequencing technologies. Nat. Methods 18, 604-617 (2021).
20. C. G. Brown, J. Clarke, Nanopore development at Oxford Nanopore. Nat. Biotechnol. 34, 810-811 (2016).
21. M. Jain, H. E. Olsen, B. Paten, M. Akeson, The Oxford Nanopore MinlON: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).
22. F. Hu, B. Angelov, S. Li, N. Li, X. Lin, A. Zou, Single-Molecule Study of Peptides with the Same Amino Acid Composition but Different Sequences by Using an Aerolysin Nanopore. ChemBioChem 21, 2467-2473 (2020).
23. S. Yan, J. Zhang, Y. Wang, W. Guo, S. Zhang, Y. Liu, J. Cao, Y. Wang, L. Wang, F. Ma, P. Zhang, H.-Y. Chen, S. Huang, Single Molecule Ratcheting Motion of Peptides in a Mycobacterium smegmatis Porin A (MspA) Nanopore. Nano Lett. 21, 6703-6710 (2021).
24. M. Miyagi, S. Takiguchi, K. Hakamada, M. Yohda, R. Kawano, Single polypeptide detection using a translocon EXP2 nanopore. PROTEOMICS , 2100070 (2021).
25. S. Biswas, W. Song, C. Borges, S. Lindsay, P. Zhang, Click Addition of a DNA Thread to the N-Termini of Peptides for Their Translocation through Solid-State Nanopores. ACS Nano 9, 9652-9664 (2015). 26. M. A. V. Fahie, Ed., Nanopore Technology: Methods and Protocols (Springer US, New York, NY, 2021; http://link.springer.com/10.1007/978-l-0716-0806-7).
27. J. Zuo, N.-N. Song, J. Wang, X. Zhao, M.-Y. Cheng, Q. Wang, W. Tang, Z. Yang, K. Qiu, Review — Single-Molecule Sensors Based on Protein Nanopores. J. Electrochem. Soc. 168, 126502 (2021).
28. T. Albrecht, Single-Molecule Analysis with Solid-State Nanopores. Annu. Rev. Anal. Chem. 12, 371-387 (2019).
29. L. Restrepo-Perez, S. John, A. Aksimentiev, C. Joo, C. Dekker, SDS-assisted protein transport through solid-state nanopores. Nanoscale 9, 11685-11693 (2017).
30. P. Mallick, High-density and scalable protein arrays for single-molecule proteomic studies. , 33.
31. J. D. Egertson, D. DiPasquo, A. Killeen, V. Lobanov, S. Patel, P. Mallick, A theoretical framework for proteome-scale single-molecule protein identification using multi-affinity protein binding reagents (Systems Biology, 2021; httpV/biorxiv.org/lookup/doi/lO.l 101/2021.10.11.463967).
32. C. M. Stawicki, T. E. Rinker, M. Bums, S. S. Tonapi, R. P. Galimidi, D. Anumala, J. K. Robinson, J. S. Klein, P. Mallick, Modular fluorescent nanoparticle DNA probes for detection of peptides and proteins. Sci. Rep. 11, 19921 (2021).
33. J. van Ginkel, M. Filius, M. Szczepaniak, P. Tulinski, A. S. Meyer, C. Joo, Single-molecule peptide fingerprinting. Proc. Natl. Acad. Sci. 115, 3338-3343 (2018).
34. J. M. Hong, M. Gibbons, A. Bashir, D. Wu, S. Shao, Z. Cutts, M. Chavarha, Y. Chen, L. Schiff, M. Foster, V. A. Church, L. Ching, S. Ahadi, A. Hieu-Thao Le, A. Tran, M. Dimon, M. Coram, B. Williams, P. Jess, M. Bemdl, A. Pawlosky, ProtSeq: Toward high-throughput, single-molecule protein sequencing via amino acid conversion into DNA barcodes. iScience 25, 103586 (2022). 35. B. D. Reed, M. J. Meyer, V. Abramzon, O. Ad, P. Adcock, F. R. Ahmad, G. Alppay, J. A. Ball, J. Beach, D. Belhachemi, A. Bellofiore, M. Bellos, J. F. Beltran, A. Betts, M. W. Bhuiya, K. Blacklock, R. Boer, D. Boisvert, N. D. Brault, A. Buxbaum, S. Caprio, C. Choi, T. D. Christian, R. Clancy, J. Clark, T. Connolly, K. F. Croce, R. Cullen, M. Davey, J. Davidson, M. M. Elshenawy, M. Ferrigno, D. Frier, S. Gudipati, S. Hamill, Z. He, S. Hosali, H. Huang, L. Huang, A. Kabiri, G. Kriger, B. Lathrop, A. Li, P. Lim, S. Liu, F. Luo, C. Lv, X. Ma, E. McCormack, M. Millham, R. Nani, M. Pandey, J. Parillo, G. Patel, D. H. Pike, K. Preston, A. Pichard-Kostuch, K. Rearick, T. Rearick, M. Ribezzi-Crivellari, G. Schmid, J. Schultz, X. Shi, B. Singh, N. Srivastava, S. F. Stewman, T. R. Thurston, P. Trioli, J. Tullman, X. Wang, Y.-C. Wang, E. A. G. Webster, Z. Zhang, J. Zuniga, S. S. Patel, A. D. Griffiths, A. M. van Oijen, M. McKenna, M. D. Dyer, J. M. Rothberg, Real-time dynamic single-molecule protein sequencing on an integrated semiconductor device (Biophysics, 2022; http://biorxiv.org/lookup/doi/10.! 101/2022.01.04.475002).
36. Y. Yao, M. Docter, J. van Ginkel, D. de Ridder, C. Joo, Single-molecule protein sequencing through fingerprinting: computational assessment. Phys. Biol. 12, 055003 (2015).
37. E. T. Hernandez, J. Swaminathan, E. M. Marcotte, E. V. Anslyn, Solution-phase and solid- phase sequential, selective modification of side chains in KDYWEC and KDYWE as models for usage in single-molecule protein sequencing. New J. Chem. 41, 462-469 (2017).
38. X. Qu, D. Wu, L. Mets, N. F. Scherer, Nanometer-localized multiple single-molecule fluorescence microscopy. Proc. Natl. Acad. Sci. 101, 11298-11303 (2004).
39. Q. Xu, M. R. Schlabach, G. J. Hannon, S. J. Elledge, Design of 240,000 orthogonal 25mer DNA barcode probes. Proc. Natl. Acad. Sci. 106, 2289-2294 (2009).
40. T. Buschmann, L. V. Bystrykh, Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinformatics 14, 272 (2013).
41. J. A. Hawkins, S. K. Jones, I. J. Finkelstein, W. H. Press, Indel-correcting DNA barcodes for high-throughput sequencing. Proc. Natl. Acad. Sci. 115 (2018), doi : 10.1073/pnas.1802640115. 42. P. Somervuo, P. Koskinen, P. Mei, L. Holm, P. Auvinen, L. Paulin, BARCOSEL: a tool for selecting an optimal barcode set for high-throughput sequencing. BMC Bioinformatics 19, 257 (2018).
43. R. R. Wick, L. M. Judd, K. E. Holt, Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 20, 129 (2019).
44. F. Schueder, E. M. Unterauer, M. Ganji, R. Jungmann, DNA-Barcoded Fluorescence Microscopy for Spatial Omics. PROTEOMICS 20, 1900368 (2020).
45. K. H. Chen, A. N. Boettiger, J. R. Moffitt, S. Wang, X. Zhuang, Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
46. R. W. Hamming, Error Detecting and Error Correcting Codes. Bell Syst. Tech. J. 29, 147- 160 (1950).
47. D. T. Flood, C. Kingston, J. C. Vantourout, P. E. Dawson, P. S. Baran, DNA Encoded Libraries: A Visitor’s Guide. Isr. J. Chem. 60, 268-280 (2020).
48. K. Miyamoto, W. Aoki, Y. Ohtani, N. Miura, S. Aburaya, Y. Matsuzaki, K. Kajiwara, Y. Kitagawa, M. Ueda, M. Antopolsky, Ed. Peptide barcoding for establishment of new types of genotype-phenotype linkages. PLOS ONE 14, e0215993 (2019).
49. A. Magi, R. Semeraro, A. Mingrino, B. Giusti, R. D’Aurizio, Nanopore sequencing data analysis: state of the art, applications and challenges. Brief. Bioinform. (2017), doi: 10.1093/bib/bbx062.
50. M. J. MacCoss, J. Alfaro, M. Wanunu, D. A. Faivre, N. Slavov, Sampling the proteome by emerging single-molecule and mass-spectrometry methods. , 18.
51. E. Lerner, A. Barth, J. Hendrix, B. Ambrose, V. Birkedal, S. C. Blanchard, R. Borner, H. Sung Chung, T. Cordes, T. D. Craggs, A. A. Deniz, J. Diao, J. Fei, R. L. Gonzalez, I. V. Gopich, T. Ha, C. A. Hanke, G. Haran, N. S. Hatzakis, S. Hohng, S.-C. Hong, T. Hugel, A. Ingargiola, C. Joo, A. N. Kapanidis, H. D. Kim, T. Laurence, N. K. Lee, T.-H. Lee, E. A. Lemke, E. Margeat, J. Michaelis, X. Michalet, S. Myong, D. Nettels, T.-O. Peulen, E. Ploetz, Y. Razvag, N. C. Robb, B. Schuler, H. Soleimaninejad, C. Tang, R. Vafabakhsh, D. C. Lamb, C. A. Seidel, S. Weiss, FRET-based dynamic structural biology: Challenges, perspectives and an appeal for open-science practices. eLife 10, e60416 (2021).
52. N. L. Anderson, C. L. Hunter, Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol. Cell. Proteomics MCP 5, 573-588 (2006).
53. D. Branton, D. W. Deamer, Nanopore sequencing: an introduction (World Scientific, New Jersey, 2018).
54. T. Aksel, H. Qian, P. Hao, P. F. Indermuhle, C. Inman, S. Paul, K. Chen, R. Seghers, J. K. Robinson, M. De Garate, B. Nortman, J. Tan, S. Hendricks, S. Sankar, P. Mallick, High-density and scalable protein arrays for single-molecule proteomic studies (Bioengineering, 2022; http://biorxiv.org/lookup/doi/10.! 101/2022.05.02.490328).
55. A. N. Hoofnagle, J. O. Becker, M. H. Wener, J. W. Heinecke, Quantification of thyroglobulin, a low-abundance serum protein, by immunoaffinity peptide enrichment and tandem mass spectrometry. Clin. Chem. 54, 1796-1804 (2008).
56. R. Beardsley, J. Karty, Enhancing the intensities of lysine-terminated tryptic peptide ions in matrix-assisted laser desorption/ionization mass spectrometry - Beardsley - 2000 - Rapid Communications in Mass Spectrometry - Wiley Online Library. ... Mass Spectrom. (2000) (available at http://onlinelibrary.wiley.eom/doi/10.1002/1097-
0231(20001215)14:23%3C2147::AID-RCM145%3E3.0.CO;2-M/full).
57. S. Saveliev, M. Bratz, R. Zubarev, M. Szapacs, H. Budamgunta, M. Urh, Trypsin/Lys-C protease mix for enhanced protein mass spectrometry analysis. Nat. Methods 10, i-ii (2013).
58. Q. Li, Y. Feng, M.-J. Tan, L.-H. Zhai, Evaluation of Endoproteinase Lys-C/Trypsin Sequential Digestion Used in Proteomics Sample Preparation. Chin. J. Anal. Chem. 45, 316— 321 (2017).
59. C. Ricos, V. Alvarez, F. Cava, Current databases on biological variation: pros, cons and progress. Scand. J. Clin. Lab. Invest. 59, 491-500 (1999). 60. N. L. Anderson, A. Jackson, D. Smith, D. Hardie, C. Borchers, T. W. Pearson, SISCAPA peptide enrichment on magnetic beads using an in-line bead trap device. 8, 995-1005 (2009).
61. G. Xu, S. B. Y. Shin, S. R. Jaffrey, Chemoenzymatic Labeling of Protein C-Termini for Positive Selection of C-Terminal Peptides. ACS Chem. Biol. 6, 1015-1020 (2011).
62. S. D. Brown, D. Graham, Conjugation of an oligonucleotide to Tat, a cell-penetrating peptide, via click chemistry. Tetrahedron Lett. 51, 5032-5034 (2010).
63. F. J. Flett, J. G. A. Walton, C. L. Mackay, H. Interthal, Click Chemistry Generated Model DNA-Peptide Heteroconjugates as Tools for Mass Spectrometry. Anal. Chem. 87, 9595-9599 (2015).
64. N. Inoue, A. Onoda, T. Hayashi, Site-Specific Modification of Proteins through N- Terminal Azide Labeling and a Chelation-Assisted CuAAC Reaction. Bioconjug. Chem. 30, 2427-2434 (2019).
65. H. Brinkerhoff, A. S. W. Kang, J. Liu, A. Aksimentiev, C. Dekker, Multiple rereads of single proteins at single-amino acid resolution using nanopores. Science , eabl4381 (2021).
66. M. Mag, S. Liiking, J. W. Engels, Synthesis and selective cleavage of an oligodeoxynucleotide containing a bridged intemucleotide 5’-phosphorothioate linkage. Nucleic Acids Res. 19, 1437-1441 (1991).
67. Z. Hu, J. Yang, F. Xu, G. Sun, X. Pan, M. Xia, S. Zhang, X. Zhang, Site-Specific Scissors Based on Myeloperoxidase for Phosphorothioate DNA. J. Am. Chem. Soc. 143, 12361-12368 (2021).
68. M. T. Noakes, H. Brinkerhoff, A. H. Laszlo, I. M. Derrington, K. W. Langford, J. W. Mount, J. L. Bowman, K. S. Baker, K. M. Doering, B. I. Tickman, J. H. Gundlach, Increasing the accuracy of nanopore DNA sequencing using a time-varying cross membrane voltage. Nat. Biotechnol. 37, 651-656 (2019).
69. J. Nivala, D. B. Marks, M. Akeson, Unfoldase-mediated protein translocation through an a-hemolysin nanopore. Nat. Biotechnol. 31, 247-250 (2013). 70. S. Zhang, G. Huang, R. C. A. Versloot, B. M. H. Bruininks, P. C. T. de Souza, S.-J. Marrink, G. Maglia, Bottom-up fabrication of a proteasome-nanopore that unravels and processes single proteins. Nat. Chem. (2021), doi: 10.1038/s41557-021-00824-w.
71. B. Albada, J. F. Keijzer, H. Zuilhof, F. van Delft, Oxidation-Induced “One-Pot” Click Chemistry. Chem. Rev. 121, 7032-7058 (2021).
72. METHODS FOR DELIVERING AN ANALYTE TO TRANSMEMBRANE PORES
US20210147904A1 (available at https://patentimages.storage.googleapis.com/45/5b/04/e35513c92f0382/US20210147904Al. pdf).
73. SYSTEMS AND METHODS OF DELIVERING TARGET MOLECULES TO A
NANOPORE: US20200284783A1 (available at https://patentimages.storage.googleapis.com/6f/5a/e5/85ccf2aal5f897/US20200284783ALp df).
74. Y. Bao, J. Wadden, J. R. Erb-Downward, P. Ranjan, W. Zhou, T. L. McDonald, R. E. Mills, A. P. Boyle, R. P. Dickson, D. Blaauw, J. D. Welch, SquiggleNet: real-time, direct classification of nanopore signals. Genome Biol. 22, 298 (2021).
75. M. Lelek, M. T. Gyparaki, G. Beliu, F. Schueder, J. Griffie, S. Manley, R. Jungmann, M. Sauer, M. Lakadamyali, C. Zimmer, Single-molecule localization microscopy. Nat. Rev. Methods Primer 1, 39 (2021).
76. H. Yang, G. Garcia-Manero, K. Sasaki, G. Montalban-Bravo, Z. Tang, Y. Wei, T. Kadia,
K. Chien, D. Rush, H. Nguyen, A. Kalia, M. Nimmakayalu, C. Bueso-Ramos, H. Kantarjian,
L. J. Medeiros, R. Luthra, R. Kanagal-Shamanna, High-resolution structural variant profiling of myelodysplastic syndromes by optical genome mapping uncovers cryptic aberrations of prognostic and therapeutic significance. Leukemia (2022), doi: 10.1038/s41375-022-01652-8.
77. T. Chatterjee, A. Knappik, E. Sandford, M. Tewari, S. W. Choi, W. B. Strong, E. P. Thrush, K. J. Oh, N. Liu, N. G. Walter, A. Johnson-Buck, Direct kinetic fingerprinting and digital counting of single protein molecules. Proc. Natl. Acad. Sci. 117, 22815-22822 (2020). 78. J. Swaminathan, A. A. Boulgakov, E. T. Hernandez, A. M. Bardo, J. L. Bachman, J. Marotta, A. M. Johnson, E. V. Anslyn, E. M. Marcotte, Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures. Nat. Biotechnol. 36, 1076-1082 (2018).
79. J. M. Rothberg, W. Hinz, T. M. Rearick, J. Schultz, W. Mileski, M. Davey, J. H. Leamon, K. Johnson, M. J. Milgrew, M. Edwards, J. Hoon, J. F. Simons, D. Marran, J. W. Myers, J. F. Davidson, A. Branting, J. R. Nobile, B. P. Puc, D. Light, T. A. Clark, M. Huber, J. T. Branciforte, I. B. Stoner, S. E. Cawley, M. Lyons, Y. Fu, N. Homer, M. Sedova, X. Miao, B. Reed, J. Sabina, E. Feierstein, M. Schorn, M. Alanjary, E. Dimalanta, D. Dressman, R. Kasinskas, T. Sokolsky, J. A. Fidanza, E. Namsaraev, K. J. McKernan, A. Williams, G. T. Roth, J. Bustillo, An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348-352 (2011).
80. R. Roy, S. Hohng, T. Ha, A practical guide to single-molecule FRET. Nat. Methods 5, 507- 516 (2008).
81. M. S. Chee, S. Diego, D. A. Routenberg, S. Diego, US2018/0201980, 47.

Claims

What is claim is:
1. A molecular construct and vehicle comprising:
(a) a molecular construct comprising a peptide comprising a target peptide sequence derived from proteolytic cleavage of a target protein and a molecular tag defining the source of said peptide, and
(b) a vehicle capable of presenting the construct for analysis by a sequence-sensitive single molecule detector.
2. The molecular construct and vehicle of claim 1, wherein the molecular tag is a target tag that identifies the peptide as a peptide created by proteolytic digestion of a biological sample.
3. The molecular construct and vehicle of claim 1, wherein the peptide comprises a synthetic peptide and the molecular tag is a standard tag that identifies the synthetic peptide as an internal standard.
4. The molecular construct and vehicle of claim 2, further comprising a sample barcode identifying the sample of origin.
5. The molecular construct and vehicle of claim 1 further comprising a binder barcode identifying a binder to which the construct has been bound.
6. The molecular construct and vehicle of claim 4, wherein the barcode or the tag is an oligonucleotide.
7. A standardized sample digest derived from a proteolytic digest of a biological sample, comprising: an amount of a molecular construct comprising a target tag and a target peptide, said construct being a target peptide construct and an amount of a molecular construct comprising a standard tag and a peptide whose sequence is the same or similar to the sequence of said target peptide, said construct being a standard peptide construct, wherein the target peptide is generated by proteolytic digestion of a target protein in said biological sample, wherein said target and standard tags can be distinguished by a single molecule detector and comprise chemical or structural groups covalently joined to peptides in their respective constructs, wherein said target tag is covalently attached to a plurality of the peptides present in said sample digest, wherein said target peptide construct comprises more than 90% of the target peptide molecules present in said sample digest and wherein said standard peptide construct is prepared separately and added to said digest in a known amount, or in a consistent relative amount across a multiplicity of samples. The standardized sample digest of claim 7, wherein the number of molecules of the standard peptide construct added to the sample digest differs by no more than a factor of 100 from the number of molecules of the target peptide construct in said sample digest. The standardized sample digest of claim 7, further comprising one or more additional standard peptide constructs having a different standard tag from each other and with each construct at a different relative abundance. The standardized sample digest of claim 7, wherein the target tag is covalently attached to a majority of the peptides generated by proteolytic digestion of said sample. The standardized sample digest of claim 7 wherein said tags are oligonucleotides. An enriched standardized sample digest, comprising a bound fraction of the standardized sample digest of claim 7 bound by a binder, wherein said bound fraction comprises a target peptide construct and a standard peptide construct in a ratio equal within 2%, 5%, 10% or 20% to the ratio in which they are present in said standardized sample digest. A stoichiometrically-flattened standardized sample, comprising a plurality of pairs of cognate standard and target peptide constructs enriched from a standardized proteolytic digest of a biological sample by binding to their respective cognate binders, wherein a pre-enrichment ratio calculated by dividing the number of molecules of a first target peptide construct that is the most numerous of said target peptide constructs in the standardized sample digest by the number of molecules of a second target peptide construct that is the least numerous of said target peptide constructs in the standardized sample digest is more than 10 times larger than a post-enrichment ratio calculated by dividing the number of molecules of said first target peptide construct by the number of molecules of said second target peptide construct in said enriched sample. A method for the measuring the amount of a selected target protein in a biological sample, comprising: proteolytically digesting said sample, modifying a plurality of peptides in the digested sample by adding a target tag to form a plurality of constructs comprising a selected target peptide derived from, and proteotypic of, said target protein, said plurality of constructs being target construct molecules, adding an amount that is known and/or consistent between a set of samples of a prepared standard peptide construct that is a cognate of said selected target peptide construct and comprises a standard tag, forming a standardized digest, enriching said cognate target and standard peptide constructs by contacting said standardized digest with a cognate binder, forming bound constructs, separating said bound constructs from unbound constructs to form enriched constructs, releasing said enriched constructs from said binder, linking said enriched constructs to a vehicle capable of presenting said enriched constructs to a sequence-sensitive single molecule detector, counting said enriched target construct molecules and said enriched standard construct molecules using a sequence-sensitive single molecule detector capable of distinguishing said target and standard tags and identifying said peptides, calculating the amount of said protein in said sample. The method of claim 14, wherein the calculating is performed by multiplying the amount of standard construct added by the ratio of the number of target construct molecules counted to the number of standard construct molecules counted by said detector. The method of claim 14, wherein said proteolytic digestion comprises at least two sequential steps resulting in peptide cleavage at different sites, and wherein peptides are covalently modified between two such steps (or wherein said first sequential step cleaves at lysine residues). The method of claim 14, wherein said proteolytic digestion comprises at least two sequential steps resulting in peptide cleavage at different sites, and wherein peptides retain an unmodified n-terminal amino group when presented to said detector. The method of claim 14, wherein a sample barcode is linked to said constructs encoding the identity, or relative position within a sample set, of said standardized samples; a plurality of said standardized samples is pooled; said sample barcodes associated with construct molecules are read using a sequence-sensitive single molecule detector; and the counts of target and standard construct molecules for each sample are separated based on said sample ID barcode identifying the sample from which they were enriched, and wherein said barcode may be an oligonucleotide. The method of claim 14, wherein a binder barcode is linked to said constructs identifying the binder by which they were enriched, and wherein said barcode may be an oligonucleotide. The method of claim 14, wherein said construct molecules are joined together into concatamers prior to presentation to said detector.
PCT/US2022/080781 2021-12-01 2022-12-01 Enriched peptide detection by single molecule sequencing WO2023102502A2 (en)

Applications Claiming Priority (18)

Application Number Priority Date Filing Date Title
US202163284990P 2021-12-01 2021-12-01
US63/284,990 2021-12-01
US202163288987P 2021-12-13 2021-12-13
US63/288,987 2021-12-13
US202263296196P 2022-01-04 2022-01-04
US63/296,196 2022-01-04
US202263303417P 2022-01-26 2022-01-26
US63/303,417 2022-01-26
US202263313760P 2022-02-25 2022-02-25
US63/313,760 2022-02-25
US202263340001P 2022-05-10 2022-05-10
US63/340,001 2022-05-10
US202263348213P 2022-06-02 2022-06-02
US63/348,213 2022-06-02
US202263352925P 2022-06-16 2022-06-16
US63/352,925 2022-06-16
US202263373875P 2022-08-30 2022-08-30
US63/373,875 2022-08-30

Publications (3)

Publication Number Publication Date
WO2023102502A2 true WO2023102502A2 (en) 2023-06-08
WO2023102502A3 WO2023102502A3 (en) 2023-09-14
WO2023102502A4 WO2023102502A4 (en) 2023-10-12

Family

ID=86613124

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/080781 WO2023102502A2 (en) 2021-12-01 2022-12-01 Enriched peptide detection by single molecule sequencing

Country Status (1)

Country Link
WO (1) WO2023102502A2 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050255491A1 (en) * 2003-11-13 2005-11-17 Lee Frank D Small molecule and peptide arrays and uses thereof
EP2867662B1 (en) * 2012-06-27 2021-01-06 Siscapa Assay Technologies, Inc. Multipurpose mass spectrometric assay panels for peptides
ES2739430T3 (en) * 2013-08-19 2020-01-31 Singular Bio Inc Assays for the detection of a single molecule and their use
US20200348307A1 (en) * 2017-10-31 2020-11-05 Encodia, Inc. Methods and compositions for polypeptide analysis

Also Published As

Publication number Publication date
WO2023102502A3 (en) 2023-09-14
WO2023102502A4 (en) 2023-10-12

Similar Documents

Publication Publication Date Title
US20230081326A1 (en) Increasing dynamic range for identifying multiple epitopes in cells
US20210002733A1 (en) Methods for identifying multiple epitopes in selected sub-populations of cells
US20230236198A1 (en) Kits for analysis using nucleic acid encoding and/or label
US20210040538A1 (en) Methods of identifying multiple epitopes in cells
EP2209893B1 (en) Use of aptamers in proteomics
WO2010065322A1 (en) Concurrent identification of multitudes of polypeptides
US20190219592A1 (en) Mass spectrometry technique for single cell proteomics
US20230416806A1 (en) Polymorphism detection with increased accuracy
US20200348310A1 (en) Srm methods in alzheimer&#39;s disease and neurological disease assays
Kelstrup et al. Pinpointing phosphorylation sites: Quantitative filtering and a novel site-specific x-ion fragment
WO2020236846A1 (en) Methods and related kits for spatial analysis
CN114929887A (en) Method for sequencing and reconstructing single polypeptide
US20220214353A1 (en) Methods for spatial analysis of proteins and related kits
CN114127281A (en) Proximity interaction analysis
US11459598B2 (en) Multiplex DNA immuno-sandwich assay (MDISA)
WO2023102502A2 (en) Enriched peptide detection by single molecule sequencing
US20200157603A1 (en) Methods of identifying multiple epitopes in cells
US20200362392A1 (en) Methods of identifying multiple epitopes in cells
CN114929888A (en) Methods, kits and devices for preparing samples for multiplex polypeptide sequencing
US20230212647A1 (en) Systems and methods for rapid identification of proteins
US11959922B2 (en) Macromolecule analysis employing nucleic acid encoding
US20240125792A1 (en) Kits for analysis using nucleic acid encoding and/or label
US20220127754A1 (en) Methods and compositions of accelerating reactions for polypeptide analysis and related uses
WO2023114732A2 (en) Single-molecule peptide sequencing through molecular barcoding and ex-situ analysis
CN115175998A (en) Automated processing of macromolecules for analysis and related apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22902411

Country of ref document: EP

Kind code of ref document: A2