US20210380971A1 - Simultaneous, sequencing-based analysis of proteins, nucleosomes, and cell-free nucleic acids from a single biological sample - Google Patents

Simultaneous, sequencing-based analysis of proteins, nucleosomes, and cell-free nucleic acids from a single biological sample Download PDF

Info

Publication number
US20210380971A1
US20210380971A1 US17/282,694 US201917282694A US2021380971A1 US 20210380971 A1 US20210380971 A1 US 20210380971A1 US 201917282694 A US201917282694 A US 201917282694A US 2021380971 A1 US2021380971 A1 US 2021380971A1
Authority
US
United States
Prior art keywords
dna
sample
adapter
5hmc
nucleic acid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/282,694
Other languages
English (en)
Inventor
Patrick A. Arensdorf
Damek Spacek
Christopher E. Ellison
Samuel Levy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ClearNote Health Inc
Original Assignee
Bluestar Genomics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bluestar Genomics Inc filed Critical Bluestar Genomics Inc
Priority to US17/282,694 priority Critical patent/US20210380971A1/en
Assigned to BLUESTAR GENOMICS, INC. reassignment BLUESTAR GENOMICS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ELLISON, CHRISTOPHER E, ARENSDORF, PATRICK A., LEVY, SAMUEL, SPACEK, Damek
Publication of US20210380971A1 publication Critical patent/US20210380971A1/en
Assigned to CLEARNOTE HEALTH, INC. reassignment CLEARNOTE HEALTH, INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: BLUESTAR GENOMICS, INC.
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6804Nucleic acid analysis using immunogens

Definitions

  • the present invention relates generally to epigenetic analysis, and more particularly relates to combined workflow methods for obtaining multiple types of information from a single biological sample.
  • the invention finds utility in the fields of genomics, medicine, diagnostics, and epigenetic research.
  • a cell-free DNA (cfDNA) sample typically contains only a few nanograms of DNA per mL of plasma.
  • cfDNA DNA sequence information and/or methylation data
  • the invention is directed to the aforementioned need in the art and, in one embodiment, provides a combined workflow method for the analysis of a biological sample to determine multiple types of information therefrom without need for many independent analytical steps, a plurality of data-generating modalities, or a large quantity of sample.
  • the types of information that may be obtained from a patient's blood sample include the presence and concentration of specific plasma proteins; the number, location, and types of histone modifications associated with cfDNA (e.g., DNA from the cell-free fraction of a blood sample); the sequence of cfRNA and cfDNA in that fraction; and epigenetic information pertaining to the cell-free DNA, such as hydroxymethylation and methylation profiles, i.e., the distribution of 5-hydroxymethylcytosine (5hmC) and 5-methylcytosine (5mC) residues, respectively.
  • cfDNA e.g., DNA from the cell-free fraction of a blood sample
  • epigenetic information pertaining to the cell-free DNA such as hydroxymethylation and methylation profiles, i.e., the distribution of 5-hydroxymethylcytosine (5hmC) and 5-methylcytosine (5mC) residues, respectively.
  • the invention additionally pertains to a classical sequencing-based method for analyzing a biological sample to determine one or more non-classical sequence features of the sample, where a “non-classical sequence feature” refers to a feature other than the identity and order of the four primary bases (i.e., adenine, cytosine, guanine, and thymine for DNA, and adenine, cytosine, guanine, and uracil for RNA) of a nucleic acid molecule in the sample. That is, the method comprises determination of classical nucleic acid sequence information from which the non-classical sequence feature of interest can be derived.
  • a “non-classical sequence feature” refers to a feature other than the identity and order of the four primary bases (i.e., adenine, cytosine, guanine, and thymine for DNA, and adenine, cytosine, guanine, and uracil for RNA) of a nucleic acid
  • the non-classical sequence feature may be information related to the composition of a nucleic acid, such as the distribution of modified cytosine residues, e.g., 5hmC or 5mC, or it may be unrelated to the composition of a nucleic acid and pertain instead to the presence and concentration of plasma proteins in a blood sample, histone modifications observed in a cell-free nucleosome fraction of the blood sample, and the like.
  • the method may be implemented to determine a single non-classical sequence feature of a biological sample, more than one non-classical sequence feature of a biological sample, or a combination of classical sequence information and one or more non-classical sequence features.
  • the analysis involves conversion of a non-classical sequence feature of interest, such as the identity of a plasma protein, the concentration of a plasma protein, the number, location and types of histone modifications, the hydroxymethylation profile of a nucleic acid (e.g., the 5hmC profile of cell-free DNA in a cell-free nucleic acid fraction of a biological sample), or the methylation profile of a nucleic acid (e.g., the 5mC profile of cell-free DNA in a cell-free nucleic acid fraction of a biological sample), into classical sequence data.
  • a non-classical sequence feature of interest such as the identity of a plasma protein, the concentration of a plasma protein, the number, location and types of histone modifications, the hydroxymethylation profile of a nucleic acid (e.g., the 5hmC profile of cell-free DNA in a cell-free nucleic acid fraction of a biological sample), or the methylation profile of a nucleic acid (e.g., the 5mC
  • the classical sequence data obtained includes at least one specific nucleic acid sequence in the range of about 4 to about 36 base pairs in length which serves as a Unique Feature Identifier (UFI) sequence, where the UFI is incorporated within a double-stranded DNA (dsDNA) molecule deriving from an analyte of interest in the biological sample.
  • the classical sequence data may also comprise a cDNA sequence, thus providing information regarding the corresponding sequence of RNA template molecules, such as cell-free RNA in a cell-free nucleic acid fraction of a biological sample.
  • the invention provides an improved proximity extension assay for identifying a plurality of protein analytes in a biological sample by providing a plurality of probe pairs each comprising a first proximity probe and a second proximity probe, with each probe pair targeting a specific protein analyte, and generating a double-stranded DNA (dsDNA) segment between the probes of each probe pair in the presence of the corresponding protein analyte, wherein the improvement comprises:
  • each protein-specific nucleic acid sequence is contained within an adapter, and step (a) is carried out by end-ligating the adapters to the dsDNA segments.
  • an improved proximity extension assay for identifying a plurality of protein analytes in a biological sample by providing a plurality of probe pairs each comprising a first proximity probe and a second proximity probe, with each probe pair targeting a specific protein analyte, and generating a dsDNA segment between the probes of each probe pair in the presence of the corresponding protein analyte, wherein the improvement comprises:
  • step (a) is carried out by end-ligating the dsDNA segments with adapters each comprising a protein-specific nucleic acid sequence and the capture sequence.
  • one or more 5hmC residues in the capture sequence can be functionalized to facilitate removal of the dsDNA template molecule from the sample, from a fraction of a sample, or from an admixture comprising a plurality of biomolecules. This is particularly useful in the context of a combined workflow analysis of a single biological sample from which multiple types of information are extracted.
  • an improved proximity extension assay for identifying a plurality of protein analytes in a biological sample by providing a plurality of probe pairs each comprising a first proximity probe and a second proximity probe, with each probe pair targeting a specific protein analyte, and generating a dsDNA segment between the probes of each probe pair in the presence of the corresponding protein analyte, wherein the improvement comprises:
  • the proximity extension assay further comprises, prior to step (b), combining at least one protein concentration control composition with the dsDNA template molecules.
  • the control composition together with the molecular barcode, enables the determination of the original concentration of at least one protein analyte in the sample by comparing the number of sequence reads indicative of a specific protein analyte with sequence reads generated by the protein concentration control composition.
  • an improved proximity extension assay for identifying a plurality of protein analytes in each of a plurality of biological samples, wherein, for each biological sample, the assay comprises providing a plurality of probe pairs each comprising a first proximity probe and a second proximity probe, with each probe pair targeting a specific protein analyte, and generating a dsDNA segment between the probes of each probe pair in the presence of the corresponding protein analyte, wherein the improvement comprises:
  • steps (a), (b), and (c) can be carried out simultaneously for at least 300 biological samples, at least 500 biological samples, or at least 1500 biological samples.
  • the invention provides method for identifying a plurality of protein analytes in a biological sample using a DNA sequence-based technique, the method comprising:
  • step (e) identifying the protein analytes in the biological sample from the protein identifier barcodes observed in the sequence reads generated in step (b).
  • the method is carried out on a fraction of the biological sample, typically on plasma obtained from a blood sample.
  • a combined workflow method in which protein analytes in one or more biological samples are analyzed as set forth with respect to any of the above embodiments, and a cell-free nucleic acid sample from the same biological sample is analyzed as well.
  • the information obtained for the cell-free nucleic acid sample in a first embodiment of a combined workflow method provided herein, pertains to the presence or quantity of one or more histone modifications within nucleosomes in the cell-free nucleic acid sample.
  • the histone modifications may be covalent post-translational modifications (PTMs), alterations in histone structure that impact on gene expression.
  • PTMs post-translational modifications
  • Particular histone modifications of interest in one aspect of this embodiment, are histone modification biomarkers for assessing a disease state in a subject may also include histone modification biomarkers for assessing a disease state in a subject.
  • the information obtained for the cell-free nucleic acid sample includes at least one sequence of cfDNA in the cell-free nucleic acid sample.
  • the information obtained for the cell-free nucleic acid sample includes at least one sequence of cfRNA in the cell-free nucleic acid sample.
  • the information obtained for the cell-free nucleic acid sample includes epigenetic data pertaining to cfDNA hydroxymethylation.
  • the information obtained for the cell-free nucleic acid sample includes epigenetic data pertaining to cfDNA methylation.
  • a combined workflow method in which protein analytes in one or more biological samples are analyzed as described above, and a cell-free nucleic acid sample from the same biological sample is analyzed with respect to at least two of: histone modifications; cfDNA sequence; cfRNA sequence; cfDNA hydroxymethylation; and cfDNA methylation.
  • the invention provides a method for preparing a cell-free nucleic acid sample to enable identification of at least one histone modification in a nucleosome contained therein using a DNA sequencing-based technique.
  • the method comprises:
  • a proximity probe comprising, at a first terminus, a histone modification binding domain that specifically binds to a histone modification of interest; at a second terminus, a nucleic acid binding domain complementary to a terminal hybridizing region; and a non-hybridizing region therebetween comprising a nucleic acid sequence that corresponds to the histone modification of interest and thereby serves as a histone modification barcode, wherein the proximity probe is dimensioned to allow for simultaneous binding of the histone modification binding domain to the histone modification of interest and hybridization of the complementary nucleic acid binding domain with the hybridizing nucleic acid region;
  • step (c) comprises providing a plurality of proximity probes each targeting a different histone modification.
  • the method additionally includes amplifying the histone modification-barcoded dsDNA template molecules.
  • the method also includes sequencing the amplified, histone modification-barcoded dsDNA template molecules and determining information about the type and location of histone modifications from the histone modification barcodes observed in the sequence reads generated.
  • Another embodiment of the invention pertains to a method for using adapters that comprise at least one 5hmC residue in the preparation of cfDNA for extraction from a cell-free nucleic acid sample.
  • the method involves (a) ligating DNA adapters comprising capture sequences that comprise a 5hmC residue onto the ends of end-blunted DNA in the cell-free nucleic acid sample to provide adapter-ligated DNA; and (b) functionalizing the 5hmC residue with an affinity tag that allows selective removal of tagged cfDNA.
  • the affinity tag may be a biotin moiety, such as biotin per se or, more typically, biotin that has been covalently modified to include a reactive site.
  • the biotinylated 5hmC site(s) are then used to enable extraction from the sample by reaction with an avidin-coated or streptavidin-coated support.
  • the adapters additionally include a UFI sequence, generally at least two UFI sequences, each indicating a non-sequence feature, or characteristic, of the cfDNA in the cell-free nucleic acid sequence.
  • the non-sequence feature(s) of interest may be determined from the UFI sequences observed in the sequence reads.
  • the invention provides a method for preparing cell-free DNA and cell-free RNA in a single cell-free nucleic acid sample for simultaneous, sequencing-based analysis.
  • the method involves (a) ligating DNA adapters comprising a first adapter sequence that includes at least one UFI sequence onto the ends of end-blunted DNA in the cell-free sample to provide adapter-ligated DNA, where the at least one UFI sequence includes a source identifier barcode; (b) purifying the adapter-ligated DNA and RNA to provide a cell-free admixture of adapter-ligated DNA and RNA; (c) synthesizing a first strand of cDNA from the RNA; (d) synthesizing a second strand of cDNA complementary to the first strand to provide a cDNA duplex; and (e) covalently attaching to at least one terminus of the cDNA duplex, in the absence of a ligase, a cDNA adapter comprising a second adapter sequence that includes
  • a combined workflow process for extracting multiple types of data from a single, cell-free nucleic acid sample using a sequencing-based analysis, where the data includes the hydroxymethylation profile of cfDNA in the sample as well as sequence information for cfRNA.
  • the data may also include DNA sequence information.
  • the process comprises: (a) ligating DNA adapters comprising a first adapter sequence that includes at least one UFI sequence onto the ends of end-blunted DNA in the cell-free nucleic acid sample to provide adapter-ligated DNA, wherein the at least one UFI sequence includes a source identifier barcode; (b) synthesizing cDNA from RNA in the sample and covalently attaching a cDNA adapter comprising the source identifier barcode and an RNA indicator barcode to at least one terminus of the cDNA, thereby providing adapter-bound cDNA in a cell-free composition that also comprises the adapter-ligated DNA; (c) functionalizing 5hmC residues in the cell-free composition with an affinity tag that allows selective removal of 5hmC-containing DNA from the cell-free composition; (d) removing the 5hmC-containing DNA from the cell-free composition, with untagged DNA and adapter-bound cDNA remaining; (e) appending a 5hmC process barcode to the 5h
  • step (e) is carried out by incorporation of the 5hmC process barcode into the DNA adapters.
  • a combined workflow process for extracting multiple types of data from a single, cell-free nucleic acid sample using a sequencing-based analysis, where the data includes the hydroxymethylation profile of cfDNA in the sample, cfRNA sequence information, and, optionally, DNA sequence information, as above, and further comprises the methylation profile of cfDNA in the sample.
  • the process comprises: (a) ligating DNA adapters comprising a first adapter sequence that includes at least one UFI sequence onto the ends of end-blunted DNA in the cell-free nucleic acid sample to provide adapter-ligated DNA, wherein the at least one UFI sequence includes a source identifier barcode; (b) synthesizing cDNA from RNA in the sample and covalently attaching a cDNA adapter comprising the source identifier barcode and an RNA indicator barcode to at least one terminus of the cDNA, thereby providing adapter-bound cDNA in a cell-free composition that also comprises the adapter-ligated DNA; (c) functionalizing 5hmC residues in the cell-free composition with an affinity tag that allows selective removal of 5hmC-containing DNA from the cell-free composition; (d) removing the 5hmC-containing DNA from the cell-free composition, with untagged DNA and adapter-bound cDNA remaining; (e) appending a 5hmC process barcode to the 5h
  • a combined workflow process for extracting at least two types of data from a single, cell-free nucleic acid sample using a sequencing-based analysis, where the data includes the hydroxymethylation profile of cfDNA in the sample, cfRNA sequence information, and, optionally, DNA sequence information.
  • the process comprises: (a) ligating DNA adapters comprising a first adapter sequence that includes at least one molecular barcode comprising a source identifier barcode onto the ends of end-blunted DNA in the sample to provide adapter-ligated DNA; (b) synthesizing cDNA from RNA in the sample and covalently attaching, to at least one terminus of the cDNA, a cDNA adapter comprising a 5hmC residue, the source identifier barcode, and an RNA indicator barcode, thereby providing barcoded, adapter-bound cDNA; (c) functionalizing 5hmC residues in the sample with an affinity tag that allows selective removal of 5hmC-containing species from the cell-free sample; (d) removing the 5hmC-containing DNA and the barcoded, adapter-bound cDNA from the cell-free sample; and (e) amplifying and sequencing a pooled admixture of the 5hmC-containing DNA and the barcoded-adapter-bound
  • a combined workflow process for extracting at least two types of data from a single, cell-free nucleic acid sample using a sequencing-based analysis, where the data includes the presence or quantity of one or more histone modifications within nucleosomes in the cell-free nucleic acid sample, and sequence information for cfRNA in the sample.
  • the process comprises: A combined workflow process for extracting multiple types of data from a single, cell-free nucleic acid sample, comprising: (a) ligating an adapter comprising a hybridizing nucleic acid region to each terminus of nucleosome-associated DNA, thereby providing a modified cell-free nucleic acid sample comprising nucleosomes associated with adapter-ligated DNA; (b) providing a proximity probe comprising a histone modification binding domain at a first terminus, a nucleic acid binding domain complementary to the hybridizing nucleic acid region at an opposing second terminus, and a non-hybridizing region therebetween comprising a nucleic acid sequence selected to correspond to a specific histone modification and thereby serve as a histone modification barcode, wherein the proximity probe is dimensioned to allow for simultaneous binding of the histone modification binding domain to the histone modification and the hybridization of the complementary nucleic acid binding domain with the hybridizing nucleic acid region; (c) incubating the modified cell-free nucleic acid
  • the process further includes (i) amplifying and sequencing the histone modification-barcoded dsDNA template molecule and the adapter-bound cDNA, wherein the histone modification-barcoded dsDNA template molecule and the adapter-bound cDNA are normally amplified and sequence together in a pooled admixture.
  • the process further includes incorporating an analysis of cfDNA in the sample to determine the hydroxymethylation profile thereof.
  • the process comprises carrying out steps (a) through (h) of the embodiment and then: (i) functionalizing 5hmC residues in the nucleic acid composition with a first affinity tag that allows selective removal of 5hmC-containing species; (j) removing the tagged 5hmC-containing DNA from the composition, with untagged DNA and adapter-bound cDNA remaining; (k) appending a 5hmC process barcode to the tagged 5hmC-containing DNA; and (1) amplifying and sequencing the 5hmC-containing DNA, the untagged DNA (including the histone modification-barcoded dsDNA template molecules generated in step (d)), and the adapter-bound cDNA, wherein amplification and sequencing are normally carried out with a pooled admixture of the various species.
  • the process further includes the determination of the methylation profile of cfDNA in the sample.
  • the process comprises carrying out steps (a) through (k) delineated above, and then (l) converting methylcytosine residues in the remaining sample to oxidized methylcytosine residues; (m) functionalizing the oxidized methylcytosine residues with a second affinity tag that allows selective removal of the functionalized species from the sample; (n) removing the tagged 5mC-containing DNA, with untagged DNA and adapter-bound cDNA remaining; (o) appending a 5mC process barcode to the tagged 5mC-containing DNA; and (p) amplifying and sequencing the tagged 5hmC-containing DNA, the tagged 5mC-containing DNA, the untagged DNA (including, as before, the histone modification-barcoded dsDNA template molecules)., and the adapter-bound cDNA, wherein amplification and sequencing are, again, typically carried out with a pooled admixture of
  • a combined workflow process for carrying out both a plasma protein analysis on a blood sample and an analysis of a cell-free nucleic acid fraction of the blood sample.
  • the plasma protein analysis involves the generation of a protein-barcoded dsDNA template molecule using a proximity extension assay and ultimately pooling that dsDNA template molecule with one or more of the the various DNA template molecules generated in the analysis of the cell-free nucleic acid sample, i.e., the histone modification-barcoded dsDNA template, the tagged 5hmC-containing DNA, the tagged 5mC-containing DNA, the untagged DNA, and the adapter-bound cDNA.
  • the invention provides a sequencing-based method for determining a non-classical sequence feature of a nucleic acid template molecule, comprising: appending an identifier sequence to the nucleic acid template molecule which designates a specific non-sequence feature of the template molecule; amplifying the nucleic acid template molecule and the appended identifier sequence to give a plurality of amplicons each including the appended identifier sequence; and sequencing the amplicons and determining the non-sequence feature from the sequence reads obtained.
  • a further embodiment of the invention pertains to a double-stranded DNA template molecule that comprises a protein-specific nucleic acid sequence derived from a known protein analyte in a proximity extension assay and thereby serving as a protein identifier barcode.
  • Still another embodiment of the invention provides a combination of sample fractions each comprising adapter-ligated, barcoded, double-stranded DNA template molecules derived from a single blood sample, the combination comprising: (a) a plasma-derived sample fraction comprising at least one protein-related dsDNA template molecule, each of which comprises a protein-specific nucleic acid sequence corresponding to a specific protein analyte and thereby serving as a protein identifier barcode; and (b) at least one cfDNA-derived sample fraction comprising a double-stranded cfDNA template molecule obtained from a cell-free nucleic acid sample obtained from the blood sample, wherein the cfDNA template molecule is end-ligated with a set of adapters that comprise a UFI sequence selected from a source identifier barcode, a fragment identifier barcode, a strand identifier barcode, a histone modification barcode, a random barcode, and combinations thereof.
  • the aforementioned combination of sample fractions comprises a pooled admixture of the sample fractions, where the DNA template molecules in the admixture may then be amplified and sequenced simultaneously.
  • methods and compositions are provided for improving the efficiency of adapter ligation, in turn improving a process for sequencing DNA.
  • the aforementioned methods and compositions of the invention are especially useful in the analysis of cfDNA, insofar as the concentration of DNA in a cell-free sample is already very low.
  • the methods and compositions are particularly useful in the sequencing and quantitation of 5mC-containing DNA and 5hmC-containing DNA, since these modified cytosine residues occur relatively infrequently, representing about 1% and 0.1% of all DNA bases, respectively.
  • the invention provides improved methods and compositions for sequencing cfDNA, e.g., cfDNA containing 5mC residues, 5hmC residues, or both 5mC and 5hmC residues, where the improvement comprises the use of truncated sequencing adapters that facilitate a single template ligation reaction such that adapter-ligated cfDNA is indexed by sample only upon amplification, e.g., PCR amplification.
  • a method for adding an identifier barcode to a dsDNA molecule comprising:
  • sequencing adapters in the form of a Y-construct having a double-stranded segment comprising in the range of 2 base pairs to 50 base pairs and two single-stranded segments each comprising in the range of 2 bases to 25 bases;
  • the barcoded primer comprises: (i) a first region that is not complementary to any sequence in the adapter and comprises an identifier barcode; and (ii) a second region that is sufficiently complementary to a single-stranded segment of the adapter to hybridize thereto, such that extension of the barcoded primer in the presence of a polymerase results in a double-stranded complex of the second region of the primer and the single-stranded segment of the adapter, with the first region comprising the identifier barcode extending beyond the end of the double-stranded complex as a single-stranded oligonucleotide tail.
  • the invention provides a kit for amplifying and sequencing a dsDNA template molecule, comprising:
  • a sequencing adapter in the form of a Y-construct having a double-stranded segment comprising in the range of 2 base pairs to 50 base pairs and two single-stranded segments each comprising in the range of 2 bases to 25 bases;
  • a barcoded primer comprising (i) a first region that is not complementary to any sequence in the adapter and comprises an identifier barcode; and (ii) a second region that is sufficiently complementary to a single-stranded segment of the adapter to hybridize thereto;
  • FIG. 1 schematically illustrates the conversion of information regarding a protein analyte in a biological sample to classical sequence information using a proximity extension assay and protein UFI sequences.
  • FIG. 2 schematically illustrates the use of a cell-free ChIP (cfChIP) method to convert information regarding histone modifications in a nucleosome to classical sequence information using proximity probes and histone modification UFI sequences.
  • cfChIP cell-free ChIP
  • FIG. 3 schematically illustrates a comprehensive combined workflow process of the invention.
  • FIG. 4 schematically illustrates another combined workflow process of the invention in which cfRNA analysis is omitted.
  • FIG. 5 schematically illustrates an additional combined workflow process of the invention in which plasma proteomics is not included.
  • FIG. 6 illustrates a prior art template/adapter/primer construct used in adding an identifier sequence (“[index]” in the figure.
  • FIG. 7 illustrates a corresponding construct using a truncated adapter of the invention in combination with a barcoded primer.
  • FIG. 8 schematically illustrates the use of indexed primers and truncated adapters in a PCR process.
  • FIG. 9 shows the size distribution profiles obtained following DNA fragmentation as described in part (a) of Example 2.
  • FIG. 10 is a plot of library concentration (ng/ ⁇ L) versus the concentration of adapter input, along with the MT group whole genome sequencing (WGS).
  • FIG. 11 illustrates the head-to-head adapter comparison results, in a plot of the fraction of templates sampled versus assumed PCR efficiency.
  • FIG. 12 provides a side-by-side comparison of truncated adapter efficiency and standard adapter efficiency, as described in Example 3.
  • an adapter refers not only to a single adapter but also to two or more adapters that may be the same or different
  • a template molecule refers to a single template molecule as well as a plurality of template molecules, and the like.
  • nucleic acids are written left to right in 5′ to 3′ orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.
  • sample as used herein relates to a material or mixture of materials, typically, although not necessarily, in liquid form, containing one or more analytes of interest.
  • biological sample as used herein relates to a sample derived from a biological fluid, cell, tissue, or organ of a human subject, comprising a mixture of biomolecules including proteins, peptides, lipids, nucleic acids, and the like.
  • the sample is a blood sample such as a whole blood sample, a serum sample, or a plasma sample.
  • nucleic acid sample refers to a biological sample comprising nucleic acids.
  • the nucleic acid sample may be a cell-free nucleic acid sample that comprises nucleosomes, in which case the nucleic acid sample is sometimes referred to herein as a “nucleosome sample.”
  • the nucleic acid sample may also be comprised of cell-free DNA wherein the sample is substantially free of histones and other proteins, such as will be the case following cell-free DNA purification.
  • the nucleic acid samples herein may also contain cell-free RNA.
  • sample fraction refers to a subset of an original biological sample, and may be a compositionally identical portion of the biological sample, as when a blood sample is divided into identical fractions.
  • sample fraction may be compositionally different, as will be the case when, for example, certain components of the biological sample are removed, with extraction of cell-free nucleic acids being one such example.
  • cell-free nucleic acid encompasses both cell-free DNA and cell-free RNA, where the cell-free DNA and cell-free RNA may be in a cell-free fraction of a biological sample comprising a body fluid.
  • the body fluid may be blood, including whole blood, serum, or plasma, or it may be urine, cyst fluid, or another body fluid.
  • the biological sample is a blood sample
  • a cell-free nucleic acid sample is extracted therefrom using now-conventional means known to those of ordinary skill in the art and/or described in the pertinent texts and literature; kits for carrying out cell-free nucleic acid extraction are commercially available (e.g., the AllPrep® DNA/RNA Mini Kit and QIAmp DNA Blood Mini Kit, both available from Qiagen, or the MagMAX Cell-Free Total Nucleic Acid Kit and the MagMAX DNA Isolation Kit, available from ThermoFisher Scientific). Also see, e.g., Hui et al. Fong et al. (2009) Clin. Chem. 55(3):587-598
  • nucleotide is intended to include those moieties that contain not only the known purine and pyrimidine bases, but also other heterocyclic bases that have been modified. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles.
  • nucleotide includes those moieties that contain hapten or fluorescent labels and may contain not only conventional ribose and deoxyribose sugars, but other sugars as well.
  • Modified nucleosides or nucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, or are functionalized as ethers, amines, or the like.
  • modified cytosine residues including 5-methylcytosine and oxidized forms thereof, such as 5-hydroxymethylcytosine, 5-formylcytosine, and 5-carboxymethylcytosine.
  • nucleic acid and “polynucleotide” are used interchangeably herein to describe a polymer of any length, e.g., greater than about 2 bases, greater than about 10 bases, greater than about 100 bases, greater than about 500 bases, greater than 1000 bases, and up to about 10,000 or more bases composed of nucleotides, e.g., deoxyribonucleotides or ribonucleotide. Nucleic acids may be produced enzymatically, chemically synthesized, or naturally obtained.
  • oligonucleotide denotes a single-stranded multimer of nucleotide of from about 2 to 200 nucleotides, up to 500 nucleotides in length.
  • Oligonucleotides may be synthetic or may be made enzymatically, and, in some embodiments, are 30 to 150 nucleotides in length. Oligonucleotides may contain ribonucleotide monomers (i.e., may be oligoribonucleotides) and/or deoxyribonucleotide monomers. An oligonucleotide may be 10 to 20, 21 to 30, 31 to 40, 41 to 50, 51 to 60, 61 to 70, 71 to 80, 80 to 100, 100 to 150 or 150 to 200 nucleotides in length, for example.
  • hybridization refers to the process by which a strand of nucleic acid joins with a complementary strand through base pairing as known in the art.
  • a nucleic acid is considered to be “selectively hybridizable” to a reference nucleic acid sequence if the two sequences specifically hybridize to one another under moderate to high stringency hybridization and wash conditions. Moderate and high stringency hybridization conditions are known (see, e.g., Ausubel, et al., Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons 1995 and Sambrook et al., Molecular Cloning: A Laboratory Manual, Third Edition, 2001 Cold Spring Harbor, N.Y.).
  • duplex and “duplexed” are used interchangeably herein to describe two complementary polynucleotides that are base-paired, i.e., hybridized together.
  • a DNA duplex is referred to herein as “double-stranded DNA” or “dsDNA” and may be an intact molecule or a molecular segment.
  • dsDNA double-stranded DNA
  • barcoded and adapter-ligated is an intact molecule
  • the dsDNA formed between the nucleic acid tails of proximity probes in a proximity extension assay is a dsDNA segment.
  • strand refers to a single strand of a nucleic acid made up of nucleotides covalently linked together by covalent bonds, e.g., phosphodiester bonds.
  • DNA usually exists in a double-stranded form, and as such, has two complementary strands of nucleic acid referred to herein as the “top” and “bottom” strands.
  • complementary strands of a chromosomal region may be referred to as “plus” and “minus” strands, “positive” and “negative” strands, the “first” and “second” strands, the “coding” and “noncoding” strands, the “Watson” and “Crick” strands or the “sense” and “antisense” strands.
  • the assignment of a strand as being a top or bottom strand is arbitrary and does not imply any particular orientation, function or structure.
  • nucleotide sequences of the first strand of several exemplary mammalian chromosomal regions e.g., BACs, assemblies, chromosomes, etc.
  • BACs e.g., BACs, assemblies, chromosomes, etc.
  • primer refers to a synthetic oligonucleotide, which, upon forming a duplex with a polynucleotide template, is capable of acting as a point of initiation of nucleic acid synthesis and being extended from its 3′ end along the template so that an extended duplex is formed.
  • the sequence of nucleotides added during the extension process is determined by the sequence of the template polynucleotide.
  • primers are extended by a DNA polymerase.
  • Primers are generally of a length compatible with their use in synthesis of primer extension products, and are usually in the range of between 8 to 100 nucleotides in length, such as 10 to 75, 15 to 60, 15 to 40, 18 to 30, 20 to 40, 21 to 50, 22 to 45, 25 to 40, and so on.
  • Typical primers can be in the range of between 10-50 nucleotides long, such as 15-45, 18-40, 20-30, 21-25 and so on, and any length between the stated ranges.
  • the primers are usually not more than about 10, 12, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, or 70 nucleotides in length.
  • an adapter is also short synthetic oligonucleotides that serve a specific purpose in a biological analysis.
  • Adapters can be single-stranded or double-stranded, although the preferred adapters herein are double-stranded.
  • an adapter may be a hairpin adapter (i.e., one molecule that base pairs with itself to form a structure that has a double-stranded stem and a loop, where the 3′ and 5′ ends of the molecule ligate to the 5′ and 3′ ends of a double-stranded DNA molecule, respectively).
  • an adapter may be a Y-adapter.
  • an adapter may itself be composed of two distinct oligonucleotide molecules that are base paired with each other.
  • a ligatable end of an adapter may be designed to be compatible with overhangs made by cleavage by a restriction enzyme, or it may have blunt ends or a 5′ T overhang.
  • the term “adapter” refers to double-stranded as well as single-stranded molecules.
  • An adapter can be DNA or RNA, or a mixture of the two.
  • An adapter containing RNA may be cleavable by RNase treatment or by alkaline hydrolysis.
  • An adapter may be 15 to 100 bases, e.g., 50 to 70 bases, although adapters outside of this range are envisioned.
  • adapter-ligated refers to a nucleic acid that has been ligated to an adapter.
  • the adapter can be ligated to a 5′ end and/or a 3′ end of a nucleic acid molecule.
  • the term “adding adapter sequences” refers to the act of adding an adapter sequence to the end of fragments in a sample. This may be done by filling in the ends of the fragments using a polymerase, adding an A tail, and then ligating an adapter comprising a T overhang onto the A-tailed fragments.
  • Adapters are usually ligated to a DNA duplex using a ligase, while with RNA, adapters are covalently or otherwise attached to at least one end of a cDNA duplex preferably in the absence of a ligase.
  • asymmetric adapter refers to an adapter that, when ligated to both ends of a double stranded nucleic acid fragment, will lead to a top strand that contains a 5′ tag sequence that is not the same as or complementary to the tag sequence at the 3′ end. Examples of asymmetric adapters are described in U.S. Pat. Nos. 5,712,126 and 6,372,434 to Weissman et al., and International Patent Publication No. WO 2009/032167 to Bignell et al.
  • An asymmetrically tagged fragment can be amplified by two primers: a first primer that hybridizes to a first tag sequence added to the 3′ end of a strand; and a second primer that hybridizes to the complement of a second tag sequence added to the 5′ end of a strand.
  • Y-adapters and hairpin adapters are examples of asymmetric adapters.
  • Y-adapter refers to an adapter that contains: a double-stranded region and a single-stranded region in which the opposing sequences are not complementary.
  • the end of the double-stranded region can be joined to target molecules such as double-stranded fragments of genomic DNA, e.g., by ligation or a transposase-catalyzed reaction.
  • Each strand of an adapter-tagged double-stranded DNA that has been ligated to a Y-adapter is asymmetrically tagged in that it has the sequence of one strand of the Y-adapter at one end and the other strand of the Y-adapter at the other end.
  • Amplification of nucleic acid molecules that have been joined to Y-adapters at both ends results in an asymmetrically tagged nucleic acid, i.e., a nucleic acid that has a 5′ end containing one tag sequence and a 3′ end that has another tag sequence.
  • hairpin adapter refers to an adapter that is in the form of a hairpin.
  • the hairpin loop can be cleaved to produce strands that have non-complementary tags on the ends.
  • the loop of a hairpin adapter may contain a uracil residue, and the loop can be cleaved using uracil DNA glycosylase and endonuclease VIII, although other methods are known.
  • adapter-ligated sample refers to a sample that has been ligated to an adapter.
  • a sample that has been ligated to an asymmetric adapter contains strands that have non-complementary sequences at the 5′ and 3′ ends.
  • amplifying refers to generating one or more copies, or “amplicons,” of a template nucleic acid, such as may be carried out using any suitable nucleic acid amplification technique, such as technology, such as PCR (polymerase chain reaction) amplification (including nested PCR and multiplex PCR), RCA (rolling circle amplification), NASBA (nucleic acid sequence-based amplification), TMA (transcript mediated amplification), and SDA (strand displacement amplification).
  • PCR polymerase chain reaction
  • RCA rolling circle amplification
  • NASBA nucleic acid sequence-based amplification
  • TMA transcription mediated amplification
  • SDA strand displacement amplification
  • enrichment refers to a partial purification of template molecules that have a certain feature (e.g., nucleic acids that contain 5-hydroxymethylcytosine) from analytes that do not have the feature (e.g., nucleic acids that do not contain hydroxymethylcytosine).
  • Enrichment typically increases the concentration of the analytes that have the feature by at least 2-fold, at least 5-fold or at least 10-fold relative to the analytes that do not have the feature.
  • at least 10%, at least 20%, at least 50%, at least 80% or at least 90% of the analytes in a sample may have the feature used for enrichment.
  • at least 10%, at least 20%, at least 50%, at least 80% or at least 90% of the nucleic acid molecules in an enriched composition may contain a strand having one or more hydroxymethylcytosines that have been modified to contain a capture tag.
  • sequencing refers to a method by which the identity of at least 10 consecutive nucleotides (e.g., the identity of at least 20, at least 50, at least 100 or at least 200 or more consecutive nucleotides) of a polynucleotide is obtained.
  • next-generation sequencing or “high-throughput sequencing”, as used herein, refer to the so-called parallelized sequencing-by-synthesis or sequencing-by-ligation platforms currently employed by Illumina, Life Technologies, Roche, etc.
  • Next-generation sequencing methods may also include nanopore sequencing methods such as that commercialized by Oxford Nanopore Technologies, electronic detection methods such as Ion Torrent technology commercialized by Life Technologies, and single-molecule fluorescence-based methods such as that commercialized by Pacific Biosciences.
  • read refers to the raw or processed output of sequencing systems, such as massively parallel sequencing.
  • the output of the methods described herein is reads.
  • these reads may need to be trimmed, filtered, and aligned, resulting in raw reads, trimmed reads, aligned reads.
  • a “UFI” is a unique feature identifier that characterizes a group of nucleic acid molecules.
  • a UFI may be a particular sequence of nucleic acids in what is sometimes referred to as a “barcode” (sometimes referred to herein as a “UFI sequence” or “UFI barcode”) or it may be a chemical tag as will result from glycosylation, biotinylation, or the like.
  • a UFI may also be the absence of a particular feature such as an appended or incorporated moiety; for instance, a UFI may be the absence of a particular barcode, or the absence of glycosylated or biotinylated moieties, or the like.
  • a UFI sequence is typically a relatively short nucleic acid sequence that serves to identify a feature of a nucleic acid molecule.
  • Nucleic acid template molecules and amplicons thereof that contain a UFI are sometimes referred to herein as “barcoded” template molecules or amplicons. Examples of UFI sequence types include, without limitation, the following:
  • a “molecular UFI sequence” is a short sequence of nucleic acids that is appended to every nucleic acid template molecule in a sample, such that, providing the UFI sequence is of sufficient length, every nucleic acid template molecule is attached to a unique UFI sequence.
  • the molecular UFI sequences are usually designed as a string of random nucleotides, partially degenerate nucleotides, or, in some cases, i.e., with a limited number of template molecules, defined nucleotides.
  • Molecular UFI sequences can be used to account for and offset amplification and sequencer errors, allow a user to track duplicates and remove them from downstream analysis, enable molecular counting, and, in turn, the determination of an analyte concentration. See, e.g., Casbon et al. (2011) Nucl. Acids Res. 39(12):1-8.
  • sample UFI sequence (or “sample barcode” or “indexed UFI”) is a sequence of nucleic acids that is appended to every nucleic acid template molecule in a sample, such that a plurality of samples can be combined, processed, and sequenced together, with the sample UFI sequence enabling the sorting and grouping of reads by sample (i.e., de-multiplexing).
  • sample UFI sequence identifies the individual from whom the sample was obtained.
  • a “source identifier sequence” (or “source UFI” or “source barcode”) identifies the source of origin.
  • a source UFI will normally be a sample UFI. In certain instances, however, for example when different types of samples are obtained from the same individual (e.g., blood sample, cyst fluid, or the like), a source UFI will indicate the physiological source of the sample rather than the patient from whom the sample was obtained. When multiple samples are combined that include two or more sample types obtained from a single individual, both a sample barcode and a source barcode should be used.
  • fragment identifier sequence (or “fragment UFI” or “fragment barcode”): In a nucleic acid sample in which nucleic acids comprise a population of many fragments (as occurs naturally in cell-free DNA, or can be engineered through multiple known fragmentation techniques (e.g., physical, sonication, enzymatic, etc.), each fragment in a sample is barcoded with a corresponding fragment identifier sequence. Sequence reads that have non-overlapping fragment identifier sequences represent different original nucleic acid template molecules, while reads that have the same fragment identifier sequences, or substantially overlapping fragment identifier sequences, likely represent fragments of the same template molecule. The unique feature identified here is the template nucleic acid molecule from which a fragment derives.
  • a “strand identifier sequence” (or “strand UFI” or “strand barcode”) independently tags each of the two strands of a DNA duplex, so that the strand from which a read originates can be determined, i.e., as the W strand or the C strand.
  • a “protein identifier sequence” (or “protein UFI” or “protein barcode”) is contained within, adjacent to, or near the hybridized region formed between the nucleic acid tails of a pair of proximity probes in the presence of the corresponding protein to which the proximity probes specifically bind.
  • the protein identifier sequence when read, thus identifies the presence of the protein analyte targeted by a pair of proximity probes.
  • a “histone modification identifier sequence” (or “histone modification UFI” or “histone modification barcode”) is used in the cell-free chromatin immunoprecipitation (cfChIP) technique described herein to identify histone modifications identified in a nucleosome.
  • the histone modification identifier sequence is contained within, adjacent to, or near the hybridized region formed between the nucleic acid tail of a probe, i.e., at the first terminus of the probe, and a terminus of the DNA wrapped around the histone. The other terminus of the probe binds to a histone modification of interest. Accordingly, the histone modification identifier sequence, when read, identifies the presence of the histone modification.
  • a “5hmC identifier sequence” (or “5hmC barcode”) identifies DNA fragments originating from 5hmC-containing cell-free DNA template molecules in a sample, i.e., “hydroxymethylated” DNA.
  • a “5mC identifier sequence” (or “5mC barcode”) identifies DNA fragments originating from 5mC-containing cell-free DNA template molecules that do not contain 5hmC.
  • a “cell-free RNA identifier sequence” (or “cfRNA UFI”) identifies cDNA fragments as originating from cfRNA template molecules.
  • UFIs provide the basis for conversion of a non-classical sequence feature—such as the presence and concentration of plasma proteins, the location and type of histone modifications, hydroxymethylation profile, methylation profile, and the like—to classical sequence data from which the non-classical sequence feature can be derived.
  • the application is not limited to the aforementioned types of UFIs, and other types of UFIs are also envisioned.
  • Many types of “process identifier sequences,” or “process UFIs,” for example, may be used to identify any one of a number of processes used to partition an initial pool of non-amplified template DNA fragments based on non-sequence features.
  • UFIs In addition to histone modification UFIs, protein UFIs, and epigenetic UFIs (including 5hmC UFIs and 5mC UFIs), all of which may be characterized as process UFIs, there are other types of UFIs that can be advantageously used in conjunction with the present invention, including UFIs indicating the presence or identity of adjacent genomic regions outside the sequence of a template molecule, such as CTCF binding sites across genomic spans.
  • a UFI may have a length in the range of from 1 to about 35 nucleotides, e.g., from 2 to 30 nucleotides, 4 to 30 nucleotides, 4 to 24 nucleotides, 4 to 16 nucleotides, 4 to 12 nucleotides, 6 to 20 nucleotides, 6 to 16 nucleotides, 6 to 12 nucleotides, etc.
  • the UFI may be error-detecting and/or error-correcting, meaning that even if there is an error (e.g., if the sequence of the molecular barcode is mis-synthesized, mis-read or distorted during any of the various processing steps leading up to the determination of the molecular barcode sequence) then the code can still be interpreted correctly.
  • error-correcting sequences is described in the literature (e.g., in U.S. Patent Publication Nos. U.S. 2010/0323348 to Hamati et al. and U.S. 2009/0105959 to Braverman et al., both of which are incorporated herein by reference).
  • oligonucleotides that serve as UFI sequences herein may be incorporated into DNA molecule using any effective means, where “incorporated into” is used interchangeably herein with “added to” and “appended to,” insofar as the UFI can be provided at the end of a DNA molecule, near the end of a DNA molecule, or within a DNA molecule.
  • incorporated into is used interchangeably herein with “added to” and “appended to,” insofar as the UFI can be provided at the end of a DNA molecule, near the end of a DNA molecule, or within a DNA molecule.
  • multiple UFIs can be end-ligated to DNA using a selected ligase, in which case only the final UFI is at the “end” of the molecule.
  • the UFI may be contained within the nucleic acid tail of a proximity probe, at the end of the nucleic acid tail of a proximity probe, or within the hybridized region generated upon the binding of probes to the protein target.
  • protein analyte encompasses a plurality of peptidic species, including oligopeptides, polypeptides, and proteins, where, as an analyte, the species of interest may or may not be present in a particular sample. Accordingly, the “detection” of an analyte in a sample herein may involve detecting the presence or absence of the analyte, confirming the likely presence of the analyte, ascertaining the concentration of the detected analyte, or the like.
  • the term “detection” is used interchangeably with the terms “determining,” “measuring,” “evaluating,” “assessing,” “assaying,” and “analyzing,” to refer to any form of measurement, and include determining if an element is present or not. These terms include both quantitative and/or qualitative determinations. Assessing may be relative or absolute. “Assessing the presence of” thus includes determining the amount of a moiety present, as well as determining whether it is present or absent.
  • a “hydroxymethylation level” or “hydroxymethylation state” is the extent of hydroxymethylation within a locus of interest.
  • the extent of hydroxymethylation is normally measured as hydroxymethylation density, e.g., the ratio of 5hmC residues to total cytosines, both modified and unmodified, within a nucleic acid region.
  • Other measures of hydroxymethylation density are also possible, e.g., the ratio of 5hmC residues to total nucleotides in a nucleic acid region.
  • a “hydroxymethylation profile” or “hydroxymethylation signature” refers to a data set that comprises the hydroxymethylation level at each of a plurality of hydroxymethylation loci.
  • a method for detecting a protein analyte in a biological sample may involve detection of the presence of the protein analyte as well as quantitation, i.e., determination of the amount or concentration of the analyte.
  • the information sought and obtained is derived from sequence reads generated by an appropriately barcoded nucleic acid template molecule. It will be appreciated that the method extends to the detection of each of a plurality of protein analytes in a sample, as will be described below.
  • PLA proximity ligation assay
  • PEA proximity extension assay
  • PLA involves binding a protein analyte to two “proximity probes,” which are antibodies coupled with DNA strands. When the strands are in sufficiently close proximity, as occurs when the probes are bound to the target analyte, they are united with a DNA ligase enzyme. The ligation product then serves as a template for quantitative PCR (qPCR), reflecting the amount of the protein analyte present.
  • qPCR quantitative PCR
  • PEA The PEA method is also known in the detection and quantification of protein analytes, and provides a significant improvement over PLA, insofar as PLA results in noticeable recovery loss, particularly with complex biological samples; see, e.g., Lundberg et al. (2011) Nuc. Acids. Res. 39(15):1-8.
  • PLA Like PLA, PEA relies on the use of two proximity probes, each of which is essentially an antibody coupled to a DNA strand.
  • the DNA “tail” of one probe of a probe pair hybridizes to the DNA tail of the other probe of the pair, resulting in a double-stranded DNA (dsDNA) segment formed between the probes, with a 5′ terminus originating with the first proximity probe of the pair.
  • a polymerase and a mixture of dNTPs is then employed to extend the 5′ terminus of the dsDNA segment along the second proximity probe.
  • the proximity extension product is then used, in conventional protein assays, as a template for qPCR to quantitate the analyte of interest.
  • the present invention eliminates the need for quantitative PCR and instead uses sequencing, typically NGS, to detect and quantitate at least one protein analyte in a biological sample.
  • the sequenced nucleic acid product generated with the improved proximity extension assay of the invention is an amplified protein-barcoded dsDNA template molecule, i.e., a dsDNA amplicon (or PCR product) that comprises a protein-specific UFI sequence. That protein-barcoded amplicon is sequenced, and the sequence reads deconvoluted to determine the presence and quantity of a protein analyte from the protein UFI sequences observed in the sequence reads.
  • the invention provides an improved proximity extension assay for identifying a plurality of protein analytes in a biological sample by providing a plurality of probe pairs each comprising a first proximity probe and a second proximity probe, wherein each probe pair targets a specific protein analyte, and generating a dsDNA segment between the probes of each probe pair in the presence of the corresponding protein analyte, where the improvement comprises the following: incorporating into the dsDNA segments generated between the probes of a probe pair a protein-specific UFI sequence, thereby forming protein-barcoded dsDNA template molecules; amplifying and sequencing the protein-barcoded dsDNA template molecules; and identifying the protein analytes in the biological sample from the protein-specific UFIs observed in the sequence reads generated.
  • the protein-specific UFI is incorporated into the dsDNA segments by end-ligation of a UFI-containing adapter to at least one end a segment. The method is illustrated schematically in FIG. 1 .
  • the protein-barcoded dsDNA template molecule generated in the aforementioned process is also provided with a capture sequence that comprises a 5hmC residue.
  • the capture sequence may be a single 5hmC residue, or it may be a short oligonucleotide sequence that contains a single 5hmC residue, or a short oligonucleotide sequence sequence that contains two or more 5hmC residues.
  • the presence of 5hmC residues allows capture by functionalization of the 5hmCs with an affinity tag such as biotin, which in turn enables removal of the biotinylated species from a sample or fraction thereof with an avidin or streptavidin surface.
  • the protein-barcoded dsDNA template molecules may all have the same capture sequence, while each dsDNA template molecule generated by a different probe pair has a unique protein UFI sequence corresponding to the protein analyte targeted by that probe pair.
  • the protein-specific UFI sequence and the capture sequence can be simultaneously added to the dsDNA template generated between the probes, in a single oligonucleotide sequence or adapter. Alternatively, the protein-specific UFI sequence can be added first, followed by the capture sequence.
  • a molecular UFI sequence can also be appended to each dsDNA template molecule generated by the proximity extension assay, along with the protein-specific UFI and the optional 5hmC-containing capture sequence.
  • the improved proximity extension assay delineated above additionally involves use of a protein concentration control composition. Sequence reads indicative of a specific protein analyte are compared with sequence reads generated by the protein concentration control composition, which is incorporated into the biological sample at the outset.
  • Protein concentration control compositions are known in the art, and include, by way of example, a spike-in control in which a known concentration of a protein is added into the sample prior to processing. In some embodiments, the spike-in control is used in conjunction with a concentration ladder with control compositions having different concentrations throughout a concentration range.
  • An advantage of the present method in which protein analytes are determined from sequence reads of protein-barcoded dsDNA templates is that a large number of biological samples, e.g., blood samples or fractions thereof, such as plasma samples or serum samples, can be processed simultaneously. At least 50, at least 100, at least 300, at least 500, at least 1000, or at least 1500 or more biological samples can readily be processed at the same time. It may be convenient to provide each sample to be processed in an individual well of a commercially available microwell plate, such as a 96-well, a 384-well, or a 1536-well plate. Another advantage is the capability of carrying out other types of analyses on the same sample and obtaining additional information via sequencing as well, as will be explained in detail infra.
  • a related method for identifying a plurality of protein analytes in a biological sample using a DNA sequencing-based technique, where the method comprises:
  • step (e) identifying the protein analytes in the biological sample from the protein identifier barcodes observed in the sequence reads generated in step (b).
  • each protein-binding domain comprises an antigen and each binding site comprises an epitope.
  • the biological sample is generally a blood sample and the protein analysis is performed on a fraction of the blood sample, such as serum or plasma from the sample, typically plasma.
  • the biological sample is a blood sample, with protein analyte detection carried out on a fraction of the sample, typically the plasma fraction, and other types of analyses carried out, if desired, on a cell-free fraction of the same sample.
  • a fraction of the sample typically the plasma fraction
  • other types of analyses carried out, if desired, on a cell-free fraction of the same sample.
  • FIG. 1 the amplified protein-barcoded dsDNA template molecules can be sequenced in a single pool along with other types of amplified, barcoded dsDNA template molecules generated by analysis of the cell-free sample fraction.
  • dsDNA template molecules deriving from processing of the cell-free sample fraction can include, for example, template molecules with a histone modification UFI, a 5hmC-related UFI, a 5mC-related UFI, a UFI designating a DNA duplex as cDNA deriving from cell-free RNA, and the like.
  • the various types of barcoded dsDNA template molecules can be pooled prior to amplification, and amplified together in a single run, or the barcoded dsDNA template molecules can be amplified prior to sequencing.
  • Pre-amplification of a group of template molecules is another method appropriate in the same context. Pre-amplification involves separation of a group of template molecules sharing a barcoded feature from the biological sample or admixture of template molecules, followed by re-combination with the remaining template molecules and simultaneous amplification.
  • the information obtained from a cell-free nucleic acid sample extracted from the same biological sample that contains the protein analytes, in combination with a proximity extension method as described in the preceding section, can include detection of the presence, identity, location, or quantity (or a combination thereof) of one or more histone modifications within nucleosomes in the cell-free nucleic acid sample.
  • Histone modifications include post-translational modifications (PTMs), many of which have been established to regulate gene expression by altering chromatin structure or by other means.
  • Histone modifications of particular interest herein are those comprising histone modification biomarkers for assessing a disease status in a subject. Methods for detecting histone modifications according to this and other embodiments of the invention are described in the next section.
  • Other information obtained from the cell-free sample can include: at least one sequence of cell-free DNA; at least one sequence of cell-free RNA; DNA methylation data; DNA hydroxymethylation data; and other information that may or may not be related to any of the foregoing. Detailed information regarding appropriate and preferred methodologies for obtaining the foregoing information is included infra.
  • methods for (1) preparing a cell-free nucleic acid sample to enable identification of at least one histone modification in a nucleosome contained therein using a sequencing-based technique, and (2) detecting histone modifications in a cell-free nucleic acid sample containing intact nucleosomes, where the presence, identity, location or quantity of the histone modifications, or a combination thereof, are detected.
  • Both methods involve analysis of a cell-free nucleic acid sample extracted from a biological sample such as a blood sample, where the cell-free nucleic acid sample contains intact nucleosomes.
  • the nucleosome is the basic unit of chromatin structure and is composed of a protein complex of eight highly conserved core histones, with two copies of each of the core histones H2A, H2B, H3, and H4. Approximately 146 base pairs of DNA are wrapped around the histone octamer to form the nucleosome “core.”
  • the core particles are connected by stretches of linker DNA, up to about 80 base pairs in length, which appear like “beads on a string” (Koller et al. (1979) J. Cell Biol. 83(2 Pt 1):403-427) until compacted with linker histones such as H1, H5, or their isoforms, to form chromatin.
  • Nucleosome position and nucleosome structure are also known to mediate epigenetic signaling. Histone PTMs have been linked to a variety of processes, including transcription, DNA replication, and DNA damage.
  • PTMs are typically located on the tails of the core histones, and include acetylation, methylation, dimethylation, trimethylation, propionylation, butyrylation, crotonylation, 2-hydroxy-isobutyrylation, malonylation, succinylation, formylation, ubiquitination, citrullination, phosphorylation, hydroxylation, sumoylation, O-GlcNAcylation, and ADP ribosylation, and the more common modifications include the acetylation, methylation or ubiquitination of lysine residues as well as methylation of arginine residues and phosphorylation of serine residues.
  • Mononucleosomes and oligonucleosomes have been detected by ELISA, as reported in Salgame et al. (1997) Nuc. Acids. Res. 25(3):680-1 and van Nieuwenhuijze et al. (2003) Ann. Rheum. Dis. 62(1):10-14.
  • Such assays typically employ an anti-histone antibody, such as anti-H2B, anti-H3 or anti-H1, H2A, H2B, H3 and H4, as capture antibody and an anti-DNA or anti-H2A-H2B-DNA complex antibody as detection antibody.
  • cfChIP cell-free chromatin immunoprecipitation
  • NChIP Native ChIP
  • CChIP Carrier ChIP
  • qChIP Fast ChIP
  • Q 2 ChIP MicroChIP
  • ⁇ ChIP Matrix ChIP
  • Pathology-ChIP PAT-ChIP
  • the invention provides a method for preparing a cell-free nucleic acid sample to enable identification of at least one histone modification in a nucleosome contained therein using a DNA sequencing-based technique, where the method involves starting with a cell-free nucleic acid sample containing a plurality of nucleosomes each comprising a cfDNA molecule wound around a histone core, i.e., around the histone octamer composed of a pair of each of the four core histones.
  • Adapters e.g., Y adapters
  • Y adapters comprising terminal hybridizing regions are ligated to the ends of each histone-associated cfDNA molecule.
  • the adapters may contain a sample UFI sequence and a molecular UFI sequence as explained earlier herein. Ligation of adapters is illustrated at the top of FIG. 2 , and results in a modified cell-free nucleic acid sample comprising adapter-ligated cfDNA molecules each wound around a histone core.
  • a proximity probe is employed that comprises, at a first terminus, a histone modification binding domain that specifically binds to a histone modification of interest; and at a second terminus, a nucleic acid binding domain complementary to one of the terminal hybridizing region provided by the adapters.
  • a non-hybridizing region between the histone modification binding domain and the nucleic acid binding domain comprises a nucleic acid sequence selected to correspond to the histone modification of interest, and this nucleic acid sequence serves as a histone modification UFI (or histone modification “barcode”).
  • the proximity probe is dimensioned to allow for simultaneous binding of the probe's histone modification binding domain to the histone modification of interest and hybridization of the probe's complementary nucleic acid binding domain to the hybridizing nucleic acid region.
  • the modified cell-free nucleic acid sample is incubated with the proximity probe under conditions effective to facilitate (i) binding of the probe's histone modification binding domain to the histone modification and (ii) hybridization of the probe's complementary nucleic acid binding domain to the hybridizing nucleic acid region. This results in the formation of a dsDNA segment with a 5′ terminus originating with the cell-free DNA and a 3′ terminus originating with the proximity probe and comprising the histone modification barcode.
  • the 5′ terminus of the dsDNA segment is extended along the non-hybridizing region of the proximity probe and the histone modification UFI by adding a polymerase and a mixture of dNTPs, in a manner similar to that described for the proximity extension assay in part (2) of this section.
  • Polymerase extension provides a histone modification-barcoded dsDNA template molecule, typically also barcoded with a sample UFI sequence and a molecular UFI sequence as noted above, which can then undergo amplification and sequencing.
  • the aforementioned cfChIP process involves the use of a plurality of proximity probes each targeting a different histone modification, so that the sequence reads obtained in the final step can be deconvoluted to deduce information about a plurality of histone modifications such as histone PTMs.
  • a sequencing-based method for detecting histone modifications in a cell-free nucleic acid sample containing intact nucleosomes, where the presence, identity, location or quantity of histone modifications, or a combination thereof, are detected.
  • the method involves carrying out the above method for preparing a cell-free nucleic acid sample to enable identification of at least one histone modification using a DNA sequencing-based technique, followed by amplification of the histone modification-barcoded dsDNA template molecule, sequencing of the resulting amplicons, and determining information about the type and location of histone modifications from the histone modification UFIs observed in the sequence reads.
  • adapters that include a capture sequence comprising at least one 5hmC residue.
  • This is an optional feature of many of the embodiments herein, and is particularly useful in a combined workflow method in which specific adaptor-ligated dsDNA template molecules are pulled down from a sample or from a mixture of dsDNA template molecules, with the remaining components then processed in the absence of the removed dsDNA template.
  • the pulled-down template molecules can be separately amplified or just set aside while the remainder of the sample undergoes chemical processing, with, ultimately, all dsDNA template molecules generated from a single sample pooled and sequenced together.
  • the capture sequence incorporated into a dsDNA template molecule by way of an adaptor facilitates the pull-down, or removal, of the dsDNA template molecule from the sample.
  • the capture sequence comprises a 5hmC residue; the sequence may be a single 5hmC residue, a short nucleic acid sequence containing a single 5hmC residue, or a short nucleic acid sequence containing two or more 5hmC residues.
  • an adapter containing a 5hmC-containing capture sequence is ligated to at least one end of a dsDNA template molecule, or, in a cfRNA, analysis, is attached via ligase-free chemistry to at least one end of a cDNA molecule.
  • 5hmC residues in the adapters are functionalized with an affinity tag that allows selective removal of the affinity-tagged template.
  • the affinity tag is comprised of a biotin moiety such as biotin, desthiobiotin, oxybiotin, 2-iminobiotin, diaminobiotin, biotin sulfoxide, biocytin, or the like.
  • a biotin moiety such as biotin, desthiobiotin, oxybiotin, 2-iminobiotin, diaminobiotin, biotin sulfoxide, biocytin, or the like.
  • Tagging 5hmC residues with a biotin moiety or other affinity tag is accomplished by covalent attachment of a chemoselective group to 5hmC residues in the adapters, where the chemoselective group is capable of undergoing reaction with a functionalized affinity tag so as to link the affinity tag to the 5hmC residues.
  • the chemoselective group is UDP glucose-6-azide, which undergoes a spontaneous 1,3-cycloaddition reaction with an alkyne-functionalized biotin moiety, as described in Robertson et al. (2011) Biochem. Biophys. Res. Comm. 411(1):40-3, U.S. Pat. No.
  • the affinity-tagged dsDNA template molecules can then be pulled down using an avidin or streptavidin surface, as noted above, and set aside for later processing and analysis.
  • the supernatant remaining after removal of the affinity-tagged fragments contains dsDNA template molecules that do not contain 5hmC in their internal sequences or in appended adapters.
  • the remaining dsDNA template molecules can continue to undergo chemical processing and ultimately be re-pooled with the pulled-down template molecules for sequencing.
  • the invention encompasses 5hmC-containing adapter-bound cfDNA template molecules as novel compositions of matter, where the adapters may comprise, in addition to at least one 5hmC residue, a UFI sequence such as a source UFI sequence, a molecular UFI sequence, a strand-identifier UFI sequence, or a histone modification UFI sequence, as explained earlier herein.
  • a UFI sequence such as a source UFI sequence, a molecular UFI sequence, a strand-identifier UFI sequence, or a histone modification UFI sequence, as explained earlier herein.
  • RNA-free RNA primarily derived from apoptotic bodies and exosomes, is generally highly degraded, has a very short half-life, and is present in a cell-free sample at a very low concentration. It is therefore challenging to prepare cDNA sequencing libraries from cfRNA, insofar as the low integrity of cfRNA eliminates the possibility of using standard RNA-Seq methodology in the preparation of a cDNA library. Methods that can be adapted for use herein are those that employ ligation-free cDNA synthesis and library preparation techniques in which adapters needed for amplification are covalently attached to the cDNA without need for ligases.
  • random primers are used to synthesize cDNA from cfRNA, preferably from rRNA-depleted RNA, as may be prepared with an RNase; see U.S. Pat. No. 9,745,570 to Sooknanan, the disclosure of which is incorporated by reference herein.
  • 5′ and 3′ linker tags for amplification and barcoding i.e., addition of a cfRNA UFI sequence to the cDNA
  • the process can be carried out using commercially available kits, such as the ScriptSegTM v2 RNA-Seq Library Preparation Kit, available from Epicentre Biotechnologies (Illumina, Inc.). Additional description of the materials, reagents, and processes used in conjunction with ScriptSeq cDNA library preparation may be found in the ScriptSegTM v2 RNA-Seq Library Preparation Guide [retrieved on Aug. 16, 2018 from support.illumina.com].
  • first-strand cDNA synthesis of 3′-polyadenylated RNA with a dT primer that includes an adapter sequence employs a template switching technique that makes use of the terminal transferase activity of the selected reverse transcriptase.
  • a short sequence of non-template nucleotides extends the first strand of cDNA when the 5′ end of the RNA is reached, and a template switching oligonucleotide containing a short sequence complementary to the added sequence (e.g., GGG) and a second adapter sequence that serves as a forward PCR primer hybridizes to the first strand extension and enables second strand synthesis and amplification via PCR.
  • a template switching oligonucleotide containing a short sequence complementary to the added sequence e.g., GGG
  • a second adapter sequence that serves as a forward PCR primer hybridizes to the first strand extension and enables second strand synthesis and amplification via PCR.
  • a ligase-free method such as one of the above-described techniques is used to synthesize adaptor-bound cDNA from cfRNA in a biological sample, where the adaptor(s) comprise a cfRNA UFI sequence to identify the dsDNA template molecule as cfRNA-derived cDNA.
  • the adaptors also comprise at least one additional UFI sequence, such as a source UFI sequence, a molecular UFI sequence, a strand-identifier UFI sequence, or a histone modification UFI sequence, as explained earlier herein.
  • the adapter-bound cDNA can then be amplified and sequenced, and information regarding the cfRNA in the biological sample can be obtained by deconvolution of sequence reads.
  • the cfRNA may be mRNA or an RNA that is not translated into a protein, i.e., non-coding RNAs (ncRNAs) such as tRNA; rRNA; small RNAs such as microRNAs (miRNAs), siRNAs, piRNAs, snoRNAs, snRNAs, exRNAs, and scaRNAs; and long nRNAs such as Xist and HOTAIR.
  • ncRNAs non-coding RNAs
  • tRNA tRNA
  • rRNA small RNAs
  • miRNAs microRNAs
  • siRNAs siRNAs
  • piRNAs snoRNAs
  • snRNAs exRNAs
  • scaRNAs long nRNAs
  • the adaptor-bound cDNA can be amplified and sequenced at this point, or further analyses may be carried out in the context of an expanded combined workflow process.
  • analyses Of particular interest are hydroxymethylation and/or methylation analyses of the dsDNA, as explained in Section 8.
  • a combined workflow process for preparing cfDNA and cfRNA in a single cell-free nucleic acid sample for simultaneous, sequencing-based identification.
  • the initial step here following extraction of the cell-free nucleic acid sample from a biological sample, is the ligation of selected adapters to the cfDNA.
  • the adapters can be ligated onto the ends of cfDNA fragments in the cell-free nucleic acid sample to form adapter-ligated dsDNA template molecules. Standard ligation conditions and commercially available ligases can be used.
  • the adapters selected for ligation to the cfDNA fragments comprise a sample UFI sequence, and, preferably at least one additional UFI sequence such as a molecular UFI sequence and a strand-identifier UFI sequence.
  • the adapter-ligated cfDNA is then purified along with the cfRNA using conventional nucleic acid purification techniques, to provide a cell-free admixture of cfRNA and adapter-ligated DNA template molecules.
  • the cfRNA is processed in the cell-free admixture still containing the adapter-ligated cfDNA, as the present method obviates the need for removal of the adapter-ligated cfDNA prior to or during cDNA synthesis.
  • a first strand of cDNA is synthesized from the cfRNA, followed by synthesis of a second strand of cDNA complementary to the first strand, as is known in the art, to form a cDNA duplex.
  • cDNA synthesis is carried out as described in the preceding section, so as to attach adapters to the cDNA without need for ligase.
  • the cDNA adapters comprise a source identifier UFI and and RNA indicator UFI, thereby providing adapter-bound cDNA in a cell-free admixture that also comprises the adapter-ligated DNA.
  • the adaptor-ligated dsDNA template molecules and the cDNA template molecules can be amplified and sequenced at this point, or further analyses may be carried out in the context of an expanded combined workflow process, including hydroxymethylation and/or methylation analyses of the dsDNA, as explained in the following section.
  • Epigenetic control of gene expression in cells is mediated in part by modifications to DNA nucleotides including the cytosine methylation status and the cytosine hydroxymethylation status of DNA. It has been known in the art for some time that DNA may be methylated at the 5 position of cytosine nucleotides to form 5-methylcytosine. Methylated DNA in the form of 5-methylcytosine is reported to occur at positions in the DNA sequence where a cytosine nucleotide occurs next to a guanine nucleotide.
  • CpG regions of the genome that contain a high proportion of CpG sites are often termed “CpG islands”; the majority of human gene promoter sequences are associated with such CpG islands. In active genes these CpG islands are generally hypomethylated. Methylation of gene promoter sequences is associated with stable gene inactivation.
  • DNA methylation patterns observed in cancer cells differ from those of healthy cells. Repetitive elements, particularly around pericentromeric areas, are reported to be hypomethylated in cancer relative to healthy cells, but promoters of specific genes have been reported to be hypermethylated in cancer. The balance of these two effects is reported to result in global DNA hypomethylation in cancer cells.
  • Global DNA methylation has been studied in cells using immunohistochemistry (IHC) techniques as well as a number of other methods, but many of these methods are disadvantageous because they are labor-intensive and/or require large amounts of good quality extracted DNA.
  • IHC immunohistochemistry
  • 5hmC is a stable DNA modification, formed from the catalytic oxidation of 5mC by a Ten-Eleven Translocation (TET) enzyme such as TET1.
  • TET Ten-Eleven Translocation
  • Bisulfite sequencing does not distinguish between 5mC and 5hmC, and, therefore, other methods for individually detecting 5mC and 5hmC residues are necessary.
  • 5hmC appears far less often than 5mC, so that any method for detecting 5hmC needs to exhibit high efficiency, with respect to the fraction of all 5hmC residues that are identified, as well as high selectivity, meaning that substantially all residues identified as 5hmC should, in fact, be 5hmC residues.
  • ⁇ -GT ⁇ -glucosyltransferase
  • the combined workflow methods preferably include a sequencing-based process for detecting modified cytosine residues in cell-free DNA, i.e., 5mC, 5hmC, or both 5mC and 5hmC. If a hydroxymethylation analysis is to be carried out along with a methylation analysis, hydroxymethylation should be the initial focus, followed by methylation, as will be understood from the following description of process flow.
  • the “hydroxymethylation profile” can be hydroxymethylation density, e.g., the ratio of 5hmC residues to total cytosines, both modified and unmodified, within a nucleic acid region. Other measures of 5hmC density are also envisioned, e.g., the ratio of 5hmC residues to total nucleotides in a locus.
  • the hydroxymethylation profile may also comprise hydroxymethylation information such as hydroxymethylation pattern, total 5hmC residues within a nucleic acid region, the location of 5hmC residues within a nucleic acid region, the relative positions of 5hmC residues within a nucleic acid region, and/or identification of a hydroxymethylated site as hemi-hydroxymethylated or fully hydroxymethylated.
  • hydroxymethylation information such as hydroxymethylation pattern, total 5hmC residues within a nucleic acid region, the location of 5hmC residues within a nucleic acid region, the relative positions of 5hmC residues within a nucleic acid region, and/or identification of a hydroxymethylated site as hemi-hydroxymethylated or fully hydroxymethylated.
  • This may be carried out by selectively glucosylating 5hmC residues with uridine diphospho (UDP) glucose functionalized at the 6-position with an azide moiety, a step that is followed by a spontaneous 1,3-cycloaddition reaction with alkyne-functionalized biotin via a “click chemistry” reaction, as described previously, in Section 5, with respect to 5hmC-containing capture sequences in adapters.
  • the DNA fragments containing the biotinylated 5hmC residues are adapter-ligated dsDNA template molecules that can then be pulled down with streptavidin beads in an “enrichment” step.
  • a 5hmC UFI sequence is added to the termini of the pulled down adapter-ligated dsDNA template molecules, so that the after amplification, pooling, and sequencing, information regarding hydroxymethylation profile can be deduced from the sequence reads obtained. That is, the sequence reads are analyzed to provide a quantitative determination of which sequences are hydroxymethylated in the cfDNA. This may be done by, e.g., counting sequence reads or, alternatively, counting the number of original starting molecules, prior to amplification, based on their fragmentation breakpoint and/or whether they contain the same molecular UFI.
  • Dual-Biotin Technique After a cell-free nucleic acid sample has been extracted from a biological sample, with cfDNA having been adapter-ligated followed by cfRNA processing to provide adapter-bound cDNA (as described in Section 7), 5hmC residues in the cfDNA are selectively labeled with an affinity tag, e.g., a biotin moiety as explained earlier herein.
  • an affinity tag e.g., a biotin moiety as explained earlier herein.
  • Biotinylation can be carried out by selective functionalization of 5hmC residues via ⁇ GT-catalyzed glucosylation with uridine diphosphoglucose-6-azide followed by a click chemistry reaction to covalently attach an alkyne-functionalized biotin moiety as explained previously.
  • An avidin or streptavidin surface (e.g., in the form of streptavidin beads) is then used to pull out all of the dsDNA template molecules biotinylated at the 5hmC locations, which are then placed in a separate container for UFI sequence attachment during amplification.
  • the remaining dsDNA template molecules in the supernatant are fragments that either have 5mC residues or have no modifications (the latter group including cDNA generated from cfRNA).
  • a TET protein is then used to oxidize 5mC residues in the supernatant to 5hmC; in this case, a TET mutant protein is employed to ensure that oxidation of 5mC does not proceed beyond hydroxylation.
  • Suitable TET mutant proteins for this purpose are described in Liu et al. (2017) Nature Chem. Bio. 13: 181-191, incorporated by reference herein.
  • the ⁇ GT-catalyzed glucosylation followed by biotin functionalization is then repeated.
  • the bead-bound DNA fragments are then barcoded—with a UFI sequence than used in the first step, i.e., a 5mC UFI sequence—during amplification.
  • Unmodified DNA fragments, i.e., fragments containing no modified cytosine residues now remain in the supernatant.
  • sequence-specific probes can be used to hybridize to unmethylated DNA strands.
  • the hybridized complexes that result can be pulled out and tagged with a further UFI sequence during amplification, as before.
  • Pic-Borane Methodology This is an alternative to the dual biotin technique, and also begins with biotinylation of 5hmC residues in adapter-ligated DNA fragments, followed by avidin or streptavidin pull-down. In this technique, however, the DNA containing unmodified 5mC residues remaining in the supernatant is oxidized beyond 5hmC, to 5caC and/or 5fC residues. Oxidation may be carried out enzymatically, using a catalytically active TET family enzyme.
  • a “TET family enzyme” or a “TET enzyme” as those terms are used herein refer to a catalytically active “TET family protein” or a “TET catalytically active fragment” as defined in U.S.
  • TET2 A preferred TET enzyme in this context is TET2; see Ito et al. (2011) Science 333(6047):1300-1303. Oxidation may also be carried out chemically, using a chemical oxidizing agent.
  • Suitable oxidizing agent include, without limitation: a perruthenate anion in the form of an inorganic or organic perruthenate salt, including metal perruthenates such as potassium perruthenate (KRuO 4 ), tetraalkylammonium perruthenates such as tetrapropylammonium perruthenate (TPAP) and tetrabutylammonium perruthenate (TBAP), and polymer supported perruthenate (PSP); and inorganic peroxo compounds and compositions such as peroxotungstate or a copper (II) perchlorate/TEMPO combination. It is unnecessary at this point to separate 5fC-containing fragments from 5caC-containing fragments, insofar as in the next step of the process, both 5fC residues and 5caC residues are converted to dihydrouracil (DHU).
  • metal perruthenates such as potassium perruthenate (KRuO 4 )
  • TPAP tetrapropylammonium perrut
  • dsDNA template molecules contain DHU in place of the original 5mC residues, and can be amplified, pooled, and sequenced, along with other dsDNA template molecules deriving from the same sample.
  • the organic borane may be characterized as a complex of borane and a nitrogen-containing compound selected from nitrogen heterocycles and tertiary amines.
  • the nitrogen heterocycle may be monocyclic, bicyclic, or polycyclic, but is typically monocyclic, in the form of a 5- or 6-membered ring that contains a nitrogen heteroatom and optionally one or more additional heteroatoms selected from N, O, and S.
  • the nitrogen heterocycle may be aromatic or alicyclic.
  • Preferred nitrogen heterocycles herein include 2-pyrroline, 2H-pyrrole, 1H-pyrrole, pyrazolidine, imidazolidine, 2-pyrazoline, 2-imidazoline, pyrazole, imidazole, 1,2,4-triazole, 1,2,4-triazole, pyridazine, pyrimidine, pyrazine, 1,2,4-triazine, and 1,3,5-triazine, any of which may be unsubstituted or substituted with one or more non-hydrogen substituents.
  • Typical non-hydrogen substituents are alkyl groups, particularly lower alkyl groups, such as methyl, ethyl, n-propyl, isopropyl, n-butyl, isobutyl, t-butyl, and the like.
  • Exemplary compounds include pyridine borane, 2-methylpyridine borane (also referred to as 2-picoline borane), and 5-ethyl-2-pyridine. Further information concerning these organic boranes and reaction thereof to convert oxidized 5mC residues to DHU may be found in the Arensdorf patent application cited above, Provisional U.S. Patent Application Ser. No. 62/630,798, previously incorporated by reference herein.
  • Biotin/Native 5mC Enrichment Method This is an alternative to the dual biotin technique, and begins with biotinylation of 5hmC residues in adapter-ligated DNA fragments, followed by avidin or streptavidin pull-down.
  • an anti-5mC antibody or an MBD protein is used to capture and pull down native 5mC-containing fragments.
  • This technique is less preferred herein, insofar as it does not result in the generation of dsDNA template molecules that can be amplified, pooled, and sequenced with other dsDNA template molecules deriving from the same sample.
  • the barcoded, adapter-ligated dsDNA template molecules generated are thus dsDNA containing 5hmC and dsDNA containing 5mC and no 5hmC, and optionally further including dsDNA with no modified cytosine residues.
  • These template molecules are amplified, pooled, and sequenced along with at least one of:
  • Histone modification-barcoded dsDNA template molecules generated by the process in Section 2;
  • Protein-barcoded dsDNA template molecules generated from the same biological sample (e.g., a blood sample) by the process described in Section 2.
  • Sequencing the aforementioned admixture in a single run can thus provide information on nucleosomes, particularly histone modifications; cfRNA sequence; protein analyte identity and concentration; cfDNA hydroxymethylation profile; and cfDNA methylation profile.
  • Selected UFI sequences e.g., molecular UFI sequences, sample UFI sequences, process UFI sequences (including 5hmC UFI sequences and 5mC UFI sequences, as explained above)
  • RCA primers in the context of RCA techniques known to those in the art and/or described in the pertinent texts and literature.
  • a complementary strand can be generated and the present process then carried out on the dsDNA molecules as described elsewhere herein.
  • the invention thus provides a combined workflow method in which multiple types of information are obtained from a single biological sample by pooling and sequencing, in a single run, amplicons of dsDNA template molecules tagged to indicate various features of the biological sample.
  • the most comprehensive version of the process is schematically illustrated in FIG. 3 and comprises the following steps:
  • each dsDNA template molecule having a specific protein UFI sequence corresponds to the presence of that specific protein in the sample;
  • adapters include a source identifier UFI, to identify the source or sample of the DNA, a “random” molecule identifier UFI, to identify each cfDNA fragment as an original molecule in the sample, and optionally a “strand” identifier UFI, to identify the strand of each cfDNA fragment as C or W;
  • FIGS. 4 and 5 Combined workflow methods of the invention with one or two fewer analyses are schematically illustrated in FIGS. 4 and 5 .
  • a significant advantage of the invention lies in the use of a classical sequencing-based technique to determine one or more non-classical sequence features of a biological sample, where a “non-classical sequence feature” refers to a feature other than the identity and order of the primary bases (i.e., adenine, cytosine, guanine, and thymine for DNA, and adenine, cytosine, guanine, and uracil for RNA) of a nucleic acid molecule in the sample.
  • the primary bases i.e., adenine, cytosine, guanine, and thymine for DNA, and adenine, cytosine, guanine, and uracil for RNA
  • the non-classical sequence features of interest may be information related to the composition of a nucleic acid, such as the distribution of modified cytosine residues, e.g., 5hmC or 5mC, or it may be unrelated to the composition of a nucleic acid and pertain instead to the presence and concentration of plasma proteins in a blood sample, histone modifications observed in a cell-free nucleosome fraction of the blood sample, and the like, as discussed in detail above.
  • the analysis involves conversion of a non-classical sequence feature of interest, such as the identity of a plasma protein, the concentration of a plasma protein, the number, location and types of histone modifications, the hydroxymethylation profile of a nucleic acid, or the methylation profile of a nucleic acid, into classical sequence data.
  • the classical sequence data obtained includes at least one UFI, i.e., a specific nucleic acid sequence in the range of about 4 to about 36 base pairs in length, where the UFI is incorporated within a dsDNA template molecule and relates to a specific feature of the biological sample, i.e., a non-classical sequence feature of interest, as explained above.
  • the invention provides a sequencing-based method for determining a non-classical sequence feature of a nucleic acid template molecule, comprising:
  • the nucleic acid template molecule is contained within a composition that comprises a plurality of different nucleic acid template molecules, and at least one identifier sequence designating a specific non-classical sequence feature of each template molecule is appended thereto.
  • the non-classical sequence feature may comprise an aspect of a protein with which the nucleic acid template molecule was associated at some point, e.g., a histone.
  • the non-classical sequence feature may also be the presence or concentration of a particular protein in the biological sample, with conversion of that to feature to a classical sequence carried out using the proximity extension assay described in Section 2.
  • Other non-classical sequence features of interest include, by way of example, cfDNA hydroxymethylation profile and cfDNA methylation profile.
  • the invention additionally pertains to truncated sequencing adapters and their use in the amplification and sequencing of dsDNA template molecules.
  • the truncated adapters used in conjunction with certain primer constructs, are useful in adding an identifier barcode to a dsDNA template molecule during PCR amplification.
  • the truncated sequencing adapters are in the form of a Y-construct having a double-stranded segment comprising in the range of 2 base pairs to 50 base pairs and two single-stranded segments each comprising in the range of 2 bases to 25 bases.
  • the double-stranded segment comprises in the range of 5 base pairs to 35 base pairs and the two single-stranded segments each comprise in the range of about 5 bases to 25 bases, e.g., in the range of 5 base pairs to 25 base pairs and in the range of about 5 bases to 20 bases, respectively.
  • the truncated sequencing adapters are first ligated to an end-blunted, A-tailed dsDNA template molecule using conventional means.
  • the adapter-ligated dsDNA template molecules so provided are then amplified in a PCR process using at least one barcoded primer, wherein the barcoded primer comprises: (i) a first region that is not complementary to any sequence in the adapter and comprises one or more identifier barcodes; and (ii) a second region that is sufficiently complementary to a single-stranded segment of the adapter to hybridize thereto, such that extension of the barcoded primer in the presence of a polymerase results in a double-stranded complex of the second region of the primer and the single-stranded segment of the adapter, with the first region that comprises the identifier barcode extending beyond the end of the double-stranded complex as a single-stranded oligonucleotide tail
  • truncated adapters is exemplified in the experimental section herein. While the ratio of adapters to DNA template molecules can be varied, the ratio is generally in the range of about 1:5 to about 250:1 (w/w), e.g., 5:1 to 200:1, 10:1 to 150:1, or 20:1 to 100:1.
  • the invention also provides a kit for amplifying and sequencing a dsDNA template molecule, comprising:
  • a sequencing adapter in the form of a Y-construct having a double-stranded segment comprising in the range of 2 base pairs to 50 base pairs and two single-stranded segments each comprising in the range of 2 bases to 25 bases;
  • a barcoded primer comprising (i) a first region that is not complementary to any sequence in the adapter and comprises an identifier barcode; and (ii) a second region that is sufficiently complementary to a single-stranded segment of the adapter to hybridize thereto;
  • FIG. 7 A representative truncated adapter is shown in FIG. 7 , and may be compared with the standard prior art adapter shown in FIG. 6 .
  • Use of indexed primers and truncated adapters in a PCR process is schematically illustrated in FIG. 8 .
  • the truncated adapter approach may be combined with any method of the invention described herein that involves ligation of DNA adapters to a dsDNA template molecule prior to amplification and sequencing.
  • the methods of the invention as described in detail herein can be combined with conventional techniques that involve sequencing of a biological sample.
  • the present methods can be combined with conventional (or hereafter discovered or developed) liquid biopsy methodologies that involve sequence-based enrichment, e.g., when specific genes or “hot spots” are selectively captured (i.e., using hybrid capture) and/or selectively amplified (e.g., using multiplex PCR amplicons or multiple RCA primers).
  • a sequence-based enrichment step, or multiple sequence-based enrichment steps, e.g., in the context of targeted sequencing, can be carried out in conjunction with any of the present methods, by separating out one or more groups of barcoded template molecules or amplification products thereof on the basis of sequence, analyzing that group, and optionally recombining, or pooling, that group with other nucleic acid fractions, prior to a combined sequencing step.
  • the present methods and aspects thereof can be combined with non-shotgun sequencing techniques, in the analysis of a more discrete number of targeted loci, using, for instance, multiplex PCR and arrays (where a hybridization probe incorporates a process barcode and/or other UFI sequence, allowing discrimination of the marks in specific loci without direct sequencing).
  • targeted sequencing approaches with which the present methods or aspects thereof can be combined include those described in So et al. (2016) Genomic Medicine 3:2; Gong et al., cited supra; Stahlberg et al. (2016) Nuc Acids Res 44(11; e105)1-7; Mamanova et al. (2010) Nature Methods 7(2):111-118; and others.
  • the aforementioned publications are incorporated by reference herein.
  • Examples 1-3 an alternative sequencing adapter construct is evaluated which permits a single template ligation reaction that can be aliquoted to 5hmC enrichment or WGS and indexed by sample only upon PCR amplification.
  • the alternative adapters are “truncated” adapters (with single-stranded tails in the range of about 6 bases to about 30 bases in length), which are paired with modified indexed PCR sequences.
  • Example 1 describes the preparation of custom adapters, including the design of adapter sequences and generation of adapter constructs;
  • Example 2 describes optimization of library preparation with the adaptors prepared in Example 1; and
  • Example 3 provides validation of at least equivalent 5hmC enrichment performance with the custom adapters, relative to enrichment performance seen with standard, commercially available “Y” adapters.
  • FIG. 6 illustrates a standard adapter configuration for the Illumina Y-adapter construct; as may be seen in the figure, the sample index barcode (indicated as [index]) is contained within the Y adapter.
  • FIG. 7 illustrates the truncated adapter configuration of the invention, with the sample index barcode (indicated as [indexRC]) within an amplification primer. Amplification primers are highlighted in both figures.
  • FIG. 8 The creation of an indexed library using truncated adapters and indexed primers according according according to the invention are schematically illustrated in FIG. 8 .
  • Whole blood was collected in Cell-Free DNA BCT® tubes according to the manufacturer's protocol (Streck, La Vista, Nebr.) (https://www.streck.com/collection/cell-free-dna-bct/). Tubes were maintained at 15° C. to 25° C. with plasma separation performed within 24 h of phlebotomy by centrifugation of whole blood at 1600 ⁇ g for 10 min at RT, followed by transfer of the plasma layer to a new tube for centrifugation at 16,000 ⁇ g for 10 min. Plasma was aliquoted for subsequent cfDNA isolation or storage at ⁇ 80° C.
  • cfDNA was isolated using the QIAamp Circulating Nucleic Acid Kit (Qiagen, Germantown Md.) following the manufacturer's protocol.
  • Whole blood genomic DNA was extracted using the DNA Mini Kit (Qiagen) and fragmented using dsDNA Fragmentase (NEB).
  • DNA was quantified by Bioanalyzer dsDNA High Sensitivity assay (Agilent Technologies Inc., Santa Clara, Calif.) and Qubit dsDNA High Sensitivity Assay (Thermo Fisher Scientific, Waltham, Mass.).
  • Spike-in amplicon preparation To generate a spiked-in control, lambda DNA was PCR amplified by Taq DNA Polymerase (NEB) and purified by AMPure XP beads (Beckman Coulter) in nonoverlapping ⁇ 180 bp amplicons, with a cocktail of dATP/dGTP/dTTP and one of the following: dCTP, dmCTP or 10% dhmCTP (Zymo)/90% dCTP.
  • NEB Taq DNA Polymerase
  • AMPure XP beads Beckman Coulter
  • Primer sequences are as follows: dCTP FW-5′-CGTTTCCGTTCTTCTTCGTC-3′, RV-5′-TACTCGCACCGAAAATGTCA-3′; dmCTP FW-5′-GTGGCGGGTTATGATGAACT-3′, RV-5′-CATAAAATGCGGGGATTCAC-3′; 10% dhmCTP/90% dCTP FW-5′-TGAAAACGAAAGGGGATACG-3′, RV-5′-GTCCAGCTGGGAGTCGATAC-3′.
  • 5-Hydroxymethylcytosine assay enrichment Sequencing library preparation and 5hmC enrichment was performed as described previously (Song et al. (2017) Cell Research 27:1231-1242), incorporated by reference herein. cfDNA was normalized to 10 ng total input for each assay and ligated to sequencing adapters. 5hmC bases were biotinylated via a two-step chemistry and subsequently enriched by binding to Dynabeads M270 Streptavidin (Thermo Fisher Scientific, Waltham, Mass.).
  • DNA sequencing and alignment DNA sequencing was performed according to manufacturer's recommendations with 75 base-pair, paired-end sequencing using a NextSeq550 instrument with version 2 reagent chemistry (Illumina, San Diego, Calif.). Twenty-four libraries were sequenced per flowcell and raw data processing and demultiplexing was performed using the Illumina BaseSpace Sequence Hub to generate sample-specific FASTQ output. Sequencing reads were aligned to the hg19 reference genome using BWA-MEM with default parameters (Li & Durbin (2010), “Fast and accurate long-read alignment with Burrows-Wheeler transform,” Bioinformatics 26: 589-595).
  • the custom oligonucleotides in Table 1 included three subsets: (1) truncated adapter oligonucleotides for hybridization and generation of adapter constructs; (2) indexing PCR oligonucleotides for amplification of adapter ligated products and incorporation of sample indexing; and (3) universal PCR oligonucleotides for re-amplification of libraries containing any index motif.
  • a master mix of truncated adapter oligonucleotides in STE buffer was generated in a 1.5 ml Eppendorf tube and aliquoted to three 0.2 ml thin-wall PCR tubes (40 ⁇ l each).
  • Oligonucleotides were hybridized on an Eppendorf Mastercycler Pro with a heated lid (105° C. for block temperatures>40° C., otherwise ambient) under the following conditions:
  • Phase Temperature Duration Denature 95° C. 5 minutes Hybridize 95° C.-20° C. Ramp at ⁇ 1° C. per minute Hold 4° C. 00
  • Hybridized adapters were pooled into a single 1.5 ml Eppendorf tube and stored at 4° C. through validation (long term storage is recommended at ⁇ 20° C.). Adapters were diluted 1:250 in 1 ⁇ STE buffer and evaluated on a Bioanalyzer High Sensitivity chip. A single large peak was visible in the electropherogram trace of the adapter, indicating successful hybridization.
  • This example describes optimization of library preparation using the truncated adaptors prepared as described in Example 1.
  • Template DNA is limited in practice, so for the purposes of optimization and validation of truncated adapters, fragmented genomic DNA was believed to offer the best solution for availability of significant, homogeneous DNA template.
  • the KAPA® HyperPlus Kit (Roche) was used for this purpose; although the HyperPlus Kit typically is used for combined fragmentation and library preparation (including adapter ligation), only the fragmentation portion was used for this example.
  • Stock brain and spleen genomic DNA was diluted to 500 ng per 35 ⁇ l in buffer Tris-HCl (pH 8.0) solution. Two replicate preparations of each tissue were prepared for a total of 1 ⁇ g genomic DNA per tissue type.
  • concentrations and reaction volume were as follows: stock concentration, 250 ng/ ⁇ l; final concentration, 10.7 ng/ ⁇ l; reaction volume, 1.5 ⁇ l; buffer EB, 33.5 ⁇ l.
  • Fragmentation buffer and enzyme were thawed on ice and added to each genomic DNA sample in 0.2 ml thin-wall PCR tubes. Concentrations and 1 ⁇ reaction volume in the fragmentation reaction mix were as follows: dsDNA stock concentration, 10.7 ng/ ⁇ l; dsDNA reaction concentration, 7.5 ng/ ⁇ l; dsDNA 1 ⁇ reaction volume, 354 fragmentation buffer stock concentration, 10 ⁇ ; fragmentation buffer reaction concentration, 1 ⁇ ; fragmentation buffer 1 ⁇ reaction volume, 5 ⁇ l; fragmentation enzyme stock concentration, 5 ⁇ ; fragmentation enzyme reaction concentration, 1 ⁇ ; fragmentation enzyme 1 ⁇ reaction volume, 10 ⁇ l.
  • genomic DNA samples were then fragmented on an Eppendorf Mastercycler Pro (heated lid off) under the following conditions: chill, 4° C. for 1 minute; fragment, 37° C. for 35 minutes.
  • Fragmented samples were removed immediately from the thermal cycler and purified using a 2 ⁇ ratio AMPure XP bead protocol described below:
  • Warmed AMPure XP beads to room temperature for at least 30 minutes prior to purification;
  • the fragmented DNA was quantified using the Qubit dsDNA assay. 1 ⁇ l of each sample was evaluated. Fragmented DNA samples were evaluated for size distribution on a Bioanalyzer High Sensitivity chip; the size distribution profiles obtained are indicated in FIG. 9 .
  • Fragmentation of genomic DNA samples was successful, with slightly higher yield in the spleen gDNA preparation than in the brain gDNA preparation. Fragment sizes observed were within the range of standard cfDNA size distributions centered on 167 bp. Yield of fragmented gDNAs was sufficient for (1) a titer of adapter input to library preparation and (2) a head-to-head evaluation of truncated adapters versus standard adapters.
  • Adapters were titered into WGS library preparation for brain (10 ng input) and spleen (20 ng input) fragmented cfDNA templates over a 50-fold range spanning approximately 5-fold to approximately 500-fold adapter-to-template DNA ratios (5:1; 20:1; 50:1; 100:1; 250:1; and 500:1).
  • the fragmented DNA was normalized to 10 ng (brain) or 20 ng (spleen) in a 50 ⁇ l volume.
  • An end-repair and A-tailing enzyme mix was prepared, and the fragmented genomic DNA was end-repaired and A-tailed on an Eppendorf Mastercycler Pro (heated lid on), using the following conditions: end repair at 20° C. for 30 min; heat inactivation at 65° C. for 30 min.; and a 4° C. hold.
  • a ligation master mix was prepared having the following components:
  • End-repaired fragmented gDNA samples were ligated to adapter at room temperature for 30 minutes, and the ligated products were purified using a standard 1.2 ⁇ ratio AMPure XP bead protocol.
  • a PCR master mix was prepared containing the following components in a 10:1:1:2:6 volume ratio: 2 ⁇ KAPA HiFi HotStart Ready Mix; 10 ⁇ M Universal Primer; 10 ⁇ M Index Primer (1-10); ligated DNA; and HPLC water.
  • PCR cycling conditions were as follows: initial denature, 98° C. for 45 sec; denature, 98° C. for 15 sec; anneal, 60° C. for 30 sec.
  • the amplified products were purified using a standard 1.2 ⁇ ratio AMPure XP bead protocol, as before.
  • PCR amplified libraries were diluted 25 ⁇ in Buffer EB and evaluated on a Bioanalyzer High Sensitivity chip and with a Qubit High Sensitivity dsDNA assay. The data are shown below.
  • Truncated adapter titer library concentrations :
  • WGS library concentration (ng/ ⁇ L) was plotted versus the concentration of adapter input, shown in FIG. 10 , along with the MT group WGS.
  • RPKM values were compared across treatments for WGS and for 5hmC preparations. Generally, preparations were relatively similar to one another, albeit with some noise across comparisons. Notably, 5hmC preparations from brain genomic DNA were remarkably similar to one another, particularly between replicate preparations.
  • RPKM distributions for WGS data were, as expected, narrowly distributed with a modal value approaching 1 (random distribution of reads) and a secondary distribution approaching 0.5 corresponding to X chromosome gene bodies having 50% dosage in these male samples.
  • RPKM distributions of WGS libraries are largely congruent, as are 5hmC library RPKM distributions from brain gDNA.
  • significant variability is observed in 5hmC library RKPM distributions from spleen gDNA; notably two libraries appear to approximate a WGS preparation rather than a 5hmC preparation, indicating possible background noise in the enrichment process for these libraries.
  • the relative function of truncated adapters and indexed PCR primers can be estimated by the comparing the yield of whole genome libraries prepared from a single template with alternative adapter strategies with limited PCR cycling.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • Immunology (AREA)
  • Plant Pathology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
US17/282,694 2018-10-04 2019-10-03 Simultaneous, sequencing-based analysis of proteins, nucleosomes, and cell-free nucleic acids from a single biological sample Pending US20210380971A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/282,694 US20210380971A1 (en) 2018-10-04 2019-10-03 Simultaneous, sequencing-based analysis of proteins, nucleosomes, and cell-free nucleic acids from a single biological sample

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201862741473P 2018-10-04 2018-10-04
PCT/US2019/054582 WO2020072829A2 (en) 2018-10-04 2019-10-03 Simultaneous, sequencing-based analysis of proteins, nucleosomes, and cell-free nucleic acids from a single biological sample
US17/282,694 US20210380971A1 (en) 2018-10-04 2019-10-03 Simultaneous, sequencing-based analysis of proteins, nucleosomes, and cell-free nucleic acids from a single biological sample

Publications (1)

Publication Number Publication Date
US20210380971A1 true US20210380971A1 (en) 2021-12-09

Family

ID=68296828

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/282,694 Pending US20210380971A1 (en) 2018-10-04 2019-10-03 Simultaneous, sequencing-based analysis of proteins, nucleosomes, and cell-free nucleic acids from a single biological sample

Country Status (9)

Country Link
US (1) US20210380971A1 (cg-RX-API-DMAC7.html)
EP (1) EP3861132A2 (cg-RX-API-DMAC7.html)
JP (1) JP2022504078A (cg-RX-API-DMAC7.html)
CN (1) CN113166796A (cg-RX-API-DMAC7.html)
AU (1) AU2019354789A1 (cg-RX-API-DMAC7.html)
CA (1) CA3114606A1 (cg-RX-API-DMAC7.html)
MX (1) MX2021003847A (cg-RX-API-DMAC7.html)
SG (1) SG11202102954QA (cg-RX-API-DMAC7.html)
WO (1) WO2020072829A2 (cg-RX-API-DMAC7.html)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113517026B (zh) * 2021-06-16 2022-08-19 苏州拉索生物芯片科技有限公司 应用于生物制品的标签序列的生成方法、系统、智能终端及计算机可读存储介质
WO2023288222A1 (en) * 2021-07-12 2023-01-19 The Trustees Of The University Of Pennsylvania Modified adapters for enzymatic dna deamination and methods of use thereof for epigenetic sequencing of free and immobilized dna
WO2023287876A1 (en) * 2021-07-15 2023-01-19 University Of Washington Efficient duplex sequencing using high fidelity next generation sequencing reads
EP4405496A4 (en) * 2021-09-23 2025-01-01 Genomic Testing Cooperative, LCA COMPOSITIONS AND METHODS FOR TARGETED NGS SEQUENCING OF CFRNA AND CFTNA
US20230086611A1 (en) * 2021-09-23 2023-03-23 Genomic Testing Cooperative, LCA Compositions and Methods for Targeted NGS Sequencing of cfRNA and cfTNA
WO2023097295A1 (en) * 2021-11-24 2023-06-01 Alida Biosciences, Inc. Rna and dna analysis using engineered surfaces
CN114934110A (zh) * 2022-06-15 2022-08-23 江西烈冰生物科技有限公司 用于获取基因表达的原始位置的生物芯片、试剂盒及方法
EP4623079A1 (en) * 2022-11-23 2025-10-01 Alida Biosciences, Inc. Chromatin profiling compositions and methods
CN116403645B (zh) * 2023-03-03 2024-01-09 阿里巴巴(中国)有限公司 转录因子结合位点的预测方法及装置

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017075265A1 (en) * 2015-10-28 2017-05-04 The Broad Institute, Inc. Multiplex analysis of single cell constituents

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5712126A (en) 1995-08-01 1998-01-27 Yale University Analysis of gene expression by display of 3-end restriction fragments of CDNA
US6287825B1 (en) 1998-09-18 2001-09-11 Molecular Staging Inc. Methods for reducing the complexity of DNA sequences
CA2639819C (en) 2005-11-30 2012-10-23 Epicentre Technologies Corporation Selective terminal tagging of nucleic acids
CN101720359A (zh) 2007-06-01 2010-06-02 454生命科学公司 从多重混合物中识别个别样本的系统和方法
US20090093378A1 (en) 2007-08-29 2009-04-09 Helen Bignell Method for sequencing a polynucleotide template
WO2010037001A2 (en) 2008-09-26 2010-04-01 Immune Disease Institute, Inc. Selective oxidation of 5-methylcytosine by tet-family proteins
US20100323348A1 (en) 2009-01-31 2010-12-23 The Regents Of The University Of Colorado, A Body Corporate Methods and Compositions for Using Error-Detecting and/or Error-Correcting Barcodes in Nucleic Acid Amplification Process
EP2816111B1 (en) 2009-08-14 2016-04-13 Epicentre Technologies Corporation Methods, compositions, and kits for generating rRNA-depleted samples or isolating rRNA from samples
WO2011127136A1 (en) 2010-04-06 2011-10-13 University Of Chicago Composition and methods related to modification of 5-hydroxymethylcytosine (5-hmc)
GB201107863D0 (en) * 2011-05-11 2011-06-22 Olink Ab Method and product
GB2507231B (en) * 2011-07-29 2014-07-30 Cambridge Epigenetix Ltd Methods for Detection of Cytosine Modification
ES2875998T3 (es) * 2013-09-30 2021-11-11 Vesicode Ab Procedimientos de perfilado de complejos moleculares mediante el uso de códigos de barras dependientes de la proximidad
CA2923812C (en) 2013-10-17 2023-10-17 Clontech Laboratories, Inc. Methods for adding adapters to nucleic acids and compositions for practicing the same
US10844428B2 (en) * 2015-04-28 2020-11-24 Illumina, Inc. Error suppression in sequenced DNA fragments using redundant reads with unique molecular indices (UMIS)
AU2017246318B2 (en) 2016-04-07 2023-07-27 The Board Of Trustees Of The Leland Stanford Junior University Noninvasive diagnostics by sequencing 5-hydroxymethylated cell-free DNA
US20170298422A1 (en) 2016-04-18 2017-10-19 The Board Of Trustees Of The Leland Stanford Junior University Simultaneous single-molecule epigenetic imaging of dna methylation and hydroxymethylation
WO2018031897A1 (en) * 2016-08-12 2018-02-15 Cdi Laboratories, Inc. Compositions and methods for analyzing nucleic acids associated with an analyte
CN106367485B (zh) * 2016-08-29 2019-04-26 厦门艾德生物医药科技股份有限公司 一种用于检测基因突变的多定位双标签接头组及其制备方法和应用
US20180080021A1 (en) 2016-09-17 2018-03-22 The Board Of Trustees Of The Leland Stanford Junior University Simultaneous sequencing of rna and dna from the same sample
EP3752515A1 (en) * 2018-02-14 2020-12-23 Bluestar Genomics, Inc. Methods for the epigenetic analysis of dna, particularly cell-free dna

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017075265A1 (en) * 2015-10-28 2017-05-04 The Broad Institute, Inc. Multiplex analysis of single cell constituents

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Lundberg et al. Nucleic Acids Research. 2011. 39(15):e102. (Year: 2011) *
Masunaga et. al. Highly sensitive detection of ESR1 mutations in cell-free DNA from patients with metastatic breast cancer using molecular barcode sequencing. Breast Cancer Res Treat 167, 49–58 (2018) (Year: 2018) *
McAnena et al. Cancers (Basel). 2017. 9(1):5. (Year: 2017) *

Also Published As

Publication number Publication date
EP3861132A2 (en) 2021-08-11
CA3114606A1 (en) 2020-04-09
JP2022504078A (ja) 2022-01-13
CN113166796A (zh) 2021-07-23
AU2019354789A1 (en) 2021-05-27
WO2020072829A3 (en) 2020-08-13
WO2020072829A2 (en) 2020-04-09
MX2021003847A (es) 2021-05-27
SG11202102954QA (en) 2021-04-29

Similar Documents

Publication Publication Date Title
US20210380971A1 (en) Simultaneous, sequencing-based analysis of proteins, nucleosomes, and cell-free nucleic acids from a single biological sample
US12351859B2 (en) Methods for the epigenetic analysis of DNA, particularly cell-free DNA
TWI797118B (zh) 用於資料庫建立及序列分析之組合物及方法
AU2011305445B2 (en) Direct capture, amplification and sequencing of target DNA using immobilized primers
AU2017328950B2 (en) Methods of nucleic acid sample preparation
EP1885890A2 (en) Quantification of nucleic acids and proteins using oligonucleotide mass tags
US11898202B2 (en) Methods for accurate parallel quantification of nucleic acids in dilute or non-purified samples
EP4332238B1 (en) Methods for accurate parallel detection and quantification of nucleic acids
EP4060050B1 (en) Highly sensitive methods for accurate parallel quantification of nucleic acids
JPWO2020072829A5 (cg-RX-API-DMAC7.html)
EP4549581A1 (en) Highly sensitive methods for accurate parallel quantification of nucleic acids
CA3208896A1 (en) Highly sensitive methods for accurate parallel quantification of variant nucleic acids
Gingeras RAMPAGE: Promoter activity profiling by paired-end sequencing of 5'-complete cDNAs

Legal Events

Date Code Title Description
AS Assignment

Owner name: BLUESTAR GENOMICS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ARENSDORF, PATRICK A.;SPACEK, DAMEK;ELLISON, CHRISTOPHER E;AND OTHERS;SIGNING DATES FROM 20181027 TO 20200820;REEL/FRAME:055812/0573

Owner name: BLUESTAR GENOMICS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNOR'S INTEREST;ASSIGNORS:ARENSDORF, PATRICK A.;SPACEK, DAMEK;ELLISON, CHRISTOPHER E;AND OTHERS;SIGNING DATES FROM 20181027 TO 20200820;REEL/FRAME:055812/0573

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

AS Assignment

Owner name: CLEARNOTE HEALTH, INC., CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:BLUESTAR GENOMICS, INC.;REEL/FRAME:062857/0027

Effective date: 20221221

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: ALLOWED -- NOTICE OF ALLOWANCE NOT YET MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCB Information on status: application discontinuation

Free format text: ABANDONMENT FOR FAILURE TO CORRECT DRAWINGS/OATH/NONPUB REQUEST

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED