CN116157533A

CN116157533A - Capturing genetic targets using hybridization methods

Info

Publication number: CN116157533A
Application number: CN202180031221.3A
Authority: CN
Inventors: A·S·科韦; F·梅斯基; J·陈; J·G·阿瑟; N·德莱尼; Z·本特; K·普法伊弗; A·J·希尔; L·J·阿尔瓦拉杜马丁内斯
Original assignee: 10X Genomics Ltd
Current assignee: 10X Genomics Ltd
Priority date: 2020-02-21
Filing date: 2021-02-19
Publication date: 2023-05-23
Also published as: AU2021224760A1; EP4107282A1; WO2021168261A1

Abstract

Provided herein are methods of determining the location of an analyte using hybridization as a method of enhancing analyte detection. In particular, capture probes comprising a spatial barcode and a capture domain are used to capture analytes in a biological sample in contact with a substrate. The analyte may be a nucleic acid or a protein. Decoy oligonucleotides are used to enrich for nucleic acids of interest prior to sequencing.

Description

Capturing genetic targets using hybridization methods

Cross Reference to Related Applications

The present application claims U.S. provisional patent application Ser. No. 62/979,652, filed 2/21/2020; U.S. provisional patent application Ser. No. 62/980,124, filed 2/21/2020; and priority of U.S. provisional patent application No. 63/077,019 filed on 9/11/2020. The contents of the above-mentioned applications are incorporated herein by reference in their entirety.

Background

Cells within a subject tissue differ in cell morphology and/or function due to different analyte abundances (e.g., gene and/or protein expression) within different cells. Specific locations of cells within a tissue (e.g., locations of cells relative to neighboring cells or locations of cells relative to the tissue microenvironment) may affect, for example, morphology, differentiation, fate, viability, proliferation, behavior of cells, and signal transduction and crosstalk with other cells in the tissue.

Spatial heterogeneity has previously been investigated using techniques that provide data for only a small amount of analyte in whole or part of tissue, or for a large amount of analyte data for a single cell, but not information about the location of a single cell in a parent biological sample (e.g., a tissue sample).

Whole exome sequencing provides coverage for each transcript in the sample. There is a need in the art for transcriptome-specific methods for high fidelity enrichment of nucleic acid molecules for targeted sequencing while reducing cost, maximizing efficiency, and minimizing redundancy.

Sequence listing

The present application contains a sequence listing submitted by compact disc and incorporated herein by reference in its entirety. The optical disc created at 2021, 2 and 19 is named 47706-0198WO1_sequence_running_CORRECTED. Txt, size 316,281,109 bytes. 3 copies of the disc were submitted.

Summary of The Invention

Disclosed herein are methods for identifying the abundance and location of an analyte in a biological sample, the method comprising: (a) Contacting a plurality of nucleic acids with a plurality of decoy oligonucleotides, in some embodiments, the plurality of nucleic acids comprises an extended nucleic acid comprising (i) a spatial barcode or complement thereof and (ii) an analyte, analyte derivative, or complement thereof, all or part of the sequence; the decoy oligonucleotides of the plurality of decoy oligonucleotides comprise: (i) A capture domain that hybridizes to all or part of the sequence of an analyte, analyte derivative, or complement thereof, and (ii) a molecular tag; (b) Capturing decoy oligonucleotides bound to the extended nucleic acids using a substrate comprising an agent that binds to a molecular tag; and (c) determining all or part of the sequence of (i) the spatial barcode or its complement and all or part of the sequence of (ii) the extended nucleic acid, and using the sequences determined in (i) and (ii) to identify the abundance and location of the analyte in the biological sample.

In some embodiments, the analyte is a nucleic acid.

In some embodiments, the method further comprises generating a plurality of nucleic acids, the method comprising: (a) Contacting a biological sample with a substrate comprising a plurality of attached capture probes, in some embodiments, the capture probes of the plurality of capture probes comprise (i) a spatial barcode and (ii) a capture domain that binds to a sequence present in a nucleic acid; (b) hybridizing the capture probe to the nucleic acid; (c) Extending the 3' end of the capture probe using the nucleic acid bound to the capture domain as a template to generate an extended capture probe; and (d) amplifying the extended capture probe to produce an extended nucleic acid.

In some embodiments, the extended nucleic acid is released from the extended capture probe.

In some embodiments, the analyte is a protein.

In some embodiments, the analyte derivative is an oligonucleotide comprising an analyte binding moiety barcode, and an analyte capture sequence.

In some embodiments, the method further comprises generating a plurality of nucleic acids, the method comprising: (a) Contacting a plurality of analyte capture agents with a biological sample, in some embodiments, the analyte capture agents of the plurality of analyte capture agents comprise (i) an analyte binding moiety that binds to a protein and (ii) an oligonucleotide comprising an analyte binding moiety barcode and an analyte capture sequence; (b) Contacting a plurality of analyte capture agents with a substrate comprising a plurality of capture probes, in some embodiments, the capture probes of the plurality of capture probes comprise a spatial barcode and a capture domain, in some embodiments, the capture domain binds to an analyte capture sequence; (c) Extending the 3' end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and (d) amplifying the extended capture probe to produce an extended nucleic acid.

In some embodiments, step (a) and step (b) are performed substantially simultaneously. In some embodiments, step (a) is performed before step (b). In some embodiments, step (b) is performed before step (a).

Also provided herein are methods for enriching a biological sample for an analyte or analyte derivative, the method comprising: (a) Contacting a plurality of nucleic acids with a plurality of decoy oligonucleotides, in some embodiments, the plurality of nucleic acids comprises an extended nucleic acid comprising (i) a spatial barcode or complement thereof and (ii) an analyte, analyte derivative, or complement thereof, all or part of the sequence; the decoy oligonucleotides of the plurality of decoy oligonucleotides comprise: (i) A capture domain that hybridizes to all or part of the sequence of an analyte, analyte derivative, or complement thereof, and (ii) a molecular tag; (b) Capturing a complex of decoy oligonucleotides bound to an extended nucleic acid using a substrate comprising an agent bound to a molecular tag; and (c) isolating the complex of decoy oligonucleotides bound to the extended nucleic acid, thereby enriching the biological sample for the analyte or analyte derivative.

In some embodiments, the analyte is a nucleic acid.

In some embodiments, the analyte is a protein.

In some embodiments, the analyte from the biological sample is associated with a disease or disorder.

In some embodiments, the capture domain of the decoy oligonucleotide binds to a 3 'portion, a 5' portion, an intron, an exon, a 3 'untranslated region, or a 5' untranslated region of the sequence of the analyte, the analyte derivative, or the complement thereof.

In some embodiments, the capture domain of the decoy oligonucleotide comprises a total of about 10 nucleotides to about 300 nucleotides.

In some embodiments, the molecular tag comprises a protein, a small molecule, a nucleic acid, or a carbohydrate.

In some embodiments, the molecular tag is streptavidin, avidin, biotin, or a fluorophore.

In some embodiments, the agent that binds to the molecular tag comprises a protein (e.g., an antibody), a nucleic acid, or a small molecule.

In some embodiments, the molecular tag is biotin and the agent bound to the molecular tag is avidin or streptavidin.

In some embodiments, an agent that specifically binds to a molecular tag is attached to a substrate. In some embodiments, the substrate is a bead, well, or slide.

In some embodiments, the extended nucleic acid is a DNA molecule (e.g., a cDNA molecule).

In some embodiments, the extended nucleic acid further comprises a primer sequence or complement thereof; a unique molecular sequence or its complement; or other primer binding sequences or complements thereof.

In some embodiments, the biological sample is a tissue sample selected from a formalin fixed, paraffin embedded (FFPE) tissue sample or a frozen tissue sample.

In some embodiments, the biological sample is pre-stained with a detectable label, hematoxylin and eosin (H & E) dye, immunofluorescence, or immunohistochemistry.

In some embodiments, the biological sample is a permeabilized biological sample.

In some embodiments, the determining step described herein comprises sequencing all or part of (i) the sequence of the spatial barcode or its complement and (ii) the sequence of all or part of a nucleic acid from the biological sample.

In some embodiments, the sequencing is high throughput sequencing.

In some embodiments, the analyte is deregulated or differentially expressed in cancer cells, immune cells, cell signaling pathways, or neural cells.

Also disclosed herein are methods for identifying nucleic acid abundance and location in a biological sample, the method comprising: (a) Contacting a biological sample with a substrate comprising a plurality of attached capture probes, in some embodiments, the capture probes of the plurality of capture probes comprise (i) a spatial barcode and (ii) a capture domain that binds to a sequence present in a nucleic acid; (b) hybridizing the capture probe to the nucleic acid; (c) Extending the 3' end of the capture probe using the nucleic acid bound to the capture domain as a template to generate an extended capture probe; and (d) amplifying the extended capture probes to produce extended nucleic acids; in some embodiments, the extended nucleic acid comprises (i) a spatial barcode or complement thereof and (ii) all or part of the sequence of the nucleic acid or complement thereof; (e) releasing the extended nucleic acid from the extended capture probe; (f) Contacting a plurality of released nucleic acids with a plurality of decoy oligonucleotides, the released nucleic acids comprising the extended nucleic acids from step (e), in some embodiments the decoy oligonucleotides of the plurality of decoy oligonucleotides comprise: (i) A capture domain that hybridizes to all or a portion of a nucleic acid or complement thereof and (ii) a molecular tag; (g) Capturing decoy oligonucleotides bound to the extended nucleic acids using a substrate comprising an agent that binds to a molecular tag; and (h) determining all or part of the sequence of (i) the spatial barcode or its complement and all or part of the sequence of (ii) the extended nucleic acid, and using the sequences determined in (i) and (ii) to identify the abundance and location of the nucleic acid in the biological sample.

Also disclosed herein are methods for identifying protein abundance and location in a biological sample, the method comprising: (a) Contacting a plurality of analyte capture agents with a biological sample, in some embodiments, the analyte capture agents of the plurality of analyte capture agents comprise (i) an analyte binding moiety that binds to a protein and (ii) an oligonucleotide comprising an analyte binding moiety barcode and an analyte capture sequence; (b) Contacting a plurality of analyte capture agents with a substrate comprising a plurality of capture probes, in some embodiments, the capture probes of the plurality of capture probes comprise a spatial barcode and a capture domain, in some embodiments, the capture domain binds to an analyte capture sequence; (c) Extending the 3' end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; (d) Amplifying the extended capture probes to produce extended nucleic acids; in some embodiments, the extended nucleic acid comprises all or part of the sequence of (i) a spatial barcode or complement thereof and (ii) an oligonucleotide or complement thereof; (e) releasing the extended nucleic acid from the extended capture probe; (f) Contacting a plurality of released nucleic acids with a plurality of decoy oligonucleotides, the released nucleic acids comprising the extended nucleic acids from step (e), in some embodiments, the decoy oligonucleotides of the plurality of decoy oligonucleotides comprise (i) a capture domain that hybridizes to all or a portion of the sequence of the oligonucleotide or its complement and (ii) a molecular tag; (g) Capturing decoy oligonucleotides bound to the extended nucleic acids using a substrate comprising an agent that binds to a molecular tag; and (h) determining all or part of the sequence of (i) the spatial barcode or its complement and all or part of the sequence of (ii) the extended nucleic acid, and using the sequences determined in (i) and (ii) to identify the abundance and location of the protein in the biological sample.

Also disclosed herein are compositions comprising a decoy oligonucleotide and an extended nucleic acid, in some embodiments, the decoy oligonucleotide is bound to the extended nucleic acid, in some embodiments, the extended nucleic acid comprises (i) a spatial barcode or complement thereof and (ii) an analyte, an analyte derivative, or a complement thereof, in whole or in part; in some embodiments, the decoy oligonucleotide comprises a molecular tag, in some embodiments, the molecular tag is selected from streptavidin, avidin, biotin, or a fluorophore.

In some embodiments, the decoy oligonucleotide binds to the extended nucleic acid through a capture domain that hybridizes to all or a portion of the sequence of the analyte, analyte derivative, or complement thereof.

In some embodiments, the compositions as described herein further comprise an agent that binds to the molecular tag.

In some embodiments, the compositions as described herein further comprise a substrate, in some embodiments, an agent that binds to the molecular tag is attached to the substrate. In some embodiments, the substrate is a bead, well, or slide.

Also provided herein are kits comprising: an array comprising a plurality of capture probes, in some embodiments, the capture probes of the plurality of capture probes comprise a spatial barcode and a capture domain; a plurality of decoy oligonucleotides; and instructions for performing the methods described herein.

Also provided herein are kits comprising: a plurality of analyte capture agents, in some embodiments, the analyte capture agents of the plurality of analyte capture agents comprise an analyte binding moiety, an analyte binding moiety barcode, and an analyte capture sequence; an array comprising a plurality of capture probes, in some embodiments, the capture probes of the plurality of capture probes comprise a spatial barcode and a capture domain; a plurality of decoy oligonucleotides; and instructions for performing the methods described herein.

In some embodiments, the kits described herein further comprise reagents and/or enzymes for performing the methods.

Also disclosed herein are methods for identifying the location of a nucleic acid in a biological sample, the method comprising: (a) Contacting a plurality of nucleic acids from a biological sample with a plurality of decoy oligonucleotides, in some embodiments, the plurality of nucleic acids comprises (i) a spatial barcode or complement thereof and (ii) a partial sequence of a nucleic acid or complement thereof from the biological sample; a decoy oligonucleotide of the plurality of decoy oligonucleotides comprises a capture domain that specifically binds to all or part of a sequence of a nucleic acid or complement thereof from a biological sample and a molecular tag; (b) Capturing decoy oligonucleotides that specifically bind to nucleic acids using a substrate comprising a reagent that specifically binds to a molecular tag; and (c) determining (i) all or part of the sequence of the spatial barcode or its complement and (ii) all or part of the sequence of the nucleic acid from the biological sample, and using the sequences determined in (i) and (ii) to identify the location of the nucleic acid in the biological sample.

In some embodiments, the capture domain of the decoy oligonucleotide binds specifically to all or part of the sequence of a nucleic acid from a biological sample. In some embodiments, the capture domain of the decoy oligonucleotide specifically binds to a 3' portion of the sequence of a nucleic acid or complement thereof from a biological sample. In some embodiments, the capture domain of the decoy oligonucleotide specifically binds to a 5' portion of the sequence of a nucleic acid or complement thereof from a biological sample. In some embodiments, the capture domain of the decoy oligonucleotide specifically binds to an intron in the sequence of a nucleic acid or complement thereof from the biological sample. In some embodiments, the capture domain of the decoy oligonucleotide specifically binds to an exon in the sequence of a nucleic acid or complement thereof from a biological sample. In some embodiments, the capture domain of the decoy oligonucleotide specifically binds to a 3' untranslated region of a nucleic acid or its complement from a biological sample. In some embodiments, the capture domain of the decoy oligonucleotide specifically binds to a 5' untranslated region of a nucleic acid or its complement from a biological sample.

In some embodiments, the nucleic acid from the biological sample is associated with a disease or disorder. In some embodiments, the nucleic acid from the biological sample comprises a mutation. In some embodiments, the nucleic acid from the biological sample comprises a Single Nucleotide Polymorphism (SNP). In some embodiments, the nucleic acid from the biological sample comprises a trinucleotide repeat.

In some embodiments, the molecular tag includes a moiety (moiety). In some embodiments, the moiety is streptavidin, avidin, biotin, or a fluorophore. In some embodiments, the molecular tag comprises a small molecule. In some embodiments, the molecular tag comprises a nucleic acid. In some embodiments, the molecular tag comprises a carbohydrate. In some embodiments, the molecular tag is located 5' to the capture domain in the decoy oligonucleotide. In some embodiments, the molecular tag is located at the 3' position of the capture domain in the decoy oligonucleotide. In some embodiments, the agent that specifically binds to the molecular tag comprises a protein. In some embodiments, the protein is an antibody. In some embodiments, the agent that specifically binds to the molecular tag comprises a nucleic acid. In some embodiments, the agent that specifically binds to the molecular tag comprises a small molecule. In some embodiments, an agent that specifically binds to a molecular tag is attached to a substrate. In some embodiments, the substrate is a bead. In some embodiments, the substrate is a well. In some embodiments, the substrate is a slide.

In some embodiments, the nucleic acid is DNA. In some embodiments, the nucleic acid further comprises a functional sequence. In some embodiments, the functional sequence is a primer sequence or a complement thereof. In some embodiments, the nucleic acid further comprises a unique molecular sequence or complement thereof. In some embodiments, the nucleic acid further comprises other primer binding sequences or complements thereof.

In some embodiments, the biological sample is a tissue sample. In some embodiments, the tissue sample is a formalin fixed, paraffin embedded (FFPE) tissue sample or a frozen tissue sample. In some embodiments, the biological sample is pre-stained with a detectable label. In some embodiments, the biological sample is pre-stained. In some embodiments, the biological sample is pre-stained with hematoxylin and eosin (H & E). In some embodiments, the biological sample is pre-stained using immunofluorescence or immunohistochemistry. In some embodiments, the biological sample is a permeabilized biological sample that has been permeabilized with a permeabilizing agent selected from the group consisting of organic solvents, cross-linking agents, detergents, and enzymes, or a combination thereof. In some embodiments, the permeabilizing agent is a cross-linking agent. In some embodiments, the analyte is an RNA molecule. In some embodiments, the RNA molecule is an mRNA molecule.

In some embodiments, the determining in step (c) comprises sequencing all or part of the sequence of (i) the spatial barcode or its complement and (ii) the sequence of all or part of the nucleic acid from the biological sample. In some embodiments, the sequencing is high throughput sequencing. In some embodiments, sequencing comprises ligating an adapter to the nucleic acid.

In some embodiments, the method further comprises generating a plurality of nucleic acids comprising: (a) Contacting a biological sample with a substrate comprising a plurality of attached capture probes, in some embodiments, the capture probes of the plurality of capture probes comprise (i) a spatial barcode and (ii) a capture domain that specifically binds to a sequence present in an analyte; (b) Extending the 3' end of the capture probe using the analyte that specifically binds to the capture domain as a template to generate an extended capture probe; and (c) amplifying the extended capture probe to produce nucleic acid. In some embodiments, the amplification is isothermal.

In some embodiments, the nucleic acid produced is released from the extended capture probe.

In some embodiments, the nucleic acid is deregulated or differentially expressed in cancer cells. In some embodiments, the nucleic acid is deregulated or differentially expressed in immune cells. In some embodiments, the nucleic acid is deregulated in the cell signaling pathway. In some embodiments, the nucleic acid is deregulated or differentially expressed in the neural cells.

Also disclosed herein are methods for identifying the location of a nucleic acid in a biological sample, the method comprising: (a) Contacting a plurality of nucleic acids with a plurality of decoy oligonucleotides, in some embodiments, the nucleic acids of the plurality of nucleic acids comprise (i) a spatial barcode or complement thereof and (ii) a moiety that binds to a partial barcode or complement thereof; a decoy oligonucleotide of the plurality of decoy oligonucleotides comprises a capture domain that specifically binds to (i) all or part of a nucleic acid or complement thereof and/or (ii) to all or part of a partial barcode or complement thereof and a molecular tag; (b) Capturing a complex of decoy oligonucleotides that specifically bind to a nucleic acid or complement thereof or a complex of decoy oligonucleotides that specifically bind to an analyte binding moiety barcode or complement thereof using a substrate comprising an agent that specifically binds to a molecular tag; and (c) determining all or part of the sequence of (i) the nucleic acid or complement thereof and/or (ii) the analyte binding moiety barcode or complement thereof, and using the sequences determined in (i) and (ii) to identify the location of the nucleic acid in the biological sample.

In some embodiments, the capture domain of the decoy oligonucleotide specifically binds to all or part of a nucleic acid sequence from a biological sample. In some embodiments, the capture domain of the decoy oligonucleotide specifically binds to all or part of the analyte binding moiety barcode.

In some embodiments, the nucleic acid from the biological sample is associated with a disease or disorder.

In some embodiments, the molecular tag comprises a protein. In some embodiments, the protein is streptavidin, avidin, or biotin. In some embodiments, the molecular tag comprises a small molecule. In some embodiments, the molecular tag comprises a nucleic acid. In some embodiments, the molecular tag comprises a carbohydrate. In some embodiments, the molecular tag is located 5' to the domain in the decoy oligonucleotide. In some embodiments, the molecular tag is located 3' of the domain in the decoy oligonucleotide. In some embodiments, the agent that specifically binds to the molecular tag comprises a protein. In some embodiments, the protein is an antibody. In some embodiments, the agent that specifically binds to the molecular tag comprises a nucleic acid. In some embodiments, the agent that specifically binds to the molecular tag comprises a small molecule.

In some embodiments, an agent that specifically binds to a molecular tag is attached to a substrate. In some embodiments, the substrate is a bead. In some embodiments, the substrate is a well. In some embodiments, the substrate is a slide.

In some embodiments, the nucleic acid is DNA. In some embodiments, the nucleic acid further comprises a primer binding sequence or a complement thereof. In some embodiments, the nucleic acid further comprises a unique molecular sequence or complement thereof. In some embodiments, the nucleic acid further comprises other primer binding sequences or complements thereof.

In some embodiments, the biological sample is a tissue sample. In some embodiments, the tissue sample is a formalin fixed, paraffin embedded (FFPE) tissue sample or a frozen tissue sample. In some embodiments, the biological sample is pre-stained with a detectable label. In some embodiments, the biological sample is pre-stained. In some embodiments, the biological sample is pre-stained with hematoxylin and eosin (H & E). In some embodiments, the biological sample is pre-stained using immunofluorescence or immunohistochemistry. In some embodiments, the biological sample is a permeabilized biological sample that has been permeabilized with a permeabilizing agent selected from the group consisting of organic solvents, cross-linking agents, detergents, and enzymes, or a combination thereof. In some embodiments, the permeabilizing agent is selected from the group consisting of organic solvents, cross-linking agents, detergents, and enzymes, or combinations thereof. In some embodiments, the analyte is an RNA molecule. In some embodiments, the RNA molecule is an mRNA molecule.

In some embodiments, the determining in step (c) comprises sequencing all or part of the sequence of (i) the nucleic acid or complement thereof and (ii) the analyte binding moiety barcode. In some embodiments, the sequencing is high throughput sequencing. In some embodiments, sequencing comprises ligating an adapter to the nucleic acid.

In some embodiments, the method further comprises generating a plurality of nucleic acids comprising: (a) Contacting a plurality of capture agents with a biological sample disposed on a substrate, in some embodiments, the capture agents of the plurality of capture agents comprise (i) a binding moiety that specifically binds to nucleic acids in the biological sample, (ii) a binding moiety barcode, and (iii) a capture sequence; the substrate comprises a plurality of capture probes, in some embodiments, the capture probes of the plurality of capture probes comprise a spatial barcode and a capture domain, in some embodiments, the capture domain specifically binds to an analyte capture sequence; and (b) extending the 3' end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and (c) amplifying the extended capture probe to produce nucleic acid. In some embodiments, the amplification is isothermal.

In some embodiments, the nucleic acid produced is released from the extended capture probe. In some embodiments, the nucleic acid is a nucleic acid that is deregulated or differentially expressed in cancer cells. In some embodiments, the nucleic acid is a nucleic acid that is deregulated or differentially expressed in immune cells. In some embodiments, the nucleic acid is a nucleic acid that is deregulated in a cell signaling pathway. In some embodiments, the nucleic acid is a nucleic acid that is deregulated or differentially expressed in neural cells.

Also disclosed herein are methods for enriching nucleic acids in a biological sample, the method comprising: (a) Contacting a plurality of nucleic acids with a plurality of decoy oligonucleotides, in some embodiments, the nucleic acids of the plurality of nucleic acids comprise (i) a spatial barcode or complement thereof and (ii) a moiety that binds to a partial barcode or complement thereof; a decoy oligonucleotide of the plurality of decoy oligonucleotides comprises a capture domain that specifically binds to (i) all or a portion of a nucleic acid or a complement thereof and/or (ii) to all or a portion of a partial barcode or a complement thereof and a molecular tag; (b) Capturing a complex of decoy oligonucleotides that specifically bind to a nucleic acid or complement thereof or a complex of decoy oligonucleotides that specifically bind to an analyte binding moiety barcode or complement thereof using a substrate comprising an agent that specifically binds to a molecular tag; and (c) isolating the complex of decoy oligonucleotides that specifically bind to the nucleic acid or its complement or the complex of decoy oligonucleotides that specifically bind to the analyte binding moiety barcode or its complement, thereby enriching the nucleic acid in the biological sample.

In some embodiments, the capture domain of the decoy oligonucleotide specifically binds to all or part of a nucleic acid sequence from a biological sample. In some embodiments, the capture domain of the decoy oligonucleotide specifically binds to all or part of the analyte binding moiety barcode. In some embodiments, the nucleic acid from the biological sample is associated with a disease or disorder. In some embodiments, the capture domain of the decoy oligonucleotide comprises a total of about 10 nucleotides to about 300 nucleotides.

In some embodiments, the molecular tag comprises a protein. In some embodiments, the protein is streptavidin, avidin, or biotin. In some embodiments, the molecular tag comprises a small molecule. In some embodiments, the molecular tag comprises a nucleic acid. In some embodiments, the molecular tag comprises a carbohydrate. In some embodiments, the molecular tag is located 5' to the domain in the decoy oligonucleotide. In some embodiments, the molecular tag is located 3' of the domain in the decoy oligonucleotide. In some embodiments, the agent that specifically binds to the molecular tag comprises a protein. In some embodiments, the protein is an antibody. In some embodiments, the agent that specifically binds to the molecular tag comprises a nucleic acid. In some embodiments, the agent that specifically binds to the molecular tag comprises a small molecule. In some embodiments, an agent that specifically binds to a molecular tag is attached to a substrate. In some embodiments, the substrate is a bead. In some embodiments, the substrate is a well. In some embodiments, the substrate is a slide.

In some embodiments, the biological sample is a tissue sample. In some embodiments, the tissue sample is a formalin fixed, paraffin embedded (FFPE) tissue sample or a frozen tissue sample. In some embodiments, the biological sample is pre-stained with a detectable label. In some embodiments, the biological sample is pre-stained. In some embodiments, the biological sample is pre-stained with hematoxylin and eosin (H & E). In some embodiments, the biological sample is pre-stained using immunofluorescence or immunohistochemistry.

In some embodiments, the biological sample is a permeabilized biological sample that has been permeabilized with a permeabilizing agent selected from the group consisting of organic solvents, cross-linking agents, detergents, and enzymes, or a combination thereof. In some embodiments, the permeabilizing agent is selected from the group consisting of organic solvents, cross-linking agents, detergents, and enzymes, or combinations thereof.

In some embodiments, the analyte is an RNA molecule. In some embodiments, the RNA molecule is an mRNA molecule.

In some embodiments, the nucleic acid is a nucleic acid that is deregulated or differentially expressed in cancer cells. In some embodiments, the nucleic acid is a nucleic acid that is deregulated or differentially expressed in immune cells. In some embodiments, the nucleic acid is a nucleic acid that is deregulated in a cell signaling pathway. In some embodiments, the nucleic acid is a nucleic acid that is deregulated or differentially expressed in neural cells.

Also disclosed herein are methods for identifying the location of a nucleic acid in a biological sample, the method comprising: (a) Contacting a plurality of capture agents with a biological sample disposed on a substrate, in some embodiments, the capture agents of the plurality of capture agents comprise (i) a binding moiety that specifically binds to nucleic acids in the biological sample, (ii) a binding moiety barcode, and (iii) a capture sequence; the substrate comprises a plurality of capture probes, in some embodiments, the capture probes of the plurality of capture probes comprise a spatial barcode and a capture domain, in some embodiments, the capture domain specifically binds to an analyte capture sequence; and (b) extending the 3' end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and (c) amplifying the extended capture probes to produce nucleic acids; in some embodiments, the nucleic acid comprises (i) a spatial barcode or complement thereof and (ii) a partial analyte binding moiety barcode or complement thereof; (d) releasing the resulting nucleic acid from the extended capture probe; (e) Contacting the plurality of released nucleic acids from step (d) with a plurality of decoy oligonucleotides, in some embodiments, the decoy oligonucleotides of the plurality of decoy oligonucleotides comprise a capture domain that specifically binds to all or part of (i) the nucleic acid or complement thereof and/or (ii) the analyte binding moiety barcode or complement thereof; (f) Capturing a complex of decoy oligonucleotides that specifically bind to a nucleic acid or complement thereof or a complex of decoy oligonucleotides that specifically bind to an analyte binding moiety barcode or complement thereof using a substrate comprising an agent that specifically binds to a molecular tag; and (g) determining all or part of the sequence of (i) the nucleic acid or complement thereof and/or (ii) the analyte binding moiety barcode or complement thereof, and using the sequences determined in (i) and (ii) to identify the location of the nucleic acid in the biological sample.

Also disclosed herein are compositions comprising decoy oligonucleotides bound to nucleic acids, in some embodiments, the nucleic acids comprise (i) a spatial barcode or complement thereof and (ii) a partial analyte binding moiety barcode or complement thereof.

In some embodiments, the decoy oligonucleotide binds to the nucleic acid through a capture domain that specifically binds to (i) all or a portion of the nucleic acid and/or (ii) all or a portion of the analyte binding moiety barcode or its complement.

In some embodiments, the composition further comprises a molecular tag. In some embodiments, the composition further comprises an agent that specifically binds to the molecular tag. In some embodiments, the molecular tag comprises a protein. In some embodiments, the protein is streptavidin, avidin, or biotin. In some embodiments, the molecular tag comprises a small molecule. In some embodiments, the molecular tag comprises a nucleic acid. In some embodiments, the molecular tag comprises a carbohydrate. In some embodiments, the molecular tag is located 5' to the domain in the decoy oligonucleotide. In some embodiments, the molecular tag is located 3' of the domain in the decoy oligonucleotide.

In some embodiments, the agent comprises streptavidin, avidin, or biotin. In some embodiments, the agent that specifically binds to the molecular tag comprises a protein. In some embodiments, the protein is an antibody. In some embodiments, the agent that specifically binds to the molecular tag comprises a nucleic acid. In some embodiments, the agent that specifically binds to the molecular tag comprises a small molecule.

In some embodiments, the composition further comprises a substrate. In some embodiments, the substrate is a bead. In some embodiments, the substrate is a well. In some embodiments, the substrate is a slide.

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent application, or information item was specifically and individually indicated to be incorporated by reference. To the extent that publications, patents, patent applications, and information items incorporated by reference contradict the disclosure contained in this specification, it is intended that this specification take precedence over any conflicting material.

Where a range is recited, it is understood that the description includes disclosure of all possible sub-ranges within the range, as well as disclosure of particular values within the range, whether or not the particular values or sub-ranges are explicitly recited.

The term "each" when referring to a group of items is intended to identify an individual item in the collection, but does not necessarily refer to each item in the collection unless specifically stated otherwise or unless the context of the usage clearly indicates otherwise.

As used herein, the term "analyte derivative" refers to a molecule (e.g., a nucleic acid or protein molecule) derived from an analyte (e.g., a nucleic acid) or conveying information (e.g., homology information) with respect to the analyte (e.g., a protein). In some embodiments, the analyte derivative includes all or part of an analyte (e.g., a nucleic acid) described herein. In some embodiments, the analyte derivative includes all or part of the analyte capture agents described herein (e.g., all or part of the analyte capture moiety, analyte binding moiety barcode, and/or analyte capture sequence).

Various embodiments of features of the present invention are described herein. However, it should be understood that these embodiments are provided by way of example only and that many changes, modifications, and substitutions may be made by one of ordinary skill in the art without departing from the scope of the present disclosure. It should also be understood that various alternatives to the specific embodiments described herein are also within the scope of the present disclosure.

Drawings

The following drawings illustrate certain embodiments of the features and advantages of the present invention. These embodiments are not intended to limit the scope of the appended claims in any way. Like reference symbols in the drawings indicate like elements.

Fig. 1 is a schematic diagram illustrating an example of a barcoded capture probe as described herein.

FIG. 2 is a schematic diagram illustrating a cleavable capture probe, wherein the cleaved capture probe can enter a non-permeabilized cell and bind to a target analyte within a sample.

FIG. 3 is a schematic diagram of an exemplary multiple spatial barcoded feature.

FIG. 4 is a schematic diagram of an exemplary analyte capture agent.

FIG. 5 is a schematic diagram depicting an exemplary interaction between a feature-immobilized capture probe 524 and an analyte capture agent 526.

Fig. 6A-6C are schematic diagrams illustrating how streptavidin cell tags are utilized in an array-based system to generate spatially barcoded cells or cell contents.

FIG. 7 is a schematic of a workflow showing targeted spatial gene expression.

FIGS. 8A-8B show mid-target, enrichment and complexity values, wherein different spatial libraries were used. The spatial library comprises samples from heart, breast cancer or lymph. Either hl1k_200.1 group or Immune group was used.

FIG. 9 is an exemplary workflow showing targeted spatial gene expression.

FIG. 10 is an exemplary workflow for hybridization and capture-based enrichment of targeted nucleic acid sequences using a spatial gene expression library.

FIG. 11 is a schematic showing correlation between UMI counts in mouse brain tissue space versus Visium control whole transcriptome analysis (X-axis) using 65 gene targeted enriched nerve groups (Y-axis).

Fig. 12A-12B are exemplary pictures showing a) OLIG2 gene expression from a spatial array of the visual full transcriptome (control) and B) OLIG2 gene expression from a 65 gene targeted enriched nerve group. The picture is associated with the data of fig. 11.

Fig. 13A-13B are exemplary graphs showing UMI count correlation between a) map readings of whole transcriptome Visium versus 4 Visium-targeted genomes and B) Visium whole transcriptome control (X axis) versus Visium-targeted pan-oncogene genome (Y axis).

Fig. 14A-14B are exemplary graphs showing targeting metrics for a spatial array of 12 human tissue types using 4 targeted gene-enriched sets compared to a) a gene portion recovered for a matched Visium full transcriptome and B) a UMI R square (R-squared) consensus Visium full transcriptome spatial array for a matched Visium full transcriptome.

FIGS. 15A-15C show human cortex clusters of 6 gene clusters of A) Visium holoton and B) neuroscience targeted gene-enriched group. Fig. 15C shows the UMI correlation between fig. 15A and 15B.

FIGS. 16A-16D show spatial gene expression in human breast cancer tissue sections from the pan-oncogene genome; a) pathologist annotation (control), B) full transcriptome Visium data, C) 196 breast oncogenes from the pan-cancer enriched group, and D) ERBB2 gene expression from the pan-cancer enriched group.

17A-17D provide details regarding genetic targets included in a cancer group according to embodiments of the present disclosure.

18A-18C provide details regarding the genetic targets contained in the immunological probe sets according to embodiments of the present disclosure.

19A-19D provide details regarding the genetic targets contained in a pathway set according to embodiments of the disclosure.

Figures 20A-20D provide details regarding genetic targets contained in a neural group according to embodiments of the present disclosure.

Detailed Description

I. Introduction to the invention

The spatial analysis methods and compositions described herein can provide expression data for a large number of analytes and/or a wide variety of analytes within a biological sample with high spatial resolution while preserving natural spatial background information. Spatial analysis methods and compositions can include, for example, the use of capture probes that include a spatial barcode (e.g., a nucleic acid sequence that provides information about the location or orientation of an analyte within a cell or tissue sample (e.g., a mammalian cell or mammalian tissue sample), and a capture domain that is capable of binding to an analyte (e.g., a protein and/or nucleic acid) that is produced by the cell and/or is present therein. The spatial analysis methods and compositions may also include the use of capture probes with capture domains that capture intermediates (intermediate agent) for the indirect detection of analytes. For example, an intermediate may include a nucleic acid sequence (e.g., a barcode) associated with the intermediate. Thus, detection of the intermediate is indicative of the analyte in the cell or tissue sample.

Non-limiting aspects of methods and compositions of spatial analysis are described in U.S. patent nos. 10,774,374, 10,724,078, 10,480,022, 10,059,990, 10,041,949, 10,002,316,9,879,313,9,783,841,9,727,810,9,593,365,8,951,726,8,604,182,7,709,198, U.S. patent application publication nos. 2020/239946, 2020/080136, 2020/0277663, 2020/024341, 2019/330617, 2019/264268, 2020/256867, 2020/224244, 2019/194709, 2019/161796, 2019/085383, 2019/055594, 2018/216161, 2018/051322, 2018/0245142, 2017/241911, 2017/089811, 2017/067096, 2017/029875, 2017/0016053, 2016/108458, 2015/000854, 2013/171621, wo 2018/091676, wo 2020/176788, rodriliques et al, science 363 (6434): 1463-1467, 2019; lee et al, nat. Protoc.10 (3): 442-458, 2015; trejo et al, PLoS ONE 14 (2): e0212031 2019; chen et al, science 348 (6233): aaa6090, 2015; gao et al, BMC biol 15:50 2017; and Gupta et al, nature biotechnology.36: 1197-1202, 2018; visium spatial gene expression kit user guide (Visium Spatial Gene Expression Reagent Kits User Guide) (e.g., rev C, month 6 of date 2020), and/or Visium spatial tissue optimization kit user guide (Visium Spatial Tissue Optimization Reagent Kits User Guide) (e.g., rev C, month 7 of date 2020), both available from 10x Genomics Inc. (10 x Genomics) support document sites, which can be used in any combination. Other non-limiting aspects of the spatial analysis methods and compositions are described herein.

Some general terms that may be used in the present disclosure may be found in part (I) (b) of WO2020/176788 and/or U.S. patent application publication No. 2020/0277663. Typically, a "barcode" is a label or identifier that conveys or is capable of conveying information (e.g., information about analytes, beads, and/or capture probes in a sample). The barcode may be part of the analyte or may be independent of the analyte. The barcode may be attached to the analyte. Certain bar codes may be unique relative to other bar codes. For purposes of the present invention, an "analyte" may include any biological substance, structure, moiety or component to be analyzed. The term "target" may similarly refer to an analyte of interest.

Analytes can be broadly divided into two categories: nucleic acid analytes and non-nucleic acid analytes. Examples of non-nucleic acid analytes include, but are not limited to, lipids, carbohydrates, peptides, proteins, glycoproteins (N-linked or O-linked), lipoproteins, phosphoproteins, specific phosphorylated or acetylated variants of proteins, amidated variants of proteins, hydroxylated variants of proteins, methylated variants of proteins, ubiquitinated variants of proteins, sulfated variants of proteins, viral proteins (e.g., viral capsids, viral envelopes, viral shells, viral appendages, viral glycoproteins, viral spikes, etc.), extracellular and intracellular proteins, antibodies, and antigen binding fragments. In some embodiments, the analyte can be localized to a subcellular location, including, for example, organelles such as mitochondria, golgi apparatus, endoplasmic reticulum, chloroplast, endocytic vesicle, efflux vesicle, vacuole, lysosome, and the like. In some embodiments, the analyte may be a peptide or protein, including but not limited to antibodies and enzymes. Other examples of analytes can be found in WO2020/176788 part (I) (c) and/or U.S. patent application publication No. 2020/0277663. In some embodiments, the analyte can be detected indirectly, for example, by detecting an intermediate, such as a ligated probe (e.g., ligation product) or an analyte capture agent (e.g., an oligonucleotide-coupled antibody), such as those described herein.

A "biological sample" is typically obtained from a subject for analysis using any of a variety of techniques, including but not limited to biopsy, surgery, and Laser Capture Microscopy (LCM), and typically includes cells and/or other biological material from the subject. In some embodiments, the biological sample may be a tissue slice. In some embodiments, the biological sample may be a fixed and/or stained biological sample (e.g., a fixed and/or stained tissue section). Non-limiting examples of staining agents include tissue staining agents (e.g., hematoxylin and/or eosin) and immunostaining agents (e.g., fluorescent staining agents). In some embodiments, a biological sample (e.g., a fixed and/or stained biological sample) can be imaged. Biological samples are also described in WO2020/176788 part (I) (d) and/or U.S. patent application publication No. 2020/0277663.

In some embodiments, the biological sample is permeabilized with one or more permeabilization reagents. For example, permeabilization of a biological sample can facilitate capture of an analyte. Exemplary permeabilizing agents and conditions are described in WO2020/176788 part (I) (d) (ii) (13) or exemplary embodiment part and/or U.S. patent application publication No. 2020/0277663.

Array-based spatial analysis methods involve transferring one or more analytes from a biological sample to an array of features on a substrate, where each feature is associated with a unique spatial location on the array. Subsequent analysis of the transferred analyte includes determining identity of the analyte and spatial location of the analyte in the biological sample. The spatial location of the analyte in the biological sample is determined based on the characteristics of the array to which the analyte binds (e.g., directly or indirectly) and the relative spatial location of the characteristics on the array.

"capture probe" refers to any molecule capable of capturing (directly or indirectly) and/or labeling an analyte (e.g., an analyte of interest) in a biological sample. In some embodiments, the capture probe is a nucleic acid or a polypeptide. In some embodiments, the capture probe includes a barcode (e.g., a spatial barcode and/or a Unique Molecular Identifier (UMI)) and a capture domain. In some embodiments, the capture probes can include cleavage domains and/or functional domains (e.g., primer binding sites, e.g., for Next Generation Sequencing (NGS)).

Fig. 1 is a schematic diagram illustrating an exemplary capture probe as described herein. As shown, capture probes 102 are optionally coupled to features 103 through cleavage domains 101, e.g., disulfide bonds. The capture probe may include a functional sequence 104 useful for subsequent processing. Functional sequence 104 may include all or part of a sequencer-specific flow cell attachment sequence (e.g., a P5 or P7 sequence), all or part of a sequencing primer sequence (e.g., an R1 primer binding site, an R2 primer binding site), or a combination thereof. The acquisition probe may also include a spatial barcode 10. The capture probe may also include a Unique Molecular Identifier (UMI) sequence 106. Although fig. 1 shows spatial barcode 105 located upstream (5 ') of UMI sequence 106, it should be understood that capture probes wherein UMI sequence 106 is located upstream (5') of spatial barcode 105 are also suitable for use in any of the methods described herein. The capture probes may also include a capture domain 107 to facilitate capture of target analytes. In some embodiments, the capture probes comprise one or more additional functional sequences, which may be located, for example, between the spatial barcode 105 and the UMI sequence 106, between the UMI sequence 106 and the capture domain 107, or after the capture domain 107. The capture domain may have a sequence complementary to the nucleic acid analyte sequence. The capture domain may have a sequence complementary to the ligated probes described herein. The capture domain may have a sequence complementary to a capture handle (handle) sequence present in the analyte capture agent. The capture domain may have a sequence complementary to a splint (splint) oligonucleotide. Such splint oligonucleotides may have, in addition to the sequence complementary to the capture domain of the capture probe, the sequence of the nucleic acid analyte, the sequence complementary to a portion of the ligated probe described herein, and/or the capture handle sequence described herein.

The functional sequence may generally be selected to be compatible with any of a variety of different sequencing systems, such as ion torrent protons (Ion Torrent Proton) or PGMs, illumina sequencers, pacbrio, oxford nanopores (Oxford nanopores), and the like, and the requirements thereof. In some embodiments, the functional sequences may be selected to be compatible with non-commercial sequencing systems. Examples of such sequencing systems and techniques that may use suitable functional sequences include, but are not limited to, ion-torrent proton or PGM sequencing, illumina sequencing, pacbrio SMRT sequencing, and oxford nanopore sequencing. Furthermore, in some embodiments, the functional sequences may be selected to be compatible with other sequencing systems (including non-commercial sequencing systems).

In some embodiments, the spatial barcode 105 and the functional sequence 104 are common to all probes attached to a given feature. In some embodiments, the UMI sequence 106 of the capture probe attached to a given feature is different from the UMI sequence of a different capture probe attached to a given feature.

FIG. 2 is a schematic diagram illustrating a cleavable capture probe, wherein the cleaved capture probe can enter a non-permeabilized cell and bind to an analyte within a sample. The capture probe 201 comprises a cleavage domain 202, a cell penetrating peptide 203, a reporter 204 and a disulfide (-S-). 205 represents all other parts of the capture probe, e.g. the spatial barcode and the capture domain.

FIG. 3 is a schematic diagram of an exemplary multiple spatial barcoded feature. In fig. 3, feature 301 may be coupled to a spatially barcoded capture probe, where the spatially barcoded probe of a particular feature may have the same spatial barcode, but have different capture domains designed to associate the spatial barcode of the feature with multiple target analytes. For example, the features may be coupled to four different types of spatially barcoded capture probes, each type of spatially barcoded capture probe having a spatial barcode 302. One type of capture probe associated with this feature includes a combination of a spatial barcode 303 and a poly (T) capture domain 403, which is designed to capture an mRNA target analyte. The second type of capture probes associated with this feature include a combination of spatial barcodes 304 and random N-mer capture domains 804 for gDNA analysis. A third type of capture probe associated with this feature comprises a spatial barcode 302 in combination with a capture domain complementary to the capture handle sequence of the analyte capture agent of interest 305. A fourth type of capture probe associated with this feature includes a spatial barcode 306 in combination with a capture probe that can specifically bind to a nucleic acid molecule 806 that can function in a CRISPR assay (e.g., CRISPR/Cas 9). Although only four different capture probe barcoded constructs are shown in fig. 3, the capture probe barcoded constructs can be tailored for analysis of any given analyte associated with a nucleic acid and can bind to such constructs. For example, the protocol shown in fig. 3 may also be used for simultaneous analysis of other analytes disclosed herein, including but not limited to: (a) mRNA, lineage-tracking constructs, cell surface or intracellular proteins and metabolites, and gDNA; (b) mRNA, available chromatin (e.g., ATAC-seq, dnase-seq, and/or mnazyme-seq) cell surface or intracellular proteins and metabolites, and perturbation agents (e.g., CRISPR-crRNA/sgRNA, TALEN, zinc finger nucleases, and/or antisense oligonucleotides as described herein); (c) mRNA, cell surface or intracellular proteins and/or metabolites, barcoded markers (e.g., MHC multimers described herein) and V (D) J sequences of immune cell receptors (e.g., T cell receptors). In some embodiments, the perturbation agent may be a small molecule, an antibody, a drug, an aptamer, a miRNA, a physical environment (e.g., a temperature change), or any other known perturbation agent. See, for example, WO2020/176788, section (II) (b) (e.g., sections (i) - (vi)) and/or U.S. patent application publication No. 2020/0277663. The generation of capture probes may be achieved by any suitable method, including those described in section (II) (d) (II) of WO2020/176788 and/or U.S. patent application publication No. 2020/0277663.

In some embodiments, any suitable multiplexing technique (e.g., as described in section (IV) of WO 2020/176788 and/or U.S. patent application publication No. 2020/0277663) may be employed to detect (e.g., simultaneously or sequentially detect) more than one analyte type (e.g., nucleic acid and protein) from a biological sample.

In some embodiments, detection of one or more analytes (e.g., protein analytes) may be performed using one or more analyte capture agents. As used herein, an "analyte capture agent" refers to a substance that interacts with an analyte (e.g., an analyte in a biological sample) and with a capture probe (e.g., a capture probe attached to a substrate or feature) to identify the analyte. In some embodiments, the analyte capture agent comprises: (i) An analyte binding moiety (e.g., that binds to an analyte), such as an antibody or antigen binding fragment thereof; (ii) an analyte binding moiety barcode; and (iii) a capture handle sequence (e.g., an analyte capture sequence). As used herein, the term "analyte binding moiety barcode" refers to a barcode associated with or otherwise identifying an analyte binding moiety. As used herein, the term "analyte capture sequence" or "capture handle sequence" refers to a region or portion that is configured to hybridize to, bind to, couple to, or otherwise interact with a capture domain of a capture probe. In some embodiments, the capture handle sequence is complementary to the capture domain of the capture probe. In some cases, the analyte binding moiety bar code (or portion thereof) may be capable of being removed (e.g., cleaved) from the analyte capture agent.

FIG. 4 is a schematic diagram of an exemplary analyte capture agent 402 comprised of an analyte binding moiety 404 and an analyte binding moiety barcode 408. Exemplary analyte binding moieties 404 are molecules capable of binding to analyte 406, and a reactive species capture agent is capable of interacting with a spatially barcoded capture probe. The analyte binding moiety can bind to analyte 406 with high affinity and/or high specificity. The analyte capture agent can include an analyte binding moiety barcode domain 408, a nucleotide sequence (e.g., an oligonucleotide) that hybridizes to at least a portion or all of the capture domain of the capture probe. Reactive species binding moiety barcode processing and use 408 may comprise an analyte binding moiety barcode and capture handle sequence as described herein. Analyte binding moiety 404 may include a polypeptide and/or an aptamer. Analyte binding moiety 404 can include an antibody or antibody fragment (e.g., an antigen binding fragment).

FIG. 5 is a schematic diagram depicting an exemplary interaction between a feature-immobilized capture probe 524 and an analyte capture agent 526. The feature-immobilized capture probe 524 may include a spatial barcode 508 and functional sequences 506 and UMI 510, as described elsewhere herein. The capture probes may also include capture domains 512 capable of binding to analyte capture agents 526. The analyte capture agent 526 may include a functional sequence 518, an analyte binding moiety barcode 516, and a capture handle sequence 514 capable of binding to the capture domain 512 of the capture probe 524. The analyte capture agent can also include a linker 520 that couples the capture agent barcode domain 516 to the analyte binding moiety 522.

Fig. 6A, 6B and 6C are schematic diagrams illustrating how streptavidin cell tags are utilized in an array-based system to generate spatially barcoded cells or cell contents. For example, as shown in fig. 6A, the peptide-bound Major Histocompatibility Complex (MHC) may be associated with biotin (β2m) alone and bound to a streptavidin moiety such that the streptavidin moiety comprises multiple pMHC moieties. Each of these moieties can bind to the TCR such that streptavidin binds to the target T cell through multiple MCH/TCR binding interactions. The multiple interactions act synergistically to greatly increase binding affinity. This improved affinity may improve the labeling of T cells and may also reduce the likelihood of dissociation of the label from the T cell surface. As shown in fig. 6B, the capture agent barcode domain 601 may be modified by streptavidin 602 and contacted with multiple molecules of biotinylated MHC603 such that the biotinylated MHC603 molecules are coupled to the streptavidin-coupled capture agent barcode domain 601. The result is a barcoded MHC multimeric complex 605. As shown in fig. 6B, the capture agent barcode domain sequence 601 may recognize MHC as its associated tag, and also include optional functional sequences, such as sequences for hybridization with other oligonucleotides. In certain instances, the capture probe 606 may be first associated with and released from a feature (e.g., a gel bead), in other embodiments, the capture probe 606 may hybridize to the capture agent barcode domain 605 of the MHC-oligonucleotide complex 601, the hybridized oligonucleotides (spacer C C and spacer rGrGrG) may then be extended in a primer extension reaction such that a construct is generated comprising sequences corresponding to each of the two spatial barcode sequences (spatial barcode associated with the capture probe, the barcode associated with the MHC-oligonucleotide complex), in certain instances, the capture probe 606 may be first associated with and released from a feature (e.g., a gel bead), in other embodiments, the capture probe 606 may be hybridized to the capture agent barcode domain 605 of the MHC-oligonucleotide complex 601, the spacer C C and spacer rGrG) may then be extended in a primer extension reaction such that a construct comprising sequences corresponding to each of the two spatial barcode sequences (spatial barcode associated with the capture probe), in certain instances, one or both of the corresponding sequences may be the capture probe 606 may be attached to the capture probe and optionally be cleaved together at the capture domain, such as the capture probe 606 may be further processed and/or the capture domain may be further processed, and sequences derived from the spatial barcode sequence on the capture agent barcode domain 601 can be used to recognize specific peptide MHC complexes 604 bound on the cell surface (e.g., when using MHC peptide libraries for screening immune cells or immune cell populations).

Additional descriptions of analyte capture agents can be found in WO2020/176788 part (II) (b) (ix) and/or U.S. patent application publication No. 2020/0277663 part (II) (b) (viii).

There are at least two methods of associating a spatial barcode with one or more adjacent cells, such that the spatial barcode identifies the one or more cells and/or the content of the one or more cells as being associated with a particular spatial location. One approach is to facilitate removal of the analyte or analyte surrogate (proxy) (e.g., an intermediate) from the cell and toward a spatial barcoded array (e.g., including a spatial barcoded capture probe). Another approach is to cleave spatially barcoded capture probes from the array and facilitate the spatially barcoded capture probes toward and/or into or onto the biological sample.

In some cases, the capture probes may be used to prime, replicate and optionally generate a barcoded extension product from a template (e.g., a DNA or RNA template, such as an analyte or intermediate (e.g., a conjugated probe (e.g., ligation product) or analyte capture agent) or a portion thereof) or derivative thereof (see, e.g., the (II) (b) (vii) portion of WO2020/176788 and/or U.S. patent application publication No. 2020/0277663 for extended capture probes). In some cases, the capture probes can be used to form bound probes (e.g., ligation products) with templates (e.g., DNA or RNA templates, such as analytes or intermediates or portions thereof), thereby producing ligation products that serve as template substitutes.

As used herein, an "extended capture probe" refers to a capture probe having additional nucleotides added to the end (e.g., the 3 'or 5' end) of the capture probe to extend the total length of the capture probe. For example, "extended 3 'end" means that additional nucleotides are added to the most 3' nucleotide of the capture probe to extend the length of the capture probe, e.g., by polymerization reactions for extended nucleic acid molecules, including templated polymerization catalyzed by a polymerase (e.g., DNA polymerase or reverse transcriptase). In some embodiments, the extended capture probe comprises adding to the 3' end of the capture probe a nucleic acid sequence that is complementary to a nucleic acid sequence of an analyte or intermediate that specifically binds to the capture domain of the capture probe. In some embodiments, the capture probe uses reverse transcription extension. In some embodiments, the capture probes are extended using one or more DNA polymerases. The extended capture probes include the sequence of the capture probes and the spatial barcode sequences of the capture probes.

In some embodiments, the extended capture probes are amplified (e.g., in bulk solution or on an array) to produce an amount sufficient for downstream analysis (e.g., by DNA sequencing). In some embodiments, the extended capture probes (e.g., DNA molecules) serve as templates for an amplification reaction (e.g., polymerase chain reaction).

Other variations of the spatial analysis method, including in some embodiments an imaging step, are described in WO2020/176788, part (II) (a) and/or U.S. patent application publication No. 2020/0277663. Analysis of captured analytes (and/or intermediates or portions thereof), for example, includes sample removal, extension of the capture probes, sequencing (e.g., sequencing of cleaved extended capture probes and/or cDNA molecules complementary to extended capture probes), sequencing on an array (e.g., using, for example, in situ hybridization or in situ ligation methods), time domain analysis and/or proximity capture, as described in WO2020/176788, section (II) (g) and/or U.S. patent application publication No. 2020/0277663. Some quality control measures are also described in WO2020/176788 part (II) (h) and/or U.S. patent application publication No. 2020/0277663.

The spatial information may provide information of biological and/or medical importance. For example, the methods and compositions described herein may allow for: identifying one or more biomarkers of a disease or disorder (e.g., diagnosis, prognosis, and/or for determining treatment efficacy); determining candidate drug targets for treating a disease or disorder; identifying (e.g., diagnosing) a subject as having a disease or disorder; identifying a stage and/or prognosis of a disease or disorder in a subject; identifying the subject as having an increased likelihood of developing a disease or disorder; monitoring the progress of a disease or disorder in a subject; determining the efficacy of treating a disease or disorder in a subject; determining a patient subpopulation for which treatment is effective against the disease or disorder; modification of treatment of a subject suffering from a disease or disorder; selecting a subject for participation in a clinical trial; and/or selecting a treatment for a subject suffering from a disease or disorder.

The spatial information may provide information of biological importance. For example, the methods and compositions described herein may allow for: identifying transcriptome and/or proteome expression profiles (e.g., in healthy and/or diseased tissue); identifying multiple analyte types at close range (e.g., nearest neighbor analysis); determining genes and/or proteins up-regulated and/or down-regulated in diseased tissue; characterization of tumor microenvironment; characterization of tumor immune response; characterization of cell types and their co-localization in tissues; identification of genetic variation within a tissue (e.g., based on gene and/or protein expression profiles associated with specific disease or disorder biomarkers).

Typically, for spatial array-based methods, the substrate serves to support the attachment of capture probes directly or indirectly to the array features. A "feature" is an entity that serves as a support or repository for various molecular entities used in spatial analysis. In some embodiments, some or all of the features in the array are functionalized for analyte capture. Exemplary substrates are described in WO2020/176788, section (II) (c) and/or U.S. patent application publication No. 2020/0277663. Exemplary features and geometrical properties of the arrays can be found in sections (II) (d) (i), (II) (d) (iii) and (II) (d) (iv) of WO2020/176788 and/or in U.S. patent application publication No. 2020/0277663.

Typically, the analyte and/or intermediate (or portion thereof) may be captured when the biological sample is contacted with a substrate comprising capture probes (e.g., a substrate having capture probes embedded, spotted, printed, fabricated on the substrate, or a substrate having features (e.g., beads, wells) comprising capture probes). As used herein, contacting a biological sample with a substrate refers to any contact (e.g., direct or indirect) such that the capture probes can interact (e.g., covalently or non-covalently bind (e.g., hybridize)) with an analyte from the biological sample. The capturing may be effected actively (e.g., using electrophoresis) or passively (e.g., using diffusion). Analyte capture is further described in WO2020/176788 part (II) (e) and/or U.S. patent application publication No. 2020/0277663.

In some cases, spatial analysis may be performed by attaching and/or introducing molecules (e.g., peptides, lipids, or nucleic acid molecules) having a barcode (e.g., a spatial barcode) to a biological sample (e.g., to cells in a biological sample). In some embodiments, a plurality of molecules (e.g., a plurality of nucleic acid molecules) having a plurality of barcodes (e.g., a plurality of spatial barcodes) are introduced into a biological sample (e.g., a plurality of cells in a biological sample) for spatial analysis. In some embodiments, after attaching and/or introducing the molecule with the barcode to the biological sample, the biological sample may be physically separated (e.g., dissociated) into single cells or cell populations for analysis. Some such spatial analysis methods are described in WO2020/176788 part (III) and/or U.S. patent application publication No. 2020/0277663.

In some cases, spatial analysis may be performed by detecting a plurality of oligonucleotides hybridized to the analyte. In some cases, for example, spatial analysis may be performed using RNA Template Ligation (RTL). The method of RTL has been described previously. See, e.g., credle et al, nucleic Acids res.2017, 8, 21; 45 (14): e128. typically, RTL involves hybridization of two oligonucleotides to adjacent sequences on an analyte (e.g., an RNA molecule, such as an mRNA molecule). In some cases, the oligonucleotide is a DNA molecule. In some cases, one of the oligonucleotides comprises at least two ribonucleobases at the 3 'end and/or the other oligonucleotide comprises a phosphorylated nucleotide at the 5' end. In some cases, one of the two oligonucleotides includes a capture domain (e.g., a poly (a) sequence, a non-homopolymeric sequence). After hybridization to the analyte, a ligase (e.g., a SplintR ligase) ligates the two oligonucleotides together, producing a bound probe (e.g., ligation product). In some cases, two oligonucleotides hybridize to sequences that are not adjacent to each other. For example, hybridization of two oligonucleotides creates a gap between hybridized oligonucleotides. In some cases, a polymerase (e.g., a DNA polymerase) may extend one of the oligonucleotides prior to ligation. After ligation, the bound probes (e.g., ligation products) are released from the analyte. In some cases, the bound probe (e.g., ligation product) is released using an endonuclease (e.g., rnase H). The released bound probes (e.g., ligation products) may then be captured by capture probes on the array (e.g., in lieu of direct capture of the analyte), optionally amplified and sequenced, to thereby determine the location and optionally abundance of the analyte in the biological sample.

During spatial information analysis, sequence information of the spatial barcode associated with the analyte is obtained and can be used to provide information about the spatial distribution of the analyte in the biological sample. Various methods may be used to obtain the spatial information. In some embodiments, specific capture probes and analytes they capture are associated with specific locations in the feature array on the substrate. For example, a particular spatial barcode may be associated with a particular array location prior to array fabrication, and a sequence of spatial barcodes may be stored (e.g., in a database) with particular array location information such that each spatial barcode is uniquely mapped to a particular array location.

Alternatively, a particular spatial barcode may be deposited at predetermined locations in the array of features during manufacture such that at each location there is only one type of spatial barcode, whereby the spatial barcode is uniquely associated with a single feature of the array. If desired, the array may be decoded using any of the methods described herein so that the spatial bar code is uniquely associated with the array feature locations, and the mapping may be stored as described above.

When sequence information for the capture probes and/or analytes is obtained during spatial information analysis, the location of the capture probes and/or analytes may be determined by reference to stored information that uniquely correlates each spatial barcode with a characteristic location of the array. In this way, specific capture probes and capture analytes are associated with specific locations in the feature array. Each array feature location represents a location of a coordinate reference point (e.g., array location, fiducial marker) relative to the array. Thus, each feature location has an "address" or location in the coordinate space of the array.

Some exemplary spatial analysis workflows are described in the exemplary embodiments section of WO2020/176788 and/or in U.S. patent application publication No. 2020/0277663. See, for example, WO2020/176788 and/or U.S. patent application publication 2020/0277663 for some non-limiting examples of workflows described herein, a sample may be immersed in an exemplary embodiment beginning with … … ". See also, e.g., visium spatial gene expression kit user guide (Visium Spatial Gene Expression Reagent Kits User Guide) (e.g., rev C, month 6 of 2020), targeted gene expression spatial user guide (the Targeted Gene Expression-Spatial User Guide) (e.g., rev A, month 10 of 2020), and/or Visium spatial tissue optimization kit user guide (Visium Spatial Tissue Optimization Reagent Kits User Guide) (e.g., rev C, month 7 of 2020).

In some embodiments, spatial analysis may be performed using dedicated hardware and/or software, such as part (II) (e) (II) and/or (V) of WO2020/176788 and/or any of the systems described in U.S. patent application publication No. 2020/0277663, or any one or more of the devices or methods described in the control slide for imaging, the method of using the control slide and substrate for imaging, the system of using the control slide and substrate for imaging and/or the sample and array alignment device and method, the information tag of WO 2020/123320.

Suitable systems for performing spatial analysis may include components such as a chamber (e.g., a flow cell or sealable fluid tight chamber) for containing a biological sample. The biological sample may be immobilized, for example, in a biological sample container. One or more fluid chambers may be connected to the chambers and/or sample containers by fluid conduits, and fluids may be delivered into the chambers and/or sample containers by fluid pumps, vacuum sources, or other devices connected to the fluid conduits that create pressure gradients to drive the fluid flow. One or more valves may also be connected to the fluid conduit to regulate the flow of reagents from the reservoir to the chamber and/or sample container.

The system may optionally include a control unit comprising one or more electronic processors, input interfaces, output interfaces (e.g., a display), and storage units (e.g., solid-state storage media such as, but not limited to, magnetic, optical, or other solid-state, persistent, writable, and/or rewritable storage media). The control unit may optionally be connected to one or more remote devices via a network. The control unit (and its components) may generally perform any of the steps and functions described herein. Where the system is connected to a remote device, the remote device (or devices) may perform any of the steps or features described herein. The system may optionally include one or more detectors (e.g., CCD, CMOS) for capturing images. The system may also optionally include one or more light sources (e.g., LED-based, diode-based, laser-based) for illuminating the sample, a substrate having features, analytes from the biological sample captured on the substrate, and various control and calibration media.

The system may optionally include software instructions encoded and/or implemented in one or more tangible storage media and hardware components (e.g., application specific integrated circuits). The software instructions, when executed by a control unit (particularly an electronic processor) or integrated circuit, may cause the control unit, integrated circuit, or other component executing the software instructions to perform any of the method steps or functions described herein.

In some cases, the systems described herein can detect (e.g., register images) biological samples on an array. Exemplary methods of detecting biological samples on an array are described in PCT application No. 2020/061064 and/or U.S. patent application Ser. No. 16/951,854.

The biological sample may be aligned with the array prior to transferring the analyte from the biological sample to the array of features on the substrate. Alignment of the biological sample and the feature array comprising capture probes may facilitate spatial analysis, which may be used to detect differences in the presence and/or level of an analyte in different locations in the biological sample, e.g., to generate a three-dimensional map of the presence and/or level of the analyte. Exemplary methods for generating two-and/or three-dimensional maps of analyte presence and/or level are described in PCT application No. 2020/053655, spatial analysis methods are generally described in WO2020/061108 and/or U.S. patent application serial No. 16/951,864.

In some cases, one or more fiducial markers may be used to align a map of analyte presence and/or level with an image of a biological sample, e.g., an object placed in the field of view of an imaging system appears in the generated image, as described in the substrate properties portion of WO2020/123320, the control slide portion for imaging, PCT application No. 2020/061066, and/or U.S. patent application serial No. 16/951,843. Fiducial markers may be used as reference points or measurement scales for alignment (e.g., to align a sample and an array, to align two substrates, to determine the position of a sample or array on a substrate relative to a fiducial marker) and/or for quantitative measurement of size and/or distance.

The sandwich process is described in PCT patent application publication WO 2020/123320, which is incorporated herein by reference in its entirety.

Targeted spatial gene expression profiling by hybridization and capture of spatial cDNA

(a) Introduction to the invention

Spatial analysis methods using capture probes and/or analyte capture agents provide information about the abundance and location of an analyte (e.g., a nucleic acid or protein). Traditionally, sequencing results contained unwanted genes, ribosomal or mitochondrial transcripts, as well as other uninteresting reads. Detection of the analyte of interest is largely dependent on the sequencing capability of all captured analytes on the (at least part of) array. Disclosed herein are methods and compositions for capturing target analytes of interest using decoy oligonucleotides specific for the target analytes of interest. In this way, the sequencing result contains a higher percentage of reads from the target analyte of interest, which increases the spatial resolution and sequencing cost of the target analyte of interest

Disclosed herein are methods of capturing target analytes of interest using decoy oligonucleotides specific for the target analytes of interest. As disclosed herein, decoy oligonucleotides are short (40 bp to 160 bp) oligonucleotides that hybridize to transcribed (e.g., mRNA) sequences, thereby detecting mRNA and its expression. The decoy oligonucleotide has been identified to hybridize to the 5 'end of the transcript, the 3' end of the transcript, or any intervening sequence of the transcript. In particular, there are certain advantages to designing probes to hybridize to the insertion sequences (i.e., not the 5 'and 3' ends) of transcripts. For example, many transcripts in the human genome have different sequences at the 5 'and 3' ends. Thus, designing a decoy that will target an intervening transcript sequence, particularly a decoy that targets a conserved sequence, can allow a single decoy to hybridize to multiple isoforms of the same gene. Thus, disclosed herein are compositions and methods comprising decoy oligonucleotides that hybridize to multiple isoforms of the same gene.

Thus, provided herein are methods that include designing and using probes (e.g., decoy oligonucleotides, nucleic acid decoys, etc.) to capture full-length (e.g., non-fragmented) cDNA for sequencing analysis, rather than decoy oligonucleotides for capturing final library fragments comprising the 3' utr. By targeting full-length cdnas, it is possible to utilize regions, e.g., coding sequences, of the target gene that are more reliably annotated for individual transcripts (e.g., isoforms) of the individual genes. Thus, the step of using the method to identify a target of interest in a sample includes, but is not limited to, preparing a nucleic acid library so that a nucleic acid decoy can hybridize to one or more analytes of interest; hybridizing the nucleic acid decoy to one or more analytes of interest; and determining the location and abundance of the target analyte in the biological sample.

Also featured herein are methods of detecting an analyte of interest that has been captured by a capture probe on a substrate (i.e., a spatial array). In some cases, after the analyte of interest is captured on the spatial array, a probe (e.g., a nucleic acid decoy) is used to identify the target analyte of interest. Thus, the step of using the method to identify a target of interest in a sample includes, but is not limited to, capturing an analyte in the sample using a capture probe; amplifying the hybridized capture probes/analyte products to create a cDNA library; preparing a cDNA library so that the nucleic acid bait can hybridize to one or more analytes of interest; hybridizing the nucleic acid decoy to one or more analytes of interest; and determining the location and abundance of the target analyte in the biological sample.

Profiling the phenotype of a biological sample using any of the methods described above, as well as those further illustrated herein, helps to avoid confounding biological factors, such as detection of strongly expressed genes that are not relevant to the study or cell cycle phenotypes that may mask biological differences. In addition, enhancing detection of one or more analytes of interest helps to significantly reduce the sequencing cost of obtaining information about the genome of interest by minimizing the number of reads that are consumed on genes that are not in the group (e.g., ribosomal protein transcripts, mitochondrial transcripts, etc.). As such, the methods described herein are also cost effective.

(b) Biological sample, analyte and preparation thereof

(i) Biological sample and analyte

In some embodiments, the biological sample used in the methods disclosed herein is a cell culture sample. In some cases, the biological sample is a tissue sample. In some embodiments, the biological sample is a tissue sample slice. In some embodiments, the biological sample is a fresh tissue sample. In some embodiments, the biological sample is a fresh frozen tissue sample. In some embodiments, the biological sample is a tissue sample that has been Formalin Fixed and Paraffin Embedded (FFPE) (i.e., FFPE sample). In some embodiments, the biological sample is a tissue sample embedded in an Optimal Cutting Temperature (OCT) compound. In some embodiments, the biological sample has been pre-stained (e.g., immunohistochemical (IHC) or histological staining) and imaged, and optionally decolorized.

In some embodiments, biological samples are obtained from a subject for analysis using any of a variety of techniques, including but not limited to biopsies, surgery, and Laser Capture Microscopy (LCM), and generally include cells and/or other biological materials from the subject. The biological sample may also be obtained from a prokaryote, such as a bacterium, e.g., escherichia coli, staphylococcus (staphylococcus) or mycoplasma pneumoniae (Mycoplasma pneumoniae); archaebacteria; viruses, such as hepatitis c virus or human immunodeficiency virus; or a viroid. Biological samples may also be obtained from eukaryotic organisms, such as patient-derived organoids (PDOs) or patient-derived xenografts (PDXs). The biological sample may be from a mammal, such as a human, a non-human mammal (e.g., a mouse), and the like. The subject from which the biological sample may be obtained may be a healthy or asymptomatic individual, an individual having or suspected of having a disease (e.g., a patient having a disease such as cancer), or an individual having a pretreatment for a disease, and/or an individual in need of treatment or suspected of requiring treatment.

The biological sample may include any number of macromolecules, such as cellular macromolecules and organelles (e.g., mitochondria and nuclei). The biological sample may be a nucleic acid sample and/or a protein sample. The biological sample may be a carbohydrate sample or a lipid sample. The biological sample may be obtained as a tissue sample, such as a tissue section, biopsy, core biopsy, needle aspirate, or fine needle aspirate. The sample may be a liquid sample, such as a blood sample, a urine sample, or a saliva sample. The sample may be a skin sample, a colon sample, a cheek swab, a histological sample, a histopathological sample, a plasma or serum sample, a tumor sample, living cells, cultured cells, a clinical sample, such as whole blood or blood derived products, blood cells or cultured tissues or cells, including cell suspensions.

The biological sample may be from a homogeneous culture or population of objects or organisms described herein, or alternatively from a collection of several different organisms in, for example, a community or an ecosystem. In some embodiments, the biological sample is a human sample.

The biological sample may include one or more diseased cells. The diseased cells may have altered metabolic characteristics, gene expression, protein expression, and/or morphological characteristics. Examples of diseases include inflammatory disorders, metabolic disorders, neurological disorders, and cancers. Cancer cells may be derived from solid tumors, hematological malignancies, and cell lines, and may also be obtained in the form of circulating tumor cells.

The biological sample may also include fetal cells. For example, procedures such as amniocentesis may be performed to obtain a fetal cell sample from the maternal circulation. Fetal cell sequencing can be used to identify any of a number of genetic diseases, including, for example, aneuploidy, such as down's syndrome, edwardsies syndrome, and pampers Tao Zeng syndrome. In addition, the cell surface characteristics of fetal cells may be used to identify any of a variety of diseases or conditions.

The biological sample may also include immune cells. Sequence analysis of immune functions of such cells, including genomics, proteomics, and cell surface features, can provide rich information that helps to understand the state and function of the immune system. For example, determining the status of Minimal Residual Disease (MRD) (e.g., negative or positive) in a Multiple Myeloma (MM) patient after autologous stem cell transplantation is considered a predictor of MRD in MM patients (see, e.g., U.S. patent application publication No. 2018/0156784, which is incorporated herein by reference in its entirety).

Examples of immune cells in a biological sample include, but are not limited to, B cells, T cells (e.g., cytotoxic T cells, natural killer T cells, regulatory T cells, and helper T cells), natural killer cells, cytokine-induced killer (CIK) cells, myeloid cells, such as granulocytes (basophils, eosinophils, neutrophils/multisection nuclear neutrophils), monocytes/macrophages, mast cells, platelets/megakaryocytes, and dendritic cells.

In some embodiments, the biological sample is fixed to a slide. In some embodiments, the sample is stained prior to creating the nucleic acid library (e.g., the plurality of nucleic acids). In some embodiments, the biological sample is stained while on a slide. In some embodiments, the stained biological sample is imaged prior to creating the nucleic acid library (e.g., the plurality of nucleic acids).

In some embodiments, staining includes biological staining techniques, such as H & E staining. In some embodiments, staining includes identifying the analyte using a fluorescent conjugated antibody (e.g., immunofluorescence). In some embodiments, the biological sample is stained using two or more different types of stains or two or more different staining techniques (e.g., IF, IHC, and/or H & E staining). For example, a biological sample may be prepared by: staining and imaging are performed by using one technique (e.g., H & E staining and bright field imaging), bleaching (e.g., quenching or photobleaching), and then staining and imaging the same biological sample using another technique (e.g., IHC/IF staining and fluorescence microscopy).

In some embodiments, the biological sample may be decolorized prior to creating the nucleic acid space library. Methods of decontamination or decolorization of biological samples are known in the art and generally depend on the nature of the stain applied to the sample.

In some embodiments, the analyte in the biological sample is a nucleic acid. In some embodiments, the analyte (e.g., nucleic acid) is obtained from a biological sample. In some embodiments, the nucleic acid is DNA (e.g., genomic DNA, mitochondrial DNA, or exosome DNA). In some embodiments, the nucleic acid is RNA. In some embodiments, the RNA is mRNA. Other examples of RNAs are, for example, various types of coding and non-coding RNAs. Different types of RNA analytes include messenger RNA (mRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), small RNA (miRNA), and viral RNA. The RNA can be a transcript (e.g., present in a tissue section). The RNA can be small (e.g., less than 200 nucleobases in length) or large (e.g., RNA greater than 200 nucleobases in length). The micrornas include mainly 5.8S ribosomal RNAs (rRNA), 5S rRNA, transfer RNAs (tRNA), micrornas (miRNA), small interfering RNAs (siRNA), micronucleolar RNAs (snRNA), piwi interacting RNAs (piRNA), tRNA-derived micrornas (tsRNA), and small rDNA-derived RNAs (srrrna). The RNA may be double-stranded RNA or single-stranded RNA. The RNA may be circular RNA. The RNA may be bacterial rRNA (e.g., 16s rRNA or 23s rRNA).

In some embodiments, the nucleic acid comprises DNA. Examples of DNA include genomic DNA, methylated DNA, specific methylated DNA sequences, fragmented DNA, mitochondrial DNA, PCR products synthesized in situ, and RNA/DNA hybrids.

(ii) Imaging and staining

In some cases, the biological sample may be stained using a variety of stains and staining techniques. In some cases, the biological sample is a slice of tissue (e.g., a 10 μm slice). In some cases, the biological sample is dried after being placed on the slide. In some cases, the biological sample is dried at 42 ℃. In some cases, drying occurs for about 1 hour, about 2 hours, about 3 hours, or until the slice becomes transparent. In some cases, the biological sample may be dried overnight (e.g., in a desiccator at room temperature).

In some embodiments, the sample may be stained using any number of biological stains including, but not limited to, acridine orange, bismaleimide brown, carmine, coomassie blue, cresol purple, DAPI, eosin, ethidium bromide, acid fuchsin, hematoxylin, holtz stain, iodine, methyl green, methylene blue, neutral red, nile blue, nile red, osmium tetroxide, propidium iodide, rhodamine, or safranine. In some cases, the methods disclosed herein comprise imaging a biological sample. In some cases, sample imaging occurs prior to biological sample deamination. In some cases, the sample may be stained using known staining techniques, including candlen (Can-Grunwald), giemsa (Giemsa), hematoxylin and eosin (H & E), hucho's (Jenner's), leishmania (Leishman), masson's) trichromatism, papanicolaou (Papanicolaou), luo Manla s-based (roman sky), silver (silver), sudan (Sudan), rayleigh's (Wright's), and/or PAS staining techniques. PAS staining is usually performed after formaldehyde or acetone fixation. In some cases, the dye is an H & E dye.

In some embodiments, a biological sample may be stained with a detectable label (e.g., radioisotope, fluorophore, chemiluminescent compound, bioluminescent compound, and dye) as described elsewhere herein. In some embodiments, the biological sample is stained using only one type of stain or one technique. In some embodiments, staining includes biological staining techniques, such as H & E staining. In some embodiments, staining comprises identifying the analyte using a fluorescent conjugated antibody. In some embodiments, the biological sample is stained using two or more different types of stains or two or more different staining techniques. For example, a biological sample may be prepared by staining and imaging using one technique (e.g., H & E staining and bright field imaging) followed by staining and imaging the same biological sample using another technique (e.g., IHC/IF staining and fluorescence microscopy).

In some embodiments, the biological sample may be decolorized. Methods of decontamination or decolorization of biological samples are known in the art and generally depend on the nature of the stain applied to the sample. For example, H & E staining can be decolorized by washing the sample in HCl or any other acid (e.g., selenoic acid, sulfuric acid, hydroiodic acid, benzoic acid, carbonic acid, malic acid, phosphoric acid, oxalic acid, succinic acid, salicylic acid, tartaric acid, sulfurous acid, trichloroacetic acid, hydrobromic acid, hydrochloric acid, nitric acid, orthophosphoric acid, arsenic acid, selenious acid, chromic acid, citric acid, hydrofluoric acid, nitrous acid, isocyanic acid, formic acid, hydrogen selenide, molybdic acid, lactic acid, acetic acid, carbonic acid, hydrogen sulfide, or combinations thereof). In some embodiments, decolorization may include 1, 2, 3, 4, 5 or more washes in acid (e.g., HCl). In some embodiments, decolorizing may include adding HCl to a downstream solution (e.g., permeabilization solution). In some embodiments, decolorization can include dissolving an enzyme (e.g., pepsin) used in the disclosed methods in an acid (e.g., HCl) solution. In some embodiments, after the hematoxylin is decolorized with acid, other reagents may be added to the decolorized solution to raise the pH for other applications. For example, SDS may be added to the acid decolorization solution to raise the pH as compared to the acid decolorization solution alone. As another example, in some embodiments, one or more immunofluorescent stains are applied to the sample by antibody coupling. These stains can be removed using techniques such as cleavage of disulfide bonds by reducing agent and detergent wash treatments, hygrosalt treatments, antigen recovery solution treatments, and acidic glycine buffer treatments. For example, bolognesi et al, J.Histochem. Cytochem.2017;65 (8): 431-444, lin et al, nat Commun.2015;6:8390, pirici et al, j. Histochem. Cytochem.2009;57:567-75, and Glass et al, j. Histochem. Cytochem.2009;57:899-905, the entire contents of which are incorporated herein by reference, describes a method of multiple staining and bleaching.

In some embodiments, immunofluorescence or immunohistochemical protocols (direct and indirect staining techniques) may be performed as part of or in addition to the exemplary spatial workflow presented herein. For example, tissue sections may be fixed according to the methods described herein. The biological sample may be transferred to an array (e.g., a capture probe array) where the analyte (e.g., protein) is detected using an immunofluorescence protocol. For example, samples can be rehydrated, blocked and permeabilized (3 XSSC,2% BSA, 0.1% Triton X, 1U/. Mu.l RNase inhibitor, 4℃for 10 min) and then stained with fluorescent primary antibodies (1:100 at 3XSSC,2% BSA, 0.1% Triton X, 1U/. Mu.l RNase inhibitor, 4℃for 30 min). Biological samples may be washed, coverslipped (in glycerol+1u/. Mu.l rnase inhibitor), imaged (e.g., using confocal microscopy or other device capable of fluorescence detection), washed, and processed (according to the analyte capture or spatial workflow described herein).

In some cases, a glycerol solution and a cover slip may be added to the sample. In some cases, the glycerol solution may include a counterstain (e.g., DAPI).

As used herein, antigen retrieval buffers may improve antibody capture in IF/IHC protocols. An exemplary protocol for antigen retrieval may be to preheat the antigen retrieval buffer (e.g., to 95 ℃), immerse the biological sample in the heated antigen retrieval buffer for a predetermined time, then remove the biological sample from the antigen retrieval buffer and wash the biological sample.

In some embodiments, optimizing permeabilization may be useful for identifying intracellular analytes. Permeabilization optimization may include selection of permeabilizing agent, concentration of permeabilizing agent, and duration of permeabilization. Tissue permeabilization is discussed elsewhere herein.

In some embodiments, blocking the array and/or biological sample during preparation of the labeled (labeling) biological sample reduces non-specific binding (background reduction) of antibodies to the array and/or biological sample. Some embodiments provide a blocking buffer/blocking solution that may be applied prior to and/or during administration of the label, wherein the blocking buffer may include a blocking agent and optionally a surfactant and/or salt solution. In some embodiments, the blocking agent may be Bovine Serum Albumin (BSA), serum, gelatin (e.g., fish gelatin), milk (e.g., skim milk powder), casein, polyethylene glycol (PEG), polyvinyl alcohol (PVA), or polyvinylpyrrolidone (PVP), biotin blocking agent, peroxidase blocking agent, levamisole, carnoy's solution, glycine, lysine, sodium borohydride, pantamine sky blue (pontamine sky blue), sudan black, trypan blue, FITC blocking agent, and/or acetic acid. The blocking buffer/blocking solution may be applied to the array and/or the biological sample prior to and/or during the labeling of the biological sample (e.g., the application of fluorophore-conjugated antibodies).

(iii) Sample preparation for probe applications

In some cases, the biological sample is dewaxed. Dewaxing may be accomplished using any method known in the art. For example, in some cases, a biological sample is treated with a series of wash solutions comprising xylenes and various concentrations of ethanol. In some cases, the dewaxing process includes treatment with xylenes (e.g., 3 washes 5 minutes each). In some cases, the method further comprises treating with ethanol (e.g., 100% ethanol twice per wash for 10 minutes, 95% ethanol twice per wash for 10 minutes, 70% ethanol twice per wash for 10 minutes, 50% ethanol twice per wash for 10 minutes). In some cases, after the ethanol wash, the biological sample may be washed with deionized water (e.g., twice for 5 minutes each time). It will be appreciated that one skilled in the art can adjust these methods to optimize dewaxing.

In some cases, the biological sample is de-crosslinked. In some cases, the biological sample is de-crosslinked in a solution containing TE buffer (including Tris and EDTA). In some cases, the TE buffer is alkaline (e.g., at a pH of about 9). In some cases, the de-crosslinking occurs at about 50 ℃ to about 80 ℃. In some cases, the de-crosslinking occurs at about 70 ℃. In some cases, the de-crosslinking occurs at 70 ℃ for about 1 hour. Just prior to de-crosslinking, the biological sample may be treated with an acid (e.g., 0.1M HCl, about 1 minute). After the de-crosslinking step, the biological sample may be washed (e.g., with 1 x PBST).

In some cases, a method of preparing a biological sample for a probe application includes permeabilizing the sample. In some cases, the biological sample is permeabilized using phosphate buffer. In some cases, the phosphate buffer is PBS (e.g., 1 x PBS). In some cases, the phosphate buffer is PBST (e.g., 1 x PBST). In some cases, the permeabilization step is performed multiple times (e.g., 3 times per 5 minutes).

In some cases, the method of preparing a biological sample for a probe application includes the steps of equilibrating and blocking the biological sample. In some cases, equilibration is performed using pre-hybridization (pre-Hyb) buffer. In some cases, the pre-Hyb buffer is free of rnase. In some cases, the pre-Hyb buffer is free of Bovine Serum Albumin (BSA), denhardt's solution, or other biological material that may be contaminated with nucleases.

In some cases, the equilibration step is performed multiple times (e.g., 2 times for 5 minutes each, 3 times for 5 minutes each). In some cases, the biological sample is blocked with a blocking buffer. In some cases, the blocking buffer includes a carrier (e.g., tRNA), such as a yeast tRNA from Saccharomyces cerevisiae (e.g., at a final concentration of 10-20 μg/mL). In some cases, the closing may be performed for 5, 10, 15, 20, 25, or 30 minutes.

Any of the foregoing steps may be optimized for performance. For example, the temperature may be changed. In some cases, the prehybridization process is performed at room temperature. In some cases, the prehybridization method is performed at 4 ℃ (in some cases, the time ranges provided herein are altered).

(c) Nucleic acid library preparation

(i) Single cell library preparation

Disclosed herein are methods of preparing a nucleic acid library (e.g., a plurality of nucleic acids) from a cell or population of cells. In some embodiments, the biological sample may be obtained from a cell culture grown in vitro. Samples from cell cultures may include one or more suspended cells that are anchored independently within the cell culture. Examples of such cells include, but are not limited to, cell lines derived from hematopoietic cells and from the following cell lines: colo205, CCRF-CEM, HL-60, K562, MOLT-4, RPMI-8226, SR, HOP-92, NCI-H322M and MALME-3M.

Samples from cell cultures may include one or more adherent cells grown on the surface of a vessel containing the culture medium. Non-limiting examples of adherent cells include DU145 (prostate cancer) cells, H295R (adrenal cortical cancer) cells, heLa (cervical cancer) cells, KBM-7 (chronic granulocytic leukemia) cells, LNCaP (prostate cancer) cells, MCF-7 (breast cancer) cells, MDA-MB-468 (breast cancer) cells, PC3 (prostate cancer) cells, saOS-2 (bone cancer) cells, SH-SY5Y (neuroblastoma, cloned from myeloma) cells, T-47D (breast cancer) cells, THP-1 (acute myeloid leukemia) cells, U87 (glioblastoma) cells, national cancer institute 60 cancer cell line panel (NCI 60) vero (african green monkey kidney epithelial cell line) cells, MC3T3 (embryo) cells, GH3 (pituitary tumor) cells, PC12 (pheochromocytoma) cells, dog MDCK kidney epithelial cells, claw A6 kidney epithelial cells, AB9 insect epithelial cells and zebra 9 insect epithelial cells.

Other examples of samples that are considered cells or cell populations include, but are not limited to, those listed in priority file U.S. provisional patent application Ser. Nos. 62/979,652, 62/980,124, and 63/077,019, each of which is incorporated herein by reference in its entirety. It should be understood that the population of cells in the single cell library preparation may be from any of the cells or cell culture populations disclosed herein.

In some embodiments, a nucleic acid library (e.g., a plurality of nucleic acids) includes one or more nucleic acids of interest, for which detection may be enhanced using hybridization methods. Analytes may be isolated, amplified, and/or otherwise processed for subsequent analysis, such as fragmentation and sequencing library preparation.

In some embodiments, the biological sample is permeabilized using any of the methods described herein. The biological sample is permeabilized such that the analyte is released from one or more cells in the biological sample. In some embodiments, obtaining an analyte set (e.g., mRNA) from a biological sample includes any nucleic acid extraction and/or isolation method known in the art and/or described herein. In some embodiments, the released and obtained analytes may be amplified. In some embodiments, the polyadenylation mRNA collection is obtained from a biological sample in preparation for a sequencing method (e.g., by any of the methods contained in the sequencing library preparation workflow).

Other reagents may be added to the biological sample to perform various functions prior to analyzing the sample. In some embodiments, dnase and rnase inactivating agents or inhibitors and/or chelating agents such as EDTA may be added to the sample. In some embodiments, the sample may be treated with one or more enzymes. For example, one or more endonucleases for fragmenting DNA or RNA, as well as polymerases, may be added for amplifying nucleic acids. Enzymes that may be added to the sample include, but are not limited to, polymerases, transposases, ligases, dnases, and rnases.

In some embodiments, reverse transcriptase may be added to a sample, including enzymes, primers, and switching oligonucleotides having terminal transferase activity. Template switching can be used to increase the length of the cDNA, for example by appending a predefined nucleic acid sequence to the cDNA, whereby nucleic acid extension can be performed.

In some embodiments, obtaining the plurality of cDNA sequences includes first strand cDNA synthesis by reverse transcription of a corresponding set of analytes (e.g., polyadenylation mRNA). In some such embodiments, the cDNA is generated using a primer-containing poly (T). In some embodiments, the generated cDNA is barcoded using a capture probe characterized by having a barcode sequence (and optionally a UMI sequence) that hybridizes to at least a portion of the generated cDNA. In some embodiments, a Unique Molecular Identifier (UMI) is attached to the generated cDNA using a capture probe characterized by having a UMI that hybridizes to at least a portion of the generated cDNA. In some embodiments, the template switching oligonucleotide hybridizes to a poly (C) tail that is added to the 3' end of the cDNA by reverse transcriptase. In some such embodiments, the original mRNA template and template switching oligonucleotide are denatured from the cDNA, and the barcoded capture probe, having an optional UMI, is hybridized to the cDNA to generate a complement of the cDNA.

In some embodiments, obtaining the plurality of cDNA sequences further comprises amplifying (e.g., PCR amplifying) and/or adaptor extending each of the plurality of cDNA sequences. In some embodiments, adapter extension occurs prior to cDNA amplification. In some embodiments, adapter extension occurs during first strand cDNA synthesis. In some such embodiments, the extension of the adapter occurs through hybridization of the RNA molecule to the capture probe. In some such embodiments, the extension of the adapter occurs through hybridization of the cDNA molecule to the capture probe.

In some embodiments, the plurality of cDNA sequences are not fragmented, such that each of the plurality of cDNA sequences is a full-length cDNA sequence. In some such embodiments, the plurality of cDNA sequences comprises intermediates of a gene expression library preparation workflow (e.g., an unfractionated cDNA sequence generated using a 3' gene expression library preparation method).

In some embodiments, the plurality of cDNA sequences comprises a first subset of cDNA sequences, wherein each respective cDNA sequence in the first subset of cDNA sequences maps to each respective gene in the plurality of genes. The plurality of genes may be targeted genomes (e.g., genomes of interest). In some embodiments, the plurality of genes is between 5 genes and 20000 genes. In some embodiments, the plurality of genes is between 100 genes and 10000 genes. In some embodiments, the plurality of genes is between 500 genes and 2000 genes. In some embodiments, the plurality of genes is more than 10, more than 50, more than 100, more than 500, more than 1000, more than 2000, more than 5000, more than 10000, more than 15000, or more than 20000 genes.

In some embodiments, the plurality of cDNA sequences comprises a second subset of cDNA sequences, wherein each respective cDNA sequence in the second subset of cDNA sequences maps to a reference genome portion represented by the plurality of genes that is not mapped to by the first set of cDNA sequences. The second subset of cDNA sequences may include cDNA sequences that map to regions of the reference genome that are not targeted and/or other genes that are not included in the targeted genome.

In some embodiments, the plurality of cDNA sequences consists of a first subset of cDNA sequences and a second subset of cDNA sequences. In some such embodiments, each cDNA sequence in the plurality of cDNA sequences maps to a non-target region of the gene or reference genome of interest.

Each respective gene of the plurality of genes may be characterized by any number of transcripts. For example, in some embodiments, 3 or more transcripts correspond to each respective gene. For example, in some embodiments, 4 or more transcripts correspond to each respective gene. In some embodiments, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more transcripts correspond to each respective gene. In some embodiments, the plurality of transcripts corresponding to each respective gene in the plurality of genes comprises a plurality of isoforms of each respective gene.

An isoform of a gene refers to an mRNA molecule (and corresponding cDNA molecule) derived from the same genomic locus but comprising different nucleic acid sequences, including, but not limited to, transcription initiation sites (TSSs), protein Coding DNA Sequences (CDS), and/or untranslated regions (UTRs). These differences are caused by alternative splicing, use of alternative promoters, gene fusion or deletion, single Nucleotide Polymorphisms (SNPs) and/or other mutations or post-transcriptional genetic modifications. Due to differences in the mRNA sequences of the coding sequences and/or the cis-regulatory elements in the promoter sequences, isoforms of the gene may have different functional capabilities.

Identification of different isoforms of a gene is an important step in accurately enriching the cDNA sequence of the corresponding gene. For example, when different nucleic acid sequences of two or more isoforms of a target gene include different transcription initiation sites and/or untranslated regions, a decoy probe complementary to the 3 'or 5' annotated end of the gene may not capture all possible isoforms. If a nucleic acid decoy is designed to hybridize to a region of a first isoform that extends beyond either end of a second isoform, it may occur that while a minimal set of decoy probes is ideal for cost-effective and linear targeted sequencing analysis, in some cases, when two or more isoforms have such significantly different lengths that they do not overlap, it may be necessary to design multiple nucleic acid decoys so that the various isoforms can be bound and enriched. In some such cases, each non-overlapping isoform hybridizes to a different unique bait probe.

Thus, appropriate annotation of genomic regions (e.g., coding and/or non-coding sequences) spanned by the various isoforms is necessary to ensure that the decoy probe design generates multiple decoy probes (e.g., hybridizable) that represent each transcript of the target gene.

In some embodiments, each transcript of the plurality of transcripts is protein encoded. Alternatively, one or more transcripts of the plurality of transcripts may comprise non-coding sequences (e.g., 3 'or 5' untranslated regions). For example, one or more transcripts of the plurality of transcripts may comprise a 3'utr sequence downstream of a stop codon or a 5' utr upstream of a start codon. In some embodiments, one or more transcripts of the plurality of transcripts are protein-encoded, but comprise an incomplete coding sequence. For example, in some such embodiments, the transcript of the plurality of transcripts corresponding to the respective gene is a coding sequence (CDS) 3 'incomplete, a CDS 5' incomplete, or a 3 'and CDS 5' incomplete. As used herein, CDS 3' incomplete refers to protein encoded transcripts that do not include a stop codon due to incomplete verification (authentication). As used herein, a CDS 5' imperfection refers to a protein encoded transcript that does not include the initiation codon due to incomplete verification.

In some embodiments, the reference database can be used to align transcripts corresponding to respective genes, e.g., reference genome genode version 33 (grch 38. P13). For example, the annotation of the corresponding gene can be used as a reference database within the GENCODE Consortium (Consortium). In other cases, annotations of the corresponding genes can be obtained by Ensembl project, see Harrow et al 2012, "GENCODE: reference human Genome annotation of the code project (genode: the reference human Genome annotation for The ENCODE Project), "Genome res.22 (9): 1760-1774: doi:10.1101/gr.135350.111; and flick et al, 2014, "Ensembl 2014," Nucleic Acids Res.42 (database issue): D749-D755: doi:10.1093/nar/gkt1196, the entire contents of which are incorporated herein by reference.

(ii) Spatial library preparation

Disclosed herein are methods of generating a plurality of extended nucleic acids (e.g., any of the extended nucleic acids described herein) for detecting analytes including proteins and nucleic acids. In some cases, the method includes detecting the nucleic acid. In some embodiments, the method comprises detecting a protein.

As used herein, "extended nucleic acid" refers to a nucleic acid having additional nucleotides added to the end (e.g., the 3 'or 5' end) of the nucleic acid to extend the total length of the nucleic acid.

Disclosed herein are methods of making a library of target nucleic acids using decoy oligonucleotides, wherein the library of target nucleic acids is initially made from nucleic acids that hybridize to probes on a capture space array (i.e., a substrate). For example, the nucleic acid sequences captured on the substrate may be passed through a spatial workflow to provide the resulting nucleic acids that may be isolated, amplified, and/or otherwise processed for subsequent analysis (e.g., fragmentation and sequencing library preparation). Capture probes, substrates and arrays have been described in the previous section of this application and are incorporated into this section.

In some cases, embodiments of any of the methods described herein can include contacting a biological sample with a substrate comprising a plurality of attached capture probes, wherein the capture probes of the plurality of capture probes comprise (i) a spatial barcode and (ii) a capture domain that specifically binds to a sequence (e.g., a nucleic acid) present in an analyte; extending the 3' end of the capture probe using the analyte that specifically binds to the capture domain as a template to generate an extended capture probe; and amplifying the extended capture probe. In some cases, embodiments of any of the methods described herein can include contacting a plurality of analyte capture agents with a biological sample, wherein the analyte capture agents of the plurality of analyte capture agents comprise (i) an analyte binding moiety that binds to an analyte (e.g., a protein) and (ii) an oligonucleotide comprising an analyte binding moiety barcode and an analyte capture sequence; contacting a plurality of analyte capture agents with a substrate, the substrate comprising a plurality of capture probes, wherein the capture probes of the plurality of capture probes comprise a spatial barcode and a capture domain, wherein the capture domain binds to an analyte capture sequence; extending the 3' end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and amplifying the extended capture probe. Individual method steps and system features may be present in combination in many different embodiments; the particular combinations described herein are not limited in any way to other combinations of steps or features.

When a biological sample is contacted with a substrate (e.g., an array) that includes capture probes, for example, the analyte can be captured on the array. The capture probes interact with analytes released from the biological sample through the capture domains described throughout this application to capture the analytes. For example, the capture domain captures the analyte by hybridization to a nucleic acid sequence in a target nucleic acid molecule from a biological sample. In some cases, the sequence complementary to the capture domain is a polyadenylation (poly (a)) sequence. In some cases, the sequence complementary to the capture domain is designed to be specific for the sequence of interest (i.e., in the analyte of interest).

In a nucleic acid detection setting, a nucleic acid analyte hybridizes directly to capture probes on an array (e.g., a substrate). In a protein detection setting, referring to FIG. 4, analyte binding moiety 402 comprises a protein binding moiety 404 and an oligonucleotide 408. In some cases, the analyte binding moiety is added to the biological sample. After the protein binds to the protein binding moiety 404, the oligonucleotide is captured by a capture domain of the array (e.g., a substrate). Thus, an analyte binding moiety as used herein can be considered an analyte derivative, and its oligonucleotides can be captured and analyzed in a similar manner as the analysis of a nucleic acid analyte. Embodiments of analyte binding moieties have been previously described, for example in WO 2020/176788 and/or U.S. patent application publication No. 2020/0277663, each of which is incorporated herein by reference in its entirety.

Alternatively, a surrogate (proxy) of the analyte may be captured by the capture domain of the capture probes on the array. For example, two probes can hybridize to a target nucleic acid, whereby they can be ligated together to produce a ligation product that can be used as a surrogate for the target sequence. The ligation product may comprise a sequence substantially complementary to the capture domain of the capture probes on the array. Ligation products can be captured on the array and serve as a surrogate for the presence of target nucleic acids.

In some embodiments, the sample is stained prior to creating the nucleic acid library. In some embodiments, the biological sample is stained while on a slide. In some embodiments, the stained biological sample is imaged prior to creating the nucleic acid library. In some embodiments, staining includes biological staining techniques, such as H & E staining. In some embodiments, staining includes identifying the analyte using a fluorescent conjugated antibody (e.g., immunofluorescence). In some embodiments, the biological sample is stained using two or more different types of stains or two or more different staining techniques (e.g., IF, IHC, and/or H & E staining). For example, a biological sample may be prepared by: staining and imaging are performed by using one technique (e.g., H & E staining and bright field imaging), bleaching (e.g., quenching or photobleaching), and then staining and imaging the same biological sample using another technique (e.g., IHC/IF staining and fluorescence microscopy).

In some embodiments, a target-specific reaction is performed in the biological sample to enhance detection of one or more targets of interest prior to interaction with capture probes on the array. Examples of target-specific reactions include, but are not limited to, ligation of target-specific adaptors, probes, and/or other oligonucleotides, target-specific amplification using primers specific for one or more analytes, and target-specific detection using in situ hybridization, DNA microscopy, and/or antibody detection. In some embodiments, the capture probes comprise a capture domain (e.g., amplification or ligation) that is targeted to a target-specific product. The target analyte may then be captured by a capture probe (e.g., as described throughout herein).

In some embodiments, the methods provided herein include a permeabilization step to release an analyte from a biological sample. In some embodiments, permeabilization occurs using a protease. In some embodiments, the protease is an endopeptidase. Endopeptidases that may be used include, but are not limited to, trypsin, chymotrypsin, elastase, thermolysin, pepsin, clostripain, glutamyl endopeptidase (GluC), argC, peptidyl-aspartic endopeptidase (ApsN), endopeptidase LysC, and endopeptidase LysN. In some embodiments, the endopeptidase is pepsin.

In some embodiments, the methods provided herein include permeabilization of the biological sample, such that the capture probes can more readily bind to the analyte or analyte derivative (i.e., as compared to no permeabilization). In some embodiments, a Reverse Transcription (RT) reagent may be added to the permeabilized biological sample. Incubation with RT reagents can produce spatially barcoded full-length cdnas from captured analytes (e.g., polyadenylated mRNA). A second strand reagent (e.g., a second strand primer, an enzyme) may be added to the biological sample on the slide to initiate second strand synthesis.

In some cases, the permeabilizing step includes applying a permeabilizing buffer to the biological sample. In some cases, the permeabilization buffer comprises a buffer (e.g., tris pH 7.5), mgCl ₂ Sarcosinyl detergent (e.g., sodium lauroyl sarcosinate) or other detergents, enzymes (e.g., proteinase K, pepsin, collagenase, etc.), and water without ribozymes. In some cases, the permeabilization step is performed at 37 ℃. In some cases, the permeabilization step is performed at about 20Minutes to 2 hours (e.g., about 20 minutes, about 30 minutes, about 40 minutes, about 50 minutes, about 1 hour, about 1.5 hours, or about 2 hours). In some cases, the releasing step is performed for about 40 minutes.

After permeabilization, in some cases, the analyte is captured by a capture probe and/or an analyte capture agent. In some embodiments, such methods of increasing the capture efficiency of a spatial array described herein comprise contacting the spatial array with a biological sample and allowing an analyte to interact with a capture probe and/or an analyte capture agent.

In some embodiments, using the hybridized analyte as a template, a capture probe hybridized to the analyte can be extended with a polymerase (e.g., reverse transcriptase) to produce an extended capture probe. In some embodiments, using the hybridized analyte as a template, a polymerase (e.g., reverse transcriptase) can be used to extend the 3' end of the capture probe hybridized to the analyte to generate an extended capture probe. In some embodiments, using the hybridized analyte as a template, a polymerase (e.g., reverse transcriptase) can be used to extend the 5' end of the capture probe hybridized to the analyte to generate an extended capture probe. The extended capture probe can be amplified (e.g., by second strand synthesis) to generate a single-stranded nucleic acid comprising a sequence complementary to the extended capture probe. Single stranded nucleic acids comprising sequences complementary to the extended capture probes may be used to generate a nucleic acid library or may be part of a nucleic acid library.

After hybridization of the analyte with the capture probe, the hybridized product is amplified. For example, obtaining a cDNA sequence library also includes amplifying (e.g., PCR amplifying) and/or adaptor extending the cDNA sequence. In some embodiments, adapter extension occurs prior to cDNA amplification. In some embodiments, adapter extension occurs during first strand cDNA synthesis.

In some embodiments, the method comprises amplifying all or a portion of the analyte using isothermal amplification or non-isothermal amplification. In some embodiments, amplification produces an amplification product comprising (i) all or part of the sequence of the analyte or analyte derivative bound to the capture probe or its complement, and (ii) all or part of the sequence of the spatial barcode or its complement. In some embodiments, the determining step comprises sequencing. A non-limiting example of sequencing that can be used to determine the sequence of an analyte, analyte derivative, and/or spatial barcode is in situ sequencing. In some aspects, in situ sequencing is performed by Sequencing By Synthesis (SBS), sequential fluorescent hybridization, ligation sequencing, nucleic acid hybridization, or high throughput digital sequencing techniques. In some embodiments, the analyte is DNA or RNA. In some embodiments, the analyte is a protein.

In some embodiments, after contacting the biological sample with the substrate comprising capture probes, a removal step may optionally be performed to remove all or a portion of the biological sample from the substrate. In some embodiments, the removing step comprises enzymatic and/or chemical degradation of the biological sample cells. For example, the removing step may include treating the biological sample with an enzyme (e.g., a protease, e.g., proteinase K) to remove at least a portion of the biological sample from the substrate. In some embodiments, the removing step may include ablation of tissue (e.g., laser ablation).

In some embodiments, provided herein are methods for spatially detecting an analyte (e.g., detecting a location of an analyte (e.g., a biological analyte)) from a biological sample (e.g., present in a biological sample), the methods comprising: (a) Optionally staining and/or imaging a biological sample on a substrate; (b) Permeabilizing a biological sample on a substrate (e.g., providing a solution comprising a permeabilizing reagent); (c) Contacting the biological sample with an array comprising a plurality of capture probes, wherein the capture probes capture a biological analyte; and (d) analyzing the captured biological analyte, thereby spatially detecting the biological analyte; wherein the biological sample is removed from the substrate in whole or in part.

In some embodiments, the biological sample is not removed from the substrate. For example, the biological sample is not removed from the substrate until the capture probes (e.g., capture probes bound to the analyte) are released from the substrate. In some embodiments, such release comprises cleavage of the capture probes from the substrate (e.g., by cleavage of the domains). In some embodiments, such release does not include release of the capture probes from the substrate (e.g., a copy of the analyte-bound capture probes may be prepared and the copy may be released from the substrate, e.g., by denaturation). In some embodiments, after release of the biological sample from the substrate, the biological sample is not removed from the substrate prior to analysis of the analyte bound to the capture probes. In some embodiments, the biological sample remains on the substrate during removal of the capture probes from the substrate and/or during analysis of analytes bound to the capture probes after release from the substrate. In some embodiments, the biological sample remains on the substrate during removal (e.g., by denaturation) of the copy of the capture probe (e.g., complement). In some embodiments, analysis of analytes bound to capture probes from a substrate can be performed without subjecting the biological sample to enzymatic and/or chemical degradation of cells (e.g., permeabilized cells) or ablation of tissue (e.g., laser ablation).

In some embodiments, at least a portion of the biological sample is not removed from the substrate. For example, a portion of the biological sample may remain on the substrate prior to release of the capture probes (e.g., analyte-bound capture probes) from the substrate and/or analysis of the analyte released from the substrate that is bound to the capture probes. In some embodiments, at least a portion of the biological sample is not subjected to enzymatic and/or chemical degradation of cells (e.g., permeabilized cells) or ablation of tissue (e.g., laser ablation) prior to analysis of analytes bound to capture probes from the substrate.

In some embodiments, provided herein are methods for spatially detecting an analyte (e.g., detecting a location of the analyte, e.g., a biological analyte) from a biological sample (e.g., present in a biological sample), comprising: (a) Optionally staining and/or imaging a biological sample on a substrate; (b) Permeabilizing a biological sample on a substrate (e.g., providing a solution comprising a permeabilizing reagent); (c) Contacting the biological sample with an array comprising a plurality of capture probes, wherein the capture probes of the plurality of capture probes capture a biological analyte; and (d) analyzing the captured biological analyte, thereby spatially detecting the biological analyte; wherein the biological sample is not removed from the substrate.

In some embodiments, provided herein are methods for detecting a biological analyte of interest from a biological sample space, comprising: (a) staining and imaging a biological sample on a substrate; (b) Providing a solution comprising a permeabilizing reagent to a biological sample on a substrate; (c) Contacting the biological sample with an array on a substrate, wherein the array comprises one or more capture probes, such that the one or more capture probes capture a biological analyte of interest; and (d) analyzing the captured biological analyte to spatially detect the biological analyte of interest; wherein the biological sample is not removed from the substrate.

In some embodiments, the method further comprises performing a spatial transcriptomic analysis of the region of interest in the biological sample. In some embodiments, the one or more capture probes comprise a capture domain. In some embodiments, the one or more capture probes comprise a Unique Molecular Identifier (UMI). In some embodiments, the one or more capture probes comprise a cleavage domain. In some embodiments, the cleavage domain comprises a sequence recognized and cleaved by uracil DNA glycosylase, purine-free/pyrimidine-free (AP) endonuclease (APE 1), U uracil-specific excision reagent (USER), and/or endonuclease VIII. In some embodiments, the one or more capture probes do not comprise a cleavage domain and are not cleaved from the array.

In some embodiments, the capture probes may be extended ("extended capture probes", e.g., as described herein). For example, the extended capture probe may comprise generating cDNA from captured (hybridized) RNA. This process involves synthesizing complementary strands of the hybridized nucleic acid, e.g., generating cDNA based on the captured RNA template (RNA hybridized to the capture domain of the capture probe). Thus, in an initial step of extending the capture probe (e.g., cDNA generation), the captured (hybridized) nucleic acid (e.g., RNA) acts as a template for extension (e.g., reverse transcription step).

In some embodiments, the capture probe uses reverse transcription extension. For example, reverse transcription involves the synthesis of cDNA (complementary or copy DNA) from RNA (e.g., messenger RNA) using reverse transcriptase. In some embodiments, reverse transcription is performed while the tissue is still in place, producing an analyte library, wherein the analyte library comprises a spatial barcode from a proximity capture probe. In some embodiments, the capture probes are extended using one or more DNA polymerases.

In some embodiments, the capture domain of the capture probe includes primers for generating a complementary strand of nucleic acid hybridized to the capture probe, e.g., primers for DNA polymerase and/or reverse transcription. Nucleic acid (e.g., DNA and/or cDNA) molecules resulting from the extension reaction incorporate the sequence of the capture probe. Extension of the capture probes, such as DNA polymerase and/or reverse transcription reactions, may be performed using a variety of suitable enzymes and protocols.

In some embodiments, full-length DNA (e.g., cDNA) molecules are produced. In some embodiments, a "full-length" DNA molecule refers to the entire captured nucleic acid molecule. However, if the nucleic acid (e.g., RNA) is partially degraded in the tissue sample, the captured nucleic acid molecules will be of a different length than the original RNA in the tissue sample. In some embodiments, the 3' end of the extended probe (e.g., the first strand cDNA molecule) is modified. For example, a linker or adapter may be attached to the 3' end of the extended probe. This can be accomplished by using a single-stranded ligase such as T4 RNA ligase or CircleGase ^TM (available from the company Lu Cigen of Midelton, wisconsin) (Lucigen). In some embodiments, the template switch oligonucleotide is used to extend the cDNA to produce full-length cDNA (or as close as possible to full-length cDNA). In some embodiments, a second strand synthesis auxiliary probe (a partially double-stranded DNA molecule capable of hybridizing to the 3 'end of the extended capture probe) may be ligated to the 3' end of the extended probe using a double-stranded ligase (e.g., T4DNA ligase), such as a first strand cDNA molecule. Other enzymes suitable for use in the ligation step are known in the art and include, for example, tth DNA ligase, taq DNA ligase, thermococcus (strain 9 DEG N) DNA ligase (9 DEG N) ^TM DNA ligase, new England Biolabs (New England Biolabs)), amphigenase ^TM (available from Lu Cigen company (Lucigen) of middeton, wisconsin) and splattr (available from the new england biology laboratory of ibos, massachusetts). In some embodiments, the polynucleotide tail (e.g., poly (a) tail) incorporates an extended probe molecule3' of (3). In some embodiments, a terminal transferase active enzyme is used to incorporate the polynucleotide tail.

In some embodiments, the double-stranded extended capture probes are treated to remove any unextended capture probes prior to amplification and/or analysis (e.g., sequence analysis). This can be accomplished by a variety of methods, for example, using enzymes to degrade non-extended probes, such as exonucleases or purification columns.

In some embodiments, the extended capture probe is amplified to produce an amount sufficient for analysis, such as by DNA sequencing. In some embodiments, the first strand of the extended capture probe (e.g., a DNA and/or cDNA molecule) is used as a template for an amplification reaction (e.g., a polymerase chain reaction).

In some embodiments, the amplification reaction uses primers that include affinity groups to incorporate the affinity groups onto an extended capture probe (e.g., RNA-cDNA hybridization). In some embodiments, the primer includes an affinity group and the extended capture probe includes an affinity group. The affinity group may correspond to any of the affinity groups described previously.

In some embodiments, extended capture probes comprising an affinity group may be coupled to a substrate specific for the affinity group. In some embodiments, the substrate may comprise an antibody or antibody fragment. In some embodiments, the substrate comprises avidin or streptavidin, and the affinity group comprises biotin. In some embodiments, the substrate comprises maltose and the affinity group comprises a maltose binding protein. In some embodiments, the substrate comprises a maltose binding protein and the affinity group comprises maltose. In some embodiments, amplifying the extended capture probes may act to release the extended probes from the substrate surface as long as a copy of the extended probes is not immobilized to the substrate.

In some embodiments, the extended capture probe or its complement or amplicon is released. The step of releasing the extended capture probes or their complements or amplicons from the surface of the substrate can be accomplished in a variety of ways. In some embodiments, the extended capture probes or their complements are released from the array by nucleic acid cleavage and/or denaturation (e.g., denaturation of double-stranded molecules by heating).

In some embodiments, the extended capture probes or their complements or amplicons are physically released from the surface (e.g., array) of the substrate. For example, when the extended capture probes are immobilized indirectly on the array substrate, such as by hybridization to surface probes, it can be sufficient to disrupt the interaction between the extended capture probes and the surface probes. Methods of disrupting the interaction between nucleic acid molecules include denaturing double-stranded nucleic acid molecules as known in the art. A straightforward method of releasing DNA molecules (i.e. stripping the extended probe array) is to use a solution that interferes with double-stranded molecular hydrogen bonding. In some embodiments, the extended capture probe is released by applying a heated solution (e.g., water or buffer at least 85 ℃, such as water at least 90 ℃, 91 ℃, 92 ℃, 93 ℃, 94 ℃, 95 ℃, 96 ℃, 97 ℃, 98 ℃, or 99 ℃). In some embodiments, a solution is added that includes salts, surfactants, etc., that may further destabilize interactions between nucleic acid molecules to release extended capture probes from the substrate.

In some embodiments, the extended capture probes comprise cleavage domains, and the extended capture probes are released from the substrate surface by cleavage. For example, the cleavage domain of the extended capture probe may be cleaved by any of the methods described herein. In some embodiments, the extended capture probes are released from the substrate surface prior to the step of amplifying the extended capture probes, for example by cleaving cleavage domains in the extended capture probes.

In some embodiments, the entire sample may be sequenced by direct barcoding of the sample by hybridization with a capture probe or analyte capture agent that hybridizes, binds or associates with, or is introduced into the cell surface, as described above.

A wide variety of different sequencing methods are available for analyzing the barcoded analytes or moieties. In general, the sequenced polynucleotide may be, for example, a nucleic acid molecule, such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), including variants or derivatives thereof (e.g., single stranded DNA or DNA/RNA hybrids, as well as nucleic acid molecules having nucleotide analogs).

Sequencing of polynucleotides can be performed by various systems. More generally, nucleic acid amplification, polymerase Chain Reaction (PCR) (e.g., digital PCR and drop digital PCR (ddPCR), quantitative PCR, real-time PCR, multiplex PCR, PCR-based single-sided methods, emulsion PCR), and/or isothermal amplification can be used for sequencing. Non-limiting examples of methods of sequencing genetic material include, but are not limited to, DNA hybridization methods (e.g., southern blotting), restriction enzyme digestion methods, sanger sequencing methods, next generation sequencing methods (e.g., single molecule real-time sequencing, sequencing by synthesis, nanopore sequencing, and poony sequencing), ligation methods, and microarray methods.

(iii) Treatment of nucleic acid libraries

In some embodiments, after a nucleic acid library (e.g., a plurality of nucleic acids) is created, one or more decoy oligonucleotides (e.g., from one or more of the sets described herein) are hybridized to the plurality of nucleic acids. The nucleic acid comprises all or part of the sequence of the analyte of interest or its complement and/or comprises all or part of the spatial barcode of interest or its complement. In some embodiments, one or more decoy oligonucleotides are hybridized to a nucleic acid that comprises all or a portion of the sequence of an analyte of interest or its complement.

In some embodiments, one or more nucleic acid libraries are pooled. In some cases, one or more libraries are incubated with Cot DNA (e.g., human Cot DNA). In some cases, one or more libraries are incubated with universal blocker nucleic acids that hybridize to one or more well-expressed nucleic acids to prevent unwanted nucleic acids from hybridizing to decoy oligonucleotides.

In other embodiments, the library, nucleic acid, or enriched nucleic acid may be quantified using quantitative PCR (qPCR) prior to hybridization of the decoy oligonucleotide. In some embodiments, the library, nucleic acid, or enriched nucleic acid may be fragmented. In some embodiments, the library, nucleic acid, or enriched nucleic acid may be fragmented by an enzyme-based method (e.g., by restriction enzymes, nicking enzymes, and/or transposases). In some embodiments, the library, nucleic acid, or enriched nucleic acid may be fragmented by an endonuclease. In some embodiments, the library, nucleic acid, or enriched nucleic acid may be fragmented by mechanical shearing (e.g., sonic shearing, hydrodynamic shearing, and/or nebulization). In some embodiments, the library, nucleic acid, or enriched nucleic acid may be fragmented by a pooled enzyme-based method and mechanical cleavage. In some embodiments, the library, nucleic acid, or enriched nucleic acid may be fragmented by end repair, poly-a tailing, or a combination thereof. In some embodiments, an adapter is ligated to each nucleic acid or enriched nucleic acid sequence. In some embodiments, the adapter is ligated to the 3' end of the nucleic acid or enriched nucleic acid sequence. In some embodiments, the adapter is ligated to the 5' end of the nucleic acid sequence or the enriched nucleic acid sequence. In some embodiments, the adapter may be an additional functional nucleic acid sequence, e.g., a spacer sequence, a primer sequence/site, a barcode sequence, a Unique Molecular Identifier (UMI) sequence, a linker, and/or a sequencing adapter.

In some embodiments, the methods disclosed herein include Sample Index (SI) PCR that adds a nucleic acid sequence (e.g., a barcode) to the 5 'and/or 3' end of the nucleic acid sequence or enriched nucleic acid sequence. In some cases, the SI-PCR reaction is performed at 67 ℃. In some embodiments, SI-PCR is a PCR reaction that introduces sample index sequences (e.g., i5 and i 7) into the nucleic acid sequence or enriches the 5 'and/or 3' ends of the nucleic acid sequence. In some embodiments, the method for SI-PCR adds an i5 sample index sequence. In some embodiments, the method for SI-PCR adds an i7 sample index sequence. In some embodiments, the P5 adapter is added to the nucleic acid sequence or the enriched nucleic acid sequence. In some embodiments, the P7 adapter is added to the nucleic acid sequence or the enriched nucleic acid sequence. In some embodiments, SI-PCR is performed prior to enrichment of decoy oligonucleotides for the nucleic acid of interest. In some embodiments, SI-PCR is performed after enrichment of decoy oligonucleotides for the nucleic acid of interest.

In some embodiments, the nucleic acid of interest (either before or after enrichment with decoy oligonucleotides) or the library generated therefrom may be dried. In some embodiments, drying includes a dehydration process, such as heating, vacuum, lyophilization, drying, filtration, and air drying. In some embodiments, a vacuum centrifuge is used to dry the sample. In some cases, drying is performed at about 50 ℃, about 55 ℃, about 60 ℃, about 63 ℃, about 65 ℃, about 67 ℃, about 70 ℃, or about 75 ℃. In some embodiments, drying may be performed for at least 1 hour, at least 2 hours, at least 3 hours, or at least 4 hours. If the sample is not used immediately, the sample may be stored (e.g., at-20 ℃). In some embodiments, the nucleic acid of interest (either before or after enrichment with decoy oligonucleotides) or the library generated therefrom is not dried.

(d) Hybridization targeting of capture analytes using decoy oligonucleotides

After preparing a nucleic acid library from a biological sample, the library may be incubated with a plurality of decoy oligonucleotides, thereby selectively enriching the library of targets of interest. The target analytes can be separated from the library, thereby producing an enriched population of target analytes. While whole transcriptome spatial analysis provides very useful information, more targeted gene enrichment is able to spatially localize a subset of targets of particular interest, such as for cancer or disease detection-related cancer or disease-related genes and gene expression. Thus, research can focus on subsets of genes of interest and maximize knowledge of the spatial knowledge of these genes, while minimizing the costs and reagents associated with spatial whole transcriptome workflow.

(i) Rust erbium oligonucleotide design

In some embodiments, the decoy oligonucleotide sets are designed to target and hybridize to multiple nucleic acids (e.g., a prepared spatial library, such as a prepared cDNA library). In some embodiments, the decoy oligonucleotide sets hybridize to target nucleic acids (e.g., cdnas) from a broader library nucleic acid set. In some embodiments, the hybridized product (e.g., decoy oligonucleotide and hybridized nucleic acid) is then captured by streptavidin beads. In some embodiments, the hybridized product (e.g., decoy oligonucleotide and hybridized nucleic acid) is then captured by an avidin bead. Unhybridized nucleic acids are washed away. The target product is again amplified and sequenced. In some embodiments, the re-amplified target product may be fragmented, ligated to a linker sequence, and amplified by SI-PCR.

Disclosed herein are methods of designing and testing candidate decoy oligonucleotide sequences. Candidate decoy oligonucleotide sequences are designed such that each decoy oligonucleotide sequence theoretically hybridizes to a unique target of interest. Thus, the decoy oligonucleotide is designed to be at least 40 nucleotides in length. To identify decoy oligonucleotides of interest, aligners designed to align RNA-seq data were used to identify and align 40 nucleotide portions of the human transcriptome that are unique to the genome. In some embodiments, decoy oligonucleotides are designed to hybridize to a particular exon. In some embodiments, the decoy oligonucleotide is designed to span an exon-exon junction (exon-exon junction). In some embodiments, decoy oligonucleotides can hybridize to a target, thereby being able to recognize spliced and alternatively spliced transcripts in the transcriptome. Alignment is used to identify and classify sequences aligned one or more times with the genome. Each designed bait may be tested against (i.e., compared to) sequences identified in the genome. If the decoy oligonucleotide and sequences in the genome do not match, the decoy may be tested in one or more groups as disclosed herein.

The present disclosure provides a method of designing nucleic acid decoys for full-length cdnas comprising obtaining coding sequences for each transcript (e.g., isoform) of each target gene in a targeted genome. When the coding sequence is less than a threshold length (e.g., 120 base pairs), the complete mRNA sequence may be used instead. The method further comprises, for each 120 base pair sequence in each coding sequence, obtaining a count of the number of transcripts in which the 120 base pair sequence occurs. The subsequences are ordered by the obtained count and the first subsequence is selected from the subsequences that occur in the largest number of transcripts of the corresponding gene. The first subsequence is further subjected to filtering criteria (e.g., uniqueness, mappability, absence of repeated subsequences and/or overall GC content). In some embodiments, if a first subsequence fails to meet one or more filtering criteria, the first subsequence is rejected and a new first subsequence is selected from subsequences that occur in the largest number of transcripts of the corresponding gene that further meet the filtering criteria. In some embodiments, if the first subsequence fails to meet one or more filtering criteria, the first subsequence is modified (e.g., by truncating the first subsequence such that it meets one or more filtering criteria and/or by shifting the first subsequence along the reference genome such that it meets one or more filtering criteria).

The method further includes selecting a second subsequence (e.g., a transcript that is not the first subsequence) from the subsequences that occur in the largest number of remaining transcripts of the corresponding gene. The method is iterated over all remaining transcripts until no transcripts remain (e.g., at least one subsequence of the plurality of selected subsequences is present in each of the plurality of transcripts of the target gene).

The methods provided in the present disclosure improve upon the current technology by utilizing full-length cDNA sequences rather than 3' fragmented sequencing libraries, allowing access to larger regions of reliably annotated sequences for nucleic acid bait design. For example, full-length cDNA sequences contain coding sequences that are generally well annotated, ensuring better targeted hybridization results and mid-targeting rates.

Full-length cDNA sequences, including common sequences shared between transcripts (e.g., isoforms), can be designed to target multiple transcripts and reduce the number of nucleic acid decoys required by a factor of 10. Thus, the resulting plurality of nucleic acid decoys can be reduced in size and linearized, thereby resulting in greater efficiency and lower cost for users who need nucleic acid decoys for large and/or custom targeted genomes. For example, while a typical "tiled" plurality of nucleic acid baits would comprise 38,000 to 68,000 nucleic acid baits for 1-fold coverage of 1000 genomes, a plurality of nucleic acid baits designed using the methods of the present disclosure would comprise 2,500 to 4,000 nucleic acid baits for the same 1000 genomes. Furthermore, the methods of the present disclosure may be used in any application where targeted analysis is desired (e.g., single cell RNA sequencing and/or spatial gene expression profiling). In some cases, the plurality of nucleic acid baits comprises at least 500, at least 1000, at least 2000, at least 3000, at least 4000, or at least 5000 nucleic acid baits.

In some cases, each respective nucleic acid decoy of the plurality of nucleic acid decoys hybridized to the cDNA sequences of the respective genes of the plurality of genes hybridizes (i) selectively to a first subset of transcripts in a corresponding plurality of transcripts of the plurality of transcripts corresponding to the respective genes, or (ii) to another subset of transcripts other than the first subset of transcripts in a corresponding plurality of transcripts of the plurality of transcripts corresponding to the respective genes. Each respective transcript of the plurality of transcripts corresponding to each respective gene of the plurality of genes is hybridizable to a nucleic acid decoy of the plurality of nucleic acid decoys.

For example, the nucleic acid decoy may hybridize to (e.g., may comprise a nucleic acid sequence complementary to) one or more nucleic acid sequences corresponding to the target gene. In some cases, the one or more nucleic acid sequences hybridized to the nucleic acid decoy represent a respective one or more transcripts or isoforms of the target gene. As used herein, a subset of transcripts (e.g., a subset of isoforms) is defined as a set of transcripts (e.g., isoforms) of a target gene that hybridize to a corresponding nucleic acid decoy.

In some embodiments, a nucleic acid decoy that hybridizes to a cDNA sequence of a target gene selectively hybridizes to each of a plurality of transcripts of the corresponding gene. For example, a single nucleic acid decoy may hybridize to all isoforms of a target gene such that a first subset of isoforms consists of multiple isoforms of the target gene. In some such cases, no other subset of isoforms are defined (e.g., multiple isoforms may be grouped into a single or first subset of isoforms).

In some cases, multiple transcripts of a target gene may be subdivided into multiple transcript subsets. The plurality of transcript subsets may include first and second transcript subsets.

In some embodiments, each of the plurality of transcript subsets is defined as a set of transcripts that hybridize to a corresponding nucleic acid decoy. In this case, the first subset of transcripts is defined as the set of transcripts with which at least the first nucleic acid decoy hybridizes, and the second subset of transcripts is defined as the set of transcripts with which at least the second nucleic acid decoy hybridizes. For example, a first set of transcripts (e.g., isoforms) of a target gene that hybridizes to a first nucleic acid decoy is defined as a first subset of transcripts (e.g., a first subset of isoforms). In addition, a second set of transcripts (e.g., isoforms) of the target gene that hybridizes to the second nucleic acid decoy is defined as a second subset of transcripts (e.g., a second subset of isoforms)

In some embodiments, the second subset of transcripts consists of transcripts not included in the first subset of transcripts (e.g., the first subset of isoforms and the second subset of isoforms may include mutually exclusive sets of isoforms).

In some cases, the first subset of isoforms may be selected by counting matches of possible subsequences of a given length (e.g., subsequences of the target gene coding sequence or mRNA sequence) of the candidate bait sequence.

Hybridization matches between each possible decoy subsequence and each position in each isoform were repeated for each isoform of the corresponding gene. The number of isoforms (e.g., comprising complementary sequences) that match each decoy subsequence is counted, the decoy subsequence with the highest number of matches is selected as the first nucleic acid decoy, and the subset of isoforms that match the first nucleic acid decoy is defined as the first subset of isoforms. In some cases, when the corresponding first subset of isoforms fails to account for all isoforms of the target gene, the process is repeated for all remaining isoforms that fail to match the first nucleic acid decoy. Thus, the decoy subsequence having the highest number of matches to the remaining isoforms (e.g., isoforms not included in the first subset of isoforms) is selected as the second nucleic acid decoy, and the second subset of isoforms that match the second nucleic acid decoy is defined as the second subset of isoforms.

In some cases, the process can be repeated as many times as desired until a plurality of nucleic acid decoys are identified such that all of the plurality of transcripts of the respective genes are hybridizable to at least one nucleic acid decoy of the plurality of nucleic acid decoys.

In some embodiments, the plurality of transcript subsets comprises at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, or at least ten transcript subsets. In some embodiments, each subset of transcripts corresponds to a single nucleic acid decoy. In some alternative embodiments, each transcript subset corresponds to a plurality of nucleic acid decoys.

In some embodiments, the subset of transcripts other than the first subset of transcripts consists of one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more transcripts. In some embodiments, the plurality of nucleic acid baits comprises at least 2 x 10 ³ At least 3 x 10 ³ At least 4 x 10 ³ At least 5 x 10 ³ At least 1 x 10 ⁴ At least 2 x 10 ⁴ At least 3 x 10 ⁴ At least 4 x 10 ⁴ At least 5 x 10 ⁴ At least 6 x 10 ⁴ At least 7 x 10 ⁴ Or at least 1 x 10 ⁵ Nucleic acid decoys.

In some embodiments, the plurality of nucleic acid decoys comprises a minimum number of decoys required for selective hybridization of each respective transcript of the plurality of transcripts corresponding to the respective gene of the plurality of genes. For example, each respective transcript of the plurality of transcripts corresponding to each respective gene of the plurality of genes is capable of hybridizing to at least one nucleic acid decoy of the plurality of nucleic acid decoys.

In some embodiments, each nucleic acid decoy of the plurality of nucleic acid decoys is capable of hybridizing to a single transcript of the plurality of transcripts, and the number of nucleic acid decoys in the plurality of nucleic acid decoys is equal to the number of transcripts in the plurality of transcripts of each respective gene of the plurality of genes. In some embodiments, each nucleic acid decoy of the plurality of nucleic acid decoys is capable of hybridizing to a plurality of transcripts, and the number of nucleic acid decoys in the plurality of nucleic acid decoys is less than the number of transcripts in the plurality of transcripts of each respective gene of the plurality of genes.

In some such embodiments, the decoy coverage for each respective transcript in the plurality of transcripts of the respective gene is less than 1X. In some such embodiments, the bait for each respective isoform of the plurality of isoforms of the first genetic target covers less than 0.8X, less than 0.6X, less than 0.4X, less than 0.2X, or less than 0.1X.

In some embodiments, each respective nucleic acid decoy in the plurality of nucleic acid decoys has less than a threshold percentage of sequence identity to any other nucleic acid decoy in the plurality of nucleic acid decoys. For example, in some embodiments, each respective nucleic acid bait of the plurality of nucleic acid baits has less than 100%, less than 98%, less than 96%, less than 94%, less than 92%, less than 90%, less than 88%, less than 86%, less than 84%, less than 82%, less than 80, less than 70%, less than 60%, less than 50%, or less than 40% identity to any other nucleic acid bait of the plurality of nucleic acid baits. In some embodiments, the percentage sequence identity threshold is 10%, 20%, 30%, or 5% -50%. In some embodiments, a threshold value of shared sequence identity between a respective nucleic acid decoy of the plurality of nucleic acid decoys and any other nucleic acid decoys of the plurality of nucleic acid decoys determines a level of cross-hybridization of each respective nucleic acid decoy to an off-target sequence reading. In some embodiments, each respective nucleic acid decoy in the plurality of nucleic acid decoys comprises a nucleic acid sequence that has at least 90% minimal identity to a reference genome. In some embodiments, each respective nucleic acid decoy in the plurality of nucleic acid decoys comprises a nucleic acid sequence that has at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% minimal identity to a reference genome.

In some embodiments, each respective nucleic acid decoy of the plurality of nucleic acid decoys hybridized to a transcript of the plurality of transcripts of the respective genes has a Tm relative to the transcript that is between the first threshold temperature and the second threshold temperature. In some such embodiments, the first threshold temperature is between 55 ℃ and 85 ℃, and the second threshold temperature is between 90 ℃ and 110 ℃ and in some cases hybridization occurs at 65 ℃. In some cases, hybridization occurs at 60 ℃.

In some embodiments, each respective nucleic acid decoy of the plurality of nucleic acid decoys hybridizes to a respective region of the gene that is at least a minimum threshold distance from any annotated start and/or end sites of the respective gene. In some embodiments, a respective nucleic acid decoy of a plurality of nucleic acid decoys mapped to a respective sequence read of a respective gene is located at least a minimum threshold distance from the 3' end of the respective sequence read. In some such embodiments, off-target hybridization of a corresponding nucleic acid decoy to a corresponding cDNA sequence may occur when there are unexpired poly a sites or poly a sequences in the genomic exons and/or mRNA sequences that result in mispriming of oligo dT. Thus, in some such embodiments, the optimal position for hybridization of the nucleic acid decoy to the corresponding cDNA sequence is located at least a minimum threshold distance from the 3' end. Non-limiting examples of minimum threshold distances are 100-200 base pairs (bp), 200-300bp, 300-400bp, 400-500bp, 500-600bp, 600-700bp, 700-800bp, 800-900bp, 900-1000bp, or more than 1000bp. In some embodiments, the percentage of cDNA sequences that preferentially hybridize to the nucleic acid decoy at least a minimum threshold distance from the 3' end is between 0% and 10%, between 10% and 20%, or between 20% and 30%. Furthermore, in some embodiments, nucleic acid decoys comprising unannotated poly-a sites or poly-a sequences in the mRNA sequence are removed from the plurality of nucleic acid decoys. In some cases, a modified decoy is designed if the sequence in the decoy oligonucleotide and the sequence in the genome match. To prepare a modified bait, the original sequence can be slid +/-40bp from the original position to identify potentially new decoy oligonucleotides. For each design, the new decoy oligonucleotides were tested against the genome. After all such candidates are classified, the decoy oligonucleotides that are ultimately included in one or more of the sets described herein are sorted (i.e., ordered) according to the decoy oligonucleotide length (i.e., longer decoys are prioritized), and then by distance from the original intended position (sequence that is closer to the original intended position is prioritized). However, if no bait meets the required criteria, then discarding the bait does not design the bait at that location.

In some embodiments, the decoy oligonucleotide sequence is 40 nucleotides in length. In some embodiments, the decoy oligonucleotide sequence is 40-160 nucleotides in length. In some embodiments, the decoy oligonucleotide sequence is 40-120 nucleotides in length. In some embodiments of the present invention, in some embodiments, the bait oligonucleotide sequences have a length of about 40, about 41, about 42, about 43, about 44, about 45, about 46, about 47, about 48, about 49, about 50, about 51, about 52, about 53, about 54, about 55, about 56, about 57, about 58, about 59, about 60, about 61, about 62, about 63, about 64, about 65, about 66, about 67, about 68, about 69, about 70, about 71, about 72, about 73, about 74, about 75, about 76, about 77, about 78, about 79, about 80, about 81, about 82, about 83, about 84, about 85, about 86, about 87, about 88, about 89, about 90, about 91, about 92, about 93, about 94, about 95, about 96, about 97, about 98, about 99, about 100, about 101, about about 102, about 103, about 104, about 105, about 106, about 107, about 108, about 109, about 110, about 111, about 112, about 113, about 114, about 115, about 116, about 117, about 118, about 119, about 120, about 121, about 122, about 123, about 124, about 125, about 126, about 127, about 128, about 129, about 130, about 131, about 132, about 133, about 134, about 135, about 136, about 137, about 138, about 139, about 140, about 141, about 142, about 143, about 144, about 145, about 146, about 147, about 148, about 149, about 150, about 151, about 152, about 153, about 154, about 155, about 156, about 157, about 158, about 159 or about 160 nucleotides. In some cases, the decoy oligonucleotide is a single-stranded 120 nucleotide long DNA oligonucleotide with a 5' biotin modification. In some cases, each decoy targets a unique library molecule. Decoy oligonucleotides can span all mature mRNA sequences, including UTRs and all annotated isoforms.

In some embodiments, the decoy oligonucleotides of the plurality of decoy oligonucleotides comprise domains that specifically bind to all or part of the spatial barcode or its complement. In some embodiments, the decoy oligonucleotides of the plurality of decoy oligonucleotides comprise domains that specifically bind all or part of the sequence of an analyte or complement thereof from a biological sample. In some embodiments, the decoy oligonucleotides of the plurality of decoy oligonucleotides comprise domains that specifically bind all or part of the spatial barcode or its complement or all or part of the sequence of an analyte or its complement from a biological sample. In some embodiments, the domain of the decoy oligonucleotide hybridizes to the analyte of interest. In some embodiments, the domain of the decoy oligonucleotide specifically binds to the analyte of interest. In some embodiments, the domain of the decoy oligonucleotide specifically binds to all or part of the spatial barcode or its complement. In some embodiments, the domain of the decoy oligonucleotide specifically binds to all or part of the sequence of the analyte from the biological sample. In some embodiments, the domain of the decoy oligonucleotide specifically binds to a 3' portion of the sequence of the analyte or complement thereof from the biological sample. In some embodiments, the domain of the decoy oligonucleotide specifically binds to the 5' portion of the sequence of the analyte or complement thereof from the biological sample. In some embodiments, the domain of the decoy oligonucleotide specifically binds to an intron in the sequence of the analyte or complement thereof from the biological sample. In some embodiments, the domain of the decoy oligonucleotide specifically binds to an exon in the sequence of an analyte or complement thereof from a biological sample. In some embodiments, the domain of the decoy oligonucleotide specifically binds to a 3' untranslated region of an analyte or its complement from a biological sample. In some embodiments, the domain of the decoy oligonucleotide specifically binds to a 5' untranslated region of an analyte or its complement from a biological sample.

In some embodiments, the domain of the decoy oligonucleotide sequence is 40 nucleotides in length. In some embodiments, the domain of the decoy oligonucleotide sequence is 40-160 nucleotides in length. In some embodiments, the domain of the decoy oligonucleotide sequence is 40-120 nucleotides in length. In some embodiments of the present invention, in some embodiments, the domains of the bait oligonucleotide sequences have a length of about 40, about 41, about 42, about 43, about 44, about 45, about 46, about 47, about 48, about 49, about 50, about 51, about 52, about 53, about 54, about 55, about 56, about 57, about 58, about 59, about 60, about 61, about 62, about 63, about 64, about 65, about 66, about 67, about 68, about 69, about 70, about 71, about 72, about 73, about 74, about 75, about 76, about 77, about 78, about 79, about 80, about 81, about 82, about 83, about 84, about 85, about 86, about 87, about 88, about 89, about 90, about 91, about 92, about 93, about 94, about 95, about 96, about 97, about 98, about 99, about 100 about 101, about 102, about 103, about 104, about 105, about 106, about 107, about 108, about 109, about 110, about 111, about 112, about 113, about 114, about 115, about 116, about 117, about 118, about 119, about 120, about 121, about 122, about 123, about 124, about 125, about 126, about 127, about 128, about 129, about 130, about 131, about 132, about 133, about 134, about 135, about 136, about 137, about 138, about 139, about 140, about 141, about 142, about 143, about 144, about 145, about 146, about 147, about 148, about 149, about 150, about 151, about 152, about 153, about 154, about 155, about 156, about 157, about 158, about 159, or about 160 nucleotides.

In some embodiments, the analyte from the biological sample is associated with a disease or disorder. In some embodiments, the analyte from the biological sample comprises a mutation. In some embodiments, the analyte from the biological sample comprises a Single Nucleotide Polymorphism (SNP). In some embodiments, the analyte from the biological sample comprises a trinucleotide repeat.

In some embodiments, the domain of the decoy oligonucleotide hybridizes to a specific exon of a transcript (i.e., an mRNA molecule). For example, transcripts may be processed such that exons to be excised in a normal setting are contained in mature mRNA products in different environments (e.g., pathological environments such as cancer). In some embodiments, the domain of the decoy oligonucleotide recognizes (e.g., hybridizes to) one or more isoforms or analytes, but not others. In some embodiments, for example, decoy oligonucleotides hybridize to specific exons detected in a pathological environment (e.g., cancer).

In some embodiments, there is more than one decoy oligonucleotide for a particular analyte of interest in the set. For example, the analyte may undergo alternative splicing (e.g., as in an mRNA molecule). In some embodiments, different baits may detect and enhance a particular transcript, thereby detecting whether a particular exon is contained in the analyte. Thus, in some embodiments, for one analyte, a panel will include at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, or more decoy oligonucleotides.

In some embodiments, the decoy oligonucleotide is fully complementary (i.e., 100% complementary) to a portion of the target analyte. In some embodiments, the decoy oligonucleotide is partially complementary (i.e., less than 100% complementary) to a portion of the analyte of interest. In some embodiments, the decoy oligonucleotide has at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to a portion of the analyte of interest.

In some embodiments, the decoy oligonucleotide is partially complementary (i.e., less than 100% complementary) to a portion of the analyte of interest. In some embodiments, the decoy oligonucleotide is partially complementary (i.e., less than 100% complementary) to a portion of the analyte of interest. In some embodiments, a portion of the decoy oligonucleotide hybridizes to a portion of the analyte of interest. In some embodiments, a portion of the decoy oligonucleotide has at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to a portion of the analyte of interest.

Decoy oligonucleotides are included in one or more groups (e.g., clusters). The panel includes a panel of decoy oligonucleotides targeted to a particular group of analytes. For example, a panel may include a collection of decoy oligonucleotides specific to a pathological environment. In some embodiments, more than one set (e.g., a pool of decoy oligonucleotides) may be used to enhance detection of the analyte. For example, in some embodiments, at least 1, at least 2, at least 3, at least 4, at least 5, or more groups (e.g., a collection of decoy oligonucleotides) may be used to enhance detection of an analyte. All sets of decoy oligonucleotides and target analytes disclosed are provided in the accompanying sequencing list disclosed herein.

In some embodiments, the panel includes decoy oligonucleotides that hybridize and enhance detection of an analyte of interest in a cancer (e.g., a cancer panel). In some embodiments, the cancer group is capable of quantifying analytes of aberrant expression in the cancer transcriptome while avoiding the additional costs and time associated with sequencing the entire exome. Decoy oligonucleotides and target analytes for an exemplary cancer panel are provided in table 1 and the accompanying sequencing list disclosed herein. In some embodiments, decoy oligonucleotides in a cancer group can include analytes that detect disorders associated with biological processes such as apoptosis, metabolism, cell cycle (e.g., checkpoint analytes), DNA damage and repair, hypoxia, and stress toxicity. More specifically, decoy oligonucleotides in the cancer group target cancer specific analytes, such as analytes that function in pathways including, but not limited to, myc pathway, hippo pathway, RTK/RAS pathway, TP53 and TP53 related pathway, tfgβ pathway, and Wnt pathway. Decoy oligonucleotides in the cancer group can also detect cancer hotspots, tumor suppressor analytes, and immune disorder analytes. Decoy oligonucleotides and target analytes for an exemplary cancer set are disclosed in U.S. patent application No. 62/970,066 (titled "Capture targeted genetic target using hybridization/Capture method)") and U.S. patent application No. 62/929,686 (titled "Capture targeted genetic target using hybridization/Capture method)", each of which is incorporated herein by reference in its entirety.

In some embodiments, the panel includes decoy oligonucleotides that hybridize and enhance detection of analytes associated with immune disorders (e.g., an immunological panel). In some embodiments, the immunology group is able to quantitatively analyze analytes associated with immune disorders while avoiding the additional costs and time associated with sequencing the entire exome. Exemplary immunological sets of decoy oligonucleotides and target analytes are provided in table 2 and the accompanying sequencing list disclosed herein. In some embodiments, decoy oligonucleotides in an immunological set may include detecting deregulated analytes associated with biological processes such as B cell function, T cell function, cell cycle, cell signaling, interleukin signaling, and metabolism. More specifically, decoy oligonucleotides target analytes, including but not limited to transcription factors, T cell activation markers, antigen presentation genes, metabolic genes, and SIRP family members. In some embodiments, the immunology group has applications in detecting immune disorders related to examining innate immunity, adaptive immunity, one or more inflammatory responses, detecting one or more infectious diseases (e.g., bacterial infection; viral infection), and immune responses of transplant recipients. More specifically, decoy oligonucleotides in the immunology group target immunospecific biomarkers including, but not limited to, lineage markers, tissue markers, and cancer markers. In some embodiments, the decoy oligonucleotide enhances detection of an analyte expressed in bone marrow, intestine, lung, salivary gland, intestine, lymph node, stem cells, or a combination thereof. Exemplary sets of decoy oligonucleotides and target analytes are disclosed in U.S. patent application No. 62/970,066 (titled "Capture targeted genetic target using hybridization/Capture method)", and U.S. patent application No. 62/929,686 (titled "Capture targeted genetic target using hybridization/Capture method)", each of which is incorporated herein by reference in its entirety.

In some embodiments, the panel includes decoy oligonucleotides that hybridize and enhance detection of analytes that detect a deregulated pathway (e.g., a pathway panel or a gene signature panel). In some embodiments, the set of pathways is capable of quantifying analytes associated with deregulation of pathways while avoiding the additional costs and time associated with sequencing the entire set of exons. Decoy oligonucleotides and target analytes for an exemplary set of pathways are provided in table 3 and the accompanying sequencing list disclosed herein. In some embodiments, decoy oligonucleotides in a set of pathways may include detecting deregulated analytes associated with complex signal transduction pathways. Target analytes include analytes specific for a disease or drug target; g protein-coupled receptors (GPCRs), one or more kinases; one or more epigenetic markers; or one or more checkpoint analytes. Analytes in the pathway group that are detected by decoy oligonucleotides include, but are not limited to, tissue markers of cancer, central nervous system, inflammatory, metabolic, cardiovascular, respiratory, and reproductive disorders. Exemplary sets of decoy oligonucleotides and target analytes are disclosed in U.S. patent application No. 62/970,066 (titled "Capture targeted genetic target using hybridization/Capture method)", and U.S. patent application No. 62/929,686 (titled "Capture targeted genetic target using hybridization/Capture method)", each of which is incorporated herein by reference in its entirety.

In some embodiments, the panel includes decoy oligonucleotides that hybridize and enhance detection of analytes that detect neurodevelopment and/or dysregulation. In some embodiments, the panel includes decoy oligonucleotides that hybridize and enhance detection of analytes that detect neurodevelopment. In some embodiments, the panel includes decoy oligonucleotides that hybridize and enhance detection of analytes that detect neurological disorders. In some embodiments, the panel includes decoy oligonucleotides that hybridize and enhance detection of analytes that detect neurodevelopment and dysregulation. In some embodiments, the neurones are able to quantitatively analyze analytes associated with deregulation of the pathway while avoiding the additional costs and time associated with sequencing the entire transcriptome. Exemplary neural sets of decoy oligonucleotides and target analytes are provided in table 4 and the accompanying sequencing list disclosed herein. In some embodiments, decoy oligonucleotides in the nerve group may include detection of deregulated analytes associated with axonal targeting, hypoxia, and glioblastoma. In some embodiments, decoy oligonucleotides in the nerve group may include detecting a deregulated analyte associated with axonal targeting. In some embodiments, decoy oligonucleotides in the nerve group may include detecting dysregulated analytes associated with hypoxia. In some embodiments, decoy oligonucleotides in the nerve group may include detecting a deregulated analyte associated with glioblastoma. In some embodiments, decoy oligonucleotides in the nerve group may include detecting a gene encoding a mitochondrial protein. In some embodiments, decoy oligonucleotides in the nerve group may include detecting genes encoding mitochondrial proteins to assess energy metabolism.

In some cases, additional (i.e., custom) baits may be added to each group. Furthermore, in some cases, fully customized groups may be prepared.

In some embodiments, any of the sets disclosed herein include decoy oligonucleotides that can enhance detection of at least about 100 analytes, about 200 analytes, about 300 analytes, about 400 analytes, about 500 analytes, about 600 analytes, about 700 analytes, about 800 analytes, about 900 analytes, about 1000 analytes, about 1100 analytes, about 1200 analytes, about 1300 analytes, about 1400 analytes, about 1500 analytes, about 1600 analytes, about 1700 analytes, about 1800 analytes, about 1900 analytes, about 2000 analytes, or more.

In some embodiments, the decoy oligonucleotide increases detection of the analyte of interest by about 1.5-fold, about 2-fold, about 3-fold, about 4-fold, about 5-fold, about 6-fold, about 7-fold, about 8-fold, about 9-fold, about 10-fold, about 20-fold, about 50-fold, about 100-fold, 500-fold, 1000-fold, or more as compared to an analyte not detected using the decoy oligonucleotide.

In some embodiments, the decoy oligonucleotide comprises a molecular tag. As disclosed herein, the molecular tag of the decoy oligonucleotide is attached to (e.g., coupled to) the nucleic acid sequence of the decoy oligonucleotide. In some embodiments, the molecular tag includes one or more moieties. In some embodiments, the moiety comprises a label as described herein. The label is capable of detecting hybridization of the decoy oligonucleotide to the analyte. In some embodiments, the label is directly associated (i.e., coupled) with the decoy oligonucleotide. The detectable label may be directly detectable by itself (e.g., a radioisotope label or a fluorescent label), or, in the case of an enzymatic label, may be indirectly detectable, e.g., by catalyzing a chemical change in a chemical substrate compound or composition that is directly detectable. The detectable label may be suitable for small scale detection and/or for high throughput screening. Thus, suitable detectable labels include, but are not limited to, radioisotopes, fluorophores, chemiluminescent compounds, bioluminescent compounds, and dyes.

In some embodiments, the molecular tag comprises a small molecule. In some embodiments, the molecular tag comprises a nucleic acid. In some embodiments, the target nucleic acid is single stranded. In some embodiments, the target nucleic acid is double stranded. In some embodiments, the nucleic acid is RNA. In some embodiments, the nucleic acid is DNA. In some embodiments, the molecular tag comprises a carbohydrate. In some embodiments, the molecular tag is located 5' to the domain in the decoy oligonucleotide. In some embodiments, the molecular tag is located 3' of the domain in the decoy oligonucleotide.

In some embodiments, the agent that specifically binds to the molecular tag comprises a protein. In some embodiments, the protein is an antibody. In some embodiments, the agent that specifically binds to the molecular tag comprises a nucleic acid. In some embodiments, the agent that specifically binds to the molecular tag comprises a small molecule. In some embodiments, an agent that specifically binds to a molecular tag is attached to a substrate. In some embodiments, the substrate is a bead. In some embodiments, the substrate is a well. In some embodiments, the substrate is a slide.

In some embodiments, the moiety is biotin. In some embodiments, the biotin molecule is directly associated (i.e., coupled) with the decoy oligonucleotide at the 3' end. In some embodiments, the biotin molecule is directly associated (i.e., coupled) with the decoy oligonucleotide at the 5' end. In some embodiments, and as disclosed below, a biotin molecule can be associated (e.g., coupled) with an avidin molecule, thereby enabling pull-down (pulldown) of the analyte. In some embodiments, and as disclosed below, a biotin molecule can be associated (e.g., coupled) with a streptavidin molecule, thereby enabling the pull-down of the analyte.

In some embodiments, the decoy oligonucleotide does not include a portion attached to the sequence (i.e., the decoy oligonucleotide is a naked decoy oligonucleotide).

(ii) Hybridization method

In some cases, nucleic acids from a single nucleic acid library (i.e., a single sample) are hybridized to an oligonucleotide decoy. In some cases, multiple samples may be pooled prior to hybridization. . In some cases, at least 2, 3, 4, 5, 6, 7, 8, or more samples are pooled. In some cases, each library is complexed with a barcode specific for a particular nucleic acid set. In some cases, the collection library is from the same cell or tissue type. In some cases, the collection library is from a different cell or tissue type.

In an arrangement where multiple samples from different points on one or more arrays are pooled, the pooling calculation takes into account the number of points covered by the tissue on the array. In some cases, the number of gene expression points covered by tissue may be estimated visually or more accurate measurements made by using an automated calculation method. In an arrangement having an array of about 5000 spots, the number of spots covered by a sample on the array can be calculated by multiplying the percentage of coverage by the total number of gene expression spots (e.g., about 5000).

In some cases, a universal blocker is added to the pool during the step of prehybridization pooling nucleic acids. In some embodiments, human Cot DNA is added to the collection. Human Cot DNA is rich in repetitive non-coding elements common in genomic DNA. These repeated sequences often result in non-specific binding during the hybridization reaction. The addition of Cot DNA to these reactions reduced non-specific binding associated with these repeated sequences, improving accuracy.

In some cases, the decoy oligonucleotide hybridization methods disclosed herein include a variety of buffers and components to aid in nucleic acid capture. In some cases, the method includes a hybridization enhancer, a wash buffer, an equilibration buffer, a hybridization buffer, or any combination thereof. Any buffer may be in concentrated form, which may be diluted to an appropriate concentration by one skilled in the art using, for example, nuclease-free water. In some cases, the component comprises streptavidin beads. In some cases, the component comprises avidin beads.

In some embodiments, wherein the RNA is a nucleic acid analyte, one or more RNA analytes of interest may be selectively enriched. For example, one or more RNAs of interest may be selected by adding one or more oligonucleotides to the sample. In some embodiments, the other oligonucleotides are sequences for initiating a reaction by a polymerase. For example, one or more primer sequences having sequence complementarity to one or more RNAs of interest may be used to amplify one or more RNAs of interest, thereby selectively enriching those RNAs. In some embodiments, an oligonucleotide (e.g., probe) having sequence complementarity to the complementary strand of the captured RNA (e.g., cDNA) can bind to the cDNA. For example, biotinylated oligonucleotides having sequences complementary to one or more cdnas of interest can be bound to the cdnas and can be selected using any of a variety of methods known in the art (e.g., streptavidin beads) using biotinylation-streptavidin affinity. He is known in the art as a non-nucleic acid affinity moiety, such as 2- (4-hydroxyphenylazo) benzoic acid (HABA) or a compound listed in Table 5.

Alternatively, any of a variety of methods may be used to downwardly select (e.g., remove) one or more RNAs. For example, probes can be administered to a sample that selectively hybridizes to ribosomal RNA (rRNA), thereby reducing the collection and concentration of rRNA in the sample. Subsequent application of the capture probe to the sample may result in improved capture of other types of RNA due to the reduction of non-specific RNA present in the sample.

(1) Direct hybridization of decoy oligonucleotides

After creating a plurality of nucleic acids or libraries generated using the same, one or more decoy oligonucleotides can be used to enrich for one or more nucleic acids of interest.

In some embodiments, decoy oligonucleotides are added to a plurality of nucleic acids (e.g., a nucleic acid library). In some embodiments, the plurality of nucleic acids includes a nucleic acid having a partial sequence of a spatial barcode or complement thereof, an analyte from a biological sample, or complement thereof. In some embodiments, the spatial barcode includes a sequence corresponding to a region of interest in the biological sample. In some embodiments, the spatial barcode is capable of detecting and correlating to a particular region of interest in the biological sample. In some embodiments, the methods disclosed herein include identifying a region of interest. In some embodiments, the spatial barcode provides information about the location of the analyte in the biological sample. In some embodiments, the methods disclosed herein comprise identifying the location and/or abundance of an analyte in a biological sample.

In some embodiments, decoy oligonucleotides are added to a plurality of nucleic acids. In some embodiments, the decoy oligonucleotide comprises a domain that specifically binds to all or part of a spatial barcode or complement thereof and/or all or part of the sequence of an analyte or complement thereof. In some embodiments, complexes of decoy oligonucleotides that specifically bind to nucleic acids may be enriched. For example, the decoy oligonucleotide may include a molecular tag, and a reagent that specifically binds to the molecular tag may be used to enrich for complexes of the decoy oligonucleotide that specifically bind to nucleic acids. In some embodiments, the molecular tag may be attached (directly or indirectly) to a substrate (e.g., a slide, well, or bead). In some embodiments, the molecular tag may include a protein, a nucleic acid, a carbohydrate, a small molecule, or any combination thereof. In some embodiments, the agent that specifically binds to the molecular tag may be a protein, a nucleic acid, a carbohydrate, a small molecule, or any combination thereof. In some embodiments, the molecular tag may be avidin and the agent that specifically binds to the molecular tag may be biotin. In some embodiments, the molecular tag may be streptavidin and the agent that specifically binds the molecular tag may be biotin.

In some embodiments, the molecular tag can be biotin and the agent that specifically binds to the molecular tag can be avidin or streptavidin (e.g., streptavidin attached to a bead). In some cases, the beads are prepared using methods known in the art. In some cases, a buffer (e.g., equilibration buffer) is used to suspend the beads. In some cases, the beads are separated from the buffer using methods known in the art (e.g., using a magnetic separator; centrifugation). In some cases, the beads are resuspended in buffer for capture of the decoy oligonucleotides.

In some embodiments where the molecular tag is biotin and the method includes using avidin or streptavidin beads to enrich for nucleic acid complexed with a decoy oligonucleotide, the streptavidin beads may be washed using any method known in the art. In some embodiments, streptavidin beads may be washed 1, 2, 3, 4, 5, 6, 7, or more times. In some embodiments, streptavidin beads may be washed strictly 1, 2, 3, 4, 5, 6, 7 or more times. In some embodiments, streptavidin bead washing can be at about 15 ℃, about 20 ℃, about 25 ℃, about 30 ℃, about 35 ℃, about 40 ℃, about 45 ℃, about 48 ℃, about 50 ℃, about 55 ℃, about 60 ℃, about 62 ℃, about 65 ℃, about 67 ℃, about 70 ℃, about 75 ℃, or more. In some embodiments, streptavidin beads may be washed at about 67 ℃. In some cases, the temperature of the avidin or streptavidin bead wash is the same as the temperature of the decoy oligonucleotide hybridization step (e.g., 60 ℃ or 65 ℃). In some embodiments, after one or more wash steps, one or more nucleic acids that hybridize to one or more decoy oligonucleotides can be recovered and enriched.

In some embodiments, the recovered nucleic acid may be released from one or more decoy oligonucleotides and purified to remove avidin or streptavidin and biotin (or any other molecular tag and reagent that specifically binds to the molecular tag).

In some embodiments, the molecular tag can be biotin and the agent that specifically binds to the molecular tag can be avidin (e.g., avidin attached to a bead). In some embodiments where the molecular tag is biotin and the method includes using the avidin beads to enrich for nucleic acids complexed with decoy oligonucleotides, the avidin beads may be washed using any method known in the art. In some embodiments, the avidin beads may be washed 1, 2, 3, 4, 5, 6, 7 or more times. In some embodiments, avidin beads may be washed strictly 1, 2, 3, 4, 5, 6, 7 or more times. In some embodiments, the avidin bead wash can be at about 15 ℃, about 20 ℃, about 25 ℃, about 30 ℃, about 35 ℃, about 40 ℃, about 45 ℃, about 48 ℃, about 50 ℃ or higher. In some embodiments, after one or more wash steps, one or more nucleic acids that hybridize to one or more decoy oligonucleotides can be recovered and enriched. In some embodiments, a plurality of decoy oligonucleotides can be used in any of the methods described herein to enrich for one or more nucleic acids of interest from a plurality of nucleic acids. In some embodiments, the plurality of decoy oligonucleotides are designed to enrich for one or more nucleic acids including all or part of the sequence of an analyte of interest (e.g., one or more genes that function or are abnormally expressed in a particular cellular state or pathway) or a complement thereof. For example, in some embodiments, a plurality of decoy oligonucleotides can be used to enrich for nucleic acids including all or a portion of the sequence of a cancer-associated transcript or complement thereof (e.g., U.S. patent application No. 62/970,066 (entitled "Capture of targeted genetic target using hybridization/Capture method)") and U.S. patent application No. 62/929,686 (entitled "Capture of targeted genetic target using hybridization/Capture method)"), each of which is incorporated herein by reference in its entirety. In some embodiments, a plurality of decoy oligonucleotides can be used to enrich for nucleic acids including all or a portion of the sequence of an immune-related transcript or its complement (e.g., as disclosed in U.S. patent application No. 62/970,066 (entitled "Capture of targeted genetic target using hybridization/Capture method)") and U.S. patent application No. 62/929,686 (entitled "Capture of targeted genetic target using hybridization/Capture method)", each of which is incorporated herein by reference in its entirety). In some embodiments, a plurality of decoy oligonucleotides can be used to enrich for nucleic acids including all or part of the sequence of a pathway-specific transcript or its complement (e.g., U.S. patent application No. 62/970,066 (entitled "Capture of targeted genetic target using hybridization/Capture method)") and U.S. patent application No. 62/929,686 (entitled "Capture of targeted genetic target using hybridization/Capture method)"), each of which is incorporated herein by reference in its entirety. In some embodiments, a plurality of decoy oligonucleotides can be used to enrich for nucleic acids comprising all or part of the sequence of a neurological-specific transcript or its complement, as set forth in table 4 and the accompanying sequence listing.

In some embodiments, after hybridization of one or more decoy oligonucleotides to their target nucleic acids, the hybridized nucleic acids are enriched, thereby producing a nucleic acid set enriched for the particular nucleic acid of interest. In some cases, after hybridization and binding to streptavidin or avidin beads, the unhybridized nucleic acid is washed out of the sample, thereby enriching the nucleic acid of interest. In some embodiments, after hybridization of one or more decoy oligonucleotides to their target nucleic acids, the unhybridized nucleic acids are degraded (e.g., by nucleases), thereby enriching for hybridized nucleic acids. In some embodiments, after hybridization of one or more decoy oligonucleotides, the hybridized nucleic acids are degraded (e.g., by nucleases), thereby enriching for unhybridized nucleic acids; for example, the techniques may be used to reduce the amount of high abundance nucleic acids that are not of interest. In some embodiments, the decoy oligonucleotide may not include a detectable moiety. In some embodiments, the enriched nucleic acid is purified. In some embodiments, the enriched nucleic acid is sample indexed (e.g., prior to sequencing).

In some embodiments, the decoy oligonucleotide hybridizes to the nucleic acid at about 40 ℃, about 45 ℃, about 50 ℃, about 55 ℃, about 60 ℃, about 65 ℃, about 70 ℃, about 75 ℃, about 80 ℃ or more. In some embodiments, the decoy oligonucleotide hybridizes to the nucleic acid for at least 15 minutes, at least 30 minutes, at least 45 minutes, at least 1 hour, at least 90 minutes, at least 2 hours, at least 3 hours, at least 4 hours, at least 5 hours, or more.

In some embodiments, the decoy oligonucleotide hybridizes to the nucleic acid for about 2 hours. In some cases, the decoy oligonucleotide hybridizes to the nucleic acid overnight (e.g., at least about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, or more). In some cases, the decoy oligonucleotide hybridizes to the nucleic acid for about 6 hours, about 7 hours, about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, or more. In some cases, after hybridization, the sample comprising decoy oligonucleotides hybridized to the nucleic acid of interest is washed. In some cases, washing is performed using any of the washing solutions described herein. In some cases, the washing occurs at 65 ℃. In some cases, the washing occurs at 60 ℃. In some cases, hybridization occurs overnight at 60 ℃, followed by a subsequent wash step at 60 ℃. In the case of nucleic acid molecules having an average length of less than 700 base pairs, hybridization should be extended to 60℃overnight. In this setup, the subsequent wash is performed at 60 ℃. In the case where the library set comprises nucleic acid molecules of different lengths (e.g., if the set comprises short (e.g., < 700 bp) and long (e.g., > 700 bp)), hybridization can be extended to 60 ℃ overnight, followed by washing at 60 ℃. For example, a percentage (e.g., about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%) of the nucleic acid fraction can be less than 700 base pairs, so the alternative method is overnight at 60 ℃, followed by washing at 60 ℃.

In some embodiments, one or more detectable moieties may be bound (e.g., directly or indirectly linked) to the decoy oligonucleotide. In some embodiments, one or more detectable moieties can be used to detect (or enhance detection of) decoy oligonucleotides (e.g., decoy oligonucleotides that hybridize to nucleic acids).

In some embodiments, the enriched nucleic acid may be amplified. After amplification, the enriched and amplified nucleic acids can be used to generate a nucleic acid library and sequenced using any method known in the art, including the exemplary sequencing methods described herein. In some embodiments, sequencing may include determining all or part of the sequence of a spatial barcode or its complement in a nucleic acid. In some embodiments, sequencing comprises determining all or part of the sequence of an analyte or its complement from a biological sample in a nucleic acid. In some embodiments, sequencing comprises high throughput sequencing.

In some cases, the amplification method includes a step that includes a composition that includes a buffer and a library primer set. Thermal cyclers can be used to amplify enriched nucleic acid libraries (e.g., cycling steps at 98 ℃, 67 ℃, and 72 ℃). It will be appreciated that one skilled in the art can determine temperature and run time parameters to amplify a nucleic acid library. In some cases, the total number of cycles of amplification may be varied to optimize detection of one or more decoy oligonucleotides. In some cases, the number of PCR cycles performed to amplify the library is at least 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more cycles. It will be appreciated that if the expected expression of a given target is lower, more cycles are required to detect the target. In some cases, amplification can be performed while the beads (e.g., affinity cords or streptavidin) are in the mixture. In some cases, one or more beads may be removed using methods known in the art (e.g., using a magnet).

In some cases, the average fragment size of the nucleic acids in the library may be determined after amplification. In some cases, an automated electrophoresis system (e.g., tapeStation from Agilent) may be run as a quality control for nucleic acid library samples. In some cases, the size, number, and integrity of the samples may be analyzed prior to downstream analysis (e.g., sequencing).

In some embodiments, targeted spatial gene expression profiling using one or more sets (e.g., cancer set, immune set, pathway set, or nerve set) as described herein can enrich for about 1-fold, about 2-fold, about 3-fold, about 4-fold, about 5-fold, about 6-fold, about 7-fold, about 8-fold, about 9-fold, about 10-fold, about 20-fold, about 50-fold, about 100-fold, or more target nucleic acids as compared to unbiased spatial profiling. In some embodiments, target nucleic acids may be enriched about 1-fold, about 2-fold, about 3-fold, about 4-fold, about 5-fold, about 6-fold, about 7-fold, about 8-fold, about 9-fold, about 10-fold, about 20-fold, about 50-fold, about 100-fold, or more of each gene using targeted spatial gene expression profiling of one or more groups (e.g., cancer group, immune group, pathway group, or nerve group) as described herein as compared to unbiased spatial profiling.

In some embodiments, using targeted spatial gene expression profiling of one or more groups (e.g., cancer group, immune group, pathway group, or nerve group) as described herein can increase the percentage of mid-target reads by about 1-fold, about 2-fold, about 3-fold, about 4-fold, about 5-fold, about 6-fold, about 7-fold, about 8-fold, about 9-fold, about 10-fold, about 20-fold, about 50-fold, about 100-fold, or more as compared to unbiased spatial profiling. In some embodiments, the number of mid-target reads using one or more sets of targeted spatial gene expression profiling as described herein is at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, or at least 95% compared to using unbiased spatial profiling. In some embodiments, using targeted spatial gene expression profiling of a group as described herein (e.g., cancer group, immune group, pathway group, or nerve group) can reduce the percent of off-target reads by about 1-fold, about 2-fold, about 3-fold, about 4-fold, about 5-fold, about 6-fold, about 7-fold, about 8-fold, about 9-fold, about 10-fold, about 20-fold, about 50-fold, about 100-fold, or more compared to an unbiased spatial profiling.

In some embodiments, detection of enriched target analytes is analyzed using targeted spatial gene expression profiles of one or more groups (e.g., cancer group, immune group, pathway group, or nerve group) as described herein. In some embodiments, detection of one or more analytes of interest is enriched using targeted spatial gene expression profiling analysis of one or more groups (e.g., cancer group, immune group, pathway group, or nerve group) as described herein. In some embodiments, enriching comprises increasing the number of sequencing reads detected when sequencing a plurality of nucleic acids and/or probes. In some embodiments, enriching comprises increasing the number of sequencing reads detected when sequencing a plurality of nucleic acids after avidin-biotin pull-down as described herein. In some embodiments, enriching comprises increasing the number of sequencing reads detected when sequencing a plurality of nucleic acids after streptavidin-biotin pulldown as described herein. In some embodiments, the number of enriched target analyte sequence reads is increased by about 50%, about 55%, about 60%, about 62%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, about 1-fold, about 2-fold, about 3-fold, about 4-fold, about 5-fold, about 6-fold, about 7-fold, about 8-fold, about 9-fold, about 10-fold, about 20-fold, about 50-fold, about 100-fold, or more compared to the number of sequence reads in the same target analyte when whole locus sequencing is performed.

In some embodiments, enriched target analyte sequence reads obtained from targeted spatial gene expression profiling using one or more sets (e.g., cancer set, immune set, pathway set, or nerve set) as described herein are highly correlated with sequence reads when performing unbiased (e.g., non-targeted) spatial gene expression profiling in the same biological sample. In some embodiments, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% umi detected from the enriched analyte sequence reads matches the sequence reads when subjected to unbiased (e.g., non-targeted) spatial gene expression profiling in the same biological sample. In some embodiments, the matched UMI count R2 is at least 0.95, at least 0.96, at least 0.97, at least 0.98, or at least 0.99.

In some embodiments, the number of reads (reads/points) from each point obtained from targeted spatial gene expression profiling using one or more sets (e.g., cancer set, immune set, pathway set, or nerve set) described herein is less than 20%, less than 15%, less than 10%, less than 9%, less than 8%, less than 7%, less than 6%, less than 5%, less than 4%, less than 3%, less than 2%, or less than 1% as compared to the number of reads (reads/points) from each point obtained from unbiased (e.g., non-targeted) spatial gene expression profiling. In some embodiments, the total reading obtained from targeted spatial gene expression profiling using one or more of the groups (e.g., cancer group, immune group, pathway group, or nerve group) described herein is less than 20%, less than 15%, less than 10%, less than 9%, less than 8%, less than 7%, less than 6%, less than 5%, less than 4%, less than 3%, less than 2%, or less than 1% as compared to the total reading obtained from unbiased (e.g., non-targeted) spatial gene expression profiling. In some embodiments, while the total reads and/or reads/spots are reduced by using one or more sets of targeted spatial gene expression profiling as described herein, the biological pattern (e.g., spatial pattern of specific gene expression in a biological sample) remains consistent compared to the results obtained from unbiased (e.g., non-targeted) spatial gene expression profiling.

In some embodiments, about 50%, about 55%, about 60%, about 62%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 95% or more of the group gene recovery can be obtained using targeted spatial gene expression profiling of one or more groups (e.g., cancer group, immune group, pathway group, or nerve group) as described herein relative to unbiased spatial gene expression profiling.

In some embodiments, about 50%, about 55%, about 60%, about 62%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 95% or more of the group UMI recovery and/or UMI retention can be obtained using targeted spatial gene expression profiling of one or more groups (e.g., cancer group, immune group, pathway group or nerve group) as described herein relative to unbiased (e.g., non-targeted) spatial gene expression profiling. In some embodiments, targeted spatial gene expression profiling using one or more groups (e.g., cancer group, immune group, pathway group, or nerve group) as described herein can reduce the complexity (or complexity) of gene or UMI targeting to about 90%, about 80%, about 70%, about 60%, about 50%, about 40%, about 30%, about 20% or less of an unbiased (e.g., non-targeted) spatial gene expression profiling.

In some embodiments, the double index library may be used for targeted spatial gene expression profiling. In some embodiments, the individually indexed libraries may be mixed by hybridization/capture prior to the enrichment step to generate targeted libraries for downstream high throughput sequencing. In some embodiments, the individually indexed libraries can be mixed by hybridization/capture after the enrichment step to generate targeted libraries for downstream high throughput sequencing. In some embodiments, the targeted spatial gene expression profile of one or more groups as described herein is used to provide comparable, even more detailed spatial gene profiling results as compared to annotated and/or unbiased (e.g., non-targeted) spatial gene profiling obtained from a pathologist.

(2) Direct hybridization of decoy oligonucleotides to target analytes

In some embodiments, decoy oligonucleotides (e.g., a collection of decoy oligonucleotides from one or more of the sets disclosed herein) can be added directly to an analyte. In some embodiments, the decoy oligonucleotide is directly associated (i.e., conjugated) at the 3' end with a detectable moiety or molecular tag. In some embodiments, the decoy oligonucleotide is directly associated (i.e., conjugated) at the 5' end with a detectable moiety or molecular tag. In some embodiments, the detectable moiety is a fluorophore or radioisotope.

In some embodiments, the target-specific reaction is performed in a biological sample. In some embodiments, the nucleic acid analyte may be denatured prior to contact with the decoy oligonucleotide to produce single stranded analytes. In some embodiments, the decoy oligonucleotide directly binds (e.g., hybridizes) to an analyte (e.g., a single-stranded analyte), and one or more imaging techniques can be used to identify the detectable moiety (e.g., a fluorophore). In some embodiments, the target-specific reaction comprises in situ hybridization of a decoy oligonucleotide to an analyte (e.g., a single-stranded analyte). In some embodiments, the in situ hybridization is Fluorescence In Situ Hybridization (FISH). In some embodiments, a fluorophore or molecular tag can be used as a marker to identify (e.g., enrich) an analyte of interest. In some embodiments, fluorescence microscopy can be used to locate fluorophore-labeled decoy oligonucleotides in a biological sample or elsewhere. In some embodiments, analytes that do not hybridize to decoy oligonucleotides may be washed away. In some embodiments, the washing method includes separating only those analytes that fluoresce. In some embodiments, the washing step comprises any of the washing steps provided herein.

In some embodiments, a fluorophore (e.g., any of the fluorophores disclosed herein) can be directly conjugated (e.g., coupled) to a decoy oligonucleotide. In some embodiments, the decoy oligonucleotide may include a non-fluorescent moiety that is directly bound (e.g., coupled) to the decoy oligonucleotide. In some embodiments, the fluorophore may bind to a non-fluorescent moiety, thereby enhancing detection of the analyte.

In some embodiments, the fluorescent in situ hybridization methods disclosed herein detect a specific set of transcripts. In some embodiments, the fluorescent in situ hybridization methods disclosed herein detect cancer-related transcripts (e.g., as disclosed in U.S. patent application No. 62/970,066 (titled "Capture targeted genetic target using hybridization/Capture method)") and U.S. patent application No. 62/929,686 (titled "Capture targeted genetic target using hybridization/Capture method)"), each of which is incorporated herein by reference in its entirety. In some embodiments, the fluorescent in situ hybridization methods disclosed herein detect immune-related transcripts (e.g., as disclosed in U.S. patent application No. 62/970,066 (entitled "Capture of targeted genetic target using hybridization/Capture method)") and U.S. patent application No. 62/929,686 (entitled "Capture of targeted genetic target using hybridization/Capture method)"), each of which is incorporated herein by reference in its entirety. In some embodiments, the fluorescent in situ hybridization methods disclosed herein detect pathway-specific transcripts (e.g., as disclosed in U.S. patent application No. 62/970,066 (entitled "Capture of targeted genetic target using hybridization/Capture method)") and U.S. patent application No. 62/929,686 (entitled "Capture of targeted genetic target using hybridization/Capture method)"), each of which is incorporated herein by reference in its entirety. In some embodiments, the fluorescent in situ hybridization methods disclosed herein detect a neurological-specific transcript as described herein.

In some embodiments, the hybridized analyte/decoy oligonucleotide complex can be cleaved from the substrate. In some embodiments, a spatial barcoded array populated with a plurality of capture probes may be contacted with a biological sample. The spatially barcoded capture probes can be lysed and then interact with cells within the biological sample. In some embodiments, the capture probes of the plurality of capture probes may optionally include at least one cleavage domain representing a capture probe moiety for reversibly attaching the capture probes to an array feature. Furthermore, one or more segments or regions of the capture probes may optionally be released from the array features by cleavage of the cleavage domain. For example, a spatial barcode may be released by cleavage of a cleavage domain. In some embodiments, the capture probe may further comprise a cleavage site (e.g., a cleavage recognition site for a restriction endonuclease), a photolabile bond, a thermo-bond, or a chemo-sensitive bond. In some embodiments, once the spatially barcoded capture probes are associated with a particular cell or analyte thereof, the biological sample may optionally be removed.

In some embodiments, the cleaved or enriched analyte/decoy oligonucleotide complexes may be purified prior to downstream steps (e.g., sequencing). Sequencing can be performed using any method known in the art, including the exemplary sequencing methods described herein. In some cases, the methods disclosed herein comprise sequencing of a single index library. In some cases, the methods disclosed herein comprise sequencing a double index library. In some cases, the library comprises a standard Illumina end-paired construct, including P5 and P7 sequences. For a single index library, an 8bp sample index sequence was included as the i7 sample index sequence. For the double index library, the 10bp sample index sequence was included as the i7 and i5 sample index sequences.

In some embodiments, nucleic acid sequencing comprises determining all or part of the sequence of a spatial barcode or its complement. In some embodiments, sequencing comprises determining all or part of the sequence of an analyte or its complement from a biological sample. In some embodiments, sequencing comprises high throughput sequencing. In some embodiments, sequencing comprises ligating or adding one or more adaptor sequences or complements thereof, poly (GI) sequences or complements thereof, template switching oligonucleotide sequences or complements thereof, primer binding sites or complements thereof to the nucleic acid.

In some embodiments, a plurality of decoy oligonucleotides (e.g., two or more decoy oligonucleotides) can be used to interrogate spatial gene expression in a biological sample through RNA templated ligation.

(3) Downstream application of decoy oligonucleotides after hybridization

In some embodiments, the methods disclosed herein comprise post-capture amplification. In some cases, hybridization library molecules that bind to streptavidin or avidin beads may be amplified prior to a downstream sequencing event. In some cases, primers (e.g., P5 adaptors and/or P7 adaptors) are added to the nucleic acid sequence or the enriched nucleic acid sequence. In some embodiments, amplification is performed prior to enrichment of the decoy oligonucleotide with the nucleic acid of interest. In some embodiments, amplification occurs after enrichment of the decoy oligonucleotide with the nucleic acid of interest.

In some embodiments of the methods provided herein, RNA templated ligation is used to interrogate spatial gene expression in a biological sample (e.g., FFPE tissue section). In some aspects, the step of RNA templated ligation comprises: (1) Hybridizing a decoy oligonucleotide pair to a nucleic acid (e.g., single stranded cDNA or RNA molecule); (2) in situ linking pairs of adjacent annealed probes; (3) Rnase H treatment that (i) releases RNA-templated ligation products from tissue (e.g., into solution) and (ii) breaks unwanted DNA-templated ligation products; and optionally, (4) amplifying the RNA-templated ligation product (e.g., by multiplex PCR).

In some aspects, RNA templated ligation is used to detect target analytes, determine sequence identity, and/or expression monitoring and transcript analysis. In some aspects, the RNA templated ligation allows detection of a specific change in nucleic acid (e.g., a mutation or Single Nucleotide Polymorphism (SNP)), detection or expression of a specific nucleic acid, or detection or expression of a specific set of nucleic acids (e.g., expression in a similar cellular pathway or in a specific pathology). In some embodiments, methods involving RNA templated ligation are used to analyze nucleic acids, e.g., by genotyping, DNA copy number or quantification of RNA transcripts, localization of specific transcripts within a sample, etc. In some aspects, the systems and methods provided herein, including RNA templated ligation, are capable of identifying Single Nucleotide Polymorphisms (SNPs). In some aspects, such systems and methods are capable of identifying mutations.

In some aspects, disclosed herein are methods of detecting RNA expression comprising contacting a first decoy oligonucleotide, a second decoy oligonucleotide, and a ligase (e.g., T4 RNA ligase). In some embodiments, the first decoy oligonucleotide and the second decoy oligonucleotide are designed to hybridize to a target sequence such that the 5 'end of the first decoy oligonucleotide and the 3' end of the second decoy oligonucleotide are adjacent and can be ligated. After hybridization, if the target sequence is present in the target sample, a ligase (e.g., T4 RNA ligase) ligates the first decoy oligonucleotide and the second decoy oligonucleotide, but if the target sequence is not present in the target sample, the ligase (e.g., T4 RNA ligase) does not ligate the first decoy oligonucleotide and the second decoy oligonucleotide. The presence or absence of a target sequence in a biological sample can be determined by determining whether the first decoy oligonucleotide and the second decoy oligonucleotide are ligated in the presence of a ligase.

In some aspects, two or more RNA analytes are analyzed using a method that includes RNA templated ligation. In some aspects, when two or more analytes are analyzed, a first decoy oligonucleotide and a second decoy oligonucleotide are used that are specific (e.g., specifically hybridize) to each RNA analyte.

In some aspects, three or more decoy oligonucleotides are used in the RNA templated ligation methods provided herein. In some embodiments, three or more decoy oligonucleotides are designed to hybridize to a target sequence such that the three or more decoy oligonucleotides hybridize adjacent to each other such that the 5 'and 3' ends of adjacent probes can be ligated. In some embodiments, the presence or absence of a target sequence in a biological sample can be determined by determining whether three or more decoy oligonucleotides are ligated in the presence of a ligase.

(e) Compositions and kits

Also provided herein are kits comprising components and instructions for performing the methods described herein. In some cases, the kit includes a collection of decoy oligonucleotides. In some cases, the pool of decoy oligonucleotides is specific for a particular pool of transcripts (e.g., a cancer group, an immunological group, a cell pathway group, or a neuroscience group as disclosed herein). In some embodiments, the kit comprises decoy oligonucleotides specific for a cancer group as described herein. In some embodiments, the kit comprises decoy oligonucleotides specific for the immune groups described herein. In some embodiments, the kit comprises decoy oligonucleotides specific for the set of cell pathways described herein. In some embodiments, the kit comprises decoy oligonucleotides specific for the neural groups described herein. It will also be appreciated that custom sets may be designed using the methods of designing decoy oligonucleotides as described herein. In some cases, the kit includes multiple sets of baits for the same set, so that multiple (e.g., double, triple) reactions can be performed simultaneously on the same slide. In some cases, other decoy oligonucleotides can be added to one of the groups described herein (e.g., cancer group, immune group, pathway group, or nerve group as disclosed herein). In some cases, the kit comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 or more reactions, wherein each reaction refers to testing one sample.

In some cases, a moiety (e.g., biotin) is attached to each decoy in the pool of decoy oligonucleotides. In some cases, the kit further comprises streptavidin beads. In some cases, the kit further comprises avidin beads. In some cases, the kit includes an array comprising a plurality of capture probes (i.e., as described herein). The capture probes on the array each comprise a spatial barcode and a capture domain.

The kit may also include reagents for performing the methods disclosed herein. For example, in some cases, the kit includes reagents for performing a reverse transcription reaction (e.g., reverse transcriptase), nucleic acid quantification as described herein, adding fragments to a nucleic acid sequence, and nucleic acid cleaning. In some cases, the kit includes reagents necessary for immobilizing, staining, and destaining the biological sample.

In some cases, the kit includes components capable of performing multiple reactions. In some cases, the kit includes components capable of target hybridization. In some cases, the components for target hybridization include one or more of a universal blocker, hybridization buffer, hybridization enhancer, equilibration buffer (concentrated or 1×), wash buffer (concentrated or 1×), primer, control sample, or any combination thereof. In some cases, the kit includes components for amplifying the nucleic acid library. In some cases, the components used to amplify the nucleic acid library include mixtures, primers, or any combination thereof that facilitate amplification.

In some cases, the kit further comprises instructions for library preparation and nucleic acid detection using the methods described herein.

Also provided herein are systems for analyzing one or more biological analytes present in a biological sample, comprising an array comprising a plurality of capture probes. Any of the capture probes described herein may be included in part of the system. In some cases, the capture probes each comprise a spatial barcode and a capture domain. In some cases, the system includes a composition (e.g., an enzyme) that can cleave the probes from the array. In some cases, cleavage of the probe occurs after hybridization of the probe to the analyte.

In some cases, the system includes a pool of decoy oligonucleotides specific for a target of interest. In some cases, the system includes decoy oligonucleotides specific for the cancer groups described herein. In some cases, the system includes decoy oligonucleotides specific for the immune groups described herein. In some cases, the system includes decoy oligonucleotides specific for the sets of pathways described herein. In some cases, the system includes decoy oligonucleotides specific for the neural groups described herein.

The system may also include reagents for performing the methods disclosed herein. For example, in some cases, the system includes reagents for performing reverse transcription reactions (e.g., reverse transcriptase), nucleic acid quantification as described herein, adding fragments to nucleic acid sequences, and nucleic acid cleaning. In some cases, the system includes reagents necessary to immobilize, stain, and decolorize the biological sample.

In some cases, the system includes a reagent delivery system. The reagent delivery system includes an instrument capable of delivering reagents to discrete portions of the biological sample, thereby preserving the integrity of the spatial pattern of the addressing scheme. In some cases, the reagent delivery system of the present assay system includes optional imaging means, reagent delivery hardware, and control software. Reagent delivery may be achieved in a number of different ways. . It should be noted that the reagent may be delivered to a variety of different biological samples at once. Individual tissue sections are exemplified herein; however, multiple biological samples may be simultaneously manipulated and analyzed. For example, successive slices of tissue samples may be analyzed in parallel and the data combined to construct a 3D map. In one exemplary aspect, the reagent delivery system may be a flow-based system. Flow-based systems for reagent delivery in the present invention may include instrumentation such as one or more pumps, valves, fluid containers, channels, and/or reagent storage units.

Also provided herein are compositions comprising decoy oligonucleotides bound to nucleic acids, wherein the nucleic acids comprise (i) a spatial barcode or complement thereof and (ii) a partial analyte binding moiety barcode or complement thereof. In some cases, the composition is an intermediate composition produced by the methods disclosed herein.

The decoy oligonucleotide may be any of the decoy oligonucleotides described herein. In some cases, the decoy oligonucleotide binds to the nucleic acid through a capture domain that specifically binds to (i) all or a portion of the nucleic acid and/or (ii) all or a portion of the analyte binding moiety barcode or its complement.

In some cases, the compositions described herein further comprise a molecular tag. In some cases, the composition further comprises an agent that specifically binds to the molecular tag. The molecular tag may be any suitable molecular tag described herein. In some cases, the molecular tag comprises a protein. In some cases, the protein is streptavidin, avidin, or biotin. In some cases, the molecular tag comprises a small molecule. In some cases, the molecular tag comprises a nucleic acid. In some cases, the molecular tag includes a carbohydrate.

In some cases, the molecular tag is located 5' to the domain in the decoy oligonucleotide. In some cases, the molecular tag is located 3' of the domain in the decoy oligonucleotide.

The agent that specifically binds to the molecular tag may be any suitable agent described herein. In some cases, the reagent comprises streptavidin, avidin, or biotin. In some cases, the molecular tag is biotin and the agent that specifically binds to the molecular tag comprises avidin. In some cases, the molecular tag comprises biotin and the agent that specifically binds to the molecular tag comprises streptavidin. In some cases, the molecular tag comprises avidin and the agent that specifically binds to the molecular tag comprises biotin. In some cases, the molecular tag comprises streptavidin and the agent that specifically binds the molecular tag comprises biotin.

In some cases, the agent that specifically binds to the molecule comprises a protein. In some cases, the protein is an antibody. In some cases, the agent that specifically binds to the molecular tag includes a nucleic acid. In some cases, the agent that specifically binds to the molecular tag comprises a small molecule.

In some cases, the compositions described herein further comprise a substrate. The substrate may be any suitable substrate described herein. In some embodiments, the agent that specifically binds to the molecular tag is immobilized to the substrate. In some embodiments, the molecular tag is immobilized to a substrate. In some cases, the substrate is a bead. In some cases, the substrate is a hole. In some cases, the substrate is a slide.

Exemplary embodiments

In some embodiments, disclosed herein are methods for identifying a location of an analyte of interest (e.g., a nucleic acid) in a biological sample, the method comprising (a) contacting a plurality of nucleic acids with a plurality of decoy oligonucleotides, wherein the nucleic acids of the plurality of nucleic acids comprise (i) a spatial barcode or complement thereof and (ii) a portion of an analyte binding moiety barcode or complement thereof; and the decoy oligonucleotides of the plurality of decoy oligonucleotides comprise a domain that specifically binds to all or part of the sequence of interest in (i) the nucleic acid or its complement, and a molecular tag; (b) Capturing and/or isolating a complex of decoy oligonucleotides that specifically bind to nucleic acids using a substrate comprising an agent that specifically binds to a molecular tag; and (c) determining all or part of the sequence of (i) the spatial barcode or its complement and (ii) the sequence of the analyte binding moiety barcode, and using the sequences determined in (i) and (ii) to identify the location of the analyte in the biological sample.

In some embodiments, disclosed herein are methods for enriching a biological sample for an analyte of interest (e.g., a nucleic acid), the method comprising (a) contacting a plurality of nucleic acids with a plurality of decoy oligonucleotides, wherein the nucleic acids of the plurality of nucleic acids comprise (i) a spatial barcode or complement thereof and (ii) a portion of an analyte binding moiety barcode or complement thereof; and the decoy oligonucleotides of the plurality of decoy oligonucleotides comprise a domain that specifically binds to all or part of the sequence of interest in (i) the nucleic acid or its complement, and a molecular tag; (b) Capturing and/or isolating a complex of decoy oligonucleotides that specifically bind to nucleic acids using a substrate comprising an agent that specifically binds to a molecular tag; thereby enriching the biological sample for the analyte (e.g., nucleic acid).

In some embodiments, the analyte from the biological sample is associated with a disease or disorder. In some embodiments, the domain of the decoy oligonucleotide comprises a total of about 10 nucleotides to about 300 nucleotides. In some embodiments, the molecular tag comprises a protein. In some embodiments, the protein is biotin. In some embodiments, the molecular tag comprises a small molecule. In some embodiments, the molecular tag comprises a nucleic acid. In some embodiments, the molecular tag comprises a carbohydrate. In some embodiments, the molecular tag is located 5' to the domain in the decoy oligonucleotide. In some embodiments, the molecular tag is located 3' of the domain in the decoy oligonucleotide. In some embodiments, the agent that specifically binds to the molecular tag comprises a protein. In some embodiments, the protein is an antibody. In some embodiments, the agent that specifically binds to the molecular tag comprises streptavidin or avidin. In some embodiments, an agent that specifically binds to a molecular tag is attached to a substrate. In some embodiments, the substrate is a bead, a slide, or a well. In some embodiments, the nucleic acid is DNA. In some embodiments, the nucleic acid further comprises a primer binding sequence or a complement thereof. In some embodiments, the nucleic acid further comprises a unique molecular sequence or complement thereof. In some embodiments, the nucleic acid further comprises other primer binding sequences or complements thereof.

In some embodiments, the biological sample is a tissue sample. In some embodiments, the tissue sample is a formalin fixed, paraffin embedded (FFPE) tissue sample or a frozen tissue sample. In some embodiments, the biological sample is pre-stained with a detectable label. In some embodiments, the biological sample is pre-stained with hematoxylin and eosin (H & E). In some embodiments, the biological sample is a permeabilized biological sample. In some embodiments, the permeabilized biological sample has been permeabilized with a permeabilizing reagent. In some embodiments, the permeabilizing agent is selected from the group consisting of organic solvents, cross-linking agents, detergents, and enzymes, or combinations thereof.

In some embodiments, the permeabilizing agent is selected from the group consisting of organic solvents, cross-linking agents, detergents, and enzymes, or combinations thereof. In some embodiments, the analyte is an RNA molecule. In some embodiments, the RNA molecule is an mRNA molecule. In some embodiments, the determining in step (c) comprises sequencing all or part of the sequence of (i) the spatial barcode or its complement and (ii) the analyte binding moiety barcode. In some embodiments, the sequencing is high throughput sequencing. In some embodiments, sequencing comprises ligating an adapter to the nucleic acid.

In some embodiments, the method further comprises generating a plurality of nucleic acids comprising: (a) Contacting a plurality of analyte capture agents with a biological sample disposed on a substrate, wherein the analyte capture agents of the plurality of analyte capture agents comprise (i) an analyte binding moiety that specifically binds to an analyte, (ii) an analyte binding moiety barcode, and (iii) an analyte capture sequence; the substrate comprises a plurality of capture probes, wherein the capture probes of the plurality of capture probes comprise a spatial barcode and a capture domain, wherein the capture domain specifically binds to an analyte capture sequence; and (b) extending the 3' end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and (c) amplifying the extended capture probe to produce nucleic acid. In some embodiments, the amplification is isothermal. In some embodiments, the nucleic acid produced is released from the extended capture probe. In some embodiments, the analyte is an analyte that is deregulated or differentially expressed in cancer cells. In some embodiments, the analyte is an analyte that is deregulated or differentially expressed in immune cells. In some embodiments, the analyte is an analyte that is deregulated in a cell signal transduction pathway. In some embodiments, the analyte is an analyte that is deregulated or differentially expressed in neural cells.

Examples

Example 1: targeting spatial gene expression

FIG. 7 shows an exemplary targeted spatial gene expression workflow. Typically, the slide with the tissue attached thereto is stored at-80℃for less than one week. After thawing, the biological samples were fixed in methanol and stained with H & E (hematoxylin and eosin), imaged using a bright field microscope and permeabilized.

After intracellular analytes (e.g., mRNA) are obtained by permeabilization, the analytes are captured by capture probes attached to a slide, in which case first and second strand cDNA synthesis is performed, wherein the second strand cDNA can be used as a surrogate for the captured analytes (e.g., mRNA). After cDNA synthesis, double stranded analytes (e.g., cDNA molecules) are denatured and transferred from slides and quantified by qPCR. After the cDNA is quantified, the cDNA analyte is fragmented and end repaired. The adaptors were ligated to the cDNA molecules and Sample Index (SI) PCR was performed. After SI-PCR, the library prepared as above may be dried.

A panel comprising decoy oligonucleotides specific for a cancer target, immune target, pathway target, or neurological target hybridizes to a cDNA analyte. Decoy oligonucleotides are associated with biotin. After hybridization of the cDNA analyte to the biotinylated decoy oligonucleotide, the biotinylated decoy is captured using avidin or streptavidin beads. After a washing step to remove unbound analyte, the retained library is reamplified into a new nucleic acid library. The purified targeted library set is then sequenced.

As shown in fig. 8A-8B, spatial libraries from multiple samples (i.e., heart, breast cancer and lymph) showed very high percentage of mid-target and map reads, enrichment and UMI complexity (relative to the parental samples) using two different sets (cancer set of 200 genes (i.e., hl1 k.200) and immune set of 400 genes). These data demonstrate that the set of methods disclosed herein for targeted hybridization enrichment is compatible with the prepared spatial library.

Example 2: targeted gene capture

In another example, the step of targeting oligonucleotide hybridization may be performed prior to the adaptor ligation and SI-PCR steps.

Fig. 9 shows an exemplary workflow. The workflow utilizes a plurality of spatially generated library molecules obtained from a gene expression library. Optionally, the library is amplified by PCR to increase the number of library molecules used for targeted enrichment. In this example, up to 8 different libraries were pooled prior to enrichment. Optionally, blocking oligonucleotides are hybridized to the library to prevent modification of the targeting sequence, thereby preventing downstream interference during sequencing applications. The pooled library was dried in SpeedVac for 2-4 hours, which allowed the sample to dry at low pressure without heating. Biotinylated decoy oligonucleotides were added and hybridized to pooled library samples for 2 hours at 65 ℃. Avidin or streptavidin coated beads were provided to capture biotinylated oligonucleotide/targeted capture complex for 5 min at 65 ℃. The avidin or streptavidin beads are washed to remove any unbound sequence reads and post-capture PCR is performed to re-amplify the library of interest (e.g., to amplify the target sequence hybridized to the biotinylated oligonucleotide decoys and then captured by the avidin or streptavidin beads), followed by SPRI clean-up for 15 minutes. The purified targeted library set is then sequenced.

Example 3: hybridization and Capture Using targeting groups

An exemplary workflow of a hybridization and capture method using a spatial gene expression library is shown in fig. 10.

In particular, biological samples (e.g., tissue sections) are obtained using the suitable methods described herein, such as by frozen sections of the tissue samples. The tissue sections are then stained, for example by H & E staining, and imaged using any suitable method described herein (e.g., bright field microscopy). Stained tissue sections are then destained using the appropriate methods described herein.

After imaging and decolorizing the tissue section, it is permeabilized and then contacted with an array comprising a plurality of capture probes. Each capture probe comprises a spatial barcode and a poly (T) sequence that hybridizes to a poly (a) tail of an mRNA analyte in a tissue section, thereby capturing a plurality of mRNA analytes on the array. The captured mRNA is then reverse transcribed into cDNA and amplified using the appropriate methods described herein. Thus, a cDNA library (final library from the GEX spatial array) was constructed for spatial gene expression analysis.

Hybridization and capture using the bait sets was then performed. The blocking oligonucleotides are hybridized to analytes in the library to prevent modification of sequencing-related positions, thereby preventing downstream interference during sequencing applications. The library was then dried in SpeedVac for 2-4 hours, which allowed the sample to dry at low pressure without heating. Biotinylated bait sets were added to hybridize to the library for 2 hours at 65 ℃. Avidin or streptavidin coated beads were provided to capture biotinylated bait set-sequence read complexes for 5 minutes at 65 ℃. The avidin or streptavidin beads were washed at 65℃to remove any unbound sequence reads. Alternatively, biotinylated bait sets are added to hybridize to library samples overnight (e.g., at least 8 hours) at 65 ℃ and the avidin or streptavidin beads are washed at 60 ℃ to remove any unbound sequence reads. The avidin or streptavidin beads are subjected to a 15 minute PCR to re-amplify the targeted library (e.g., amplify a sequence read that hybridizes to the biotinylated bait and is then captured by the avidin or streptavidin beads), followed by 15 minute SPRI purification. The purified targeted library set is then sequenced.

Example 4: mouse tissue and neurome space transcriptomics

The mouse glial group of decoy oligonucleotides was generated and targeted gene expression was performed on mouse brain tissue sections. Experiments were performed as described in the 10X genome-targeted gene expression-spatial user guidance protocol, except that mouse brain tissue was used instead of human tissue and the mouse glial cell line was used.

The complete transcriptome (control) was run on a Visium array against the targeted 65 neurogenic mouse glia group. As shown in fig. 11, the control (Visium whole transcriptome assay) has very high quality UMI recovery efficiency and spatial distribution retention compared to the targeted genome. FIGS. 12A-12B illustrate the spatial gene distribution of OLIG2 in a control library (FIG. 12A) compared to the OLIG2 gene in the targeted genome (FIG. 12B). For the control, approximately 72k reads/spot had 2211 UMI counts, while approximately 3.5k reads/spot had 3315 UMI counts for the targeted genome. OLIG2 gene encodes oligodendrocyte transcription factor 2, which is expressed in oligodendrocyte brain cells, and in humans, OLIG2 gene is expressed in brain oligodendrocyte tumors. The results demonstrate that the same targeted gene expression enrichment method can also be performed using mouse decoy oligonucleotides in mouse tissue samples.

Example 5: targeting genome applied to human tissue section

The targeted genome was used to target capture gene expression in different human tissues from a visual array. The targeted gene expression method as described in the 10X genome targeted gene expression space user guidance protocol was followed.

Figures 13A-13B show general data results when comparing target enrichment using four different genomes on human glioblastoma tissue sections compared to a conventional full transcriptome control visual (non-targeted) spatial array. For all four groups (i.e., pan-cancer, immunology, gene signature, and neuroscience-enriched genomes), the mid-target reads for the different genes were between 70% -90% regardless of the targeted group used (fig. 13A). As shown in the Visium control, approximately 3% -10% of the reads of the full transcriptome map corresponded to genes in the targeted group. Furthermore, when comparing the correlation between the Visium control and the pan-cancer targeted group (as a representative targeted group, but with evidence for all groups), a high degree of reproducibility and correlation of the expression profile was observed (R ² 0.98) (fig. 13B).

To demonstrate that multiple tissue types are compatible with the targeted gene enrichment workflow, twelve different tissue types for each of the four targeted genomes (fig. 14A-14B) were evaluated for the visual control whole transcriptome spatial experiment. The twelve tissue types tested were all from humans: breast IDC, breast ILC, triple negative breast cancer, cerebellum, colorectal cancer, cortex, glioblastoma, heart, kidney disease, lung, ovarian tumor and spinal cord. Regardless of tissue type or targeted gene enriched set, more than 90% of the targeted gene was recovered relative to the control Visium whole transcriptome assay (fig. 14A), and UMI matched more than 80% (fig. 14B). Thus, a variety of tissue types are compatible with the targeted gene enrichment workflow described herein.

To demonstrate the maintenance of clustering when performing the targeted gene enrichment methods described herein, a spatial array of Visium full transcriptome on human brain cortical tissue was compared to a spatial array of targeted gene enrichment by the neuroscience group. As shown in fig. 15A (visual full transcriptome control) and fig. 15B (neuroscience group), the clusters were comparable. In fact, for the neuroscience panel, only 5k raw sequencing reads/spot were needed to recapitulate the major biological pattern shown in the full transcriptome test (full transcriptome control was 50k reads/spot). Fig. 15C demonstrates this data, demonstrating excellent correlation between the two data sets of UMI plotted for each gene.

The next question is whether the targeted genome enrichment protocol can refine the subcategories of related genes in the tissue and how it will be compared to the pathologist's annotation of the sample. Thus, experiments were performed on human breast cancer tissue sections as described herein. Fig. 16A shows pathologist notes of different cells in breast cancer tissue: DCIS, adipose, fibrous tissue, immune cells, invasive cancer cells and normal glandular cells were identified. Specifically, the areas representing invasive cancer cells, fibrous tissue, and immune cells are circled with dashed lines, dotted lines, and solid lines, respectively. Fig. 16B shows the conversion of pathologist notes to visual full transcriptome data. FIG. 16C shows 196 breast oncogenes from the pan-oncogene enriched group, shrinking to the expression of ERBB2 genes from the pan-oncogene enriched group (FIG. 16D). As shown, targeting the pan-cancer group, and 196 breast cancer genes from the group, demonstrated targeted gene expression comparable to pathologist annotation, further suggesting that ERBB2 is most abundantly expressed in the invasive cancer compartment as expected. Furthermore, all data were obtained using only 5k reads/point compared to 50k reads/point for the visual full transcriptome. This efficiency requires a small portion of the sequencing cost of performing a full transcriptome spatial analysis, thus not only facilitating targeted gene expression, but also saving costs.

Thus, this data demonstrates the ability to use a targeted genome with spatial gene expression of Visium to identify gene expression information for specific gene targets associated with cancer and other diseases important to human health and disease for different targeted genomes as well as all targeted groups.

EXAMPLE 6 cancer group

FIGS. 17A-17D detail the sequence listing of each probe in the cancer group. Cancer group represents SEQ ID NO:1 to SEQ ID NO: 1041963. Each entry in the cancer group is the SEQ ID NO of the probe in the cancer group found in the sequence listing. Thus, entry 843 of the cancer group is SEQ ID NO:843. turning to SEQ ID NO:843, the name of the sequence can be seen to be probe ENSG00000002016_0. The base part ENSG00000002016 of probe ENSG00000002016_0 represents the gene ENSG00000002016 or RAD52 in the Ensembl version 101 database, i.e. the gene of probe ENSG00000002016_0 is designed to be pulled down during targeted sequencing. See, cunningham et al Ensembl 2019, pubMed PMID: 3049321, doi:10.1093/nar/gky1113, incorporated by reference, obtain details of the Ensembl database. Each sequence in the sequence listing provides the name of the entry in the Ensembl database for the corresponding gene for which the probe was designed. 17A-17D, when the cancer group includes a contiguous range of sequences, for example, SEQ ID NOs: 843-887, which are listed as 843-887, rather than listing the individual sequence listings (e.g., 843, 844, 845, …, 887). However, the cancer group includes each probe within each range provided in fig. 17A-17D. Genetic targets in the cancer group are associated with cancer. See, e.g., kandoth et al, 2013, "mutation profile and meaning of 12 major cancer types (Mutational landscape and significance across 12 major cancer types)," Nature 502 (7471), pages 333-339; hoadley et al, 2018, "molecular classification of 10,000 tumors for 33 cancers is dominated by Cell-origin pattern (Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer), cell 173 (2), pages 291-304; bailey et al, 2018, "comprehensive characterization of cancer driver genes and mutations (Comprehensive Characterization of Cancer Driver Genes and Mutations)," Cell 173 (2), pages 371-385, each of which is incorporated by reference. See also, cancer genome map study networks (Cancer Genome Atlas Research Network), illumina TruSight 500, nanostring nCounter pan cancer pathway, idtxgen pan cancer, and Agilent ClearSeq comprehensive cancer.

EXAMPLE 7 immunological Probe set

FIGS. 18A-18D detail the sequence listing of each probe in an immunological probe set. Immunological probe sets represent SEQ ID NOs: 1 to SEQ ID NO: 1041963. Each entry in the immunological probe set is the SEQ ID NO of the probe in the immunological probe set found in the sequence listing. Thus, entry 888 of the immunological probe set is SEQ ID NO:888. turning to SEQ ID NO:888, it can be seen that the name of the sequence is probe ENSG00000002549_0. The base part ENSG00000002549 of probe ENSG00000002549_0 represents the gene ENSG00000002549 or LAP3 in the Ensembl version 101 database, i.e. the gene the probe ENSG00000002549 is designed to pull down during targeted sequencing. See, cunningham et al Ensembl 2019, pubMed PMID: 3049321, doi:10.1093/nar/gky1113, incorporated by reference, obtain details of the Ensembl database. Each sequence in the sequence listing provides the name of the entry in the Ensembl database for the corresponding gene for which the probe was designed. For simplicity, in 18A-18D, when the immunological probe set comprises a contiguous range of sequences, for example SEQ ID NOs: 888-894, which is listed as 888-894, rather than listing the individual sequence listings (e.g., 888, 889, 890, …, 894). However, the immunological probe set includes each probe within each range provided in fig. 18A-18D. Genetic targets in the immunological probe set are immunologically relevant. See, e.g., thorsson et al, 2018, "immune profile of cancer (The Immune Landscape of Cancer)," Immunity 48 (4), pages 812-830, which are incorporated by reference. See also BD (T cells, immune response) and HTG (HTG EdgeSeq immunooncology assay), nanostring (pan cancer immunospectral analysis, adaptive immunity, innate immunity).

Example 8 sets of passages

FIGS. 19A-19D detail the sequence listing of each probe in the pathway set. The set of pathways represents SEQ ID NO:1 to SEQ ID NO: 1041963. Each entry in the pathway set is the SEQ ID NO of the probe in the pathway set found in the sequence listing. Thus, entry 1 of the pathway set is SEQ ID NO:1. turning to SEQ ID NO:1, it can be seen that the name of the sequence is probe ENSG00000000003_0. The base portion ENSG00000000003 of probe ENSG00000000003_0 represents the gene ENSG00000000003 or TSPAN6 in the Ensembl version 101 database, i.e., the gene that probe ENSG00000000003 is designed to pull down during targeted sequencing. See, cunningham et al Ensembl 2019, pubMed PMID: 3049321, doi:10.1093/nar/gky1113, incorporated by reference, obtain details of the Ensembl database. Each sequence in the sequence listing provides the name of the entry in the Ensembl database for the corresponding gene for which the probe was designed. For simplicity, in 19A-19D, when the set of pathways comprises a contiguous range of sequences, for example SEQ ID NO:1-37, which are listed as 1-37, rather than listing each sequence list (e.g., 1, 2, 3, …, 37). However, the set of channels includes probes within the ranges provided in FIGS. 19A-19D. Genetic targets in the pathway set are associated with biological pathways. See, e.g., behan et al, 2019, "priority for screening cancer therapeutic targets using CRISPR-Cas9 (Prioritization of cancer therapeutic targets using CRISPR-Cas9 screens), nature 568, pages 511-516; sanchez-Vega et al, 2018, "oncogenic signaling pathway in cancer genomic map (Oncogenic Signaling Pathways in The Cancer Genome Atlas)," Cell 173 (2), pages 321-337; and Fang et al 2019, "Genetics dominant approach defines a drug target profile of 30 immune-related traits (a Genetics-led approach defines the drug target landscape of 30 immune-related tracks)," Nature Genetics 51 (7); pages 1082-1091, each of which is incorporated by reference.

EXAMPLE 9 neurones

FIGS. 20A-20D detail the sequence listing of each probe in the nerve group. The nerve group represents SEQ ID NO:1 to SEQ ID NO: 1041963. Each entry in the neurone is the SEQ ID NO of the probe in the neurone found in the sequence listing. Thus, entry 632 of the nerve group is SEQ ID NO:632. turning to SEQ ID NO:632, the name of the sequence can be seen to be probe ENSG00000001626_0. The base part ENSG00000001626 of probe ENSG00000001626_0 represents the gene ENSG00000001626 or CFTR in the Ensembl version 101 database, i.e. the gene that this probe ENSG00000001626 is designed to pull down during targeted sequencing. See, cunningham et al Ensembl 2019, pubMed PMID: 3049321, doi:10.1093/nar/gky1113, incorporated by reference, obtain details of the Ensembl database. Each sequence in the sequence listing provides the name of the entry in the Ensembl database for the corresponding gene for which the probe was designed. For simplicity, in FIGS. 20A-20D, when the nerve group includes a contiguous range of sequences, for example, SEQ ID NO:632-705, which are listed as 632-705, rather than listing each sequence list (e.g., 632, 633, 634, …, 705). However, the nerve group includes probes within the ranges provided in FIGS. 20A-20D. Genetic targets in the neurogroup are related to neurobiology.

Genes represented in the nerve group are ABAT, ABCA1, ABCA7, ABCB1, ABCG2, ABI3, ABL1, ACA 1, ACTG1, ACTN1, ACVRL1, ADAM10, ADAM15, ADAM17, ADRB1, ADCY5, ADCY8, ADCY9, ADCYAP1, ADGRG1, ADM, ADORA1, ADORA 22 21, ADRB2, 1, AIF1, AK3, AKT1S1, AKT2, AKT3, ALB, ALDH1A1, ALDH1L1, ALDH7A1, ALK, ALOX12, ALS2, AMIGO1, 2, ANO3, ANXA11, AP1S1, AP2A2, AP2B1, AP3M 2AP 3S1, AP4S1, APAF1, APBB1, APC, APEX1, APOA1, AQP4, 44, ARHGEF10, ARMC10, ARRB2, 1, ASCL2, 4, ATF6, ATG5, ATM, ATP13A2, ATP1A3, ATP2B3, ATP6AP2, ATP6V0 6V0D1, ATP6V0D1 ATP6V0E1, ATP6V0E2, ATP6V1 6V1E1, ATP6V1G2, ATP6V 17A 2, ATXN3, ATXN7, AVP, B4GALT6, BACE1, BACE2, BAD, BAI1, BAX, BCAS1, BCAS2, BCHE, BCL2L1, BDNF, BECN1, BID BIN1, BIRC5, BMI1, BNIP3, BRAF, BRMS 12, C21ORF2, C3, C5, C6, C9ORF72, CA1, CA2, CAB39, CACNA 11 11 11 11 12, CANB 4, CADM3, CADPS, CALB1, CALB2, CALCA, CALM1, CALML5, CAMK 22 24, CAPN1, CASK, CASP1, CASP3, CASP6, CASP7, CASP8, CASP9, CASS4, CAST, CAV1, 2, CCL5, CCND1, CCNH, CCR2, CCR5, CCS, CD11, CD163, CD2AP, CD31, CD33, CD34, CD39, CD4, CD40, CD44, CD68, CD8, CD9 CDC27, CDC40, CDC6, CDH1, CDH2, CDK5R1, CDK5RAP2, CDK5RAP3, CDK7, CDKL5, CDKN 11 21, CELF1, CEND1, CENPJ, CEP135, CEP41, CER1, CERS2, CERS4, CERS6, 10, CHCHD2, CHD4, CHGA, CHI3L1, CHL1, CHMP 12, CHRM3, CHRM5, CHRNA1, CHRNA3, CHRNA4, CHRNA5, CHRNA7, CHRNB2, CKB, CLCN6, CLDN15, CLDN5, CLN3, CLN5, CLN8, CLU, CNR2, CNP, CNR1, CNR2, device for detecting and controlling temperature of water, 1 3-DRA 1, HLA-DRB1, HLA-a, HMGB1, HMOX1, HNRNPA2B1, HNRNPM, HOMER1, HOXA2, HPCAL1, 3, HSPA6, HSPB1, HTR 12 22 34, HTR5 2, HTT, ICAM1, ICAM2, IDE, IDH1, IDH2, IDO1, IFNG, IGF 12, IGFBP2, IKBKB, IL10RA, IL13RA1, IL15RA, IL18, IL1R1, IL1RN, IL2, IL 46, IL6 4 5 5 5, IRF8 ISLR2, ITGA5, ITGA7, 3, ITM 21, ITPR2, ITPR3, JAM3, JUN, KATNA1, KCNA1, KCNB1, KCNC3, KCND3, KCNJ10, KCNK9, KCNMA1, KCNN3, KCNQ2, KCNQ3, KDELR2, KDR, KEAP1, KEL, KIAA1161, KIF 35 6, KNL1, KRAS, L1CAM, LAMA2, LAMB2, LAMP1, LARGE1, LCLAT1, 3, LGI1, LIF, LINGO1, LMNA, LMNB1, LOX, LPAR1, LPL LRP1, LRRC25, LRRC4, LRRC61, LRRK2, LSM7, 1, MAG, MAGEE1, MAL, MAN2B1, MAP2K1, MAP2K2, MAPK1, MAPK10, MAPK3, MAPK8, MAPK9, MAPKAPK2, MARK4, ENSG, MBP, MDM2, MEAF6, MECP2, MED10, MEF2, MFSD8, 67, MKS1, MME, MMP12, MMP14, MMP16, MMP19, MMP2, MMP24, MMP3, MMP9, MMRN2, MMP3, MMP9, MMP MNAT1, 4A6, MSI1, MSN, MT-ATP6, MT-ND1, MT1, MTA2, 1, MYD88, MYH10, 1, NCF1, NEK1, NELFA, NELL2, NEO1, NFE2L2, NFKB1, 2, NKX2-1, NKX6-2, NLGN4 3, NMB, NME5, NME8, NMNAT2, NOG, NOL3, NOS1, NOS2, NOS3, 1, NOTCH2, NOTCH3, NOTCH4, NOVA1, NPAS4, NPC1, NPC2, NPHP1, NPPB, NPR1, NR3C2, NR4A2, NRG1, NRP2, NRXN1, NSF, NTF3, NTN1, NTNG1, NTRK2, NTRK3, NTS, NWD1, OCLN, OLFM3, OLIG2, OLR1, ONECUT2, OPA1, OPHN1, OPRK1, OPRM1, OPTN, ORC4, ORC6, OSMR, OTUD4, OXR1, OXTRR, P2RX4, P2RX7, P2RY12, P4HA1, PADI2, PAFAH1B1, PAH, PAK1, PALM, PANK2, PARK5, PARK7, PARP1, PAX2, PAX3, PAX6, PCDH19, PARK 22, PDE 14 1, PECAM1, PFN1, PGAM1, PGK1, PHF19, PHF2, PHF21 3CA, PIK3CB, PIK3R1, PINK1, PKN1, PLA2G16, PLA2G 4G 6, 1, PLCB2, PLCB3, PLCB4 PLCG2, PLCL2, PLEKHG4, PLEKHO2, PLLP, PLOD2, PLP1, PLS1, PLXNB3, PLXNC1, PMCH, PMP22, PNKD, POC 11, POLG, POLR 22 22 21, POU5F1, PPARG, PPARGC1 1R 12 CA, PPP2R5 CA, PPP3CB PPP3CC, PPT1, PQBP1, PRF1, PRKAA2, 11, PRPF3, PRPF31, PRPH, PRRT2, PSEN1, PSEN2, PSMB8, PSMB9, PTDSS1, PTDSS2, PTK2, QKI, RAB2 38, RAB39 3 3L1, RABGEF1, RAC1, PTDSS2, PTK2, QKI, RAB38, RAB39 3L1, RABGEF1, RAC1, RAG 1, PSK 2, PSP RAD23 1, RAN, RAGEF 2, RARS2, RASGRP1, RB1, RBBP8, RBFOX3, RCAN1, RIN3, RING1, RIT2, RNF216, ROBO1, RPL39 25, RRAS, RTN4, RYR1, RYR2, RYR3, S100 1, SART1, SCAMP2, SCARB2, RABO 1, RPL39 25, RRAS, RYR2, RYR3, S100 1, SART1, SCAMP2, SCARB2, RABO 1, RAGG 2, RASGR 1, RASGR 2, RA SCN 12 5 8 9 3 6, SERPINE1, SERPINI1, SETX, SF3A2, SF3B4, SGK1, SGPL1, SGTA, SH3TC2, SHANK2, SHH, sigma R1, SIRT2, SIRT7, SIX3, SLA, SLC11A1, SLC12A5, SLC17A6, SLC17A7, SLC18A2, SLC18A3, SLC1A1, SLC1A2, SLC1A3, SLC24A4, SLC25a19, SLC2A1, SLC2A3, SLC32A1, SLC4a10, SLC6A3, SLC6A4, SLC6A6, SLC8A1, SLC9A6, SLIT1, SLIT2, SLIT3, SLU7, SMARCB1, SMN1, SMPD4, SMYD1, SNAP91, SNCA, SNCAIP, SNCB, SNRPA, SOD1, SOD2, SORCS3, SORL1, SOX10, SOX2, SOX9, SP1, SP100, SPAG4, SPARC, SPAST, SPG, SPI1, SPP1, SPTBN2, SQSTM1, SRC, SRGAP1, SRGAP2, SRGAP3, SRI, SRSF4, SRY, SST, ST3GAL3, ST3GAL5, starb 1, starbpl 1, STAT3, STC1 STC2, STIL, STX1A, STX1B, STX2, SUCLA2, SYN1, SYNJ1, SYP, SYT1, SYT13, SYT4, SYT7, TAC1, TACR1, TAF10, TAF4B, TAF6L, TAF9, TARDBP, TAU, TAZ, TBC D24, TBK1, TBP, TBPL1, TBR1, TCERG1, TCIRG1, TENM2, TERT, TF, TFAM, TGFB1, TGFB2, TGFBR2, TGIF1, TGM2, TH, THAP1, THY1, TIA1, TIE1 TLR2, TLR3, TLR4, TMEM106B, TMEM119, TMEM216, TMEM230, TMEM237, TMEM67, B, TMEM 10B, TMEM 10B, TMEM 10B, TMEM 11B, TMEM 12B, TMEM 1B, TMEM10, TNNI3, TNR, TOMM40, TMEM67, TMEM TOR 1B, TMEM, TP53INP1, TP73, TPH1, TPH2, TPM1, TPP1, TRADD, TREM1, TREM2, TRIM28, TRIM37, TRIP4, TRPA1, TRPM2, TRPV1, TRPV4, TRPV1, TREM2 TSC1, TSC2, TSEN34, TSPAN13, TSPO, TUBA 4B, TMEM 8, TUBB3, B, TMEM1, TYROBP, U2AF2, UBB, UBE 2B, TMEM 2B, TMEM 3B, TMEM1, UBQLN2, UCHL1, UGCG, UGT8, UNC 13B, TMEM21, USP30, B, TMEM 13B, TMEM 13B, TMEM 35, WDR62, WFDC2, WFS1, WNT 10B, TMEM6, WT1, XAB2, XBP1, B, TMEM CL, ZCWPW1, ZEB2, ZIC2 and ZNF24.

Claims

1. A method for identifying an abundance and location of an analyte in a biological sample, the method comprising:

(a) Contacting a plurality of nucleic acids with a plurality of decoy oligonucleotides, wherein

The plurality of nucleic acids comprises an extended nucleic acid comprising (i) a spatial barcode or complement thereof, and (ii) all or part of the sequence of the analyte, analyte derivative, or complement thereof; and

the decoy oligonucleotides of the plurality of decoy oligonucleotides comprise:

(i) A capture domain that hybridizes to all or a portion of the sequence of the analyte, analyte derivative, or complement thereof, and

(ii) A molecular tag;

(b) Capturing decoy oligonucleotides bound to the extended nucleic acids using a substrate comprising an agent bound to the molecular tag; and

(c) Determining (i) all or part of the sequence of the spatial barcode or its complement, and (ii) all or part of the sequence of the extended nucleic acid, and using the sequences determined in (i) and (ii) to identify the abundance and location of the analyte in the biological sample.

2. The method of claim 1, wherein the analyte is a nucleic acid.

3. The method of claim 2, wherein the method further comprises generating a plurality of nucleic acids comprising:

(a) Contacting the biological sample with a substrate comprising a plurality of attached capture probes, wherein the capture probes of the plurality of capture probes comprise (i) a spatial barcode and (ii) a capture domain that binds to a sequence present in the nucleic acid;

(b) Hybridizing the capture probes to nucleic acids;

(c) Extending the 3' end of the capture probe using the nucleic acid bound to the capture domain as a template to generate an extended capture probe; and

(d) Amplifying the extended capture probe to produce an extended nucleic acid.

4. The method of claim 3, wherein the extended nucleic acid is released from the extended capture probe.

5. The method of claim 1, wherein the analyte is a protein.

6. The method of claim 1, wherein the analyte derivative is an oligonucleotide comprising an analyte binding moiety barcode, and an analyte capture sequence.

7. The method of claim 6, wherein the method further comprises generating a plurality of nucleic acids, the method comprising:

(a) Contacting a plurality of analyte capture agents with the biological sample, wherein:

the analyte capture agents of the plurality of analyte capture agents comprise (i) an analyte binding moiety that binds to a protein and (ii) an oligonucleotide comprising an analyte binding moiety barcode and an analyte capture sequence;

(b) Contacting the plurality of analyte capture agents with a substrate comprising a plurality of capture probes, wherein the capture probes of the plurality of capture probes comprise a spatial barcode and a capture domain, wherein the capture domain binds to an analyte capture sequence;

(c) Extending the 3' end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe; and

(d) Amplifying the extended capture probe to produce an extended nucleic acid.

8. The method of claim 7, wherein the extended nucleic acid is released from the extended capture probe.

9. The method of claim 7, wherein step (a) and step (b) are performed substantially simultaneously.

10. A method for enriching a biological sample for an analyte or analyte derivative, the method comprising:

the decoy oligonucleotides of the plurality of decoy oligonucleotides comprise:

(ii) A molecular tag;

(b) Capturing a complex of decoy oligonucleotides bound to an extended nucleic acid using a substrate comprising an agent bound to the molecular tag; and

(c) Isolating the complex of decoy oligonucleotides bound to the extended nucleic acids, thereby enriching the biological sample for the analyte or analyte derivative.

11. The method of claim 10, wherein the analyte is a nucleic acid.

12. The method of claim 11, wherein the method further comprises generating a plurality of nucleic acids comprising:

(b) Hybridizing the capture probes to nucleic acids;

(d) Amplifying the extended capture probe to produce an extended nucleic acid.

13. The method of claim 12, wherein the extended nucleic acid is released from the extended capture probe.

14. The method of claim 10, wherein the analyte is a protein.

15. The method of claim 10 or 14, wherein the analyte derivative is an oligonucleotide comprising an analyte binding moiety barcode, and an analyte capture sequence.

16. The method of claim 15, wherein the method further comprises generating a plurality of nucleic acids, the method comprising:

(d) Amplifying the extended capture probe to produce an extended nucleic acid.

17. The method of claim 16, wherein the extended nucleic acid is released from the extended capture probe.

18. The method of claim 16, wherein step (a) and step (b) are performed substantially simultaneously.

19. The method of any one of claims 1-18, wherein the analyte from the biological sample is associated with a disease or disorder.

20. The method of any one of claims 1-19, wherein the capture domain of the decoy oligonucleotide binds to a 3 'portion, a 5' portion, an intron, an exon, a 3 'untranslated region, or a 5' untranslated region of the sequence of the analyte, analyte derivative, or complement thereof.

21. The method of any one of claims 1-20, wherein the capture domain of the decoy oligonucleotide comprises a total of about 10 nucleotides to about 300 nucleotides.

22. The method of any one of claims 1-21, wherein the molecular tag comprises a protein, a small molecule, a nucleic acid, or a carbohydrate.

23. The method of any one of claims 1-21, wherein the molecular tag is streptavidin, avidin, biotin, or a fluorophore.

24. The method of any one of claims 1-23, wherein the agent that binds to a molecular tag comprises a protein (e.g., an antibody), a nucleic acid, or a small molecule.

25. The method of any one of claims 1-24, wherein the molecular tag is biotin and the agent bound to the molecular tag is avidin or streptavidin.

26. The method of any one of claims 1-25, wherein the agent that specifically binds to a molecular tag is attached to the substrate.

27. The method of claim 26, wherein the substrate is a bead, a slide, or a well.

28. The method of any one of claims 1-27, wherein the extended nucleic acid is a DNA molecule (e.g., a cDNA molecule).

29. The method of any one of claims 1-28, wherein the extended nucleic acid further comprises a primer sequence or complement thereof; a unique molecular sequence or its complement; or other primer binding sequences or complements thereof.

30. The method of any one of claims 1-29, wherein the biological sample is a tissue sample selected from a formalin fixed, paraffin embedded (FFPE) tissue sample or a frozen tissue sample.

31. The method of any one of claims 1-30, wherein the biological sample is pre-stained with a detectable label, hematoxylin and eosin (H & E) dye, immunofluorescence, or immunohistochemistry.

32. The method of any one of claims 1-31, wherein the biological sample is a permeabilized biological sample.

33. The method of any one of claims 1-9 and 19-32, wherein the determining step comprises sequencing (i) all or part of the sequence of the spatial barcode or its complement and (ii) all or part of the sequence of nucleic acid from the biological sample.

34. The method of claim 33, wherein the sequencing is high throughput sequencing.

35. The method of any one of claims 1-34, wherein the analyte is deregulated or differentially expressed in cancer cells, immune cells, cell signaling pathways, or neural cells.

36. A method for identifying nucleic acid abundance and location in a biological sample, the method comprising:

(b) Hybridizing the capture probes to nucleic acids;

(d) Amplifying the extended capture probes to produce extended nucleic acids; wherein the extended nucleic acid comprises all or part of the sequence of (i) a spatial barcode or complement thereof and (ii) a nucleic acid or complement thereof;

(e) Releasing the extended nucleic acid from the extended capture probe;

(f) Contacting a plurality of released nucleic acids with a plurality of decoy oligonucleotides, the released nucleic acids comprising the extended nucleic acids from step (e), wherein

The decoy oligonucleotides of the plurality of decoy oligonucleotides comprise:

(i) A capture domain that hybridizes to all or a portion of the sequence of the nucleic acid or complement thereof, and

(ii) A molecular tag;

(g) Capturing decoy oligonucleotides bound to the extended nucleic acids using a substrate comprising an agent bound to the molecular tag; and

(h) Determining (i) all or part of the sequence of the spatial barcode or its complement, and (ii) all or part of the sequence of the extended nucleic acid, and using the sequences determined in (i) and (ii) to identify the abundance and location of nucleic acid in the biological sample.

37. A method for identifying protein abundance and location in a biological sample, the method comprising:

(c) Extending the 3' end of the capture probe using the analyte binding moiety barcode as a template to generate an extended capture probe;

(d) Amplifying the extended capture probes to produce extended nucleic acids; wherein the extended nucleic acid comprises all or part of the sequence of (i) a spatial barcode or complement thereof and (ii) an oligonucleotide or complement thereof;

(e) Releasing the extended nucleic acid from the extended capture probe;

The decoy oligonucleotides of the plurality of decoy oligonucleotides comprise:

(i) A capture domain that hybridizes to all or a portion of the nucleotide or complement thereof, and

(ii) A molecular tag;

(h) Determining (i) all or part of the sequence of the spatial barcode or its complement, and (ii) all or part of the sequence of the extended nucleic acid, and using the sequences determined in (i) and (ii) to identify the abundance and location of the protein in the biological sample.

38. A composition comprising a decoy oligonucleotide and an extended nucleic acid, wherein the decoy oligonucleotide is bound to the extended nucleic acid,

wherein the extended nucleic acid comprises (i) a spatial barcode or complement thereof, and (ii) all or part of the sequence of the analyte, analyte derivative, or complement thereof; and

wherein the decoy oligonucleotide comprises a molecular tag, wherein the molecular tag is selected from the group consisting of streptavidin, avidin, biotin, or a fluorophore.

39. The composition of claim 38, wherein the decoy oligonucleotide binds to the extended nucleic acid through a capture domain that hybridizes to all or a portion of the sequence of the analyte, analyte derivative, or complement thereof.

40. The composition of claim 38 or 39, comprising an agent that binds to a molecular tag.

41. The composition of claim 40, wherein the agent that binds to a molecular tag comprises a protein (e.g., an antibody), a nucleic acid, or a small molecule.

42. The composition of claim 40, wherein the molecular tag is biotin and the agent that binds to the molecular tag is avidin or streptavidin.

43. The composition of any one of claims 38-42, further comprising a substrate, wherein the agent that specifically binds to a molecular tag is attached to the substrate.

44. The composition of claim 43, wherein the substrate is a bead, well or slide.

45. A kit, comprising:

an array comprising a plurality of capture probes, wherein the capture probes of the plurality of capture probes comprise a spatial barcode and a capture domain;

a plurality of decoy oligonucleotides; and

instructions for carrying out the method of any one of claims 1-4, 10-13 and 19-36.

46. A kit, comprising:

a plurality of analyte capture agents, wherein the analyte capture agents of the plurality of analyte capture agents comprise an analyte binding moiety, an analyte binding moiety barcode, and an analyte capture sequence;

A plurality of decoy oligonucleotides; and

instructions for carrying out the method of any one of claims 1, 5-9, 10, 14-35 and 37.

47. The kit of claim 45 or 46, further comprising reagents and/or enzymes for performing the method.