WO2024026376A2

WO2024026376A2 - Methods and systems for multiomic analysis

Info

Publication number: WO2024026376A2
Application number: PCT/US2023/071068
Authority: WO
Inventors: Jay A.A. West; Jon Stanley ZAWISTOWSKI; Victor WEIGMAN
Original assignee: BioSkryb Genomics, Inc.
Priority date: 2022-07-27
Filing date: 2023-07-26
Publication date: 2024-02-01
Also published as: WO2024026376A3

Abstract

The present disclosure provides methods and systems for performing experiments and computational methods for generating, analyzing, and using multi-omics data and leveraging such multiomics data and computational analysis for applications such as identifying biomarkers, diagnostics, prognostics, drug and vaccine discovery and development, personalized and precision medicine, and any combination thereof. In some aspects, a correlation between genomics data and transcriptomics/proteomics data are used to determine the effects of a genetic event on a transcriptomics/proteomics effect and/or the effect of a genomics event in development of the course of a disease. Such information and analyses are then used for the aforementioned applications.

Description

METHODS AND SYSTEMS FOR MULTIOMIC ANALYSIS

CROSS-REFERENCE

[0001] This application claims the benefit of U.S. Provisional Application No. 63/392,580, filed on July 27, 2022, which is incorporated herein by reference in its entirety and for all purposes.

BACKGROUND

[0002] The present disclosure is generally related to the fields of genomics, transcriptomics, and bioinformatics and high-throughput single cell analysis. High-throughput single cell analysis can provide extensive and valuable information about a subject (e.g., a human patient) or a population which can be used to make informed decisions regarding health-related matters. Such methods and systems may have vast applications in diagnostics, prognostics, personalized and precision medicine, drug design, discovery', and development.

SUMMARY

[0003] There is an unmet need for comprehensive and effective approaches to generate one or more datasets including genomics, transcriptomics, proteomics, and methylomics. Using such to identify correlations (in some cases direct correlations) therebetween, such as to diagnose patients, identify biomarkers, design therapeutics or vaccines, prescribe medications, and/or implement individualized/personalized medicine approaches. In some aspects, provided herein is a comprehensive approach comprising elements of high-throughput single cell analysis, genomics, transcriptomics, proteomics, bioinformatics, software engineenng, and data analysis for generating and analyzing data sets that have vast applications for identifying disease biomarkers, diagnosing patients, and designing drugs or vaccines. Provided are also methods for conducting such biomarker identifications, diagnosis, prognosis, and drug design.

[0004] In an aspect, provided herein is a method of single cell analysis comprising: (a) providing or obtaining a plurality of cells; (b) performing one or more experiments on single cells of the plurality' of cells to generate at least a first data set and a second data set from the plurality' of cells, wherein the first data set is a genomic data set and the second data set is a transcriptomic data set and/or a proteomic data set; (c) identifying a correlation between the first data set and the second data set for at least a portion of the plurality of cells; and (d) using the correlation obtained in (c), identifying a disease biomarker, designing a therapeutic, or designing a vaccine for a disease. [0005] In some embodiments, performing the one or more experiments comprises performing primary template directed amplification (PTA). In some embodiments, the one or more experiments or screens comprise a genomics experiment, a transcriptomic experiment, a proteomics experiment, or any combination thereof. In some embodiments, the one or more experiments comprise high-throughput single cell analysis, wherein single cells of the plurality of cells are screened in high-throughput. In some embodiments, the one or more experiments are performed using a miniaturized high-throughput single cell screening system. In some embodiments, the method comprises compartmentalizing the plurality of cells into a plurality of partitions, a partition of the plurality of partitions comprises a single cell of the plurality of cells. In some embodiments, the plurality of partitions comprises a plurality of wells, a plurality of droplets, or both. In some embodiments, the wells are miniaturized wells. In some embodiments, the miniaturized high-throughput single cell screening system comprises a microfluidic device, a miniaturized array, or both.

[0006] In some embodiments, the one or more experiments comprise performing one or more reactions. In some embodiments, a partition of the plurality of partitions comprises a single cell therein, and the one or more experiments or screens comprise performing one or more reactions on the single cell in the partition. In some embodiments, the one or more reactions comprise cell lysis. In some embodiments, the one or more reactions comprise an amplification reaction. In some embodiments, the amplification reaction comprises primary template directed amplification (PTA).

[0007] In some embodiments, the one or more reactions comprise lysmg the single cell, extracting the molecular information of the single cell, thereby releasing a cellular nucleic acids, proteins, lipids, and metabolites from the single cell in the partition, and performing an amplification reaction on the cellular nucleic acid molecule.

[0008] In some embodiments, performing the one or more reactions comprises using one or more reagents. In some embodiments, the one or more reagent(s) comprise one or more of at least one amplification primer, at least one nucleic acid polymerase, and a mixture of nucleotides, wherein the mixture of nucleotides comprises at least one terminator nucleotide which terminates nucleic acid replication by the polymerase.

[0009] In some embodiments, the terminator nucleotide is an irreversible terminator. In some embodiments, the terminator nucleotide is selected from the group consisting of nucleotides with modification to the alpha group, C3 spacer nucleotides, locked nucleic acids (LNA), inverted nucleic acids, 2' fluoro nucleotides, 3' phosphorylated nucleotides, 2'-O-Methyl modified nucleotides, and trans nucleic acids. In some embodiments, the nucleotides with modification to the alpha group are alpha-thio dideoxynucleotides. In some embodiments, the terminator nucleotide comprises modifications of the r group of the 3’ carbon of the deoxyribose. In some embodiments, the terminator nucleotide is selected from the group consisting of dideoxynucleotides, inverted dideoxynucleotides, 3' biotinylated nucleotides, 3' amino nucleotides, 3’-phosphorylated nucleotides, 3'-O-methyl nucleotides, 3' carbon spacer nucleotides including 3' C3 spacer nucleotides, 3' C18 nucleotides, 3' Hexanediol spacer nucleotides, acyclonucleotides, and combinations thereof.

[0010] In some embodiments, a partition of the plurality of partitions comprises at least a single cell and a bead. In some embodiments, the bead delivers a reagent for performing a reaction on the single cell in the partition. In some embodiments, the reagent is bound to the bead via a cleavable linker and is configured to be released from the bead via cleavage of the cleavable linker. In some embodiments, the reagent comprises a barcode configured to identify the cell or a constituent of the cell. In some embodiments, the bead can envelop the entire cell to enable chemical reactions at a miniaturized scale.

[0011] In some embodiments, the constituent of the cell comprises genomic material of the cell, ribonucleic acid (RNA), deoxyribonucleic acid (DNA), or any combination thereof. In some embodiments, the method comprises lysing the cell in the partition, releasing a cellular nucleic acid molecule of the cell in the partition, releasing the barcode from the bead via cleavage of the cleavable linker, and hybridizing the cellular nucleic acid molecule to the barcode. In some embodiments, the one or more reactions comprise lysing the single cell, thereby releasing cellular nucleic acid molecules in the partition, performing one or more amplification reactions on the cellular nucleic acid molecules thereby generating amplified cellular nucleic acid molecules, and wherein the method further comprises extracting the amplified cellular nucleic acid molecules from the partition, and sequencing the amplified cellular nucleic acid molecules.

[0012] In some embodiments, generating the first data set comprises performing primary template directed amplification (PTA) and generating the second data set comprises performing a reverse transcription reaction. In some embodiments, performing the reverse transcription reaction comprises generating a cDNA library. In some embodiments, generating the first data set comprises determining a methylation site in a cellular nucleic acid molecule using PTA, thereby generating a methylation library. In some embodiments, the method further comprises comparing the methylation library to a reference library for a single cell of the plurality of cells, wherein the methylation library and the reference library are generated from the same cell.

[0013] In some embodiments, identifying the correlation comprises calculating or assigning a penetrance score to the correlation of these molecular data (biomarkers), wherein the penetrance score quantifies the correlation. In some embodiments, the penetrance score guides identifying the disease biomarker, identifying collection of biomarkers which may comprise one or more of the multiomic modalities, designing the therapeutic, designing the vaccine for the disease, or any combination thereof. In some embodiments, a high penetrance score indicates a strong correlation between the first data set and the second data set. In some embodiments, the high penetrance score indicates that the expression of a gene identified in the first data set leads to a transcriptomic event, a proteomic event or both, and wherein the gene is identified as a disease biomarker. In some embodiments, a low penetrance score indicates a weak correlation between the first data set and the second data set, and that the expression of a gene identified in the first data set does not lead to a transcriptomic event, a proteomic event, or either, and wherein the gene is not identified as a disease biomarker.

[0014] In some embodiments, identifying the correlation is performed with the aid of a computer system comprising a computer program. In some embodiments, the computer program compnses one or more bioinformatics algorithms or workflows. In some embodiments, the first data set and the second data set are combined or integrated into a database with or without links to related datasets independently generated across the research community.

[0015] In some aspects, described herein is a system for determining a penetrance score comprising: a computing system comprising at least one processor and instructions executable by the at least one processor to provide an application configured to perform operations comprising: receiving multiomics data from one or more sources and at least one biological state; and applying an algorithm configured to process the data and generate a penetrance score. In some embodiments, the computing system comprises a cloud computing platform. In some embodiments, the multiomics data comprises data obtained from analysis of one or more of genomic DNA, transcript RNA, proteins, lipids, or metabolites.

[0016] In some embodiments, the correlation is quantified by a penetrance score. In some embodiments, the penetrance score is at least 0.5. In some embodiments, the penetrance score is at least 0.9.

[0017] In an aspect, provided herein is a method of developing a treatment for a disease, wherein the method comprises: (a) generating multiomics data from one or more single cells, wherein generating comprises performing Primary Template Directed Amplification (PTA), and wherein the multiomics data comprises two or more of genome data, transcriptome data, and proteomics data; (b) correlating one or more mutations in genome data with corresponding mutations in one or both of (i) an mRNA of the transcriptome data and (ii) a protein of the proteome data; and (c) generating a treatment targeting one or both of the mRNA and the protein, thereby developing the treatment for the disease. In some embodiments, the disease comprises or is cancer.

[0018] In some embodiments, the treatment comprises an mRNA vaccine. In some embodiments, the treatment comprises reprogramming a dendritic cell to target one or both of the mRNA or protein. In some embodiments, the mutation in genome data comprises a DNA mutation. In some embodiments, the DNA mutation is selected from the group consisting of SNV*X, CNV*X, translocation, IND EL, frameshift, stop codon, mitochondrial, promoter/enhancer, TCR/BCR, and other change. In some embodiments, the mRNA comprises a transcript change. In some embodiments, the transcript change is selected from the group consisting of expression, splice variant, fusion, IncRNA, miRNA, TCR/BCR, promoter, truncated gene, mitochondrial, or mutation.

[0019] In some embodiments, the protein comprises a protein change. In some embodiments, the protein change is selected from the group consisting of over/under expressed, truncated, surface bound, frameshift, misfolded, metabolic, ligand independence, confirmation, activity change, or fused.

[0020] In some embodiments, the multiomics data comprises one or more measurements. In some embodiments, one or more of the measurements is a silent change. In some embodiments, the multiomics data comprises data from one or more of a genome, a transcriptome, a proteome, a metabolome, a lipidome, or an epigenome. In some embodiments, the multiomics data comprises data from a genome. In some embodiments, the one or more measurements are selected from the group consisting of: copy number variation, translocation, and mutation burden.

[0021] In some embodiments, the disease comprises cancer. In some embodiments, cancer comprises breast cancer. In some embodiments, the breast cancer comprises ductal carcinoma. In some embodiments, the cancer comprises leukemia. In some embodiments, the single cells (e g., single cancer cells) are obtained from an FFPE sample.

[0022] In some embodiments, the multiomics data comprises data from a methylome. In some embodiments, the one or more measurements are selected from the group consisting of: methylation at CpG sites, gene activation, and gene repression. In some embodiments, the multiomics data comprises data from a transcriptome. In some embodiments, the one or more measurements are selected from the group consisting of: expressed genes, gene fusions, and splice variants.

[0023] In some embodiments, the multiomics data comprises data from a proteome. In some embodiments, the one or more measurements are selected from the group consisting of: translation level, phosphorylation state, and protein modification. In some embodiments, the one or more sources comprise an individual organism. In some embodiments, the one or more sources comprise cells. In some embodiments, the cells are mammalian cells, human cells, bacterial cells, cancer cells, an immortalized cell line, a primary patient cell line, or any combination thereof. In some embodiments, the cells are obtained from a tissue. In some embodiments, the cells are obtained from a tissue cross-section. In some embodiments, the biological state comprises a disease state. In some embodiments, the disease state comprises cancer.

[0024] In some embodiments, the algorithm further generates a mechanism based on the data. In some embodiments, the mechanism is generated by detecting one or more changes in one or measurements.

[0025] In some embodiments, the change comprises a genome DNA change. In some embodiments, the genome DNA change is selected from the group consisting of SNV*X, CNV*X, translocation, INDEL, frameshift, stop codon, mitochondrial, promoter/enhancer, TCR/BCR, and other change. In some embodiments, the change comprises a transcript change. In some embodiments, the transcript change is selected from the group consisting of expression, splice variant, fusion, IncRNA, miRNA, TCR/BCR, promoter, truncated gene, mitochondrial, or mutation. In some embodiments, the change comprises a genome change. In some embodiments, the protein change is selected from the group consisting of over/under expressed, truncated, surface bound, frameshift, misfolded, metabolic, ligand independence, confirmation, activity change, or fused. In some embodiments, the mechanism is determined to be one or more of a genomic, transcriptomic, proteomic, lipidomic, or metabolomic mechanism.

[0026] In some aspects, described herein is a method for validating a disease target for a disease comprising (a) selecting cells from a tissue; (b) banking the cells; (c) performing one or more multiomic methods on the cells to generate multiomics data; and (d) applying a computer algorithm to process the multiomics data and generate a disease target. In some embodiments, selecting the cells comprises FACS sorting, microfluidics, spatial cell selection, or ultra-high throughput cell sorting. In some embodiments, the number of cells is at least about 200, 300, 400, 500, 1000, 2000, 3000, 4000, 5000, 6000, 10,000 or greater. In some embodiments, the disease is cancer. In some embodiments, the multiomics methods comprise PTA. In some embodiments, the multiomics data comprises data from one or more of a genome, epigenome, transcriptome, proteome, lipidome, or metabolome. In some embodiments, the method further comprises a treatment based on the disease target. In some embodiments, the treatment comprises an mRNA vaccine or small molecule.

[0027] In some embodiments, the method or system is capable of detecting a number of RNA variant per cell of at least 750, 1000, 1500, 2000, 2500 or higher. In some embodiments, the method or system is capable of detecting a number of genes per cell of from about 1000 to about 8000. In some embodiments, the method or system is capable of detecting a number of RNA variant per cell of at least 750, 1000, 1500, 2000, 2500 or higher and a number of genes per cell of from about 1000 to about 8000.

[0028] In some embodiments, the methods comprise full length synthesis of RNA transcriptsin the cell wherein a plurality of amplification products achieved from performing the method are substantially unbiased over a range of 5 ’-3’ gene body percentiles.

[0029] In some embodiments, the methods and systems of the present disclosure are capable of amplifying and detecting transcripts of at least 1 kb, 1.5 kb, 2kb, 2.5 kb, 3 kb, 3.5 kb, 4 kb, 4.5 kb, or longer. In some embodiments, these transcripts may consist of coding information from one or more genes and represent aberrations of splicing which can affect, but not limited to, transcript isoforms or gene fusion events.

INCORPORATION BY REFERENCE

[0030] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

[0031] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:

[0032] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:

[0033] FIG. 1 depicts a workflow comprising providing a sample, cell selection, and multiomic analysis (including genome, methylome, transcriptome, and proteome);

[0034] FIG. 2 depicts various multiomic modalities which contribute to penetrance score;

[0035] FIG. 3 depicts a workflow schematic of measuring penetrance score using multiomic analysis;

[0036] FIG. 4 depicts a list of various biological inquiries useful for multiomics measurements; [0037] FIG. 5 depicts another workflow schematic for the types of changes in multiomics measurements which in some instances is used for determining a mechanism;

[0038] FIG. 6 depicts a workflow schematic for spatially selecting cells from a frozen specimen, banking the cells, performing multiomic chemistry processes, providing multiomic data/measurements to a computational engine process, and validating targets;

[0039] FIG. 7 depicts a schematic of factors which in some instances dictate cell fate;

[0040] FIG. 8 schematically illustrating the various components and applications of the methods and systems of the present disclosure;

[0041] FIG. 9 depicts a workflow schematic for mammalian and bacterial multiomics analysis using the methods and systems of the present disclosure;

[0042] FIG. 10 schematically illustrates a workflow involving the computational components and systems of the present disclosure;

[0043] FIG. 11 depicts an example of a web/mobile application provision system; in this case, a system providing browser-based and/or native mobile user interfaces;

[0044] FIG. 12 depicts an example of a cloud-based web/mobile application provision system; in this case, a system comprising an elastically load balanced, auto-scaling web server and application server resources as well synchronously replicated databases;

[0045] FIG. 13 depicts change in cellular growth rates of MOLM-13 cell lines in the presence of the cancer drug quizartinib where resistant clones are thriving over genetically native cells;

[0046] FIG. 14A depicts a genomic view of allele variation FLT3 gene in resistant and parental strains;

[0047] FIG. 14B depicts CNV genomic data of resistant and parental strains;

[0048] FIG. 14C depicts karyoty pes of resistant and parental strains;

[0049] FIG. 14D depicts a principal component analysis of the transcriptomics data of parental and resistant cells;

[0050] FIG. 14E depicts a clustered heat map of transcriptomic data;

[0051] FIG. 14F depicts a mechanism for transcriptional bypass of FLT3 signaling in resistant cells;

[0052] FIGS. 14G-14H depict alternative exon utilization in transcriptional data;

[0053] FIG. 15A depicts a PCA of SNV data, showing discrimination between groups based on genomic variation;

[0054] FIG. 15B depicts clustered SNV data, showing groups of genomic positions with similar zygosity across biological groups;

[0055] FIG. 16A depicts SNV -gene expression interactions, highlighting specific mutations within genes associated to expression changes significant across biology groups; [0056] FIG. 16B depicts the location of a SNV in the MYC gene;

[0057] FIG. 16C depicts a plot of MYC gene expression and SNV genotype for the parental and resistant cells showing similar grouping of resistant cells with the signature;

[0058] FIG. 17 depicts H&E and a-ER staining of the primary cancer cells prior to sequencing; [0059] FIG. 18A depicts heterogeneity in CNV in primary breast cancer cells;

[0060] FIG. 18B depicts known CNV in DCIS;

[0061] FIG. 19 depicts SNV PIK3CA mutations detected in single cells derived from 3 separate patients;

[0062] FIG. 20 depicts SNV and CNV detected in single cells of a DCIS patient;

[0063] FIG. 21 depicts correlations between genomic and transcnptomic data;

[0064] FIGs. 22A-22C show experimental data generated using the methods and systems of the present disclosure (ResolveOME) and its comparison to droplet RNA sequencing demonstrating superior RNA performance with respect to enhanced gene body coverage, increased representation across transcript sizes, and robust variant calling;

[0065] FIG. 23A shows significant isoforms across parental or resistant clones of the MOLM- 13(transcript ‘A’ and ‘B’) from the same genes;

[0066] FIG. 23B shows transcripts that are significantly associated to changes in copy number ploidy across the genomes of MOLM-13 cells;

[0067] FIG 23C shows genomic variants of MOLM-13, in regulatory regions of the genome (depicted by color) that are also significantly associated to transcript changes across resistant cells.

DETAILED DESCRIPTION

[0068] There is an unmet need for comprehensive and effective approaches to generate one or more datasets including genomics, transcriptomics, proteomics, and methylomics, and identifying correlations therebetween, such as to diagnose patients, identify biomarkers, design therapeutics or vaccines, prescribe medications, and/or implement individualized/personalized medicine approaches. In some aspects, provided herein is a comprehensive approach comprising elements of high-throughput single cell analysis, genomics, transcriptomics, proteomics, bioinformatics, software engineering, and data analysis for generating and analyzing data sets that have vast applications for identifying disease biomarkers, diagnosing patients, and designing drugs or vaccines. Provided are also methods for conducting such biomarker identifications, diagnosis, prognosis, and drug design.

[0069] Provided herein are systems and methods for processing and visualization of biological data (e.g., biomarkers). Further provided herein are systems and methods described herein result in generating a penetrance score. Further provided herein are systems and methods to interrogate disease mechanisms. Further provided herein are systems and methods for validating therapeutic targets using penetrance data and mechanism. Provided herein are systems and methods for providing accurate and scalable Primary Template-Directed Amplification (PTA) and sequencing in combination with additional cell analysis techniques (multiomics). Further provided herein are methods of multiomic analysis, including analysis of proteins, DNA, and RNA from single cells, and corresponding post-transcriptional or post- translational modifications in combination with PTA. Such methods and compositions facilitate highly accurate amplification of target (or “template”) nucleic acids, which increases accuracy and sensitivity of downstream applications, such as Next-Generation Sequencing.

[0070] The methods and systems described herein in some instances automates many of the required functions formerly requiring labor intensive processes as well dedicated personnel to curate, analyze and interpret complex biological data.

[0071] Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. Any reference to “or” herein is intended to encompass “and/or” unless otherwise stated.

[0072] Unless defined otherwise, all terms of art, notations and other technical and scientific terms or terminology used herein are intended to have the same meaning as is commonly understood by one of ordinary skill in the art to which the claimed subject matter pertains. In some cases, terms with commonly understood meanings are defined herein for clarity and/or for ready reference, and the inclusion of such definitions herein should not necessarily be construed to represent a substantial difference over what is generally understood in the art.

[0073] Throughout this application, various embodiments may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range. [0074] As used in the specification and claims, the singular forms “a”, “an” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “a sample” includes a plurality of samples, including mixtures thereof.

[0075] The terms “determining,” “measuring,” “evaluating,” “assessing,” “assaying,” and “analyzing” are often used interchangeably herein and can refer to forms of measurement. The terms include determining if an element is present or not (for example, detection). These terms can include quantitative, qualitative or quantitative and qualitative determinations. Assessing can be relative or absolute. “Detecting the presence of’ can include determining the amount of something present in addition to determining whether it is present or absent depending on the context.

[0076] As used herein, the term “gene” can refer to a linear sequence of nucleotides along a segment of DNA that provides the coded instructions for synthesis of RNA, which, when translated into protein, leads to the expression of hereditary character.

[0077] As used herein, the term “nucleic acid molecule” can mean DNA, RNA, singlestranded, double-stranded or triple stranded and any chemical modifications thereof. Virtually any modification of the nucleic acid is contemplated. A “nucleic acid molecule” can be of almost any length, from 10, 20, 30, 40, 50, 60, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 6000, 7000, 8000, 9000, 10,000, 15,000, 20,000, 30,000, 40,000, 50,000, 75,000, 100,000, 150,000, 200,000, 500,000, 1,000,000, 1,500,000, 2,000,000, 5,000,000 or even more bases in length, including increments therein, up to a full-length chromosomal DNA molecule. For methods that analyze expression of a gene, the nucleic acid isolated from a sample is typically RNA.

[0078] A single-stranded nucleic acid molecule is “complementary” to another single-stranded nucleic acid molecule, in certain embodiments of the subject matter descnbed herein, when it can base-pair (hybridize) with all or a portion of the other nucleic acid molecule to form a double helix (double-stranded nucleic acid molecule), based on the ability of guanine (G) to base pair with cytosine (C) and adenine (A) to base pair with thymine (T) or uridine (U). For example, the nucleotide sequence 5'-TATAC-3' is complementary to the nucleotide sequence 5'-GTATA-3'.

[0079] As used herein, the term “mutation” can refer to a change in the genome with respect to the standard wild-type sequence. Mutations can be deletions, insertions, or rearrangements of nucleic acid sequences at a position in the genome, or they can be single base changes at a position in the genome, referred to as “point mutations.” Mutations can be inherited, or they can occur in one or more cells during the lifespan of an individual. In some instances, mutation and variant are used synonymously. [0080] As used herein, the term “kit” or “research kit” can refer to a collection of products that are used to perform a biological research reaction, procedure, or synthesis, such as, for example, a detection, assay, separation, purification, etc., which are typically shipped together, usually within a common packaging, to an end user.

[0081] Described herein is a cloud-based solution for the storage, query, and analysis of longitudinal data comprising a multiplicity of whole genomes, a large number of public and proprietary annotation sources as well as associated high quality phenotypic data, including microbiome metagenomes and metabolomics profiles. In various embodiments, the data analyzed by the platforms, systems, media, and methods described herein comprises more than 1,000, more than 5,000, more than 10,000, more than 20,000, more than 50,000, more than 100,000, more than 500,000, or more than 1,000,000 whole genomes.

[0082] The data analyzed by the platforms, systems, media, and methods described herein comprises genomic data. The genomic data is produced, by way of example, at a next generation sequencing (NGS) lab. In some cases, an AWS analysis pipeline based on Illumina’s HiSeq X and the ISIS Analysis Software are utilized to produce the genomic data. Sequencing reads are mapped to the hg38 human reference sequence and variant callers are used to call single nucleotide variants (SNVs) and insertions and deletions (indels). The genomic data comprises a multiplicity of unique SNVs. By way of examples, the genomic data comprises over 1 million, over 10 million, over 50 million, over 100 million, over 500 million, or over 1 billion unique SNVs.

[0083] The data analyzed by the platforms, systems, media, and methods described herein comprises metadata. The whole genomes are associated with high quality phenotypic information. A proprietary phenotype ingestion process enables the cleaning and standardization of phenotype data across disparate data sources. In some embodiments, the ingestion process includes: data integrity checks; standardization of units; standardization of terms; ontology/vocabulary mapping; and maintenance of the proprietary data dictionary. [0084] In various embodiments, the phenotype data comprises more than 1000, more than 5000, more than 10,000, more than 100,000, more than 1,000,000, or more than 10,000,000 phenotype data fields with, more than 1 million, more than 5 million, more than 10 million, more than 50 million, more than 100 million, more than 500 million, or more than 1 billion data points. Phenotypic data in some instances comprises cellular phenotype data. In some instances, cellular phenotypic data obtained from microscopy. In some instances, cell phenotypic data comprises one or more observable phenotypic traits such as cell shape or morphology , size, texture, internal structure, patterns of distribution of one or more specific proteins, glycosylated proteins, nucleic acid molecules, lipid molecules, glycosylated lipid molecules, carbohydrate molecules, metabolites, and ions. In some instances, phenotypic data describes populations of cells described herein. In some instances, phenotypic data describes phenotypic traits of an organism such as a human. In some instances, a phenotypic data comprises a clinical designation or category, for example, a clinical diagnosis, a clinical parameter name, a clinical parameter value, a laboratory test name or a laboratory test value. In some instances, a phenotype is associated with an observable disease characteristic.

[0085] The data analyzed by the platforms, systems, media, and methods described herein comprises annotation data. Annotation data is also cleaned and standardized through an automated end-to-end solution, which allows: idempotence, immutability, persistence; high quality data; consistency between data sources; and scalability and flexibility.

[0086] Samples described herein may represent biologic information obtained from individuals or populations of individuals (e.g., genomic information). In some instances, samples comprise single cells. In some instances, samples comprise 1, 2, 5, 10, 20, 25, 50, 75, 100, 200, 500, or more than 1000 cells from the same or different individual. In some instances, samples comprise 1000, 2000, 5000, 10,000 20,000, 50,000, 75,000, or at least 100,000 cells from the same or different individual. Samples may be obtained from any species, including but not limited to viruses, bacteria, plants, fungi, protozoa, archaea, or animals. In some instances, samples are obtained from vertebrates. In some instances, samples are obtained from mammals. In some instances, samples are obtained from humans. Samples in some instances are obtained from any bodily fluid or tissue. In some instances, samples are obtained from diseased tissue such as a tumor.

[0087] In an aspect, provided herein is a method of single cell analysis comprising: (a) providing or obtaining a plurality of cells; (b) performing one or more experiments on single cells of the plurality' of cells to generate at least a first data set and a second data set from the plurality' of cells, wherein the first data set is a genomic data set and the second data set is a transcriptomic data set and/or a proteomic data set and/or a methylomic data set; (c) identifying a correlation between the first data set and the second data set for at least a portion of the plurality' of cells; and (d) using the correlation obtained in (c), identifying a disease biomarker, designing a therapeutic, or designing a vaccine for a disease.

[0088] In some embodiments, performing the one or more experiments comprises performing primary template directed amplification (PTA). In some embodiments, the one or more experiments or screens comprise a genomics experiment, a transcriptomic experiment, a proteomics experiment, a methylomics experiment or any combination thereof. In some embodiments, the one or more experiments comprise high-throughput single cell analysis, wherein single cells of the plurality of cells are screened in high-throughput. In some embodiments, the one or more experiments are performed using a miniaturized high-throughput single cell screening system. In some embodiments, the method comprises compartmentalizing the plurality of cells into a plurality of partitions, a partition of the plurality of partitions comprises a single cell of the plurality of cells. In some embodiments, the plurality of partitions comprises a plurality of wells, a plurality of droplets, or both. In some embodiments, the wells are miniaturized wells. In some embodiments, the miniaturized high-throughput single cell screening system comprises a microfluidic device, a miniaturized array, or both.

[0089] In some embodiments, the one or more experiments comprise performing one or more reactions. In some embodiments, a partition of the plurality of partitions comprises a single cell therein, and the one or more experiments or screens comprise performing one or more reactions on the single cell in the partition. In some embodiments, the one or more reactions comprise cell lysis. In some embodiments, the one or more reactions comprise an amplification reaction. In some embodiments, the amplification reaction comprises primary template directed amplification (PTA).

[0090] In some embodiments, the one or more reactions comprise lysing the single cell, extracting the genomic material of the single cell, thereby releasing a cellular nucleic acid molecule from the single cell in the partition, and performing an amplification reaction on the cellular nucleic acid molecule.

[0091] In some embodiments, performing the one or more reactions comprises using one or more reagents. In some embodiments, the one or more reagent(s) comprise one or more of at least one amplification primer, at least one nucleic acid polymerase, and a mixture of nucleotides, wherein the mixture of nucleotides comprises at least one terminator nucleotide which terminates nucleic acid replication by the polymerase.

[0092] In some embodiments, the terminator nucleotide is an irreversible terminator. In some embodiments, the terminator nucleotide is selected from the group consisting of nucleotides with modification to the alpha group, C3 spacer nucleotides, locked nucleic acids (LNA), inverted nucleic acids, 2' fluoro nucleotides, 3' phosphorylated nucleotides, 2'-O-Methyl modified nucleotides, and trans nucleic acids. In some embodiments, the nucleotides with modification to the alpha group are alpha-thio dideoxynucleotides. In some embodiments, the terminator nucleotide comprises modifications of the r group of the 3’ carbon of the deoxyribose. In some embodiments, the terminator nucleotide is selected from the group consisting of dideoxynucleotides, inverted dideoxynucleotides, 3' biotinylated nucleotides, 3' amino nucleotides, 3’-phosphorylated nucleotides, 3'-O-methyl nucleotides, 3' carbon spacer nucleotides including 3' C3 spacer nucleotides, 3' C18 nucleotides, 3' Hexanediol spacer nucleotides, acyclonucleotides, and combinations thereof. [0093] In some embodiments, a partition of the plurality of partitions comprises at least a single cell and a bead. In some embodiments, the bead delivers a reagent for performing a reaction on the single cell in the partition. In some embodiments, the reagent is bound to the bead via a cleavable linker and is configured to be released from the bead via cleavage of the cleavable linker. In some embodiments, the reagent comprises a barcode configured to identify the cell or a constituent of the cell.

[0094] In some embodiments, the constituent of the cell comprises genomic material of the cell, ribonucleic acid (RNA), deoxyribonucleic acid (DNA), or any combination thereof. In some embodiments, the method comprises lysing the cell in the partition, releasing a cellular nucleic acid molecule of the cell in the partition, releasing the barcode from the bead via cleavage of the cleavable linker, and hybridizing the cellular nucleic acid molecule to the barcode. In some embodiments, the one or more reactions comprise lysing the single cell, thereby releasing cellular nucleic acid molecules in the partition, performing one or more amplification reactions on the cellular nucleic acid molecules thereby generating amplified cellular nucleic acid molecules, and wherein the method further comprises extracting the amplified cellular nucleic acid molecules from the partition, and sequencing the amplified cellular nucleic acid molecules.

[0095] In some embodiments, generating the first data set comprises performing primary template directed amplification (PTA) and generating the second data set comprises performing a reverse transcription reaction. In some embodiments, performing the reverse transcription reaction comprises generating a cDNA library. In some embodiments, generating the first data set comprises determining a methylation site in a cellular nucleic acid molecule using PTA, thereby generating a methylation library. In some embodiments, the method further comprises comparing the methylation library to a reference library for a single cell of the plurality of cells, wherein the methylation library and the reference library are generated from the same cell.

[0096] In some embodiments, identifying the correlation comprises calculating or assigning a penetrance score to the correlation, wherein the penetrance score quantifies the correlation. In some embodiments, the penetrance score guides identifying the disease biomarker, designing the therapeutic, designing the vaccine for the disease, or any combination thereof. In some embodiments, a high penetrance score indicates a strong correlation between the first data set and the second data set. In some embodiments, the high penetrance score indicates that the expression of a gene identified in the first data set leads to a transcriptomic event, a proteomic event or both, and wherein the gene is identified as a disease biomarker. In some embodiments, a low penetrance score indicates a weak correlation between the first data set and the second data set, and that the expression of a gene identified in the first data set does not lead to a transcriptomic event, a proteomic event, or either, and wherein the gene is not identified as a disease biomarker.

[0097] In some embodiments, identifying the correlation is performed with the aid of a computer system comprising a computer program. In some embodiments, the computer program comprises a bioinformatics algorithm. In some embodiments, the first data set and the second data set are combined or integrated into a database.

[0098] In an aspect, provided herein is a method of developing a treatment for a disease, wherein the method comprises: (a) generating multiomics data from one or more single cells, wherein generating comprises performing Primary Template Directed Amplification (PTA), and wherein the multiomics data comprises two or more of genome data, transcriptome data, and proteomics data; (b) correlating one or more mutations in genome data with corresponding mutations in one or both of (i) an mRNA of the transcriptome data and (ii) a protein of the proteome data; and (c) generating a treatment targeting one or both of the mRNA and the protein, thereby developing the treatment for the disease. In some embodiments, the disease comprises or is cancer.

[0099] In some embodiments, the correlation is quantified by a penetrance score. In some embodiments, the penetrance score is at least 0.5. In some embodiments, the penetrance score is at least 0.9.

[0100] In some embodiments, the treatment comprises an mRNA vaccine. In some embodiments, the treatment comprises reprogramming a dendritic cell to target one or both of the mRNA or protein. In some embodiments, the mutation in genome data comprises a DNA mutation. In some embodiments, the DNA mutation is selected from the group consisting of SNV*X, CNV*X, translocation, INDEL, frameshift, stop codon, mitochondrial, promoter/enhancer, TCR/BCR, and other change. In some embodiments, the mRNA comprises a transcript change. In some embodiments, the transcript change is selected from the group consisting of expression, splice variant, fusion, IncRNA, miRNA, TCR/BCR, promoter, truncated gene, mitochondrial, or mutation.

[0101] In some embodiments, the protein comprises a protein change. In some embodiments, the protein change is selected from the group consisting of over/under expressed, truncated, surface bound, frameshift, misfolded, metabolic, ligand independence, confirmation, activity change, or fused.

[0102] In some embodiments, the disease comprises cancer. In some embodiments, cancer comprises breast cancer. In some embodiments, the breast cancer comprises ductal carcinoma. In some embodiments, the cancer comprises leukemia. In some embodiments, the single cells (e g., single cancer cells) are obtained from an FFPE sample. [0103] In some embodiments, the method or system is capable of detecting a number of RNA variant per cell of at least 750, 1000, 1500, 2000, 2500 or higher. In some embodiments, the method or system is capable of detecting a number of genes per cell of from about 1000 to about 8000. In some embodiments, the method or system is capable of detecting a number of RNA variant per cell of at least 750, 1000, 1500, 2000, 2500 or higher and a number of genes per cell of from about 1000 to about 8000.

[0104] In some embodiments, the methods comprise full length synthesis of RNA transcripts in the cell wherein a plurality of amplification products achieved from performing the method are substantially unbiased over a range of 5 ’-3’ gene body percentiles.

[0105] In some embodiments, the methods and systems of the present disclosure are capable of amplifying and detecting transcripts of at least 1 kb, 1.5 kb, 2kb, 2.5 kb, 3 kb, 3.5 kb, 4 kb, 4.5 kb, or longer.

Multiomics

[0106] Provided herein are methods for multiomics sample preparation and/or analysis. In some embodiments, multiomics may include analysis of at least one feature of a proteome, genome, transcriptome, metabolome, lipidome, or epigenome. Proteomics may include translation level, phosphorylation state, and protein modification. Transcriptomics may include, without limitations, analysis of ribosomal RNA (rRNA), messenger RNA (mRNA), transfer RNA (tRNA), micro-RNA (miRNA), and other non-coding RNA (ncRNA), or a combination thereof. Epigenomics may include, without limitations, analysis of methylation patterns (e.g.

“methylome”) or histone modifications.

[0107] In some instances, a method comprises one or more steps of isolating a single cell from a population of cells, wherein the single cell comprises RNA and genomic DNA; amplifying the RNA by RT-PCR to generate a cDNA library; isolating the cDNA from the genomic DNA; contacting the genomic DNA with at least one amplification primer, at least one nucleic acid polymerase, and a mixture of nucleotides; and sequencing the cDNA 1 i brary and the genomic DNA library. In some instances, the mixture of nucleotides comprises at least one terminator nucleotide which terminates nucleic acid replication by the polymerase to generate a genomic DNA library.

[0108] Methods described herein (e.g., PTA) may be used as a replacement for any number of other known methods in the art which are used for single cell sequencing (multiomics or the like). In some instances, a method described herein comprises PTA and a method of poly adenylated mRNA transcripts. In some instances, a method descnbed herein comprises PTA and a method of non-polyadenylated mRNA transcripts. In some instances, a method described herein comprises PTA and a method of total (poly adenylated and non- polyadenylated) mRNA transcripts. PTA may substitute genomic DNA sequencing methods such as MDA, PicoPlex, DOP-PCR, MALBAC, or target-specific amplifications. In some instances, PTA replaces the standard genomic DNA sequencing method in a multiomics method including DR-seq (Dey et al., 2015), G&T seq (MacAulay et al., 2015), scMT-seq (Hu et al., 2016), sc-GEM (Cheow et al., 2016), scTrio-seq (Hou et al., 2016), simultaneous multiplexed measurement of RNA and proteins (Darmanis et al., 2016), scCOOL-seq (Guo et al., 2017), CITE-seq (Stoeckius et al., 2017), REAP-seq (Peterson et al., 2017), scNMT-seq (Clark et al., 2018), or SIDR-seq (Han et al., 2018).

[0109] In some instances, PTA is combined with a standard RNA sequencing method to obtain genome and transcriptome data. In some instances, a multiomics method described herein comprises PTA and one of the following: Drop-seq (Macosko, et al. 2015), mRNA-seq (Tang et al., 2009), InDrop (Klein et al., 2015), MARS-seq (Jaitin et al., 2014), Smart-seq2 (Hashimshony, et al., 2012; Fish et al., 2016), CEL-seq (Jaitin et al., 2014), STRT-seq (Islam, et al., 2011), Quartz-seq (Sasagawa et al., 2013), CEL-seq2 (Hashimshony, et al. 2016), cytoSeq (Fan et al., 2015), SuPeR-seq (Fan et al., 2011), RamDA-seq (Hayashi, et al. 2018), MATQ-seq (Sheng et al., 2017), or SMARTer (Verboom et al., 2019).

[0110] Various reaction conditions and mixes may be used for generating cDNA libraries for transcriptome analysis. In some instances, an RT reaction mix is used to generate a cDNA library. In some instances, the RT reaction mixture comprises a crowding reagent, at least one primer, a template switching oligonucleotide (TSO), a reverse transcriptase, and a dNTP mix. In some instances, an RT reaction mix comprises an RNAse inhibitor. In some instances, an RT reaction mix comprises one or more surfactants. In some instances, an RT reaction mix comprises Tween-20 and/or Tnton-X. In some instances, an RT reaction mix comprises Betaine. In some instances, an RT reaction mix comprises one or more salts. In some instances, an RT reaction mix comprises a magnesium salt (e.g., magnesium chloride) and/or tetramethyl ammonium chloride. In some instances, an RT reaction mix comprises gelatin. In some instances, an RT reaction mix comprises PEG (PEG1000, PEG2000, PEG4000, PEG6000, PEG8000, or PEG of other length).

[OHl] Multiomic methods described herein may provide both genomic and RNA transcript information from a single cell (e.g., a combined or dual protocol). In some instances, genomic information from the single cell is obtained from the PTA method, and RNA transcript information is obtained from reverse transcription to generate a cDNA library. In some instances, a whole transcript method is used to obtain the cDNA library. In some instances, 3’ or 5’ end counting is used to obtain the cDNA library. In some instances, cDNA libraries are not obtained using UMIs. In some instances, a multiomic method provides RNA transcript information from the single cell for at least 500, 1000, 2000, 5000, 8000, 10,000, 12,000, or at least 15,000 genes. In some instances, a multi omic method provides RNA transcript information from the single cell for about 500, 1000, 2000, 5000, 8000, 10,000, 12,000, or about 15,000 genes. In some instances, a multiomic method provides RNA transcript information from the single cell for 100-12,000 1000-10,000, 2000-15,000, 5000-15,000, 10,000-20,000, 8000-15,000, or 10,000-15,000 genes. In some instances, a multiomic method provides genomic sequence information for at least 80%, 90%, 92%, 95%, 97%, 98%, or at least 99% of the genome of the single cell. In some instances, a multiomic method provides genomic sequence information for about 80%, 90%, 92%, 95%, 97%, 98%, or about 99% of the genome of the single cell. RNA may be amplified in the multiomics methods described herein. In some instances, RNA is amplified to isolate mRNA transcripts. In some instances, templateswitching polynucleotides are used. In some instances, amplification of RNA uses labeled primers. In some instances, a label comprises biotin. In some instances, at least some of the cDNA polynucleotides are isolated with affinity binding to the label. In some instances, multiomics methods comprise amplification of RNA to generate a cDNA library. In some instances, a cDNA library is generated having at least 10, 20, 30, 50, 75, 100, 125, 150, 175, 200, 225, 250, 300, 350, 400, or at least 500 ng of DNA. In some instances, a cDNA library is generated having 10-500, 20-500, 30-500, 50-500, 50-400, 50-300, 100-500, 100-400, 100-300, 100-200, 200-500, 300-500, or 400-750 ng of DNA. In some instances, at least some polynucleotides in the cDNA library comprise a barcode. In some instances, the cDNA comprises polynucleotides corresponding to at least 100, 500, 1000, 1500, 2000, 2500, 3000, 3500, or at least 4000 genes. In some instances, the cDNA comprises a 5’ to 3’ transcript bias of 0.5-1.5, 0.6-1.5, 0.7-1.5, 0.8-1.5, 0.9-1.5, 0.8-1.5, 1-1.5, 1-2.0, 1.2-2.0, 0.5-2.0.

[0112] Multiomic methods may comprise analysis of single cells from a population of cells. In some instances, at least 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, or at least 8000 cells are analyzed. In some instances, about 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, or about 8000 cells are analyzed. In some instances, 5-100, 10-100, 50-500, 100-500, 100-1000, 50- 5000, 100-5000, 500-1000, 500-10000, 1000-10000, or 5000-20,000 cells are analyzed.

[0113] Multiomic methods may generate yields of amplified genomic DNA from the PTA reaction based on the type of single cell. In some instances, the amount of DNA generated from a single cell is about 0.1, 1, 1.5, 2, 3, 5, or about 10 micrograms. In some instances, the amount of DNA generated from a single cell is about 0.1, 1, 1.5, 2, 3, 5, or about 10 femtograms. In some instances, the amount of DNA generated from a single cell is at least 0.1, 1, 1.5, 2, 3, 5, or at least 10 micrograms. In some instances, the amount of DNA generated from a single cell is at least 0.1, 1, 1.5, 2, 3, 5, or at least 10 femtograms. In some instances, the amount of DNA generated from a single cell is about 0.1-10, 1-10, 1.5-10, 2-20, 2-50, 1-3, or 0.5-3.5 micrograms. In some instances, the amount of DNA generated from a single cell is about 0.1- 10, 1-10, 1.5-10, 2-20, 2-4, 1-3, or 0.5-4 femtograms. In some instances, the amount of DNA generated from a single cell is about 0.5-2.5, 0.5-3, 0.5-5, 0.2-5, 1-2.5, or 1-5 ng of DNA. In some instances, the amount of DNA generated from a single cell is at least 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 4, or at least 5 ng of DNA.

[0114] DNA libraries may comprise an allelic balance. In some instances, the allelic balance is 50-100, 60-100, 70-100, 80-100, 60-95, 70-95, 80-95, 85-95, 90-95, 90-98, 90-99, 85-99, or 95- 99 percent. In some instances, the allelic balance is at least 50, 60, 70, 80, 83, 85, 87, 90, 92, 95, 98, or at least 99 percent.

[0115] DNA libraries may comprise a sensitivity for one or more SNVs. In some instances, the sensitivity is 0.50-1, 0.60-1, 0.70-1, 0.80-1, 0.60-0.95, 0.70-0.95, 0.80-0.95, 0.85-0.95, 0.90- 0.95, 0.90-0.98, 0.90-0.99, 0.85-0.99, or 0.95-0.99. In some instances, the sensitivity is at least 0.50, 0.60, 0.70, 0.80, 0.83, 0.85, 0.87, 0.90, 0.92, 0.95, 0.98, or at least 0.99.

[0116] DNA libraries may comprise a precision for one or more SNVs. In some instances, the precision is 0.50-1, 0.60-1, 0.70-1, 0.80-1, 0.60-0.95, 0.70-0.95, 0.80-0.95, 0.85-0.95, 0.90- 0.95, 0.90-0.98, 0.90-0.99, 0.85-0.99, or 0.95-0.99. In some instances, the precision is at least 0.50, 0.60, 0.70, 0.80, 0.83, 0.85, 0.87, 0.90, 0.92, 0.95, 0.98, or at least 0.99.

Penetrance

[0117] Provided herein are systems and methods for quantifying penetrance (e.g., penetrance score). In some instances, a penetrance score represents the contribution of one or more pieces of molecular information that associate with the physical signs and symptoms of a genetic disorder. In some instances, subjects with one or more biomarkers do not develop physical features of the disorder, and the condition has incomplete (or low) penetrance. In some instances, penetrance is determined from one or more biomarkers and/or biological mechanisms (pathways). In some instances, changes to biomarkers (type, measurement, etc.) are used to determine penetrance score.

[0118] In some instances, these phenotypic changes are due to a functional element (e.g., an RNA or a protein). In some instances, a change is silent (having no impact to protein). In some instances, a phenotypic change is manifested as a change in measurements obtained from one or more multiomic modalities.

[0119] In some instances, multiomics comprises DNA (e.g., genome/epigenome), RNA (e.g., transcriptome), protein (proteome), and/or other molecules (e.g., lipidome, metabolome). In some instances, multiomics enables determination of a mechanism with interdependent components for a disease state or disorder. In some instances, a penetrance score and/or mechanism are used to identify validated therapeutic targets. In some instances, treatments are generated based on the therapeutic target. In some instances, the treatment comprises a vaccine, antibody, a genetic therapy, modified immune cells, or small molecule.

[0120] Provided herein are systems and methods comprising workflows for measuring penetrance (e.g., penetrance score). In some instances, a workflow comprises one or more steps of detecting a genetic change, transcript change, methylation change, and protein change. In some instances, systems and methods comprise a workflow according to FIG. 3. In some instances, a first step comprises detecting a genetic change. In some instances, a lack of a genetic change indicates no genomic mechanism (e.g., allele-related). In some instances, an optional second step comprises detecting a methylation change. In some instances, a lack of methylation change indicates a gene is not silenced. In some instances, a third step comprises detecting a transcript change. In some instances, a lack of transcript change indicates no transcriptome mechanism. In some instances, a lack of transcript changes indicates a transient expression of the expressed gene. In some instances, a lack of transcript change indicates incomplete penetrance. In some instances, a fourth step comprises detecting a protein change. In some instances, a lack of change in the proteome indicates no proteomic mechanism. In some instances, lack of a change in the proteome indicates incomplete penetrance. In some instances, a change detected in two or more steps indicates high penetrance. In some instances, a change detected in three or more steps indicates high penetrance. In some instances, a change detected in four or more steps indicates high penetrance. In some instances, detected changes in the genome, transcriptome, and proteome indicate high penetrance. Systems and methods described herein in some instances comprise one or more steps shown in FIG. 5. Systems and methods described herein in some instances comprise one or more measurements shown in FIG. 5. [0121] Provided herein are systems for determining a penetrance score. In some instances, systems comprise one or more of a computing system comprising at least one processor and instructions executable by the at least one processor to provide an application configured to perform operations. In some instances, the operations comprise one or more of: receiving multiomics data from one or more sources and at least one biological state; and applying an algorithm configured to process the data and generate a penetrance score. In some instances, the system comprises a standalone computing platform. In some instances, the system comprises a cloud computing platform. In some instances, the multiomics data comprises data from one or more of a genome, a trans criptome, a proteome, a metabolome, a lipidome, or an epigenome (such as a methylome). In some instances, the multiomics data comprises data from two or more of a genome, a trans criptome, a proteome, a metabolome, a lipidome, or an epigenome. In some instances, the multiomics data comprises data from a genome, a transcriptome, a proteome, a metabolome, a lipidome, or an epigenome. In some instances, the multiormcs data comprises data obtained from processes which analyze one or more of a genome, a transcriptome, a proteome, a metabolome, a lipidome, or an epigenome. In some instances, multiomics data is obtained from a sample described herein. In some instances, multiomics data is obtained from a single cell. In some instances, multiomics data is obtained from a single cell from a tissue. In some instances, systems described herein analyze multiomics data from single cells in a tissue. In some instances, one or more measurements are selected from copy number variation, translocation, mutation burden, methylation at CpG sites, gene activation, gene repression, expressed genes, gene fusions, splice variants, translation level, phosphorylation state, and protein modification. In some instances, two or more measurements are selected from copy number variation, translocation, mutation burden, methylation at CpG sites, gene activation, gene repression, expressed genes, gene fusions, splice variants, translation level, phosphorylation state, and protein modification. In some instances, four or more measurements are selected from copy number variation, translocation, mutation burden, methylation at CpG sites, gene activation, gene repression, expressed genes, gene fusions, splice variants, translation level, phosphorylation state, and protein modification. In some instances, eight or more measurements are selected from copy number variation, translocation, mutation burden, methylation at CpG sites, gene activation, gene repression, expressed genes, gene fusions, splice variants, translation level, phosphorylation state, and protein modification.

[0122] Penetrance scores may be measured from one or more changes to measurements obtained from multiomics data. In some instances, a change is established against a reference sequence. In some instances, the reference sequence is obtained from a healthy or non-disease control sample. In some instances, a reference sequence is obtained from bulk measurements of a sample population. In some instances, a change comprises one or more of a genome DNA change, a transcript change, and a proteome change. In some instances a change comprises one or more of a genomic SNV*X (single nucleotide change), genomic CNV*X (copy number variation change), genomic translocation, genomic INDEL, genomic frameshift, genomic stop codon, genomic mitochondrial, genomic promoter/enhancer, genomic TCR/BCR, transcript expression, transcript splice variant, transcript fusion, transcript IncRNA, transcript miRNA, transcript TCR/BCR, transcript promoter, transcript truncated gene, transcript mitochondrial, transcript mutation, over/under expressed protein, truncated protein, surface bound protein, frameshift protein, misfolded protein, metabolic protein, protein ligand independence, protein confirmation, protein activity change, and fused protein. In some instances a change comprises two or more of a genomic SNV*X, genomic CNV*X, genomic translocation, genomic INDEL, genomic frameshift, genomic stop codon, genomic mitochondrial, genomic promoter/enhancer, genomic TCR/BCR, transcript expression, transcript splice variant, transcript fusion, transcript IncRNA, transcript miRNA, transcript TCR/BCR, transcript promoter, transcript truncated gene, transcript mitochondrial, transcript mutation, over/under expressed protein, truncated protein, surface bound protein, frameshift protein, misfolded protein, metabolic protein, protein ligand independence, protein confirmation, protein activity change, and fused protein. In some instances a change comprises five or more of a genomic SNV*X, genomic CNV*X, genomic translocation, genomic INDEL, genomic frameshift, genomic stop codon, genomic mitochondrial, genomic promoter/enhancer, genomic TCR/BCR, transcript expression, transcript splice variant, transcript fusion, transcript IncRNA, transcript miRNA, transcript TCR/BCR, transcript promoter, transcript truncated gene, transcript mitochondrial, transcript mutation, over/under expressed protein, truncated protein, surface bound protein, frameshift protein, misfolded protein, metabolic protein, protein ligand independence, protein confirmation, protein activity change, and fused protein. In some instances, a measurement change is used to determine a mechanism. In some instances, a mechanism comprises a determinate of cell fate. In some instances, a cell fate is shown in FIG. 7. A penetrance score may be represented in different ways. In some instances, a penetrance score comprises a numerical value. In some instances, a penetrance score is categorical. In some instances, a numerical value is used to determine a categoncal value. In some instances, categorical values comprise high or low.

[0123] Biological inquiries may be used to interrogate changes in measurements obtained from multiomics data. In some instances, methods described herein perform one or more biological inquiries. In some instances, a biological inquiry comprises throughput number of cells processed, throughput number of cells recovered, throughput sequencing, DNA mutation - SNV, DNA copy number variation, RNA - 3’ gene expression, RNA - genes analyzed/detected, RNA - low level genes detected, RNA - mitochondrial gene expression, protein - translation panel, RNA - chromatin panel, RNA - chromatin state, RNA - BCR/TCR, and RNA - full transcript gene. An example of a workflow comprising biological inquires for both mammalian and bacteria samples is shown in FIG. 9. In some instances, systems and methods described herein comprise obtaining a sample, and performing one or more methods comprising biological inquiries. In some instances, obtaining cells comprises one or more of FACS sorting, microfluidics, spatial cell selection, and ultra-high throughput methods. In some instances, methods comprise simultaneous genome/transcriptome analysis to prepare libraries (e g., using PTA). In some instances, libraries are then sequenced to obtain multiomics data. [0124] Provided herein are methods of target validation. In some instances, a target is associated with a disease state or condition. In some instances, a target validation workflow comprises one or more steps of FIG. 6. In some instances, a workflow for validating a target comprises one or more of obtaining a sample, storing a sample, performing one or more multiomic methods on the sample to generate multiomics data, using a computation engine to process the data, and validating a target. In some instances, the sample comprises cells from a tissue. In some instances, the sample comprises cells from a frozen tissue. In some instances, the sample comprises a section of tissue. In some instances, cells are collected and then banked. In some instances, no more than 5000, 4000, 3000, 2000, 1000, 900, 800, 700, 600, 500, 400, 300, 200, 100, 50, 25, or no more than 10 cells are banked. Multiomic methods comprise methods described herein such as those which analyze and provide data on any one of the genome, methylome, transcriptome, and proteome. In some instances, target validation is associated with targets related to immunology, cancer genomics, neurology, PGT, microbiome, toxicology, bioprocessing, or cardiology.

Methylome analysis

[0125] Described herein are methods comprising PTA, wherein sites of methylated DNA in single cells are determined using the PTA method. In some instances, sites of methylated DNA is detected using enzymatic methods. In some instances, sites of methylated DNA is detected using non-enzymatic methods. In some instances, these methods further comprise parallel analysis of the transcriptome and/or proteome of the same cell. Methods of detecting methylated genomic bases include selective restriction with methylation-sensitive endonucleases, followed by processing with the PTA method. Sites cut by such enzymes are determined from sequencing, and methylated bases are identified. In some instance, libraries are amplified with methylation-specific primers which selectively anneal to methylated sequences. [0126] In another instance, bisulfite treatment of genomic DNA libraries is used to detect a methylation signature. Bisulfite conversion of DNA results in conversion of unmodified cytosine (C) to uracil (U) that will be read as thymine (T) upon sequencing of PCR amplified DNA. Both 5meC and 5hmC are protected against conversion and will not be converted to U. Therefore, they will both be read as C upon sequencing. Alternatively, non-methylation-specific PCR is conducted, followed by one or more methods to discriminate between bisulfite-reacted bases, including direct pyrosequencing, MS-SnuPE, HRM, COBRA, MS-SSCA, or basespecific cleavage/MALDI-TOF. In some instances, genomic DNA samples are split for parallel analysis of the genome (or an enriched portion thereof) and methylome analysis. In some instances, analysis of the genome and methylome comprises enrichment of genomic fragments (e g., exome, or other targets) or whole genome sequencing.

[0127] In some instances, the methylation signature is preserved during PTA. In some instances, processing with the PTA method while preserving the methylation signature is used to create a reference library. In some instances, after a reference library is created, methylation paterns are detected using the methods described herein to create a methylation-specific library. In some embodiments, the methylation-specific library is compared to the reference library. In some instances, the methylation-specific library and the reference library are prepared from the same cell. In some instances, comparing the methylation-specific library to the reference library allows for identification of a methylation signature. In some instances, after a reference library is created, the genomic DNA library is treated with bisulfite. In some instances, the genomic library treated with bisulfite is amplified with the PTA method to produce a methylation-specific library.

Bioinformatics

[0128] The data obtained from single-cell analysis methods utilizing PTA described herein may be compiled into a database. Described herein are methods and systems of bioinformatic data integration. Data from the proteome, genome, trans criptome, methylome or other data is in some instances combined/integrated into a database and analyzed. Bioinformatic data integration methods and systems in some instances comprise one or more of protein detection (FACS and/or NGS), mRNA detection, and/or genome variance detection. In some instances, this data is correlated with a disease state or condition. In some instances, data from a plurality of single cells is compiled to describe properties of a larger cell population, such as cells from a specific sample, region, organism, or tissue. In some instances, protein data is acquired from fluorescently labeled antibodies which selectively bind to proteins on a cell. In some instances, a method of protein detection comprises grouping cells based on fluorescent markers and reporting sample location post-sorting. In some instances, a method of protein detection comprises detecting sample barcodes, detecting protein barcodes, companng to designed sequences, and grouping cells based on barcode and copy number. In some instances, protein data is acquired from oligo barcoded antibodies which selectively bind to proteins on a cell. Such oligo barcodes covalently linked to the antibody are used a reference to the specific antigen binding site for the detection of a particular antigen or translated protein. In some instances, transcriptome data is acquired from sample and RNA specific barcodes. In some instances, a method of mRNA detection comprises detecting sample and RNA specific barcodes, aligning to genome, aligning to RefSeq/Encode, reporting Exon/Intro/Intergenic sequences, analyzing exon-exon junctions, grouping cells based on barcode and expression variance and clustering analysis of variance and top variable genes. In some instances, genomic data is acquired from sample and DNA specific barcodes. In some instances, a method of genome variance detection comprises detecting sample and DNA specific barcodes, aligning to the genome, determine genome recovery and SNV mapping rate, filtering reads on exon-exon junctions, generating variant call file (VCF), and clustenng analysis of variance and top variable mutations.

Mutations

[0129] In some instances, the methods (e.g., multiomic PTA) described herein result in higher detection sensitivity and/or lower rates of false positives for the detection of mutations. In some instances, a mutation is a difference between an analyzed sequence (e.g., using the methods described herein) and a reference sequence. Reference sequences are in some instances obtained from other organisms, other individuals of the same or similar species, populations of organisms, or other areas of the same genome. In some instances, mutations are identified on a plasmid or chromosome. In some instances, a mutation is an SNV (single nucleotide variation), SNP (single nucleotide polymorphism), or CNV (copy number variation, or CNA/copy number aberration). In some instances, a mutation is base substitution, insertion, or deletion. In some instances, a mutation is a transition, transversion, nonsense mutation, silent mutation, synonymous or non-synonymous mutation, non-pathogenic mutation, missense mutation, or frameshift mutation (deletion or insertion). In some instances, PTA results in higher detection sensitivity and/or lower rates of false positives for the detection of mutations when compared to methods such as in-silico prediction, ChlP-seq, GUIDE-seq, circle-seq, HTGTS (High- Throughput Genome-Wide Translocation Sequencing), IDLV (integration-deficient lentivirus), Digenome-seq, FISH (fluorescence in situ hybridization), or DISCOVER-seq.

Primary Template-Directed Amplification

[0130] Described herein are nucleic acid amplification methods, such as ‘’Primary Template- Directed Amplification (PTA).” In some instances, PTA is combined with other analysis workflows for multiomic analysis. With the PTA method, amplicons are preferentially generated from the primary template (“direct copies”) using a polymerase (e.g., a strand displacing polymerase). Consequently, errors are propagated at a lower rate from daughter amplicons during subsequent amplifications compared to MDA. The result is an easily executed method that, unlike existing WGA protocols, can amplify low DNA input including the genomes of single cells with high coverage breadth and uniformity in an accurate and reproducible manner. Moreover, the terminated amplification products can undergo direction ligation after removal of the terminators, allowing for the attachment of a cell barcode to the amplification primers so that products from all cells can be pooled after undergoing parallel amplification reactions. In some instances, template nucleic acids are not bound to a solid support. In some instances, direct copies of template nucleic acids are not bound to a solid support. In some instances, one or more pnmers are not bound to a solid support. In some instances, no primers are not bound to a solid support. In some instances, a primer is attached to a first solid support, and a template nucleic acid is attached to a second solid support, wherein the first and the second solid supports are not the same. In some instances, PTA is used to analyze single cells from a larger population of cells. In some instances, PTA is used to analy ze more than one cell from a larger population of cells, or an entire population of cells.

[0131] Described herein are methods employing nucleic acid polymerases with strand displacement activity for amplification. In some instances, such polymerases comprise strand displacement activity and low error rate. In some instances, such polymerases comprise strand displacement activity and proofreading exonuclease activity, such as 3 ’->5’ proofreading activity. In some instances, nucleic acid polymerases are used in conjunction with other components such as reversible or irreversible terminators, or additional strand displacement factors. In some instances, the polymerase has strand displacement activity, but does not have exonuclease proofreading activity. For example, in some instances such polymerases include bacteriophage phi29 (029) polymerase, which also has very low error rate that is the result of the 3’->5’ proofreading exonuclease activity (see, e.g., U.S. Pat. Nos. 5,198,543 and 5,001,050). In some instances, examples of strand displacing nucleic acid polymerases include, e.g., genetically modified phi29 ( 29) DNA polymerase, Klenow Fragment of DNA polymerase I (Jacobsen et al., Eur. J. Biochem. 45:623-627 (1974)), phage M2 DNA polymerase (Matsumoto et al., Gene 84:247 (1989)), phage phiPRDl DNA polymerase (Jung et al., Proc. Natl. Acad. Sci. USA 84:8287 (1987); Zhu and Ito, Biochim. Biophys. Acta.

1219:267-276 (1994)), Bst DNA polymerase (e.g., Bst large fragment DNA polymerase (Exo(-) Bst; Aliotta et al., Genet. Anal. (Netherlands) 12: 185-195 (1996)), exo(-)Bca DNA polymerase (Walker and Linn, Clinical Chemistry 42: 1604-1608 (1996)), Bsu DNA polymerase, VentR DNA polymerase including VentR (exo-) DNA polymerase (Kong et al., J. Biol. Chem. 268:1965-1975 (1993)), Deep Vent DNA polymerase including Deep Vent (exo-) DNA polymerase, IsoPol DNA polymerase, DNA polymerase I, Therminator DNA polymerase, T5 DNA polymerase (Chattegee et al., Gene 97: 13-19 (1991)), Sequenase (U.S. Biochemicals), T7 DNA polymerase, T7-Sequenase, T7 gp5 DNA polymerase, PRDI DNA polymerase, T4 DNA polymerase (Kaboord and Benkovic, Curr. Biol. 5: 149-157 (1995)). Additional strand displacing nucleic acid polymerases are also compatible with the methods described herein. The ability of a given polymerase to carry' out strand displacement replication can be determined, for example, by using the polymerase in a strand displacement replication assay (e.g., as disclosed in U.S. Pat. No. 6,977,148). Such assays in some instances are performed at a temperature suitable for optimal activity for the enzyme being used, for example, 32°C for phi29 DNA polymerase, from 46°C to 64°C for exo(-) Bst DNA polymerase, or from about 60°C to 70°C for an enzyme from a hyperthermophylic organism. Another useful assay for selecting a polymerase is the primer-block assay described in Kong et al., J. Biol. Chem. 268: 1965-1975

(1993). The assay consists of a primer extension assay using an M13 ssDNA template in the presence or absence of an oligonucleotide that is hybridized upstream of the extending primer to block its progress. Other enzymes capable of displacement the blocking primer in this assay are in some instances useful for the disclosed method. In some instances, polymerases incorporate dNTPs and terminators at approximately equal rates. In some instances, the ratio of rates of incorporation for dNTPs and terminators for a polymerase described herein are about 1:1, about 1.5: 1, about 2: 1, about 3: 1 about 4: 1 about 5: 1, about 10: 1, about 20:1 about 50: 1, about 100: 1, about 200: 1, about 500:1, or about 1000: 1. In some instances, the ratio of rates of incorporation for dNTPs and terminators for a polymerase described herein are 1 : 1 to 1000: 1, 2:1 to 500: 1, 5: 1 to 100: 1, 10: 1 to 1000: 1, 100: 1 to 1000: 1, 500:1 to 2000: 1, 50: 1 to 1500: 1, or 25: 1 to 1000: 1.

[0132] Described herein are methods of amplification wherein strand displacement can be facilitated through the use of a strand displacement factor, such as, e.g., helicase. Such factors are in some instances used in conjunction with additional amplification components, such as poly merases, terminators, or other component. In some instances, a strand displacement factor is used with a polymerase that does not have strand displacement activity. In some instances, a strand displacement factor is used with a polymerase having strand displacement activity. Without being bound by theory, strand displacement factors may increase the rate that smaller, double stranded amplicons are reprimed. In some instances, any DNA polymerase that can perform strand displacement replication in the presence of a strand displacement factor is suitable for use in the PT A method, even if the DNA polymerase does not perform strand displacement replication in the absence of such a factor. Strand displacement factors useful in strand displacement replication in some instances include (but are not limited to) BMRF1 polymerase accessory subunit (Tsurumi et al., J. Virology 67(12):7648-7653 (1993)), adenovirus DNA-binding protein (Zijderveld and van der Vliet, J. Virology 68(2): 1158-1164

(1994)), herpes simplex viral protein ICP8 (Boehmer and Lehman, J. Virology 67(2):711-715 (1993); Skaliter and Lehman, Proc. Natl. Acad. Sci. USA 91(22): 10665-10669 (1994)); singlestranded DNA binding proteins (SSB; Rigler and Romano, J. Biol. Chem. 270:8910-8919

(1995)); phage T4 gene 32 protein (Villemain and Giedroc, Biochemistry 35: 14395-14404

(1996);T7 helicase-primase; T7 gp2.5 SSB protein; Tte-UvrD (from Thermoanaerobacter tengcongensis), calf thymus helicase (Siegel et al., J. Biol. Chem. 267: 13629-13635 (1992)); bacterial SSB (e.g., E. coll SSB), Replication Protein A (RPA) in eukaryotes, human mitochondrial SSB (mtSSB), and recombinases, (e.g., Recombinase A (RecA) family proteins, T4 UvsX, T4 UvsY, Sak4 of Phage HK620, Rad51, Dmcl, or Radb). Combinations of factors that facilitate strand displacement and pnming are also consistent with the methods described herein. For example, a helicase is used in conjunction with a polymerase. In some instances, the PTA method comprises use of a single-strand DNA binding protein (SSB, T4 gp32, or other single stranded DNA binding protein), a helicase, and a polymerase (e.g., SauDNA polymerase, Bsu polymerase, Bst2.0, GspM, GspM2.0, GspSSD, or other suitable polymerase). In some instances, reverse transcriptases are used in conjunction with the strand displacement factors described herein. In some instances, reverse transcriptases are used in conjunction with the strand displacement factors described herein. In some instances, amplification is conducted using a polymerase and a nicking enzyme (e.g., “NEAR”), such as those described in US 9,617,586. In some instances, the nicking enzyme is Nt.BspQI, Nb.BbvCi, Nb.BsmI, Nb.BsrDI, Nb.BtsI, Nt.AlwI, Nt.BbvCI, Nt.BstNBI, Nt.CviPII, Nb.BpulOI, or Nt.BpulOI.

[0133] Described herein are amplification methods comprising use of terminator nucleotides, polymerases, and additional factors or conditions. For example, such factors are used in some instances to fragment the nucleic acid template(s) or amplicons during amplification. In some instances, such factors comprise endonucleases. In some instances, factors comprise transposases. In some instances, mechanical shearing is used to fragment nucleic acids during amplification. In some instances, nucleotides are added during amplification that may be fragmented through the addition of additional proteins or conditions. For example, uracil is incorporated into amplicons; treatment with uracil D-glycosylase fragments nucleic acids at uracil-containing positions. Additional systems for selective nucleic acid fragmentation are also in some instances employed, for example an engineered DNA glycosylase that cleaves modified cytosine-pyrene base pairs. (Kwon, et al. Chem Biol. 2003, 10(4), 351)

[0134] Described herein are amplification methods comprising use of terminator nucleotides, which terminate nucleic acid replication thus decreasing the size of the amplification products. Such terminators are in some instances used in conjunction with polymerases, strand displacement factors, or other amplification components described herein. In some instances, terminator nucleotides reduce or lower the efficiency of nucleic acid replication. Such terminators in some instances reduce extension rates by at least 99.9%, 99%, 98%, 95%, 90%, 85%, 80%, 75%, 70%, or at least 65%. Such terminators in some instances reduce extension rates by 50%-90%, 60%-80%, 65%-90%, 70%-85%, 60%-90%, 70%-99%, 80%-99%, or 50%- 80%. In some instances, terminators reduce the average amplicon product length by at least 99.9%, 99%, 98%, 95%, 90%, 85%, 80%, 75%, 70%, or at least 65%. Terminators in some instances reduce the average amplicon length by 50%-90%, 60%-80%, 65%-90%, 70%-85%, 60%-90%, 70%-99%, 80%-99%, or 50%-80%. In some instances, amplicons comprising terminator nucleotides form loops or hairpins which reduce a polymerase's ability to use such amplicons as templates. Use of terminators in some instances slows the rate of amplification at initial amplification sites through the incorporation of terminator nucleotides (e.g., dideoxynucleotides that have been modified to make them exonuclease-resistant to terminate DNA extension), resulting in smaller amplification products. By producing smaller amplification products than the currently used methods (e.g., average length of 50-2000 nucleotides in length for PTA methods as compared to an average product length of >10,000 nucleotides for MDA methods) PTA amplification products in some instances undergo direct ligation of adapters without the need for fragmentation, allowing for efficient incorporation of cell barcodes and unique molecular identifiers (UMI).

[0135] Terminator nucleotides are present at various concentrations depending on factors such as polymerase, template, or other factors. For example, the amount of terminator nucleotides in some instances is expressed as a ratio of non-terminator nucleotides to terminator nucleotides in a method described herein. Such concentrations in some instances allow control of amplicon lengths. In some instances, the ratio of terminator to non-terminator nucleotides is modified for the amount of template present or the size of the template. In some instances, the ratio of ratio of terminator to non-terminator nucleotides is reduced for smaller samples sizes (e.g., femtogram to picogram range). In some instances, the ratio of non-terminator to terminator nucleotides is about 2: 1, 5: 1, 7:1, 10:1, 20:1, 50:1, 100: 1, 200:1, 500: 1, 1000:1, 2000:1, or 5000: 1. In some instances the ratio of non-terminator to terminator nucleotides is 2:1-10: 1, 5: 1- 20: 1, 10: 1-100: 1, 20: 1-200: 1, 50:1-1000:1, 50:1-500: 1, 75: 1-150:1, or 100: 1-500:1. In some instances, at least one of the nucleotides present during amplification using a method described herein is a terminator nucleotide. Each terminator need not be present at approximately the same concentration; in some instances, ratios of each terminator present in a method described herein are optimized for a particular set of reaction conditions, sample type, or polymerase. Without being bound by theory, each terminator may possess a different efficiency for incorporation into the growing polynucleotide chain of an amplicon, in response to pairing with the corresponding nucleotide on the template strand. For example, in some instances a terminator pairing with cytosine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances, a terminator pairing with thymine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances, a terminator pairing with guanine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances, a terminator pairing with adenine is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. In some instances, a terminator pairing with uracil is present at about 3%, 5%, 10%, 15%, 20%, 25%, or 50% higher concentration than the average terminator concentration. Any nucleotide capable of terminating nucleic acid extension by a nucleic acid polymerase in some instances is used as a terminator nucleotide in the methods described herein. In some instances, a reversible terminator is used to terminate nucleic acid replication. In some instances, a non-reversible terminator is used to terminate nucleic acid replication. In some instances, non-limited examples of terminators include reversible and non- reversible nucleic acids and nucleic acid analogs, such as, e.g., 3’ blocked reversible terminator comprising nucleotides, 3’ unblocked reversible terminator comprising nucleotides, terminators comprising 2’ modifications of deoxynucleotides, terminators comprising modifications to the nitrogenous base of deoxynucleotides, or any combination thereof. In one embodiment, terminator nucleotides are dideoxynucleotides. Other nucleotide modifications that terminate nucleic acid replication and may be suitable for practicing the invention include, without limitation, any modifications of the r group of the 3’ carbon of the deoxyribose such as inverted dideoxynucleotides, 3' biotinylated nucleotides, 3' amino nucleotides, 3 '-phosphorylated nucleotides, 3'-O-methyl nucleotides, 3' carbon spacer nucleotides including 3' C3 spacer nucleotides, 3' C18 nucleotides, 3' Hexanediol spacer nucleotides, acyclonucleotides, and combinations thereof. In some instances, terminators are polynucleotides comprising 1, 2, 3, 4, or more bases in length. In some instances, terminators do not comprise a detectable moiety or tag (e.g., mass tag, fluorescent tag, dye, radioactive atom, or other detectable moiety). In some instances, terminators do not comprise a chemical moiety allowing for attachment of a detectable moiety or tag (e.g., “click” azide/alkyne, conjugate addition partner, or other chemical handle for attachment of a tag). In some instances, all terminator nucleotides comprise the same modification that reduces amplification to at region (e.g., the sugar moiety, base moiety, or phosphate moiety) of the nucleotide. In some instances, at least one terminator has a different modification that reduces amplification. In some instances, all terminators have a substantially similar fluorescent excitation or emission wavelengths. In some instances, terminators without modification to the phosphate group are used with polymerases that do not have exonuclease proofreading activity. Terminators, when used with polymerases which have 3’->5’ proofreading exonuclease activity (such as, e.g., phi29) that can remove the terminator nucleotide, are in some instances further modified to make them exonuclease-resistant. For example, dideoxynucleotides are modified with an alpha-thio group that creates a phosphorothioate linkage which makes these nucleotides resistant to the 3’->5’ proofreading exonuclease activity of nucleic acid polymerases. Such modifications in some instances reduce the exonuclease proofreading activity of polymerases by at least 99.5%, 99%, 98%, 95%, 90%, or at least 85%. examples of other terminator nucleotide modifications providing resistance to the 3’->5’ exonuclease activity include in some instances: nucleotides with modification to the alpha group, such as alpha-thio dideoxynucleotides creating a phosphorothioate bond, C3 spacer nucleotides, locked nucleic acids (LNA), inverted nucleic acids, 2' Fluoro bases, 3' phosphorylation, 2'-O-Methyl modifications (or other 2’-O-alkyl modification), propyne- modified bases (e.g., deoxycytosine, deoxyuridine), L-DNA nucleotides, L-RNA nucleotides, nucleotides with inverted linkages (e.g., 5’ -5’ or 3’-3’), 5’ inverted bases (e.g., 5’ inverted 2’,3’-dideoxy dT), methylphosphonate backbones, and trans nucleic acids. In some instances, nucleotides with modification include base-modified nucleic acids comprising free 3’ OH groups (e.g., 2-nitrobenzyl alkylated HOMedU triphosphates, bases comprising modification with large chemical groups, such as solid supports or other large moiety). In some instances, a polymerase with strand displacement activity but without 3 '->5 'exonuclease proofreading activity is used with terminator nucleotides with or without modifications to make them exonuclease resistant. Such nucleic acid polymerases include, without limitation, Bst DNA polymerase, Bsu DNA polymerase, Deep Vent (exo-) DNA polymerase, Klenow Fragment (exo-) DNA polymerase, Therminator DNA polymerase, and VentR(exo-).

Visualization of biological information

[0136] Described herein are computer-implemented systems for visualization of biological data. In some instances, the data comprises genomic, transcriptomic, proteomic, methylation and epigenomic data. Further described herein are computer-implemented systems comprising one or more modules. Further described herein are computer-implemented systems comprising at least one memory storing computer-executable instructions; and at least one processor configured to access the at least one memory and execute the computer-executable instructions, wherein the computer-executable instructions comprise one or more of a frontend, a backend, and a pipeline module. In some instances, an exemplary arrangement of modules is shown in FIG. 10. In some instances, modules are accessed from a cloud-based database or interface. Methods and systems described herein in some instances comprise one or more steps of accessing a web-based software application; providing or otherwise linking an input file (such as a file comprising whole genomes sequencing, RNA, or other biological information); processing the file; applying one or more filters or annotations to the data in the file; querying one or more databases; and displaying a visualization of the filtered and/or annotated data. [0137] The systems and methods described herein may comprise a frontend module. In some instances, the frontend module comprises a Vue.js application that provides the user interface and visualizations for the systems and methods described herein. In some instances, the frontend makes requests to the backend to query data. In some instances, a frontend comprises computer-executable instructions for one or more of: displays complex visualizations such as the circos plot, phylogenic tree, etc. (e.g., as navigable tabs); displays quality metrics; visualizes filters and filtering interactions; and presents data tables for cell information. In some instances, a web version of IGV is integrated into the frontend.

[0138] The systems and methods described herein may comprise a backend module. In some instances, the backend comprises a Flask framework application and provides one or more backend features of for the methods and systems described herein. In some instances, the backend is written in Python. In some instances, a backend comprises computer-executable instructions for one or more of: user authentication and registration; data computations and filtering; access of a Vaex open-source library for speeding up data interactions; interacting with a database and HDF5 files to process data requests; presenting and encoding data for visualizations; and presenting data for IGV.

[0139] The systems and methods described herein may comprise a pipeline module. In some instances, the pipeline comprises a computationally intensive workflow that runs genomics analysis tools to extract signatures of biomarkers from sequencing files and loads them into a database. In some instances, the methods and systems described herein comprise one or more pipeline modules. In some instances, pipeline modules comprise multi-omics, such as WGS/exome, methylation, proteome, proteome bacterial, or RNA-seq/transcriptome. In some instances, pipeline comprises one or more sub-modules. In some instances, a pipeline comprises one or more data files. In some instances, a pipeline comprises one or more of sequencing input files, sub-pipeline modules, and summary files.

[0140] Pipelines may be configured for whole genome or exome sequencing data. In some instances, a WGS/exome pipeline is configured to input one or more fastQ files. In some instances, a WGS/exome pipeline comprises one or more of alignment, haplotype callerjointgenotyping, heterozygous site detector (Pipeline used for the analysis of cell lines without a priori knowledge of reference heterozy gous variant sites), statistics, ADO, and CNV are needed to drive insights from sequencing data. In some instances, the files contain sequence(ing) information/data. In some instances, files comprise sequence data from the clusters that pass filter on a flow cell. In some instances, the files comprise FastQ files. In some instances, the database comprises a PostgreSQL database. In some instances, the databases are accessed from a backend module, rises computer-executable instructions for one or more of: accepts a sequencing information file as input (e g., FastQ); running joint genotyping to produce VCF file and linking variants to COSMIC, ClinVar, or another variant list. In some instances, a VCF file contains the variants called from multiple samples (cells) all together and represent high confidence variants distributed across the cells. These variants in some instances represent changes in nucleotides observed in a cell in relation to the reference genome. In some instances, these variants are placed along the genome using genomic coordinates (e.g., chrl base 18903). Such a configuration having a specific location for a variant allows in some instances association of information complied in databases to this given variant.

[0141] Pipelines may be configured for multi-omics analysis. In some instances, multi-omics comprises two or more types of biological information. In some instances, multi-omics comprises two or more of transcript (transcriptome), genomic, proteomic, methylome, or other form of sample analysis. In some instances, methods described herein display and/process multi-omics data. Data in some instances is obtained from a single cell. Data in other instances is obtained by evaluation of a population of cells. In some instances, methods described herein display transcript and genomic data. In some instances, methods described herein utilize transcript, genomic data, and proteomics data. In some instances, methods described herein utilize transcript, genomic data, and methylome data.

[0142] In some instances, an alignment pipeline comprises one or more of a compressed alignment file describing the alignment information of the reads in the project against a given reference (e.g., hg38), a .bam file) and an index file of the alignment file). In some instances, the pipeline comprises a .bam file.

[0143] In some instances, a haplotype caller pipeline comprises one or more of a genomic variant call format (GVCF) file containing the detected variants for a given sample) and an indexer file associated with the GVCF file.

[0144] In some instances, a joint-genotyping pipeline comprises one or more of a genomic variant call format (GVCF) file containing the joint variant calling of multiple samples) and an indexer file associated with the Joint-Genotyped GVCF file.

[0145] In some instances, a heterozygous site detector pipeline comprises one or more of a genomic variant call format (GVCF) file containing the called variants with high degree of prevalence across a dataset and high confidence; and an indexer file associated with the GVCF file.

[0146] In some instances, a statistics pipeline comprises one or more of a tabulator-separated value table describing whole genome sequence (WGS) level statistics estimated from the aligned reads (e.g., IX, 5X, 10X coverage, etc.); and a tabulator-separated value table showing exome-panel specific statistics (e.g., On, OFF, Near target events).

[0147] In some instances, an ADO pipeline comprises one or more of a tabulator-separated value table showing allele frequencies of N number of queried heterozygous sites. This table is in some instances used to estimate WGS allele balance. [0148] In some instances, a CNV pipeline comprises one or more of a tabulator-separated value table describing, for a sample, the estimated copy number for bins of size N across the whole genome; and tabulator-separated value table describing, for a sample, the type of event (insertion, deletion) for all bins of size N across the genome.

[0149] Pipelines may be configured for bacterial sequencing data. In some instances, a bacterial pipeline is configured to input a fastQ file. In some instances, a bacterial pipeline comprises one or more of: a compressed FASTQ files containing trimmed and filtered high qualify sequences; a tabulator-separated value table describing taxonomic assignation of each read to a given species using a database, such as Kraken’s database); a fasta file describing the genome assembly, at the level of contigs, constructed from the reads in the dataset; fasta file describing the genome-assembly, at the level of scaffolds, constructed from the reads in the dataset; a BAM file describing the alignments of the reads in reference to the assemble genome (e.g., contigs). In some instances, a bacterial pipeline comprises one or more summary files. In some instances, summary files comprise one or more of: a Tabulator-separated value table describing the taxonomic assignment of contigs in an assembly based on the proportion of reads mapped to them; a tabulator-separated value table showing the estimated completeness of a given assembly based on a set of phylogenetic marker genes.

[0150] Pipelines may be configured for RNA-seq data. In some instances, an RNA-seq pipeline is configured to accept one or more of a compressed alignment file describing the alignment information of the reads in the project against a given reference (e.g., hg38); an index file of the compressed alignment file; a compressed alignment file describing the alignment information of the reads in the project against a RNA-Seq specific index for a given reference and an index file for the alignment file. In some instances, an RNA-seq pipeline compnses one or more summary files. In some instances, summary files comprise one or more of a tabulator-separated table describing the matnx of counts of the genomic features (e.g., exons in a gene) across samples; a tabulator-separated table describing the number of unique splice-junction overlaps; a tabulator- separated table describing overall alignment metrics (e.g., number of genes with counts, etc.); and a tabulator-separated table showing the estimated ratio of exon-non exon alignment events. [0151] Systems and methods described herein may comprise filters for visualizing data. In some instances, filters comprise one or more of: Germline mutation, Somatic mutation, Copy number variation, Single nucleotide variation, Insertions and deletions, Tumor Mutation Burden (TMB) Analysis, Catalog of somatic mutation in cancer (Cosmic)4, ClinVar, and Predicted Coding Change.

[0152] Further described herein are computer-implemented systems comprising: at least one memory storing computer-executable instructions; and at least one processor configured to access the at least one memory and execute the computer-executable instructions to perform: receiving a query, wherein the query comprises genomic data from one or more samples; querying a database; wherein the database comprises a plurality of genomic data and a plurality of phenotype data; generating, using at least the genomic data, a genome summary, the genome summary comprising genes and gene variants of the cohort; determining a graphical representation of the genome summary; and sending the graphical representation to a display device.

[0153] Also described herein are computer-implemented systems comprising: at least one memory storing computer-executable instructions; and at least one processor configured to access the at least one memory and execute the computer-executable instructions to generate a graphical user interface (GUI) that accepts a query from a user comprising sequencing information from one or more samples and presents to the user genome information. In some instances, a GUI comprises a project browser or dashboard. In some instances, a GUI comprises drop down menus for one or more of project, owner, analysis type, and status. In some instances, a GUI comprises a list of previous and current projects. In some instances, projects and data are shared among a group of users. In some instances, projects are saved for future modification or access. In some instances, GUI is facilitated by a frontend. In some instances, user control queries a backend to reflect actions of the user through.

[0154] Computer-implemented systems may comprise a genome browser. In some instances, a genome browser is configured to display sections of a genome and/or variants. In some instances, a genome browser comprises an IGV (integrated genome viewer). In some instances, in the IGV window the bin size is selectable from the entire genome down to the individual base. In this view individual mutations in some instances are viewed to determine the alternative allele or base change. In some instances, each mutation is selectable, further detailing the nature of the modification and presenting it to the user.

[0155] Computer-implemented systems may comprise an interface for annotating variants. This is an important step to empower interpretation of downstream coding changes in protein structure and function. Variant information in some instances comprises one or more of features (name, gene id, gene type, strand, Tdl, Hgncld), predictions (SIFT/sorting intolerant from tolerant, LFT I likelihood ratio test, FATHMM, PROVEAN/ protein variation effect analyzer, MetaSVM, MetaLR), conservation among species (e.g., vertebrates, mammals, etc.); evidence (pathology-related data from databases such as COSMIC), and biological population. In some instances, a variant annotation interface assesses the degree of conservation among (100) vertabrates and (30) mammals. In some instances, this display is helpful in the investigation of de-novo variant alleles which are not annotated by ClinVar, Cosmic, Genecards or Ensembl. The comparison allows the determination of conservation of alleles found in the sample compared to the same allele found in an alternative species. Conserved alleles are right shifts, where the conservation is high, where alleles which have low conservation are shifted left. As an example, in the Phylo 30-way mammal plot the allele is highly conserved across all 30 mammals indicating the gene is highly conserved and likely to be important for the health of all mammalian species. Having assessed the potential for the mutation to be pathogenic, if annotated the user in some instances navigates to a variety of external databases (e.g., GeneCards, Ensembl, Clinvar and COSMIC) by simply selecting the hyperlink for that specific database.

[0156] Variants in some instances are annotated as one or more of Germline mutation, Somatic mutation, Copy number variation (CNV), Single nucleotide variation (SNV), Insertions and deletions, Catalog of somatic mutation in cancer (Cosmic), ClinVar, and Predicted Coding Change. Additional resources are also accessed in some instances, such as GeneCards, Essembl, CinVar and Cosmic. In some instances, variants comprise complex markers such as those obtained using Tumor Mutation Burden (TMB) Analysis.

[0157] Computer-implemented systems may comprise an interface for tracing variant lineages. In some instances, lineages comprise somatic, ancestral, or reference lineages. Lineage trees in some instances are generated from specific chromosomes, and graphically display variants in a chart format.

[0158] Computer-implemented systems may comprise an interface for analyzing cells. In some instances, samples comprise one or more cells. Cells in some instances are searched, or summary information about each cell is displayed such as cell name, variants detected (somatic, germline, SNPs, and mdels. In some instances, metrics high, medium, and low are used to describe confidence of variant calls for each cell. In some instances, inter-cell distances are graphed.

[0159] Computer-implemented systems may comprise an interface for visualizing sequencing metrics (e.g., Picard metrics). Metrics include but are not limited to chromosome M population, percent pass/fail reads aligned, WGS mean coverage, and WGS percent excluded duplicate reads. Each metric in some instances is also displayed on an individual per-cell basis.

[0160] Computer-implemented systems may comprise an interface for visualizing genomic data. In some instances, data may be visualized using a circos plot. Circos plots in some instances comprise additional variant information, such as number of somatic, germline, SNP or indel variants. Variants in some instances visualized at the chromosome level. In some instances, a circos plot comprises a lineage tree. In some instances, a user interface is configured to apply one or more filters to the circos plot. In some instances, two or more groups of cells or samples are compared (optionally filtered by number of variants). In some instances, views of one or more chromosomes are displayed or hidden. In some instances, data from one or more cells is hidden or displayed. In some instances, variant filters comprise one or more of variant type (SNP, indel), origin (somatic vs. germline), annotation (COSMIC, CLINVAR, coding change), or features. In some instances, features comprise name, gene id, gene type, strand, Tdl, Hgncld. In some instances, variant filters comprise predictions (SIFT, FATHMM, PROVEAN, MetaSVM, and MetaLR). Upon selection of a region or chromosome within a cell's genome, a pop-up window is in some instances presented to the user which includes a genome viewing frame (e.g., IGV) plot. This window can be configured in terms of genome window bin size allowing the visualization of the entire chromosome to the individual bases across that genome, which can be completed in matter of seconds. The window size in some instances is scrollable by simply dragging the window left or right. In the IGV window, each sample in some instances is interrogated to determine, for example, the specific change which is highlighted by a color change from the parental allele. The alternative allele is selected to determine the base change, while the parent allele can be detected to determine pathogenic risk score based on several public algorithms as well as the conservation of the allele across several vertebrate and mammalian species. This variant annotation further provides links to several databases to provide greater detail of the impact of the genomic alteration.

[0161] The systems and methods described herein may provide a visualization of genomic and multiomic data having a large number of data sets. In some instances, the genomic data comprises at least 1, 2, 5, 10, 15, 20, 25, 30, 40, 50, 75, 100, 125, 150, 200, 250, 300, 400, 500, 600, 750, 1000, or at least 1500 data sets. In some instances, the genomic data comprises 1- 1000, 5-1000, 10-1000, 5-10,000, 100-10,000, 100-10,000, 100-1000, 10-500, 10-750, 50-750, or 50-500 data sets. In some instances, each sample data set corresponds to a single cell.

[0162] The systems and methods described herein may provide a visualization of genomic data comprises data sets with a large number of variants. In some instances, each data set comprises at least 500, 1000, 2000, 5000, 10,000, 50,000, 100,000, 150,000, 250,000, 500,000 1 million, 2 million, 3 million, 4 million, 5 million, 6 million, 10 million or at least 15 million variants. In some instances, each data set comprises about 500, 1000, 2000, 5000, 10,000, 50,000, 100,000, 150,000, 250,000, 500,000 1 million, 2 million, 3 million, 4 million, 5 million, 6 million, 10 million or about 15 million variants. In some instances, each data set comprises 100-1 million, 100-100,000, 100,000-1 million, 100,000-5 million, 100-500,000, 500-5 million, 1 million-2 million, 2 million to 6 million, 3 million to 10 million, or 4 million to 7 million variants.

[0163] The systems and methods described herein may provide a visualization of genomic data within data sets represented in table format. In some instances, data sets comprise at least 1, 2, 5, 10, 20, 25, 50, 75, 80, 85, 90, 95, 100, 110, 120, 150, 200, or at least 250 million rows of data. In some instances, data sets comprise no more than 1, 2, 5, 10, 20, 25, 50, 75, 80, 85, 90, 95, 100, 110, 120, 150, 200, or no more than 250 million rows of data. In some instances, data sets comprise 1-250, 1-100, 1-50, 1-25, 5-25, 5-50, 10-100, 10-200, 50-200, 50-150, 100-400 or 100-300 million rows of data.

[0164] The systems and methods described herein may provide a visualization of genomic data in a short period of time. In some instances, a system for visualizing genomic data comprises one or more of a devices comprising at least one processor and instructions executable by the at least one processor to provide a first application configured to perform operations comprising: i. accessing one or more data sets comprising genomic data; and ii. generating a visual representation of the one or more data sets. In some instances, the visualization comprises a circos plot. In some instances, the circos plot is generated in no more than 10, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0.01 seconds. In some instances, the circos plot is generated in 0.01-10, 0.05-10, 0.1-50, 0.5-10, 1-10, 2-10, 5-10, 0.01-0.05, 0.01-0.1, 0.01-0.5, 0.1-0.5, 0.1-1, or 0.1-5 seconds. In some instances, the circos plot is generated in no more than 10, 5, 4, 3, 2, 1, 0.5, 0.2, 0. 1, 0.05, or no more than 0.01 seconds for data set having at least 5 cells. In some instances, the circos plot is generated in 0.01-10, 0.05-10, 0.1-50, 0.5-10, 1-10, 2-10, 5-10, 0.01-0.05, 0.01-0.1, 0.01-0.5, 0.1-0.5, 0.1-1, or 0.1-5 seconds for a data set having at least 5 cells. In some instances, the circos plot is generated in no more than 10, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0.01 seconds for data set having at least 10 cells. In some instances, the circos plot is generated in 0.01-10, 0.05-10, 0.1-50, 0.5-10, 1-10, 2-10, 5-10, 0.01-0.05, 0.01-0.1, 0.01-0.5, 0.1-0.5, 0.1-1, or 0.1-5 seconds for a data set having at least 10 cells. In some instances, the circos plot is generated in no more than 10, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0.01 seconds for data set having at least 20 cells. In some instances, the circos plot is generated in 0.01-10, 0.05-10, 0.1-50, 0.5-10, 1-10, 2-10, 5-10, 0.01-0.05, 0.01-0.1, 0.01- 0.5, 0.1-0.5, 0.1-1, or 0.1-5 seconds for a data set having at least 20 cells. In some instances, the circos plot is generated in no more than 10, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0,01 seconds for data set having at least 1 million variants per cell. In some instances, the circos plot is generated in 0.01-10, 0.05-10, 0.1-50, 0.5-10, 1-10, 2-10, 5-10, 0.01-0.05, 0.01-0.1, 0.01-0.5, 0. 1-0.5, 0.1-1, or 0.1-5 seconds for a data set having least 1 million variants per cell. In some instances, the circos plot is generated in no more than 10, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0.01 seconds for data set having at least 4 million variants per cell. In some instances, the circos plot is generated in 0.01-10, 0.05-10, 0.1-50, 0.5-10, 1-10, 2-10, 5-10, 0.01-0.05, 0.01-0.1, 0.01-0.5, 0.1-0.5, 0.1-1, or 0.1-5 seconds for a data set having least 4 million variants per cell. In some instances, the circos plot is generated using no more than 1, 2, 3, 4, 5, 6, 7, or no more than 8 processors. Alternatively, in some examples, more processors may be used. As many processors as needed may be used. In some instances, the circos plot is generated using at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40 or more processors. In some instances, the circos plot is generated using about 1, 2, 3, 4, 5, 6, 7, 8, 10, 20, 30, 40 or more processors.

[0165] In some instances, the visualization further comprises a phylogenic tree. In some instances, the visualization further comprises sequencing qualify metrics. In some instances, the visualization further comprises annotated variations. In some instances, the visualization further comprises number of variations. In some instances, the visualization further comprises cell and cell population statistics.

[0166] Also described herein are platforms comprising: a database, in a computer memory, comprising biologic information for member of a population of individuals or samples, the biologic information comprising genome data, the biologic information obtained by analysis of one or more biologic samples from each sample and/or individual, each individual and/or sample having an ID; and a processor configured to provide a biologic information visual synthesis application comprising: a software module presenting an interface allowing a user to query the database one or more of: inputting a phenotype, inputting a gene name, inputting an individual ID, and inputting a sample ID; a software module generating a genome browser, the genome browser comprising: a whole genome display comprising an icon representing each chromosome, each icon indicating a densify of gene variants; and a chromosome display comprising an iconic representation of each chromosome, the representation indicating a density of gene variants located at the relevant portion of the chromosome, wherein selection of a chromosome by a user generates a linear display of the chromosome demonstrating the spatial relationship on the chromosome of the genes and the variants; and a software module generating a lineage viewer, the lineage viewer comprising: a geographic display of autosomal ancestry, a geographic display of maternal line ancestry, and a geographic display of paternal line ancestry and multi-omic displays.

Population of individuals

[0167] The platforms, systems, media, and methods described herein include biologic data pertaining to a population of individuals, or use of the same. In various embodiments, the population of individuals comprises more than 1,000, more than 5,000, more than 10,000, more than 20,000, more than 50,000, or more than 100,000, more than 500,000, more than 1,000,000 more than 10,000,000, more than 50,000,000, or more than 100,000,000 individuals. In some cases, the individuals in the population participated in academic medical research studies using consents allowing for genetic testing of specimens. In such cases, biologic specimens and phenotype data are collected for individuals from pharmaceutical clinical trials, academic research, and health care settings. In some cases, biologic data pertaining to a population of individuals is collected from integrated health records for individuals representing a spectrum of diseases with unmet medical needs.

Biologic information

[0168] The platforms, systems, media, and methods described herein include biologic information, or use of the same. In some instances, biologic information comprises genetic information. In some embodiments, the biologic information compnses whole human genome sequencing information. In some embodiments, the biologic information comprises human transcriptome sequencing information. In some instances, biologic information comprises genetic information from humans, non-human primates, animals, plants, fungi, protozoa, archaea, or bacteria. In some instances, biologic information comprises genetic information from the microbiome.

[0169] The biologic information may comprise genomic information. As used herein, genomic information refers to genetic information found within a biological sample arising from the genome (or DNA - nuclear, mitochondrial or otherw ise). In some instances, genomic information comprises nucleic acid sequence copy number, location, and sequence. The genomic information is not limited to protein-coding sequence, it may refer to intronic sequence and intergenic sequence, each known to harbor multiple functional elements whereby DNA changes at those elements may be consequential in normal development and disease. In some instances, genomic information comprises post-transcriptional modifications such as methylation. In some instances, genomic information is found w ithin a chromosome, plasmid, or other medium comprising nucleic acids.

[0170] The biologic information may comprise transcript information. As used herein, transcript information refers to information obtained from a transcriptome within a biological sample. In some instances, transcript information comprises expression levels of genes and sequence of corresponding nucleic acids expressed from genes.

[0171] The biologic information may comprise microbiome information. As used herein, “microbiome” refers to the bacteria and other microorganisms that live in and on the human body. In some embodiments, the microbiome information comprises metagenomic microbiome characterization. In various embodiments, the microbiome information comprises one or more of: microflora genus and/or species information, microflora relative abundance information, and microflora gene and/or gene variant information. [0172] The biologic information may comprise proteome information. In some embodiments, the proteome information comprises information regarding abundance, localization, identity, post-transcriptional modifications, or other protein information.

[0173] The biologic information may comprise methylome information. In some embodiments, methylome information comprises post-transcriptional modifications such as the location of 5- methylcytosine (5-mC), 5-hydroxymethylcytosine (5-hmC), CpG islands, ATAC seq, methyl histone modification, other post-transcriptional modification to nucleic acids, and/or any combinations thereof.

[0174] The biologic information may comprise metabolome information. As used herein, “metabolome” refers to the small-molecule chemicals found within a biological sample. In some embodiments, metabolome information comprises the presence of one or more smallmolecule chemicals. In further embodiments, the metabolome information comprises a qualitative measurement of one or more small-molecule chemicals. In still further embodiments, the metabolome information comprises a quantitative measurement of one or more small-molecule chemicals. In various embodiments, the microbiome information comprises measurements of at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1100, at least 1200, at least 1300, at least 1400, or at least 1500 substances (e.g., molecules).

Data security

[0175] Databases and visualizations described herein may comprise sensitive information pertaining to an individual’s health. Provided herein are platforms for data security comprising one or more of an access control for one or more users; a security framework; and biological data from an individual. In some instances, one or more security measures are implemented via security frameworks to restrict access or protect an individual’s health information. Security frameworks in some instances comprises standards. In some instances, security frameworks include HIPPA standards. In some instances, security frameworks comprise NIST cybersecurity framework. Access controls in some instances restrict access to certain individuals or groups of individuals. Access controls in some instances comprise passwords, biometrics, or other method of user authentication.

Use Cases and applications

[0176] The system can be applied in a variety of fields. In some instances, the system provides useful data and analysis to pharmaceutical companies, including informaticians, bench scientists, medical director, the senior executive team, or commercial organizations. Such data and analysis, in some instances, includes analysis of clinical trial data for patient stratification and biomarker discovery, identification and in silico validation of novel genetic targets, discovery of novel disease and dose response biomarkers/signatures, compound repurposing and expand indications of marketed drugs, rescue of failed clinical trial assets, real time genetic analysis of adverse events, or targeted accelerated recruitment for clinical trials. For academic research groups, including physicians/principal investigators, informaticians, research scientists and geneticists, the system in some instances offers analysis of specific cohorts, analysis of individual patients, or large-scale analysis of variation in populations. Clinics, hospitals and cancer centers, including physicians and genetic counsellors, in some instances will find the system useful in the analysis of individuals, analysis of cohorts, wellness focus, or oncology focus. The data and analysis in some instances also have value to insurance companies, actuarial teams, or health economists.

[0177] Specifically, for pharma and researchers, the system can serve as or enable a reference set of knowledge/evidence, a hypothesis generation engine, a platform for analysis of pharma’s own data, a platform for combination of pharma data and data and analysis provided by the system, a platform for combining data from multiple collaborators, a platform for sharing data within a company, etc. For physicians or genetic counsellors, the system can similarly be used as part of a care tool to identify the most relevant results for treatment and prevention, a reference set of knowledge/evidence, or a tool to identify other physicians with similar patients/ share knowledge. In addition, for insurance companies, the system can be useful as part of a tool for detect individual care pathway and incentivize healthy living or a tool to help quantify risk that they have in the insured population.

Kits

[0178] The systems described herein may accompany or be provided as a service with a kit. In some instances, the kit comprises reagents for acquiring biological information. In some instances, the kit is configured to obtain genomic or transcriptome data. In some instances, the kit is configured to obtain genomic, methylome, transcriptome or proteome data from single cells. In some instances, provided herein are kits comprising reagents for obtaining biological data from single cells, and instructions for using the kit. In some instances, the instructions comprise links to a web-based portal or mobile based software application to import, analyze, and/or compare biological data obtained from the kit.

Digital processing device

[0179] In some embodiments, the platforms, systems, media, and methods described herein include a digital processing device, or use of the same. In further embodiments, the digital processing device includes one or more hardware central processing units (CPUs) or general- purpose graphics processing units (GPGPUs) that carry out the device’s functions. In still further embodiments, the digital processing device further comprises an operating system configured to perform executable instructions. In some embodiments, one or more resources related to the systems described herein is stored locally. In some embodiments, the digital processing device is optionally connected a computer network. In further embodiments, the digital processing device is optionally connected to the Internet such that it accesses the World Wide Web. In still further embodiments, the digital processing device is optionally connected to a cloud computing infrastructure. In other embodiments, the digital processing device is optionally connected to an intranet. In other embodiments, the digital processing device is optionally connected to a data storage device.

[0180] In accordance with the description herein, suitable digital processing devices include, by way of examples, server computers, desktop computers, laptop computers, and notebook computers.

[0181] In some embodiments, the digital processing device includes an operating system configured to perform executable instructions. The operating system is, for example, software, including programs and data, which manages the device’s hardware and provides services for execution of applications. Those of skill in the art will recognize that suitable server operating systems include, by way of examples, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®. Those of skill in the art will recognize that suitable personal computer operating systems include, by way of examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX-like operating systems such as GNU/Linux®. In some embodiments, the operating system is provided by cloud computing.

[0182] In some embodiments, the device includes a storage and/or memory device. The storage and/or memory device is one or more physical apparatuses used to store data or programs on a temporary or permanent basis. In some embodiments, the device is volatile memory and requires power to maintain stored information. In some embodiments, the device is non-volatile memory and retains stored information when the digital processing device is not powered. In further embodiments, the non-volatile memory comprises flash memory. In some embodiments, the non-volatile memory comprises dynamic random-access memory (DRAM). In some embodiments, the non-volatile memory comprises ferroelectric random-access memory (FRAM). In some embodiments, the non-volatile memory comprises phase-change random access memory (PRAM). In other embodiments, the device is a storage device including, by way of examples, CD-ROMs, DVDs, flash memory devices, magnetic disk drives, magnetic tapes drives, optical disk drives, and cloud computing-based storage. In further embodiments, the storage and/or memory device is a combination of devices such as those disclosed herein. In some embodiments, data may be stored on and/or using a DNA data storage system. Any suitable data storage system and database may be used.

[0183] In some embodiments, the digital processing device includes a display to send visual information to a user. In some embodiments, the display is a cathode ray tube (CRT). In some embodiments, the display is a liquid crystal display (LCD). In further embodiments, the display is a thin film transistor liquid crystal display (TFT-LCD). In some embodiments, the display is an organic light emitting diode (OLED) display. In various further embodiments, on OLED display is a passive-matrix OLED (PMOLED) or active-matrix OLED (AMOLED) display. In some embodiments, the display is a plasma display. In other embodiments, the display is a video projector. In some embodiments, the display is a wearable display. In still further embodiments, the display is a combination of devices such as those disclosed herein.

[0184] In some embodiments, the digital processing device includes an input device to receive information from a user. In some embodiments, the input device is a keyboard. In some embodiments, the input device is a pointing device including, by way of examples, a mouse, trackball, track pad, joystick, game controller, or stylus. In some embodiments, the input device is a touch screen or a multi-touch screen. In other embodiments, the input device is a microphone to capture voice or other sound input. In other embodiments, the input device is a video camera or other sensor to capture motion or visual input. In further embodiments, the input device is a Kinect, Leap Motion, or the like. In still further embodiments, the input device is a combination of devices such as those disclosed herein.

Non-transitory computer readable storage medium

[0185] In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked digital processing device. In further embodiments, a computer readable storage medium is a tangible component of a digital processing device. In still further embodiments, a computer readable storage medium is optionally removable from a digital processing device. In some embodiments, a computer readable storage medium includes, by way of examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, cloud computing systems and services, and the like. In some cases, the program and instructions are permanently, substantially permanently, semi-permanently, or non-transitorily encoded on the media.

Computing system

[0186] Referring to FIG. 10, a block diagram is shown depicting an exemplary machine that includes a computer system 1100 (e.g., a processing or computing system) within which a set of instructions can execute for causing a device to perform or execute any one or more of the aspects and/or methodologies for static code scheduling of the present disclosure. The components in FIG. 10 are examples only and do not limit the scope of use or functionality of any hardware, software, embedded logic component, or a combination of two or more such components implementing particular embodiments.

[0187] Computer system 1100 may include one or more processors 1101, a memory 1103, and a storage 1108 that communicate with each other, and with other components, via a bus 1140. The bus 1140 may also link a display 1132, one or more input devices 1133 (which may, for example, include a keypad, a keyboard, a mouse, a stylus, etc.), one or more output devices 1134, one or more storage devices 1135, and various tangible storage media 1136. All of these elements may interface directly or via one or more interfaces or adaptors to the bus 1140. For instance, the various tangible storage media 1136 can interface with the bus 1140 via storage medium interface 1126. Computer system 1100 may have any suitable physical form, including but not limited to one or more integrated circuits (ICs), printed circuit boards (PCBs), mobile handheld devices (such as mobile telephones or PDAs), laptop or notebook computers, distributed computer systems, computing grids, or servers.

[0188] Computer system 1100 includes one or more processor(s) 1101 (e.g., central processing units (CPUs), general purpose graphics processing units (GPGPUs), or quantum processing units (QPUs)) that carry out functions. Processor(s) 1101 optionally contains a cache memory unit 1102 for temporary' local storage of instructions, data, or computer addresses. Processor(s) 1101 are configured to assist in execution of computer readable instructions. Computer system 1100 may provide functionality for the components depicted in FIG. 10 as a result of the processor(s) 1101 executing non-transitory, processor-executable instructions embodied in one or more tangible computer-readable storage media, such as memory 1103, storage 1108, storage devices 1135, and/or storage medium 1136. The computer-readable media may store software that implements particular embodiments, and processor(s) 1101 may execute the software. Memory 1103 may read the software from one or more other computer-readable media (such as mass storage device(s) 1135, 1136) or from one or more other sources through a suitable interface, such as network interface 1120. The software may cause processor(s) 1101 to carry out one or more processes or one or more steps of one or more processes described or illustrated herein. Carrying out such processes or steps may include defining data structures stored in memory 1103 and modifying the data structures as directed by the software.

[0189] The memory 1103 may include various components (e.g., machine readable media) including, but not limited to, a random-access memory' component (e.g., RAM 1104) (e.g., static RAM (SRAM), dynamic RAM (DRAM), ferroelectric random-access memory (FRAM), phase-change random access memory (PRAM), etc.), a read-only memory component (e.g., ROM 1105), and any combinations thereof. ROM 1105 may act to communicate data and instructions unidirectionally to processor(s) 1101, and RAM 1104 may act to communicate data and instructions bidirectionally with processor(s) 1101. ROM 1105 and RAM 1104 may include any suitable tangible computer-readable media described below. In one example, a basic input/output system 106 (BIOS), including basic routines that help to transfer information between elements within computer system 1100, such as during start-up, may be stored in the memory 1103.

[0190] Fixed storage 1108 is connected bidirectionally to processor(s) 1101, optionally through storage control unit 1107. Fixed storage 1108 provides additional data storage capacity and may also include any suitable tangible computer-readable media described herein. Storage 108 may be used to store operating system 1109, executable(s) 1110, data 1111, applications 1112 (application programs), and the like. Storage 1108 can also include an optical disk drive, a solid-state memory device (e.g., flash-based systems), or a combination of any of the above. Information in storage 1108 may, in appropriate cases, be incorporated as virtual memory' in memory 1103.

[0191] In one example, storage device(s) 1135 may be removably interfaced with computer system 1100 (e.g., via an external port connector (not shown)) via a storage device interface 1125. Particularly, storage device(s) 1135 and an associated machine-readable medium may provide non-volatile and/or volatile storage of machine-readable instructions, data structures, program modules, and/or other data for the computer system 1100. In one example, software may reside, completely or partially, within a machine-readable medium on storage device(s) 1135. In another example, software may reside, completely or partially, within processor(s) 1101

[0192] Bus 1140 connects a wide variety of subsystems. Herein, reference to a bus may encompass one or more digital signal lines serving a common function, where appropriate. Bus 140 may be any of several types of bus structures including, but not limited to, a memory bus, a memory controller, a peripheral bus, a local bus, and any combinations thereof, using any of a variety of bus architectures. As an example, such architectures include an Industry Standard Architecture (ISA) bus, an Enhanced ISA (EISA) bus, a Micro Channel Architecture (MCA) bus, a Video Electronics Standards Association local bus (VLB), a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, an Accelerated Graphics Port (AGP) bus, HyperTransport (HTX) bus, serial advanced technology attachment (SATA) bus, and any combinations thereof. [0193] Computer system 1100 may also include an input device 1133. In one example, a user of computer system 1100 may enter commands and/or other information into computer system 1100 via input device(s) 1133. Examples of an input device(s) 1133 include, but are not limited to, an alpha-numeric input device (e.g., a keyboard), a pointing device (e.g., a mouse or touchpad), a touchpad, a touch screen, a multi-touch screen, a joystick, a stylus, a gamepad, an audio input device (e.g., a microphone, a voice response system, etc.), an optical scanner, a video or still image capture device (e.g., a camera), and any combinations thereof. In some embodiments, the input device is a Kinect, Leap Motion, or the like. Input device(s) 1133 may be interfaced to bus 1140 via any of a variety of input interfaces 1123 (e.g., input interface 1123) including, but not limited to, serial, parallel, game port, USB, FIREWIRE, THUNDERBOLT, or any combination of the above.

[0194] In particular embodiments, when computer system 1100 is connected to network 1130, computer system 1100 may communicate with other devices, specifically mobile devices and enterprise systems, distributed computing systems, cloud storage systems, cloud computing systems, and the like, connected to network 1130. Communications to and from computer system 100 may be sent through network interface 1120. For example, network interface 1120 may receive incoming communications (such as requests or responses from other devices) in the form of one or more packets (such as Internet Protocol (IP) packets) from network 1130, and computer system 100 may store the incoming communications in memory 1103 for processing. Computer system 100 may similarly store outgoing communications (such as requests or responses to other devices) in the form of one or more packets in memory 1103 and communicated to network 1130 from network interface 1120. Processor(s) 1101 may access these communication packets stored in memory 1103 for processing.

[0195] Examples of the network interface 1120 include, but are not limited to, a network interface card, a modem, and any combination thereof. Examples of a network 1130 or network segment 1130 include, but are not limited to, a distributed computing system, a cloud computing system, a wide area network (WAN) (e.g., the Internet, an enterprise network), a local area network (LAN) (e.g., a network associated with an office, a building, a campus or other relatively small geographic space), a telephone network, a direct connection between two computing devices, a peer-to-peer network, and any combinations thereof. A network, such as network 1130, may employ a wired and/or a wireless mode of communication. In general, any network topology may be used.

[0196] Information and data can be displayed through a display 1132. Examples of a display 1132 include, but are not limited to, a cathode ray tube (CRT), a liquid crystal display (LCD), a thin film transistor liquid crystal display (TFT-LCD), an organic liquid crystal display (OLED) such as a passive-matrix OLED (PMOLED) or active-matrix OLED (AMOLED) display, a plasma display, and any combinations thereof. The display 1132 can interface to the processor(s) 1101, memory 1103, and fixed storage 1108, as well as other devices, such as input device(s) 1133, via the bus 1140. The display 1132 is linked to the bus 1140 via a video interface 1122, and transport of data between the display 1132 and the bus 1140 can be controlled via the graphics control 1121. In some embodiments, the display is a video projector. In some embodiments, the display is a head-mounted display (HMD) such as a VR headset. In further embodiments, suitable VR headsets include, by way of example, HTC Vive, Oculus Rift, Samsung Gear VR, Microsoft HoloLens, Razer OSVR, FOVE VR, Zeiss VR One, Avegant Glyph, Freefly VR headset, and the like. In still further embodiments, the display is a combination of devices such as those disclosed herein.

[0197] In addition to a display 1132, computer system 1100 may include one or more other peripheral output devices 1134 including, but not limited to, an audio speaker, a printer, a storage device, and any combinations thereof. Such peripheral output devices may be connected to the bus 1140 via an output interface 1124. Examples of an output interface 1124 include, but are not limited to, a serial port, a parallel connection, a USB port, a FIREWIRE port, a THUNDERBOLT port, and any combinations thereof.

[0198] In addition, or as an alternative, computer system 1100 may provide functionality as a result of logic hardwired or otherwise embodied in a circuit, which may operate in place of or together with software to execute one or more processes or one or more steps of one or more processes described or illustrated herein. Reference to software in this disclosure may encompass logic, and reference to logic may encompass software. Moreover, reference to a computer-readable medium may encompass a circuit (such as an IC) storing software for execution, a circuit embodying logic for execution, or both, where appropriate. The present disclosure encompasses any suitable combination of hardware, software, or both.

[0199] Those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality.

[0200] The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a generalpurpose processor, a digital signal processor (DSP), an application specific integrated circuit

(ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

[0201] The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by one or more processor(s), or in a combination of the two. A software module may reside in RAM memory , flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium know n in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

[0202] In accordance with the description herein, suitable computing devices include, by way of example, server computers, desktop computers, laptop computers, notebook computers, subnotebook computers, netbook computers, netpad computers, set-top computers, media streaming devices, handheld computers, Internet appliances, mobile smartphones, tablet computers, and personal digital assistants. Suitable tablet computers, in various embodiments, include those with booklet, slate, and convertible configurations, known to those of skill in the art.

[0203] In some embodiments, the computing device includes an operating system configured to perform executable instructions. The operating system is, for example, software, including programs and data, which manages the device’s hardware and provides services for execution of applications. Operating systems in some instances are stored locally or accessed via a network. Those of skill in the art will recognize that suitable server operating systems include, by way of example, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®. Those of skill in the art will recognize that suitable personal computer operating systems include, by way of example, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX-like operating systems such as GNU/Linux®. In some embodiments, the operating system is provided by cloud computing. Those of skill in the art will also recognize that suitable mobile smartphone operating systems include, by way of example, Nokia® Symbian® OS, Apple® iOS®, Research In Motion® BlackBerry OS®, Google® Android®, Microsoft® Windows Phone® OS, Microsoft® Windows Mobile® OS, Linux®, and Palm® WebOS®.

[0204] Computer systems described herein may be utilized as part of the systems and methods of the present invention. In some embodiments, a computer system may be utilized as a device configured for use by a researcher, patient, partner, caretaker, or healthcare provider. Non-transitory computer readable storage medium

[0205] In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked computing device. In further embodiments, a computer readable storage medium is a tangible component of a computing device. In still further embodiments, a computer readable storage medium is optionally removable from a computing device. In some embodiments, a computer readable storage medium includes, by way of example, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, distributed computing systems including cloud computing systems and services, and the like. In some cases, the program and instructions are permanently, substantially permanently, semipermanently, or non-transitorily encoded on the media.

Computer program

[0206] In some embodiments, the platforms, systems, media, and methods disclosed herein include at least one computer program, or use of the same. A computer program includes a sequence of instructions, executable by one or more processors) of the computing device’s CPU, written to perform a specified task. Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), computing data structures, and the like, that perform particular tasks or implement particular abstract data types. In light of the disclosure provided herein, those of skill in the art will recognize that a computer program may be written in various versions of various languages. [0207] The functionality of the computer readable instructions may be combined or distributed as desired in various environments. In some embodiments, a computer program comprises one sequence of instructions. In some embodiments, a computer program comprises a plurality of sequences of instructions. In some embodiments, a computer program is provided from one location. In other embodiments, a computer program is provided from a plurality of locations. In various embodiments, a computer program includes one or more software modules or features. In various embodiments, a computer program includes, in part or in whole, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof. [0208] In some embodiments, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof are utilized to perform the methods as described herein. In some embodiments, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or addons, or combinations thereof are utilized as part of the systems as described herein. In some embodiments, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof are utilized to fully or partially automate the systems and methods as described herein. In some embodiments, automation allows methods to be carried out which are beyond the limits of what can be processed by a human.

Web application

[0209] In some embodiments, a computer program includes a web application. In light of the disclosure provided herein, those of skill in the art will recognize that a web application, in various embodiments, utilizes one or more software frameworks and one or more database systems. In some embodiments, a web application is created upon a software framework such as Microsoft® .NET or Ruby on Rails (RoR). In some embodiments, a web application utilizes one or more database systems including, by way of examples, relational, non-relational, object oriented, associative, XML, and document-oriented database systems. In further embodiments, suitable relational database systems include, by way of examples, Microsoft® SQL Server, mySQL™, and Oracle®. Those of skill in the art will also recognize that a web application, in various embodiments, is written in one or more versions of one or more languages. A web application may be written in one or more markup languages, presentation definition languages, client-side scripting languages, server-side coding languages, database query languages, or combinations thereof. In some embodiments, a web application is written to some extent in a markup language such as Hypertext Markup Language (HTML), Extensible Hypertext Markup Language (XHTML), or extensible Markup Language (XML). In some embodiments, a web application is written to some extent in a presentation definition language such as Cascading Style Sheets (CSS). In some embodiments, a web application is written to some extent in a client-side scripting language such as Asynchronous JavaScript and XML (AJAX), Flash® ActionScript, JavaScript, or Silverlight®. In some embodiments, a web application is written to some extent in a server-side coding language such as Active Server Pages (ASP), ColdFusion®, Perl, Java™, JavaServer Pages (JSP), Hypertext Preprocessor (PHP), Python™, Ruby, Tel, Smalltalk, WebDNA®, or Groovy. In some embodiments, a web application is written to some extent in a database query language such as Structured Query Language (SQL). In some embodiments, a web application integrates enterprise server products such as IBM® Lotus Domino®. In some embodiments, a web application includes a media player element. In various further embodiments, a media player element utilizes one or more of many suitable multimedia technologies including, by way of examples, Adobe® Flash®, HTML 5, Apple® QuickTime®, Microsoft® Silverlight®, Java™, and Unity®.

[0210] Referring to FIG. 11, in a particular embodiment, an application provision system comprises one or more databases 1200 accessed by a relational database management system (RDBMS) 1210. Suitable RDBMSs include Firebird, MySQL, PostgreSQL, SQLite, Oracle Database, Microsoft SQL Server, IBM DB2, IBM Informix, SAP Sybase, Teradata, and the like. In this embodiment, the application provision system further comprises one or more application severs 1220 (such as Java servers, NET servers, PHP servers, and the like) and one or more web servers 1230 (such as Apache, IIS, GWS and the like). The web server(s) optionally expose one or more web services via app application programming interfaces (APIs) 1240. Via a network, such as the Internet, the system provides browser-based and/or mobile native user interfaces.

[0211] Referring to FIG. 12, in a particular embodiment, an application provision system alternatively has a distributed, cloud-based architecture 1300 and comprises elastically load balanced, auto-scaling web server resources 1310 and application server resources 1320 as well synchronously replicated databases 1330.

[0212] The web applications may be utilized as part of the systems as described herein. The web applications may be utilized to perform the systems as described herein. In some application, web applications are utilized to provide features or modules of the systems described herein. In some application, web applications are utilized to fully or partially automate systems and methods described herein. In some embodiments, automation allows methods to be carried out which are beyond the limits of what can be processed by a human. Mobile application

[0213] In some embodiments, a computer program includes a mobile application provided to a mobile computing device. In some embodiments, the mobile application is provided to a mobile computing device at the time it is manufactured. In other embodiments, the mobile application is provided to a mobile computing device via the computer network described herein.

[0214] In view of the disclosure provided herein, a mobile application is created by techniques known to those of skill in the art using hardware, languages, and development environments known to the art. Those of skill in the art will recognize that mobile applications are written in several languages. Suitable programming languages include, by way of examples, C, C++, C#, Objective-C, Java™, JavaScript, Pascal, Object Pascal, Python™, Ruby, VB.NET, WML, and XHTML/HTML with or without CSS, or combinations thereof.

[0215] Suitable mobile application development environments are available from several sources. Commercially available development environments include, by way of examples, AirplaySDK, alcheMo, Appcelerator®, Celsius, Bedrock, Flash Lite, .NET Compact Framework, Rhomobile, and WorkLight Mobile Platform. Other development environments are available without cost including, by way of examples, Lazarus, MobiFlex, MoSync, and PhoneGap. Also, mobile device manufacturers distribute software developer kits including, by way of examples, iPhone and iPad (iOS) SDK, Android™ SDK, BlackBerry® SDK, BREW SDK, Palm® OS SDK, Symbian SDK, webOS SDK, and Windows® Mobile SDK.

[0216] Those of skill in the art will recognize that several commercial forums are available for distribution of mobile applications including, by way of examples, Apple® App Store, Google® Play, Chrome WebStore, BlackBerry® App World, App Store for Palm devices, App Catalog for webOS, Windows® Marketplace for Mobile, Ovi Store for Nokia® devices, and Samsung® Apps.

[0217] The mobile applications may be utilized as part of the systems as described herein. The mobile applications may be utilized to perform the systems as described herein. In some application, mobile applications are utilized to provide features or modules of the systems described herein. In some application, mobile applications are utilized to fully or partially automate systems and methods described herein. In some embodiments, automation allows methods to be carried out which are beyond the limits of what can be processed by a human. Web browser plug-in

[0218] In some embodiments, the computer program includes a web browser plug-m (e.g., extension, etc.). In computing, a plug-in is one or more software components that add specific functionality to a larger software application. Makers of software applications support plug-ins to enable third-party developers to create abilities which extend an application, to support easily adding new features, and to reduce the size of an application. When supported, plug-ins enable customizing the functionality of a software application. For example, plug-ins are commonly used in web browsers to play video, generate interactivity, scan for viruses, and display particular file types. Those of skill in the art will be familiar with several web browser plug-ins including, Adobe® Flash® Player, Microsoft® Sil verlight®, and Apple® QuickTime®. In some embodiments, the toolbar comprises one or more web browser extensions, add-ins, or add-ons. In some embodiments, the toolbar comprises one or more explorer bars, tool bands, or desk bands. [0219] In view of the disclosure provided herein, those of skill in the art will recognize that several plug-in frameworks are available that enable development of plug-ins in various programming languages, including, by way of examples, C++, Delphi, Java™, PHP, Python™, and VB .NET, or combinations thereof.

[0220] Web browsers (also called Internet browsers) are software applications, designed for use with network-connected computing devices, for retrieving, presenting, and traversing information resources on the World Wide Web. Suitable web browsers include, by way of examples, Microsoft® Edge®, Mozilla® Firefox®, Google® Chrome, Apple® Safari®, Opera Software® Opera®, and KDE Konqueror. In some embodiments, the web browser is a mobile web browser. Mobile web browsers (also called microbrowsers, mini-browsers, and wireless browsers) are designed for use on mobile computing devices including, by way of examples, handheld computers, tablet computers, netbook computers, subnotebook computers, smartphones, and personal digital assistants (PDAs). Suitable mobile web browsers include, by way of examples, Google® Android® browser, RIM BlackBerry® Browser, Apple® Safari®, Palm® Blazer, Palm® WebOS® Browser, Mozilla® Firefox® for mobile, Microsoft® Internet Explorer® Mobile, Amazon® Kindle® Basic Web, Nokia® Browser, Opera Software® Opera® Mobile, and Sony® PSP™ browser.

Software modules

[0221] In some embodiments, the platforms, systems, media, and methods disclosed herein include software, server, and/or database modules, or use of the same. In view of the disclosure provided herein, software modules are created by techniques known to those of skill in the art using machines, software, and languages known to the art. The software modules disclosed herein are implemented in a multitude of ways. In various embodiments, a software module comprises a file, a section of code, a programming object, a programming structure, or combinations thereof. In further various embodiments, a software module comprises a plurality of files, a plurality of sections of code, a plurality of programming objects, a plurality of programming structures, or combinations thereof. In various embodiments, the one or more software modules comprise, by way of examples, a web application, a mobile application, and a standalone application. In some embodiments, software modules are in one computer program or application. In other embodiments, software modules are in more than one computer program or application. In some embodiments, software modules are hosted on one machine. In other embodiments, software modules are hosted on more than one machine. In further embodiments, software modules are hosted on a distributed computing platform such as a cloud computing platform. In some embodiments, software modules are hosted on one or more machines in one location. In other embodiments, software modules are hosted on one or more machines in more than one location.

Databases

[0222] In some instances, users query one or more databases to identify information about biological data in his or her data set. For example, user may use an interface to display specific information about a variant, such as the variants' role in cancer or other diseases. In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more databases, or use of the same. In view of the disclosure provided herein, those of skill in the art will recognize that many databases are suitable for storage and retrieval of, for example, patient, photo, video, skin condition, visit, physician, and insurance information. In various embodiments, suitable databases include, by way of examples, relational databases, nonrelational databases, object-oriented databases, object databases, entity-relationship model databases, associative databases, XML databases, document-oriented databases, and graph databases. Further examples include SQL, PostgreSQL, MySQL, Oracle, DB2, Sybase, and MongoDB. In some embodiments, a database is Internet-based. In further embodiments, a database is web-based. In still further embodiments, a database is cloud computing based. In a particular embodiment, a database is a distributed database. In other embodiments, a database is based on one or more local computer storage devices.

[0223] Databases may comprise information (e.g., annotations) regarding genetic variants. In some instances, databases provide information on somatic, germline, or somatic and germline variants. In some instances, a database comprises one or more of ClinVar, COSMIC, NCBI database of Genotypes and Phenotypes (dbGaP), gnomAD, 69 genomes from CGI, Personalized Genome Project, NCI Genomic Data Commons (GDC), cBioPortal, Intogen, and the Pediatric Cancer Genome Project. In some instances, databases provide information on variants related to cancer or other disease.

EXAMPLES

[0224] The following examples are included for illustrative purposes only and are not intended to limit the scope of the invention.

EXAMPLE 1: PROCESSING CELLS FOR A MULTIOMIC WORKFLOW

[0225] An exemplary workflow is shown in FIG. 1. A sample is obtained from a diseased tissue, such as a frozen or FFPE sample. Cells are collected from the sample using any number of techniques known in the art. In some instances, cells are collected from specific genographic (spatial) locations on a tissue. The cells are then processed using one or more multiomics workflows to collect measurements (FIG. 2) from the genome, trans criptome, methylome, and proteome. Such workflows in some instances target biological inquiries (FIG. 4). In some instances, data from measurements are entered as an input file into a cloud computing platform (FIG. 10). Using the measurements, a penetrance score (FIG. 3) and mechanism (FIG. 5) are generated using an algorithm. Changes having a high penetrance score are selected to validate as specific drug targets (FIG. 6). Personalized treatments (e.g., vaccine or small molecule) are then designed for the specific drug target.

EXAMPLE 2: SAMPLE PROCESSING STEPS AND MODALITIES

[0226] Following the general procedure of Example 1, both mammalian and bacterial cells can be analyzed according to workflow of FIG. 9.

[0227] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

EXAMPLE 3: A MULTIOMIC VIEW OF AML

[0228] A MOLM-13 drug-resistant model was generated using quizartinib to target FLT3. The patient from which the MOLM-13 cell line was generated harbored an internal tandem duplication (ITD) in the receptor kinase FLT3 gene, resulting in hyperactive growth signaling and sensitivity to the FLT3 inhibitor quizartinib. The generation of resistance in culture can be seen in FIG. 13. The quizartinib cells also harbor a N841K mutation, which has also been found in AML patients. A genetic analysis of parental and resistant genes can be seen in FIG. 14A.

[0229] Genomic and transcriptomic libraries were prepared. First, the cytosol was lysed. Then the mRNA transcriptome was converted to cDNA using 1st strand synthesis. Next, nuclear lysis occurred. Whole genome amplification via PTA occurred. The transcriptome cDNA and genomic DNA were then isolated. The cDNA was pre-amplified via PTA and a library was prepared for NGS of the transcriptomic library. Likewise, library prep of the PTA-amplified genomic DNA occurred, and the genomic li brary was analyzed via NGS. Resistant cells showed a loss of Chromosome 5 and a gain of 19q, consistent with karyotypic data, as depicted in FIG.

14B-14C. [0230] Analysis of the transcriptome showed differences between single cells in the parental cell lines and the resistant cell lines. FIG. 14D depicts a principal component analysis of the transcriptomics data of parental and resistant cells. A clustered heat map, as depicted in FIG. 14E, showed that resistant cells had an upregulation of the enhancer factor CEB PA (mutated in AML patients) in resistant cells. GAS6 was also upregulated. Transcriptional bypass of FLT3 signaling by GAS6 upregulation can drive Axl signaling in resistant cells, as depicted in FIG. 14F. Full transcript (compared to end-counting) allows for insights into exon usage, as depicted in FIGS. 14G-14H. Isoform biases in parental versus resistant cells manifest both as alternative 5’ exon utilization (PPP1R14B ) & alternative internal exon utilization (HADHA ) resulting in different transcript lengths FIGS. 14G-14FL

[0231] Single nucleotide variations were also analyzed between parental and resistant genotypes A SNV matrix was created, and genotypes were coded as a -1 (0/0), 0 (0/1 or 1/0) and 1 (1/1). The matrix described the presence of 28134 SNVs across samples. A PCA was performed using the matrix and projected into two dimensions. The PCA is depicted in FIG. 15A. Multinomial logistic regression of the SNV matrix was performed whereby the condition Parental or Resistant was modeled. Subsequently, a Wald test derived p-values and was filtered using p < 0.01 that resulted in 520 SNVs that appear in the heatmap (FIG. 15B). Hierarchical clustering was applied over the matrix using Manhattan distance and ward.D as the clustering algorithm.

[0232] The genomic and transcriptomic data can be correlated. Linking the SNV and transcription modulation data reveals that an intronic single nucleotide genotypic shift between parental and resistant cells within the MYC gene correlated with differential MYC transcript levels. Results are depicted in FIGS. 16A-16C. Overall, the genome had approximately two orders of magnitude more plasticity than the transcriptome. There were 300 expression variants and 28,134 genetic variants. Genome plasticity drove greater differentiation of cell clusters. These cell foundational changes were verified within the transcriptome. The evolutionary pressure on the drug resistance is high.

EXAMPLE 4: A MULTIOMIC VIEW OF DUCTAL CARCINOMA IN SITU(DCIS)/INVASIVE DUCTAL CARCINOMA

[0233] A 7 cm DCIS (grade II) and a 1.2 cm invasive cancer (grade I) were analyzed. The cancer was ER+ PR+ HER2-. Normal and tumor tissue were digested to single cells. The tissue was stained with H&E staining and formalin-fixed, paraffin embedded prior to genomic DNA isolation (FIG 17). The transcriptome and genome were analyzed using the methods described in Example 5. [0234] There was single-cell heterogeneity in CNV profiles of the primary breast cancer cells. Additionally, high and low EpC M cells showed specificity in CNV profiles, as depicted in FIG. 18A. Known DCIS copy number alterations harbor prototypical tumor suppressor genes, as depicted in FIG. 18B.

[0235] An analysis of SNV in primary breast cancer cells showed a variety of mutually exclusive single-cell oncogene PIK3CA mutations, as depicted in FIG. 19. Patient 1 had 2/19 cells with a PIK3CA H1047R mutation and 13/19 cells with a PIK3CA N345K mutation.

Patient 2 had 10/13 cells with &PIK3CA E545K mutation. Patient 3 had 0/8 cells with PIK3CA mutations. For patient 1, SNV and CNV were compared across the 19 cells analyzed.

Heterogeneity was observed within single cells. However, some cells showed neither SNV nor CNV mutations (e.g., FIG. 20).

[0236] A principal component analysis of the gene expression profiles results in a separation of EpCAM high and low cells, as depicted in FIG. 21. Clustering by genes enriched in breast cancer showed low levels of expression in the EpCAM low cells. IL-2 and CD4 expression suggests these cells are tumor infiltrating lymphocytes.

[0237] The plasticity of the genome is significantly higher than found in the transcriptome and is the driver of cellular evolution. This method described transcriptional signatures that exposed the presence of tumor infiltrating lymphocytes in the tumor sample and guided interpretation of genotype. RNA mechanisms of resistance were jointly identified, including transcriptional bypass mechanisms in response to drug treatment. Unification of these DNA/RNA data identified candidate regulatory SNVs proximal to genes differentially influencing their expression between parental and resistant cells, thereby exposing novel genes and modes of drug resistance.

EXAMPLE 5: DEVELOPING A VACCINATION TARGET BASED ON A “HIGH PENTRANCE” TARGET

[0238] Following the general procedure of Example 1, the data is used to create a vaccination target to a specific “high penetrance” target according to workflow of FIG. 9.

[0239] If a change is detected either in the genome at a single base, a translocation, or a copy number variant, and they can also detect in combination with the same mutation in the transcriptome, then the penetrance for this change may be high. This can also involve a mutation in a promotor, enhancer or pioneer factor for a splice variant. A splice variant arising from an alternate single nucleotide variant. If for example, this splice variant codes for a surface marker presentation or translation/expression. In this case the same genomic or trans criptomic sequence can be used to target the immune system to this specific cell with this the specific mutation or genomic, transcriptormc or proteomic alteration. Similar to an mRNA vaccine, this oligonucleotide can be introduced to the same animal (person or study subject) to elicit response to this modified gene as its transcriptome or proteomic state. Alternatively, a dendritic cell may be “reprogrammed” with this information.

EXAMPLE 6: SUPERIOR RNA REPRESENTATION COMPARED TO FIELD

[0240] The methods and systems of the present disclosure (Resolve amplification of cDNA) enables full length synthesis of most transcripts found in the cell. In some examples, including this example, the cDNA is enriched across its entire length. Therefore, a bias of amplification or subsequent sequencing reads from the 5’ or 3’ end of a transcript does not occur. FIGs. 22A- 22C show example data illustrating this point. On the data graph shown in FIG. 22A, ResolveOME refers to data generated using the methods and systems of the present disclosure. Droplet-RNAseq shows an example data set generated using a system other than the systems of the present disclosure as a comparison. Data generated using ResolveOME (a method according to the present disclosure which may comprise using PTA) as detailed anywhere and throughout the present disclosure, do not demonstrate a bias of amplification or subsequent sequencing reads from the 5’ or 3’ end of a transcript. The symmetrical pattern seen in the data points of the ResolveOME demonstrate this point. Such bias is observed in the other data set on the graph (Droplet-RNAseq).

The methods of the present disclosure demonstrate a superior performance in analyzing RNA with high coverage over a wide range of 5 ’-3’ gene body percentile values, as shown on the graph. Conversely, the Droplet-RNAseq method leads to low coverage in the early sections of the x-axis and higher coverage further along the x-axis and toward the end. As such, this data set is unsymmetrical and biased.

[0241] FIG. 22B shows a graph demonstrating single cell analysis data comparing ResolveOME to droplet RNAseq in terms of transcript length. This graph demonstrated that ResolveOME (methods and systems of the present disclosure involving PTA) demonstrate superior RNA performance with respect to increased representation across various transcript sizes. This coverage is shown across a broader and longer set of transcript lengths. The competing technology droplet RNAseq starts losing enrichment after 1.5kb, while resolveOME (the method of the present disclosure) is capable of amplifying and detecting transcripts over 4kb. This increased evenness of coverage and broader detection of transcripts impacts robustness of downstream biomarker detection, such as allele variation, where a 2.5x increase in variant detection is achieved, coming from variation detected outside the 5’ or 3’ end of the transcript (FIG. 22C). FIG. 22C shows an additional graph characterizing number of detected DNA variants per cell vs. number of detected genes per cell. The data set on the top (depicted with circles) is generated using the methods and systems of the present disclosure, demonstrating more robust variant calling for a wider range of number of detected genes per cell. The competing technology' (depicted in squares) detects variant over a narrower range of number of detected genes per cell. As such, the competing technology is more limited. The methods and systems of the present disclosure demonstrate a number of detected RNA variants per cell ranging from 150 to 2750. The competing technology (droplet RNAseq) is limited to a number of detected RNA variants per cell ranging from 750 to 1250.

EXAMPLE 7: ISOFORM EXPRESSION UNDERLYING COMPLEX DISEASE MECHANISMS

[0242] FIGs 23A-23C demonstrate unification of genomic lesions and gene expression in AML model of drug resistance. FIG. 23A shows differential transcript utilization (DTU) between MOLM-13 parental and drug-resistant single cells. Color intensity indicates transcript proportion of A or B isoform of indicated transcript. FIG. 23B shows heatmap with transcripts in the y-axis that show a statistical (ZLM p < 0.01) association with ploidy level across all cells in the MOLM-13 dataset. Color of the tiles represents the average standardized expression value at a given ploidy level. The right panel shows the output of the ZLM model testing the expression given the ploidy. Bars are colored based on the -loglO p-value of the ZLM model testing transcriptional differences between parental and resistant cells. Blocks of concordance (ploidy and expression increased or decreased concomitantly) or discordance (ploidy and expression inversely correlated) are shown for a given transcript and chromosomal location. FIG. 23C shows bubble plot showing SNV -transcript expression associations (p < 0.05) determined by ZLM modeling between parental and resistant cells. Candidate SNVs are shown in the y-axis and genotypes in the x-axis. Size of the circle denotes the genotype prevalence of the variant in the MOLM-13 cell type set (parental or resistant). Colors of points denote the standardized mean expression level of the transcript in the set. ENCODE genotypic features mapping to the given single nucleotide variant are indicated in the right bar and are categorized in the heatmap as regulatory (top) or genic (bottom).

EXAMPLE 8: METAGENOMIC APPLICATIONS

[0243] As highlighted in FIG. 9, Bacterial colonies represent a unique opportunity for naturally discrete cells which tend to have accelerated evolutionary forces. Being able to process a high number of bacteria open up tangible impacts to human health.

[0244] In situations where infection accelerates in the presences of antimicrobials, understanding the molecular determinants of antimicrobial resistance (AMR) are key in determining treatment. Standard genomic methods, which typically involve bulk sequencing of colonies, lack the sensitivity to see rare (small numbers of bacterium) mutations that drive the change. In addition, traditional sequencing represents genomic information independently across component nucleotides such that it is unknown if more than 1 mutation are found in the same bacterium. A unique advantage to the workflow here is that we fully characterize each allele for each bacterium sequenced, so we can report phased mutations or specific cells in a quantifable manner (mutation detected in 10 out of 1000 cells would be 1% allele frequency. [0245] In addition to the review of genetic changes, empowering the multiome enables us to capture multiple mechanisms of AMR beyond genetic changes. By reviewing transcriptomic measurements within the bacterium, we can see expression states of pathways and membrane channels. Expression levels can be used to see if there is active enrichment (increased expression) or repression (decreased expression) of genes that may be involved in drug uptake or active drug efflux. We can also look at expressed mutations to see if proteins which may metabolize specific drugs or drug classes have translated genomic variants to impact proteins. [0246] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. None of the descriptions are meant to be limiting. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the methods and systems of the present disclosure. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

WHAT IS CLAIMED IS:

1. A method of single cell analysis comprising:

(a) providing or obtaining a plurality of cells.

(b) performing one or more experiments on single cells of the plurality of cells to generate at least a first data set and a second data set from the plurality of cells, wherein the first data set is a genomic data set and the second data set is a transcriptomic data set and/or a proteomic data set;

(c) identifying a correlation between the first data set and the second data set for at least a portion of the plurality of cells; and

(d) using the correlation obtained in (c), identifying a disease biomarker, designing a therapeutic, or designing a vaccine for a disease.

2. The method of claim 1, wherein performing the one or more experiments comprises performing primary template directed amplification (PT A).

3. The method of claim 1, wherein the one or more experiments or screens comprise a genomics experiment, a transcriptomic experiment, a proteomics experiment, a methylomic experiment or any combination thereof.

4. The method of claim 1, wherein the one or more experiments comprise high-throughput single cell analysis, wherein single cells of the plurality of cells are screened in high- throughput.

5. The method of claim 4, wherein the one or more experiments are performed using a miniaturized high-throughput single cell screening system.

6. The method of claim 5, wherein the method comprises compartmentalizing the plurality of cells into a plurality of partitions, wherein a partition of the plurality of partitions comprises a single cell of the plurality of cells.

7. The method of claim 6, wherein the plurality of partitions comprises a plurality of wells, a plurality of droplets, or both.

8. The method of claim 7, wherein the wells are miniaturized wells.

9. The method of claim 5, wherein the miniaturized high-throughput single cell screening system comprises a microfluidic device, a miniaturized array, or both.

10. The method of claim 1, wherein the one or more experiments comprise performing one or more reactions.

11. The method of claim 6, wherein a partition of the plurality of partitions comprises a single cell therein, and the one or more experiments or screens comprise performing one or more reactions on the single cell in the partition. The method of claim 11, wherein the one or more reactions comprise cell lysis. The method of claim 11, wherein the one or more reactions comprise an amplification reaction. The method of claim 13, wherein the amplification reaction comprises primary template directed amplification (PTA). The method of claim 11, wherein the one or more reactions comprise lysing the single cell, extracting the genomic material of the single cell, thereby releasing a cellular nucleic acid molecule from the single cell in the partition, and performing an amplification reaction on the cellular nucleic acid molecule. The method of any one of claims 10-15, wherein performing the one or more reactions comprises using one or more reagents. The method of claim 16, wherein the one or more reagent(s) comprise one or more of at least one amplification primer, at least one nucleic acid polymerase, and a mixture of nucleotides, wherein the mixture of nucleotides comprises at least one terminator nucleotide which terminates nucleic acid replication by the polymerase. The method of claim 17, wherein the terminator nucleotide is an irreversible terminator. The method of claim 17, wherein the terminator nucleotide is selected from the group consisting of nucleotides with modification to the alpha group, C3 spacer nucleotides, locked nucleic acids (LNA), inverted nucleic acids, 2' fluoro nucleotides, 3' phosphorylated nucleotides, 2'-O-Methyl modified nucleotides, and trans nucleic acids. The method of claim 19, wherein the nucleotides with modification to the alpha group are alpha-thio di deoxynucleotides. The method of claim 17, wherein the terminator nucleotide comprises modifications of the r group of the 3’ carbon of the deoxy ribose. The method of claim 17, wherein the terminator nucleotide is selected from the group consisting of dideoxynucleotides, inverted dideoxynucleotides, 3' biotinylated nucleotides, 3' amino nucleotides, 3 ’-phosphorylated nucleotides, 3'-O-methyl nucleotides, 3' carbon spacer nucleotides including 3' C3 spacer nucleotides, 3' Cl 8 nucleotides, 3' Hexanediol spacer nucleotides, acyclonucleotides, and combinations thereof. The method of any one of claims 5-9 or 11-22, wherein a partition of the plurality of partitions comprises at least a single cell and a bead. The method of claim 23, wherein the bead delivers a reagent for performing a reaction on the single cell in the partition. The method of claim 24, wherein the reagent is bound to the bead via a cleavable linker and is configured to be released from the bead via cleavage of the cleavable linker. The method of claim 24 or 25, wherein the reagent comprises a barcode configured to identify the cell or a constituent of the cell. The method of claim 26, wherein the constituent of the cell comprises genomic material of the cell, ribonucleic acid (RNA), deoxyribonucleic acid (DNA), or any combination thereof. The method of claim 26, wherein the method comprises lysing the cell in the partition, releasing a cellular nucleic acid molecule of the cell in the partition, releasing the barcode from the bead via cleavage of the cleavable linker, and hybridizing the cellular nucleic acid molecule to the barcode. The method of claim 11, wherein the one or more reactions comprise lysing the single cell, thereby releasing cellular nucleic acid molecules in the partition, performing one or more amplification reactions on the cellular nucleic acid molecules thereby generating amplified cellular nucleic acid molecules, and wherein the method further comprises extracting the amplified cellular nucleic acid molecules from the partition, and sequencing the amplified cellular nucleic acid molecules. The method of claim 1, wherein generating the first data set comprises performing primary template directed amplification (PTA) and generating the second data set comprises performing a reverse transcription reaction. The method of claim 30, performing the reverse transcription reaction comprises generating a cDNA library. The method of claim 1, wherein generating the first data set comprises determining a methylation site in a cellular nucleic acid molecule using PTA, thereby generating a methylation library. The method of claim 32, further comprising comparing the methylation library to a reference library for a single cell of the plurality of cells, wherein the methylation library and the reference library are generated from the same cell. The method of any one of the preceding claims, wherein identifying the correlation comprises calculating or assigning a penetrance score to the correlation, wherein the penetrance score quantifies the correlation. The method of claim 34, wherein the penetrance score guides identifying the disease biomarker, designing the therapeutic, designing the vaccine for the disease, or any combination thereof. The method of claim 34, wherein a high penetrance score indicates a strong correlation between the first data set and the second data set. The method of claim 36, wherein the high penetrance score indicates that the expression of a gene identified in the first data set leads to a transcriptomic event, a proteomic event or both, and wherein the gene is identified as a disease biomarker. The method of any one of claims 34-37, wherein a low penetrance score indicates a weak correlation between the first data set and the second data set, and that the expression of a gene identified in the first data set does not lead to a transcriptomic event, a proteomic event, or either, and wherein the gene is not identified as a disease biomarker. The method of any one of the preceding claims, wherein identifying the correlation is performed with the aid of a computer system comprising a computer program. The method of claim 39, wherein the computer program comprises a bioinformatics algorithm. The method of claim 1, wherein the first data set and the second data set are combined or integrated into a database. A method of developing a cancer treatment comprising:

(a) generating multiomics data from one or more single cells, wherein generating comprises performing Primary Template Directed Amplification (PTA), and wherein the multiomics data comprises two or more of genome data, transcriptome data, and proteomics data;

(b) correlating one or more mutations in genome data with corresponding mutations in one or both of (i) an mRNA of the transcriptome data and (ii) a protein of the proteome data; and

(c) generating a treatment targeting one or both of the mRNA and the protein. The method of claim 42, wherein the correlation is quantified by a penetrance score. The method of claim 43, wherein the penetrance score is at least 0.5. The method of claim 43, wherein the penetrance score is at least 0.9. The method of claim 43, wherein the treatment comprises an mRNA vaccine. The method of claim 43, wherein the treatment comprises reprogramming a dendritic cell to target one or both of the mRNA or protein. The method of claim 43, wherein the mutation in genome data comprises a DNA mutation. The method of claim 48, wherein the DNA mutation is selected from the group consisting of SNV*X, CNV*X, translocation, IND EL, frameshift, stop codon, mitochondrial, promoter/enhancer, TCR/BCR, and other change. The method of claim 43, wherein the mRNA comprises a transcript change.

51. The method of claim 50, wherein the transcript change is selected from the group consisting of expression, splice variant, fusion, IncRNA, miRNA, TCR/BCR, promoter, truncated gene, mitochondnal, or mutation.

52. The method of claim 43, wherein the protein comprises a protein change.

53. The method of claim 52, wherein the protein change is selected from the group consisting of over/under expressed, truncated, surface bound, frameshift, misfolded, metabolic, ligand independence, confirmation, activity change, or fused.

54. The method of claim 43, wherein the cancer comprises breast cancer.

55. The method of claim 43, wherein the breast cancer comprises ductal carcinoma.

56. The method of claim 43, wherein the cancer comprises leukemia.

57. The method of claim 43, wherein the single cancer cells are obtained from an FFPE sample.

58. A method for validating a disease target for a disease comprising:

(a) selecting cells from a tissue;

(b) banking the cells;

(c) performing one or more multiomic methods on the cells to generate multiomics data; and

(d) applying a computer algorithm to process the multiomics data and generate a disease target.

59. The method of claim 58, wherein selecting the cells comprises FACS sorting, microfluidics, spatial cell selection, or ultra-high throughput cell sorting.

60. The method of claim 58, wherein the number of cells is at least about 200, 300, 400, 500, 1000, 2000, 3000, 4000, 5000, 6000, 10,000 or greater.

61. The method of claim 58, wherein the disease is cancer.

62. The method of claim 58, wherein the multiomics methods comprise PTA.

63. The method of claim 58, wherein the multiomics data comprises data from one or more of a genome, epigenome, transcriptome, proteome, lipidome, or methylome.

64. The method of claim 58, wherein the method further comprises a treatment based on the disease target.

65. The method of claim 58, wherein the treatment comprises an mRNA vaccine or small molecule.

66. A system for determining a penetrance score comprising: a computing system comprising at least one processor and instructions executable by the at least one processor to provide an application configured to perform operations comprising: receiving multiomics data from one or more sources and at least one biological state; and applying an algorithm configured to process the data and generate a penetrance score.

67. The system of claim 66, wherein the computing system comprises a cloud computing platform.

68. The system of claim 66, wherein the multiomics data comprises data obtained from analysis of one or more of genomic DNA, transcript RNA, proteins, lipids, or metabolites.

69. The system of claim 66, wherein the multiomics data comprises one or more measurements.

70. The system of claim 68, wherein one or more of the measurements is a silent change.

71. The system of claim 66, wherein the multiomics data comprises data from one or more of a genome, a transcriptome, a proteome, a metabolome, a lipidome, or an epigenome.

72. The system of claim 69, wherein the multiomics data comprises data from a genome.

73. The system of claim 71, wherein the one or more measurements are selected from the group consisting of: copy number variation, translocation, and mutation burden.

74. The system of claim 69, wherein the multiomics data comprises data from a methylome.

75. The system of claim 73, wherein the one or more measurements are selected from the group consisting of: methylation at CpG sites, gene activation, and gene repression.

76. The system of claim 69, wherein the multiomics data comprises data from a transcriptome.

77. The system of claim 75, wherein the one or more measurements are selected from the group consisting of: expressed genes, gene fusions, expressed variants and splice variants.

78. The system of claim 69, wherein the multiomics data comprises data from a proteome.

79. The system of claim 77, wherein the one or more measurements are selected from the group consisting of: translation level, phosphorylation state, and protein modification.

80. The system of claim 66, wherein the one or more sources comprise an individual organism.

81. The system of claim 66, wherein the one or more sources comprise cells.

82. The system of claim 66, wherein the cells are mammalian cells, human cells, bacterial cells, cancer cells, an immortalized cell line, a primary patient cell line, or any combination thereof.

83. The system of claim 66, wherein the cells are obtained from a tissue.

84. The system of claim 66, wherein the cells are obtained from a tissue cross-section.

85. The system of claim 66, wherein the biological state comprises a disease state.

86. The system of claim 66, wherein the disease state comprises cancer.

87. The system of claim 66, wherein the algorithm further generates a mechanism based on the data.

88. The system of claim 66, wherein the mechanism is generated by detecting one or more changes in one or measurements.

89. The system of claim 88, wherein the change comprises a genome DNA change.

90. The system of claim 88, wherein the genome DNA change is selected from the group consisting of SNV*X, CNV*X, translocation, INDEL, frameshift, stop codon, mitochondrial, promoter/enhancer, TCR/BCR, and other change.

91. The system of claim 88, wherein the change comprises a transcript change.

92. The system of claim 88, wherein the transcript change is selected from the group consisting of expression, splice variant, fusion, IncRNA, miRNA, TCR/BCR, promoter, truncated gene, mitochondrial, or mutation.

93. The system of claim 88, wherein the change comprises a genome change.

94. The system of claim 88, wherein the protein change is selected from the group consisting of over/under expressed, truncated, surface bound, frameshift, misfolded, metabolic, ligand independence, confirmation, activity change, or fused.

95. The system of claim 88, wherein the mechanism is determined to be one or more of a genomic, transcriptomic, proteomic, methylomic hpidomic, or metabolomic mechanism.

96. The method or system of any one of the preceding claims, wherein the method or system is capable of detecting a number of RNA variant per cell of at least 750, 1000, 1500, 2000, 2500 or higher.

97. The method or system of any one of the preceding claims, wherein the method or system is capable of detecting a number of genes per cell of from about 1000 to about 8000.

98. The method or system of any one of the preceding claims, wherein the method or system is capable of detecting a number of RNA variant per cell of at least 750, 1000, 1500, 2000, 2500 or higher and a number of genes per cell of from about 1000 to about 8000.

99. The method or system of any one of the preceding claims comprising full length synthesis of RNA transcripts in the cell wherein a plurality of amplification products achieved from performing the method are substantially unbiased substantially unbiased over a range of 5 ’-3’ gene body percentiles.

100. The method or system of any one of the preceding claims capable of amplifying and detecting transcripts over 1 kb, 2kb, 3 kb, 4 kb, or longer in length.